In this video, we’re going to do a very high level review of a bioinformatics pipeline for microbial DNA variants. We’re just going to go straight through the protocol in interest in time. We won’t go into very much detail about the rationale for each step or explain it step in detail. We’ll assume that this is an Ilumina sequencing run with a 2 x 150 base paired-end reads. We’re going to use Geneious and Geneious Prime for the analysis.
We start with the raw FASTQ files from a sequencer, in this case, an Illumina sequence and we generally use Illumina BaseSpace Sequence Hub to manage our sequencing runs and one of the things you could do with a BaseSpace is configure to automatically remove and strip out adapter sequences in the reads in the FASTQ files. So we’ll assume that adapters are being eliminated through Illumina BaseSpace.
We also do a FASTQC quality control check on all the runs. We use FASTQC to look at the read distribution and red quality and some other metrics just to make sure the run went well. We import the FASTQ files into Geneious and I have another video on how to do that. There are some issues there that you have to be aware of.
We will run a tool called BBMerge, which merges paired-end reads, basically converting two forward and reverse reads into a single read. Then we will look at filtering and trimming low quality reads using BBDuk, that’ll help with the downstream analysis.
We will do a sequence alignment of the sample reads against a reference genome, in this case a bacterial genome.
Then, finally, the ultimate goal of the of the workflow, of course, is to do the variant calls looking for mutant and short indels. We used the find variations snips function to do that in Geneious.
Thanks for watching this episode of Basic Bioinformatics.