Episode 1: Overview of a Microbial DNA Variants Pipeline

This video is a high level review of a bioinformatics pipeline for microbial DNA variants. We won’t go into detail about the rationale for each step or explain it step in detail since this is meant to just review the process. We will assume that this is an Ilumina sequencing run with a 2 x 150 base paired-end reads and we will use Geneious and Geneious Prime for the analysis.

We start with the raw FASTQ files from the Illumina sequencer and we generally use Illumina BaseSpace Sequence Hub to manage our sequencing runs. One of the things you could do with BaseSpace is configure it to automatically remove and strip out adapter sequences in the reads in the FASTQ files. So for this overview we will assume that adapters are being eliminated through Illumina BaseSpace.

We also do a FASTQC quality control check on all the runs. We use FASTQC to look at the read distribution and read quality and other metrics to make sure the run went well. Then we import the FASTQ files into Geneious, which we discuss in our video Importing Illumina Files into Geneious.

We will run a tool called BBMerge, which merges paired-end reads, basically converting two forward and reverse reads into a single read. Then we will look at filtering and trimming low quality reads using BBDuk, that will help with the downstream analysis.

We will do a sequence alignment of the sample reads against a reference genome, in this case a bacterial genome. Now we can do the variant calls looking for mutant and short indels. We used the ‘find variations snips’ function to do that in Geneious.