Basic Bioinformatics: Sequencing Coverage

In a previous video, we did an alignment of our sample reads against a reference genome, in this case pseudomonas aeruginosa.  We’re using the Geneious tool to look at the alignment and variant calls.  What I wanted to do here was just briefly discuss “coverage”, what that means, and how to interpret it a little bit.

I want to click on our demo run to bring up the alignment viewer in Geneious – this is what it brings up.  What we want to do here is try to configure this correctly so that we have it set up to look at coverage.  The first thing we’ll do is click on Contig View to make sure we’re in that tab.  Over on the control panel on the right we have some options here.  I want to be sure we go over down to the Annotations and Tracks option, so I’ll click on that.  Then we want to choose CDS, the coding sequence, make sure that that’s enabled.  And then we’re going to disable all the other tracks that are out there.  We just want the CDS track in this case.

Then there’s some other options here for zooming in and out.  We can zoom all the way in if we want to the read level, nucleotide base level, or zoom all the way out.  We can also click along this coverage map and go to that specific location.  But what I really want to do here is zoom all the way out.  Then we wanna make sure we’re at the top of the list here for the reads.

There’s another option on the right here where we can go to a specific locus or region of the genome.  In this case, I want to go to the 1st locus, that takes us all the way to the left.  Now let’s go back and scroll out here.  This first track shows the coverage across the genome, so the mapped reads against the reference genome shows the coverage from the first locus all way over to 6.5M, which is roughly the size of this Pseudomonas genome.  The second track is the actual reference sequence for Pseudomonas aeruginosa, and then below that are all the sample reads, both forward reads and reverse reads arranged in order.

One thing we want to note here is the coverage.  If you look across the alignment of the reads against the reference genome, it shows that the coverage is quite uniform all the way across, and that’s really important.  You like to see uniform coverage across the entire genome.  In this case, I think the coverage was about 52x with a small standard deviation.  So it’s nice uniform coverage all way across the genome, and that’s always a very good sign.

If the coverage is too low, we would probably see a number of artificial gaps in here.  We don’t want to see those.  That means there’s no reads mapping to the reference genome.  In this case, we don’t really see that.  If there are gaps in the sequence here it would mean that either the samples really do not map to the reference, that they’re simply missing, or it often means there’s bacteriophages inserted into the genome.  In this case it looks like it’s pretty nice uniform coverage.

What we want to do then is a drill down to the actual nucleotide level.  We can do this, zoom in, and then go to the first position, first locus, and go all the way to the top here.  What you’ll see is a set of reads.  This is the actual first read in the sequence and you can see that the first locus is thymine, and it matched the thymine in the reference genome.  The second is thymine and it maps as well, third is thymine, the fourth is adenine and it also maps to the reference.  Then adenine, adenine, guanine and so on.  You can see it’s mapping perfectly all the way across the read.  In effect, this was a 2×150 base pair run so if we go to the right, we should see that the first read is about 150 base pairs in length and that’s indeed the case.  You can see all way across this particular sample read, it’s mapping one-for-one every nucleotide base.

If we go back to the start and look here, I think I can show what it’s claiming the coverage is.  If you look down at the bottom down in this area, then hover the pointer up here, it shows the coverage is about 78 at that particular locus.  Well, right now we only see a single base, so it looks like it’s 1x there.  However, if we scroll down and I think it’s all the way to the bottom here, there’s another set of reads that are actually mapped along the first part of reference as well.  You can see if we start counting these up I’ll bet there’s about 78 of those if we were to add them all up.  I’m not going to count them here, we’ll just take their word for it.  It looks like if you added up all the reads at that first locus it’s probably 78.  So that means that 78x coverage at the first locus.

If we go over here you see the coverage in the bottom changes to 68, the coverage goes down just a little bit there, and I bet if we scroll down here will probably find some other map read somewhere.  Somewhere along here, if we add up all the base pairs at that locus is gonna be about 68.  So that’s 68x coverage.  You can go all across genome doing this.  Here it says the coverage is 67 and over here it’s 67.  That’s the definition of coverage – it’s the number of base pairs at a particular locus that are mapping to the reference genome.  And so, if you need to talk about coverage, you can use the Geneious alignment viewer to check on that.  In the next video, I think we’ll talk about how to look at the variants.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *