In a previous video we talked about alignments of bacterial sample reads against the reference genome. In this case Pseudomonas aeruginosa. What I want to do in this brief demo is show you how to save the variant calls to an Excel spreadsheet and interpret that a little bit.
So I already have the alignment viewer set up in Geneious and we’re basically showing a reference genome, and the aligned reads to it. We’re also set to the first of locus in the genome. It looks like down here we have the list of all the variants that were called during this particular alignment. Over on the right, we have the CDS or coding sequence checked. We also have the variant track checked so we can look at all the variants that are in this alignment.
So we can do a couple things here. We can go down and click on this small right arrow, and that’ll take us to the first polymorphism or SNP in this case. What it’s showing here is that we have cytosine in the reference genome and guanine in the sample read. If we hover over these we can get metadata about the actual SNP itself. This is also coordinated with a table down below that shows the SNPs in more detailed, the polymorphism in more detail. Basically, it’s showing we have a SNP here – transversion, which is basically from cytosine to guanine. So this is going to be from a pyrimidine to a purine. This is also intergenic because it’s not showing the actual gene above here.
If we go to the next variant, we see this one’s actually occurring in a gene, it gives the gene name. In this case it is some kind of a hemolysin gene, and it’s showing that this is a transition type of SNP. In this case, from pyrimidine to pyrimidine – thymine to cytosine. We can scroll through these if we wanted to, every single SNP that’s out there every polymorphism or variant that’s out there.
One thing we can do, however, is we can set up this table we can customize is table any way we want. So, for example, if we click in the columns that are here it shows all the columns that are available to you. If you chose “Manage Columns” it shows the available fields that are in there. We have selected just a subset of those fields because in some cases these are more important than others. I’ll explain what these are when we get to this step where we’re looking at the actual Excel spreadsheet. I’ll just show you that we pull these from the available column over to the selected column, click OK, and then these field show up here.
One thing you can do, which is pretty nice, is you can export this table. Just click “Export Table” and it’ll save it as a .csv file. We’ll just go ahead and do that, and that saved it as a .csv file in one of our directories. What I’ll show you next is how to import that into a Excel spreadsheet and look at it.
Here is the spreadsheet that I was talking about. What we’ve done is we’ve imported the .csv file into Excel, reformatted a little bit to make it look better, and then saved it out as an Excel spreadsheet. Then what we have here is the variant call list. So let me just go through the different fields that are here.
The first one is the name field. Unfortunately, the default naming in Geneious is not very useful so what we normally do is we will numerically identify these and then order them like we see here.
The next two fields are”min” and “max”, that’s the genetic loci. We’ve ordered these along the minimum field, so it’s increasing from the minimum. Then for each one of these records it shows the length of the SNP. It can be just a single base change or can be multi base change. It’s going to be insertion, deletion, small indels and that sort of thing.
The next field is the type of polymorphism that we see here. There’s seven different kinds. There’s transversions and transitions. Transversion is going to be from purine to pyrimidine or vice versa. Then transition is from purine to purine or pyrimidine to pyrimidine. I think we have insertions, deletions, substitution, I think stop codons show up here once in a while. All the different kinds of polymorphism types will be listed next. We then have amino acid changes. If there is an amino acid change it’ll show up in this list. In the first one here there was no amino acid change so that’s just left blank. However, if we go down to this one we have the one letter code for amino acids throughout this list. So K -> E I think that’s lysine to glutamic acid. You get the one letter codes online. So if there is an amino acid change, it’ll show up here, and those could be pretty important.
The next field is the coding sequence and if it’s blank that usually means that it’s an intergenic type of polymorphism, so there’s no name associated with it. Once in a while the genbank file itself that this is based on may not have an actual name, so that might be a problem if it shows up blank there. But if it’s in a geneic region, it will show the name of the gene as it appears in the genbank file.
Then if there is a codon change, it will show up in this column, the codon change column. Sometimes there is no codon change, sometimes there is. If there is a change it will show up here.
The final calm and probably the most important is called “protein effect”. If the actual codon change has some kind of structural or functional change in the protein, it’ll show up here. So, for example, there might be substitution in an amino acid and a protein, there might be deletions, there might be frame shifts or something else that occurs in the protein or enzyme. Presumably if those are deletarious changes, it might change the structure and then the function of the protein or enzyme that it’s affecting and that is usually a fairly serious problem.
You’ll probably want to spend most of your time looking at the protein effects if there are any, and look at those polymorphisms first. Of course you can filter and rank order these anyway you want. That is basically how we generate the variant call list in Excel from our Geneious alignments.