Episode 2: Importing Illumina Files into Geneious

Welcome to Basic Bioinformatics brought to you by The Sequencing Center.

In this video we are going to show how to import Illumina paired-end FASTQ files into Geneious Prime.  We have quite a few clients who used Geneious for bioinformatics data analysis and will often ask “What’s the correct way to import paired-end FASTQ files into Geneious?”  So I’ll show you how to do that in this video.

What I have here is we run Windows operating system on Amazon AWS WorkSpaces.  It’s a virtual OS environment, and as part of that we get WorkDocs, which is a shared file system between your local workstation and the cloud.

So I’m going to just open up WorkDocs here and I’ve got a couple of files FASTQ files in gzipped compressed format.  I renamed them from their standard Illumina naming convention to look like this: one says forward_read_R1 and the other one says reverse_read_R2.  So R1 is typically  the forward read in a paired-end read pair, and the reverse read is often called R2, which is the reverse read in the paired-end read.  So these two files go together, the forward and reverse read.  Okay, so we’re going to try to import both of these into into Geneious.

The first thing I want to do is launch Geneious Prime.  We have 2019, I’m pretty sure this is the current version 2019.1.3, and it takes a second or two to actually load up Geneious Prime.

Okay, so once Geneious is loaded up they give the interface that you see here.  I’m not going to go over this. We can talk about the Geneious interface in a different video.  I’ll just assume you’re somewhat familiar with it.

To import the paired-end files we go to File and down to Import, and then we choose From File.  And if we do that it will load up the WorkDocs directory, and there’s our forward and reverse read files again.  Now we have to choose or pick both of these.  On my Mac laptop I have to Shift + Left mouse click on each file to do that.  Your machine may be totally different, how you choose both of them, but it’s important to have both of them checked because we’re gonna we’re gonna import them together.

So then we have them both chosen, and then we just choose import, and it brings up this dialog box.  I’m gonna reset this to the defaults because that may be what you see the first time you bring up this import dialog box.  It says FASTQ Sequences Import, and the first drop down box above is the Read Technology.  It’s in my case it’s Illumina by default.  If you actually choose to drop down boxes there’s a couple of the vendors out there PacBio and Nanopore are long read instruments and then Sanger, 454, Solid, and so forth.  So we want to choose Illumina for that.

Then you have the option of either not pairing these two files or pairing them.  We definitely want a used the paired-end option.  We could load them up individually, but we don’t want to do that.  We want to have the power of paired-end reads so we want to choose the paired-end inward pointing option there.  If you look at the drop down box is a couple other options Mate-Pair and so forth that don’t really apply to what we’re doing here so we just go with the paired-end read.

In the next drop down box there’s a couple options there.  The one we want to choose is pairs of files, and it explains what that is.  It says “the first sequence in one file is paired with the first sequence in the other file”.  What they mean is  if you look in the actual FASTQ files, the first record in the in the forward read file will be paired with the first record in the reverse read file.  So the first record in the forward and reverse reads are gonna be paired together, and that’s what we want.  I believe I have something like 1.6M reads in the forward read file, 1.6M reads in the reverse read file and if you pair those together you wind up with, I guess 3.2M reads.

Then finally, the insert sizes automatically chosen to be 500 here.  I won’t explain what inserts sizes in this video.  We’ll just leave it a 500, that’s roughly the correct number for these paired-end reads.  Then we click okay.  Then it brings up a progress bar and shows you how it’s importing these files into Geneious.

Okay, so the file import is done and if you look along the top bar here  it shows what it actually did.  First of all, we can enable or disable the paired-end read import file.  It gives it a name, usually it’s called Paired Reads.  There’s a brief description “Paired reads created from the forward and reverse read files”.  It gives you the size of the file, the absolute path it used and so forth, so some metadata there are actually import.

In the second section here, if you click on the Sequence View tab it’s pretty interesting.  It brings up all the reads that it imported and if you notice here, the first two are going to be paired.  There’s these sort of  box whisker plots, one forward facing, one reverse facing.  So this is actually the first record of the forward read from the forward read file.  And the second one is the first record in the reverse read file.  So these two were paired together, just like we expect them to be.

On the right hand side, it just shows you some more information about the actual read itself.  I’m not going to go into any detail here,  just to let you know that it looks like it did import these all correctly.  You can scroll down to the bottom and just like I said it’s about 3.2M reads in there, paired-end.  Go back to the top and that’s pretty much how you important the paired-end reads into Geneious.

Thanks for watching this episode of Basic Bioinformatics.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *