Episode 2: Importing Illumina Files into Geneious

In this video we are going to show how to import Illumina paired-end FASTQ files into Geneious Prime. We have quite a few clients who used Geneious for bioinformatics data analysis and will often ask “What’s the correct way to import paired-end FASTQ files into Geneious?”

We currently run an instance of Windows on Amazon AWS WorkSpaces. It’s a virtual OS environment and as part of that service we get WorkDocs, which is a shared file system between your local workstation and the cloud.

After opening up WorkDocs we can see I have a couple of FASTQ files in a gzipped compressed format. I renamed them from their standard Illumina naming convention to look like this: one says forward_read_R1 and the other one says reverse_read_R2. R1 is typically the forward read in a paired-end read pair, and the reverse read is often called R2. These two files go together, the forward and reverse read. We are ready to import both of these into into Geneious.

The first thing I want to do is launch Geneious Prime. In this video we have the 2019 version, and at the time of filming this the latest version is 2019.1.3. Be patient, it may take a second or two to load up Geneious Prime. Once Geneious is loaded they present the interface that you see here.

To import the paired-end files we go to File > Import > From File. It will load up the WorkDocs directory and we can see our forward and reverse read files again. Now we have to choose or pick both of these. On my Mac laptop I have to Shift + Left mouse click on each file to do that. Your machine may be different depending on how you choose both of them, but it’s important to have both of them checked because we will import them together.

When we have both files chosen we choose Import and it brings up this dialog box. I will reset this to the defaults because that may be what you see the first time you bring up this import dialog box. It says FASTQ Sequences Import, and the first drop down box above is the Read Technology. In my case it is Illumina by default. There are additional vendors options such as PacBio and Nanopore, which are long read instruments, and then Sanger, 454, Solid, and so forth. We just want to choose Illumina for this demonstration.

Then you have the option to pair your files. We definitely want a used the paired-end option. If you look at the drop down box there are a couple other options Mate-Pair and so forth that don’t really apply to what we’re doing here so we just go with the paired-end read.

In the next drop down box there’s a couple options there. The one we want to choose is Pairs of Files, which is “the first sequence in one file is paired with the first sequence in the other file”. What they mean is if you look in the actual FASTQ files, the first record in the forward read file will be paired with the first record in the reverse read file. So the first record in the forward and reverse reads will be paired together, and that’s what we want. I believe I have something like 1.6M reads in the forward read file, 1.6M reads in the reverse read file and if you pair those together you wind up with 3.2M reads.

Finally the insert size is automatically chosen to be 500 here. We will just leave it a 500 here, that is roughly the correct number for these paired-end reads. Then we click OK. Then it brings up a progress bar and shows you how it’s importing these files into Geneious.

The file import is done and if you look along the top bar here it shows what it actually did. First of all, we can enable or disable the paired-end read import file. It gives it a name, usually it’s called Paired Reads. There’s a brief description “Paired reads created from the forward and reverse read files”. It gives you the size of the file, the absolute path it used and so forth, so some metadata is actually import.

In the second section here, if you click on the Sequence View tab it brings up all the reads that it imported and if you notice here, the first two are going to be paired. There arethese sort of box whisker plots, one forward facing, one reverse facing. So this is actually the first record of the forward read from the forward read file. The second one is the first record in the reverse read file. So these two were paired together, just like we expect them to be.

On the right hand side, it just shows you some more information about the actual read itself. You can scroll down to the bottom and just like it’s about 3.2M reads in there, paired-end. That’s pretty much how you important the paired-end reads into Geneious.

Bioinformatics