We must cleanse our reads of adapters lest they wreck havoc on our sequencing results.
During the library preparation process, Illumina adapter sequences are annealed to sequencing reads. The adapter sequences are required for attaching reads to flow cells and for attaching indexes to reads. When sequencing is complete it’s important to remove, or trim off, the adapter sequences from the reads. This step will improve downstream data processing, such as sequence alignments and de novo assembly.
The exact DNA sequence of the adapters depends on the library preparation kit that is used for sequencing. Illumina Nextera XT is one of our most used kits. The adapter sequence for this kit is:
The adapter sequences for other kits may be different, so be sure to check which kit was used for library prep and get the appropriate sequence from the adapter sequences manual.
We deliver sequencing results in FASTQ and FASTA file formats. FASTQ files are automatically generated by Illumina sequencers. They represent the raw data files from which all downstream processing begins. Our bioinformatics data analysis pipelines automatically convert FASTQ files to FASTA files, so you will receive both file formats at the end of each sequencing run. Adapter trimming software applications may require FASTQ or FASTA formatted files. If you wish to use any of these applications, both FASTQ and FASTA files are available for that purpose.
Trimmomatic is a popular tool for trimming adapter sequences from Illumina reads. The Trimmomatic manual describes how to install this application, how to run it and it describes all of the required and optional command line parameters. If you decide to use Trimmomatic for trimming adapter sequences from Illumina reads, a minimal command that only performs adapter trimming may look like this:
java jar trimmomatic-0.39.jar PE -threads 4 read1.fastq.gz read2.fastq.gz read1_paired.fastq.gz read1_unpaired.fastq.gz read2_paired.fastq.gz read2_unpaired.fastq.gz ILLUMINACLIP:adapters.fasta:2:30:10
- Most sequencing runs use paired-end reads, so we specify “PE” in the command line.
- To speed-up the application, we specify the number of threads to use, up to the maximum number of available processor threads.
- There are always two FASTQ files in a paired-end run: one file for forward reads and one file for reverse reads. We specify both files in the parameter list.
- For each read file, we specify the name of a paired output file and an unpaired output file.
- The adapter sequence(s) is/are contained in a FASTA formatted file. The ILLUMINACLIP parameter specifies the name of this file. This parameter also requires three additional fields: seedMismatches, palindromeClipThreshold, simpleClipThreshold. See the manual for more information about how to set these three fields.
In our example, using the Nextera XT library prep kit, the “adapters.fasta” file would look like this:
This is a standard FASTA formatted file. The first record contains the right-caret character followed by an arbitrary string. The second record contains the adapter sequence. This file can contain multiple adapter sequences by using a multi-FASTA file format. Trimmomatic output files will show which reads (if any) were trimmed.