How to read FastQ files

You are here:
< Back

FastQ File Format

Illumina sequencing instruments generate FastQ files when a sequencing run is finished.  FastQ files are the starting point for all downstream bioinformatics data analysis.  

The file name suffix for a FastQ file is:  .fastq

For example, a typical FastQ file name could be:  sample.fastq

FastQ files are often found in gzip compressed format with the file name: sample.fastq.gz

The Illumina FastQ file format is shown below.

Each record in a FastQ file consists of four lines:

  •     Sequence identifier
  •     Nucleotide sequence
  •     Quality score identifier line (always a single “+” (plus) sign)
  •     Quality scores

The first line contains the following elements:

@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered>:<control number>:<index>

Table 1.  Elements in first line of a FastQ file record.

Element Requirements Description
@ @ Each sequence identifier line starts with @.
<instrument> Characters allowed:

a–z, A–Z, 0–9 and underscore

Instrument ID.
<run number> Numerical Run number on instrument.
<flowcell ID> Characters allowed:

a–z, A–Z, 0–9

Flowcell ID.
<lane> Numerical Lane number.
<tile> Numerical Tile number.
<x_pos> Numerical X coordinate of cluster.
<y_pos> Numerical Y coordinate of cluster.
<read> Numerical Read number. 1 can be single read or Read 2 of paired-end.
<is filtered> Y or N Y if the read is filtered (did not pass), N otherwise.
<control number> Numerical 0 when none of the control bits are on, otherwise it is an even number.
<sample number> Numerical Sample number

The second line contains the nucleotide sequence of a single read (DNA fragment).

The third line contains a quality score identifier and is always a “+” (plus) sign.

The fourth line contains basecall quality scores for each nucleotide in the sequence shown in line two.  These are Phred +33 encoded scores using ASCII characters to represent the numerical quality scores (https://bit.ly/2OLYC6m).

The number of records in a FastQ file equals the number of reads generated during a sequencing run.  On an Illumina MiniSeq instrument there can be up to 100M records in a single file.

 

Example FastQ Record

Here is an example of a single FastQ file record:

@MN00537:51:000H2K25G:1:11101:2213:1092 1:N:0:9
CTCCAGTCCTTACTCCCATATCTAACCTCTTACCCCTACNTCATAGGTANACATTTTAATGAAT
+
FFFFFFFFFFFFAFFFFFFFF=FFFFAFFFFFFF/AFFF#FFFFFFFFF#FFFFFFFF

 

Paired-End Reads

There will be two FastQ files generated in an Illumina paired-end reads sequencing run.  The files will have this naming convention:

xxx_R1.fastq.gz

xxx_R2.fastq.gz

where “xxx” is a file prefix and

R1 = file contains “forward” reads

R2 = file contains “reverse” reads

Most downstream data analysis tools automatically recognize the fact that the R1 and R2 files are paired with one other.  Most tools will ask you to import both files at once.  Therefore, it’s important to save both files in the same location for future reference.