A common project type we see at The Sequencing Center is researchers looking to identify what types of organisms are in their samples. Metagenomic sequencing provides these researchers the ability to sequence a general environmental sample, run it through a sequencing device, and parse out the organisms present.
The field of metagenomics is increasing in popularity for many fields including, crop sciences, environmental remediation, and ecological studies. For example, a large-scale agriculture operation may be having issues producing standard crop yields due to potential issues with bacteria in the soil. As a researcher, you would be able to take a soil sample from the fields and perform a metagenomic sequencing run to analyze and better understand what organisms are present.
While the sample type is different than what is considered a “standard sample”, the process of sequencing is actually very similar. However, the main difference lies within the post-sequencing data analysis. A typical sequencing run is a 1:1 sequence alignment against a reference genome with the output generating an aligned genome and variant file. In metagenomics, we are likely required to align many organisms against multiple genomes and have multiple variant output files. A standard bioinformatics pipeline won’t work in this case.
Metagenomic Bioinformatics Pipeline
Fortunately, there is a brilliant bioinformatics pipeline called Kraken which provides a rapid solution to assigning taxonomic labels to a general metagenomics sequencing run. The pipeline is based on the k-mers alignment techniques, which is a golden standard in the field. For a typical run, a Kraken analysis usually takes an hour or two to produce an output file that is composed of species identification at the strain level.
With the Kraken pipeline, researchers can produce species level identification and raw genome data for each identified species. However, it’s worth noting that Kraken does not automatically produce variant files. If you’re is looking for what species are present in a sample and nothing more, Kraken by itself is the best pipeline. Should you need to produce the variant files, you will need to take the identified species genomes and reference them against a highly annotated genome.
For that, there is a unique database called Patric which houses one of the most comprehensive bacteria and archaea datasets. By performing a BLAST query to find regions of similarity between the sequenced metagenomic species and the genomes within Patric, we can generate a variant analysis giving researchers that added depth of data.
In all, the pipeline is relatively straightforward and takes approximately 2-3 hours to fully do the analysis. To recap, for a comprehensive analysis there are 3 steps:
- Perform a metagenomic sequencing run
- Utilize Kraken bioinformatics pipeline for taxonomic labeling
- Perform a BLAST query of identified species against respective genome in Patric
If you’re a researcher interested in metagenomic sequencing and are looking for a company to partner with, then get in touch with us for a free consultation on your project.