A hot area for research is antibiotic resistant genes within bacteria. It’s a fairly complicated subject with many challenges in analyzing these types of genes as it requires properly structured data that can be hard to find. In this post, we’ll cover a simple way to analyze antibiotic resistant genes within your dataset.
Back to Basics: What is antibiotic resistance?
In general, antibiotic resistance happens when an antibiotic can no longer effectively control or kill bacteria growth within an organism. This can happen for various reasons but one of the primary reasons is that the bacteria has genes that have mutated in such a way that antibiotics do not effect it.
Why is this research important?
Bacteria can be harnessed in different ways for a lot of good. However, they can also be very harmful and even deadly to humans. We have many drugs, such as amoxicillin and tetracycline, that can kill bacteria with ease. Unfortunately for us, bacteria is well equipped at evolving rapidly and can quickly become resistant to therapeutic measures if antibiotics are used too frequently. This makes it much harder to treat human bacterial infections. In 1995, a study conducted by the US Office of Technology Assessment estimated an economic impact of $1.3 billion dollars (not inflation adjusted) due to just six different types of bacteria. This study was also conducted within the USA alone and the global figure remains unknown, although likely much higher.
Getting to the Analysis State: Data Conversion
Often times, researchers and scientists want to jump to the end state where they just blast their file against a database and out comes the answers. Unfortunately, this isn’t the case and there is a bit of complexity involved. The first step is to convert your data into a state that allows you to compare your dataset against the database.
First, you’ll need to make sure that your raw sequencing file is in a FASTQ format. If you ran your sample through an Illumina DNA sequencer then it’s likely the data is already in this format.
Next, you’ll need to convert your FASTQ file into a contig file. To do this, you can use SPAdes Genome Assembler. After installing, you’ll want to follow their user guide on assembling long Illumina paired reads. To run the software and perform the conversion, you’ll need to run the following command from Python Shell:
spades.py -k 21,33,55,77 --careful <your_reads_file> -o <output_directory>/scaffolds.fasta
What this command effectively does is converts your sample FASTQ file into a scaffolded FASTA file, which we can then use for blasting against different databases. More specifically, this pipeline first converts the sequence reads into k-mers (that is where the -k 21, 33, etc. number comes into play), then converts the k-mers data into a contig file.
Note: The SPAdes user manual will be your best friend should you get lost.
Performing the Comparison for Analysis Data
Following the file conversion, we’re ready to perform a BLAST query against antibiotic resistant gene databases. For this, we’re going to be searching against the Comprehensive Antibiotic Resistant Database – or CARD for short.
To start, click on “Analyze” from the main screen. Next, choose “BLAST” which will perform a standard BLAST query against the database. Next, copy and paste your FASTA format file and select your flavor of BLAST (typically you’ll choose BLASTP for protein sequences). If you have a specific BLAST query that you want to perform, you do have the option of uploading a file however we typically don’t use this. Next, confirm that you’re not a robot and hit “Submit”.
Should there be any hits from the query, you’ll be able to download the hit file which will contain information, most notably the ARO_Name (eg. aminocoumarin resistant alaS) and ARO_Category (eg. antibiotic inactivation enzyme).
There we have it! A pretty straightforward and simple way to grabbing bacterial data and analyzing it against an antibiotic resistant gene database.
Want us to perform the analysis for you? Not a problem!