How do you design gene panels?

You are here:
< Back

Gene Panels

Gene panels are a cost effective alternative to whole genome sequencing.  When studying specific disease states or phenotypes, we often know or suspect that certain genes, mutant alleles or genomic regions are the root cause of the disease.  Instead of sequencing entire genomes, we can design highly targeted gene panels that query only specific genomic regions-of-interest (ROI‘s).  This approach can significantly reduce the cost of sequencing, reduce the burden of data analysis and allow for examining a much larger cohort of patients or research subjects.

 

There are numerous bioinformatics tools available for designing gene panels.  However, in this brief note, we’ll walk through the design process using Illumina Design Studio.  This tool generates gene panels that run on Illumina short-read sequencers.  With Design Studio we can create gene panels in a few minutes with an intuitive, easily navigable cloud-based interface.

 

The Design Process

Fig. 1

The first step in the design process is to choose whether you want to create a DNA or RNA panel (Fig. 1).  For our purposes here we’ll choose a DNA panel.

 

Fig. 2

Next, we need to choose the appropriate gene panel assay technology (Fig. 2).  In Design Studio we’re presented with four options:

 

Assay NameApplication
AmpliSeq for Illumina GeneMulti-pool design for targeting genomic regions longer than a single amplicon, where several amplicons are required to cover one particular target.
AmpliSeq for Illumina HotspotSingle-pool design for targeting short genomic regions, such as SNP's and small indels, covered by non-overlapping amplicons in one single pool.
EnrichmentIdeal for larger gene panels with 500 kb - 25 mb total targeted sequence length.
AmpliSeq for Illumina On-DemandDual-pool design for targeting germline CDS regions, with pre-defined, wet-lab tested, 275 bp amplicons.

Table 1

 

As shown in Table 1, for most panels that cover multiple exons, multiple genes or reasonably long genomic regions, we most often choose the AmpliSeq for Illumina Gene assay.  If we’re targeting extremely small genomic regions, such as a handful of SNP‘s or small indels, then we’ll likely choose the AmpliSeq for Illumina Hotspot assay.  In contrast, if we’re targeting an exceptionally large genomic region then we’ll likely choose the Enrichment assay.  Finally, for well-characterized germline regions we’ll choose the AmpliSeq for Illumina On-Demand assay.

 

Fig. 3

The most commonly designed gene panels target the human genome (Fig. 3).  In Design Studio we have the option of choosing the current deposition of the human genome (GRCh38) or an earlier deposition (hg19 = GRCh37). Unless there is some compelling reason to do otherwise, we’ll always choose the current version, which has the most accurately sequenced and annotated genome.  In recent years whole genome sequences have also become available for a variety of species beyond human.  Design Studio shows the available panels under “Extended Species”.

 

Fig. 4

Of course we’ll give our gene panel a meaningful name and fill in the description field (Fig. 4).  In this sample demo we’ll choose the well-studied BRCA1 gene for examination.

 

Fig. 5

The next step is to choose a sample type (Fig. 5).  By default Design Studio provides a “Regular” gene panel, which is the most common type of panel.  FFPE (formalin-fixed, paraffin-embedded) tissue samples often have degraded DNA and require special panel design.  Therefore, Design Studio offers gene panels for FFPE sample types.  cfDNA (circulating free DNA) is typically also degraded DNA and, like FFPE, requires special panel design.

 

For “Regular” sample types we have the option of choosing a maximum amplicon length, although we generally use the default length of 275 bp (base pairs).  375 bp amplicons are only available for the Illumina MiSeq.  Shorter amplicon lengths of 140 bp and 175 bp are available.

 

High StringencyMedium StringencyLow Stringency
Only high stringency primers.Slight reduction in primer stringency that will increase coverage and may increase off-target risk.More relaxed primer stringency that will increase coverage with greater off-target risk.
If coverage is sufficient, this design is the best option.For higher coverage of GC-rich regions.For higher coverage of GC-rich regions and tolerance for off-target risk.

Table 2

Design Studio includes a “Stringency” option that can be used to balance on-target accuracy with on-target coverage (Table 2).  In most cases we choose the default “High” stringency.  However, for GC-rich genomic loci it may be advantageous to choose a lower level of stringency.

 

Fig. 6

The next step is to enter a gene list or genomic ROI’s for the panel (Fig. 6).  If we choose the “Gene” tab, we can enter common gene names in the “Gene Name” field and simply click “Add Gene” to build a gene list. When selecting genes by gene name, Design Studio presents two options:  “CDS only” and “Exon only”.  These options are described below.

 

CDS OnlyExon Only
Limits probe design to coding sequence only. This option excludes the 5' and 3' untranslated regions (UTR's).Limits probe design to coding sequence plus the 5' and 3' untranslated regions (UTR's).

Table 3

The main difference between the two options is that “CDS only” excludes the 5′ and 3′ UTR‘s, whereas “Exon only” includes the 5′ and 3’ UTR’s.  In general, “CDS only” generates fewer amplicons than “Exon only”, so you’ll have to decide if UTR’s are relevant to your research project.

 

Fig. 7

Instead of using gene lists you can specify the exact genomic coordinates, or ROI’s, to cover for the panel (Fig. 7).  This option gives you complete control over which genomic sequences to include in the panel.  For example, you can include individual exons, sets of exons, individual introns, sets of introns, a mix of exons and introns, 5′ and/or 3′ UTR’s, promoter regions, enhancer regions or any other specific genomic sequence or combination of sequences.  In our example gene panel we supply the chromosome name where the BRCA1 gene is located (i.e. chr17), the start and stop genomic loci and give this set of coordinates a name, i.e. BRCA1.

If you need assistance in identifying genomic coordinates, the NCBI Genome Data ViewerUCSC Genome Browser , Congenica, Genomenon, etc. are useful tools for this purpose.

 

Fig. 8

Finally, a third option for specifying genes or ROI’s for a panel is to import an Excel CSV file or BED file that lists the genes/regions (Fig. 8).  Design Studio includes template CSV and BED files to help you start this process.  Note that Design Studio also provides some guidance on chromosome naming formats as these differ among species.

 

Fig. 9

Whether you choose the “Gene”, “Coordinate” or “File” option, Design Studio builds a set of amplicons that cover the chosen gene(s) or ROI’s and displays the number of amplicons required for the panel (Fig. 9).

 

Fig. 10

The final step is to submit your panel design to Design Studio.  Depending on the number of genes or ROI’s, the design process can take a few minutes up to 30 minutes to finish.  Design Studio returns a report about the gene panel and stores the design in a private cloud-based repository for future reference (Fig. 10).  In our sample BRCA1 panel note the high level of coverage (99.61%) across the chosen gene.

 

Next Steps

Although tools like Design Studio are useful for creating gene panels, it still takes skill and experience to generate highly cost effective and useful panels.  Contact us if you’re interested in exploring gene panels for your own research projects.