Episode 8: How to Read an HLA Report

Today, I’d like to review the HLA typing reports that we generate from every HLA sequencing run that we make. This is a standard report that’s generated with every sample within an HLA typing run. We’ll go through the different fields here and do a quick review of what they mean or how to interpret them.

So obviously, the report has a sample name. That sample name will be from your sample nomenclature. Then there’s a typing result in a spreadsheet format here. The first column shows the different HLA genes that were sequenced in the run. It shows the Class 1 alleles of HLA-A, HLA-B, and HLA-C. Then it shows the Class II alleles. In this case DRB1, DRB4, DRB5, DQB1, DQA1, DPB1, and DPA1. In some runs, we also see a DRB3. It depends on the patient that you’re working with. In this case, it just shows the Class I and Class II alleles in the first column. We typically run 11 loci sequencing runs, although we can do different combinations of the genes. You can do just Class I alleles if you wanted to – HLA A, B, and C, or you could do Class II, or any combination.

There are two columns here showing allele 1 typing and allele 2 typing. Again, we typically run 4-field resolution. In another screencast, we’re going to go over the HLA nomenclature. In the 4-field resolution here, the first field just shows the allele group identifier. That would be this one here, like 02: for HLA-A.

The second field shows the allele id within that group. The third field shows the identifier for synonymous DNA substitution, within the coding region. Then the fourth field shows DNA substitution in the non-coding regions. I won’t go over any more detail on that, we’ll have more information on the HLA nomenclature in a different screencast, but you can see for each HLA gene, it shows the 4-field typing for that allele. In some cases, you might only see three fields, even though we asked for four. This just means the typing algorithm could not get the fourth field resolved or display it and so it doesn’t show up here.

Then we have two more columns, CWD 1 and CWD 2. These refer to the status of common and well-documented alleles. If you see a “C” here, it means that it’s a commonly seen allele within a certain population and that is determined by the IMGT board. Also, we have a “WD”, which means “well documented”. This means there is a substantial amount of literature that is associated with that particular allele. In some cases, we’ll see the identifier will be “CWD” meaning it’s common and well documented. If it says “NO”, that just means that there was no CWD status in the IMGT database.

In some cases, as you see here for a DRB4 and DRB5, the typing algorithm apparently could not make a typing call for allele 2. In that case, it just shows up being blank and then the CWD field is going to be blank corresponding to that allele.

Finally, in this table, we have the review status. Typically this will either say “Not reviewed”, which means that we just generated the report and delivered it to the client. There are two other levels – a first review and a second review level. Sometimes if it says that it has been reviewed at the first level, that simply means we did a quality control or QC check at the client’s request to look at the data to make sure that everything is fine. If it passes our QC tests it will show here as a “First review”.

Then below the typing result block, there is something called allele ambiguities. The ambiguities mean that the typing algorithm could not make an accurate typing call due to high homology between the sequences and the different exons and the HLA genes. You will see it under here under allele ambiguities.

There are three columns – the “Major fields” (fields 1 and 2), the third field, and the fourth field. If we go down here under the major field, the first one that is listed as DRB5 shows a typing result of 01:01:01:01, but it also shows that in the second field it has an ambiguity between 01:40. So it’s either calling it as 01:01 or 01:40 and it could not make an accurate typing call there, so it just calls as allele ambiguity.

Then you’ll see that there are some other ones here down for DRB5 and DWB1. If you go to the 3rd field, we can see there’s a couple of ambiguities there. DQB1 for example shows the typing call as 03:01:01:01 or it shows it as 03:01:41, so there’s an ambiguity in the 3rd field, or 03:01:43 or 0:3:01:44. There’s a 3rd field ambiguity, and that’s why it shows up here in the 3rd field column.

Similarly, for the 4th field, it shows some 4th field ambiguity. The HLA-A typing was first shown as a 02:01:01:01, but it’s also shown as 02:01:01:16, so there’s an ambiguity in the 4th field. Or it could be 02:01:01:31 or 02:01:01:50 – again, all fourth field ambiguities. It also includes the CWD status for each one of those in case you need to read that.

If we then scroll down a little bit there is some metadata that goes with this. It shows the IMGT library or database that was used. In this case, it was IMGT version 3.35.0, and that was a slightly older version. We normally use the most current version of the IMGT library. As soon as a new release comes out we will use that if we possibly can. This sample report was run back when 3.35.0 was the current version.

Then it shows the source files that were used. In this case, these are Illumina short reads (paired-end read files), and these are .gzip compressed FastQ files that were used. You can see the “R1” and “R2” for the forward read the reverse read. The files used for this report will show up here as metadata under the source file. Finally, we show the sequencing platform type, which in this case was Illumina.

That’s it for the HR report and I hope this was helpful in trying to interpret it.

Bioinformatics