Skip to main content

Table 1 Prediction accuracy of PHLAT and other methods in benchmarking datasets

From: Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads

HLA resolution Dataset Read length PHLAT HLAminer HLAforest seq2HLA
Accuracy Accuracy Apparent accuracy Accuracy Accuracy
4-digit HapMap RNAseq 2×37 bp 92.3% 39.8% 43.0% 84.2% ~32%
1000 Genome WXS 2×100 bp 95.0% 55.0% 71.0% 77.0% -
HapMap WXS 2×101 bp 93.3% 53.3% 84.4% 45.6% -
Amplicon seq 2×250 bp 100% 50.0% 55.0% - -
2-digit HapMap RNAseq 2×37 bp 99.1% 71.1% 71.6% 97.3% 97.2%
1000 Genome WXS 2×100 bp 97.0% 83.0% 85.0% 95.0% 90.0%
HapMap WXS 2×101 bp 95.6% 78.9% 88.9% 81.1% 93.3%
Amplicon seq 2×250 bp 100% 95.0% 95.0% - -
  1. The accuracies and apparent accuracies are calculated as described in Methods. The accuracies of the existing methods are taken from their original publications if the datasets were examined therein, otherwise are derived by applying the methods locally (Additional file 1: Table S1 and Additional file 4: Table S4, Additional file 5: Table S5 and Additional file 6: Table S6). The four-digit accuracy of seq2HLA in HapMap RNAseq dataset (~32%) is taken from the main text of its publication [28]. For all other datasets, seq2HLA is applied only at two-digit resolution. The accuracy of seq2HLA predictions is calculated without any p-value threshold. It produces less false negatives and hence higher accuracies than if imposing a p-value cutoff of 0.1 as described earlier [28].