Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads

BMC Genomics

Table 1 Prediction accuracy of PHLAT and other methods in benchmarking datasets

HLA resolution	Dataset	Read length	PHLAT	HLAminer		HLAforest	seq2HLA
HLA resolution	Dataset	Read length	Accuracy	Accuracy	Apparent accuracy	Accuracy	Accuracy
4-digit	HapMap RNAseq	2×37 bp	92.3%	39.8%	43.0%	84.2%	~32%
	1000 Genome WXS	2×100 bp	95.0%	55.0%	71.0%	77.0%	-
	HapMap WXS	2×101 bp	93.3%	53.3%	84.4%	45.6%	-
	Amplicon seq	2×250 bp	100%	50.0%	55.0%	-	-
2-digit	HapMap RNAseq	2×37 bp	99.1%	71.1%	71.6%	97.3%	97.2%
	1000 Genome WXS	2×100 bp	97.0%	83.0%	85.0%	95.0%	90.0%
	HapMap WXS	2×101 bp	95.6%	78.9%	88.9%	81.1%	93.3%
	Amplicon seq	2×250 bp	100%	95.0%	95.0%	-	-

The accuracies and apparent accuracies are calculated as described in Methods. The accuracies of the existing methods are taken from their original publications if the datasets were examined therein, otherwise are derived by applying the methods locally (Additional file 1: Table S1 and Additional file 4: Table S4, Additional file 5: Table S5 and Additional file 6: Table S6). The four-digit accuracy of seq2HLA in HapMap RNAseq dataset (~32%) is taken from the main text of its publication [28]. For all other datasets, seq2HLA is applied only at two-digit resolution. The accuracy of seq2HLA predictions is calculated without any p-value threshold. It produces less false negatives and hence higher accuracies than if imposing a p-value cutoff of 0.1 as described earlier [28].

ISSN: 1471-2164