We present an improved version of a previously described cSBT assay with dramatic improvements in resolution through the use of additional universal primer binding sites in the HLA class I open reading frame, the addition of a universal DRB amplicon, and preliminary data from additional class II loci amplicons. Our results demonstrate a scalable technique for use on either the Roche/454 GS Junior or the FLX platform with Titanium chemistry.
The use of a cDNA template, combined with clonal second-generation pyrosequencing, allows comprehensive genotyping from only two PCR amplifications for all HLA class I loci, instead of the nine or more exon- and locus-specific amplifications required for genomic DNA template methods (both Sanger and Roche/454-based). Other amplicons, such as those presented for DRB and other class II loci, can be used to achieve high-resolution genotyping as needed. This updated method allows for dozens or hundreds of patient samples to be HLA class I (and DRB or other loci) genotyped to true high-resolution (allele-specific) simultaneously using fewer amplicons per sample, at a low per-sample cost for both large scale screening projects (1,000 s of samples) using a GS-FLX instrument, or smaller (dozens to hundreds of samples) on a GS Junior instrument. Per-sample cost varies depending on the level of multiplexing and other factors, but can be estimated at ~ $70 - $100 for a 48 to 96 sample multiplexed experiment on a GS Junior instrument (Additional file 1: Table S10). Our results also demonstrate that cSBT has higher throughput and higher resolution than existing methods, and does not rely on population-specific allele frequency inferences.
Higher throughput than reported here (>92 samples typed at a time) is possible for the entire pre-emPCR process in a lab setting utilizing liquid handling robotics for RNA extraction, cDNA synthesis, PCR amplification, SPRI purification, quantification, and pooling, similar to the methods employed by Erlich et al. . Despite the need to generate 288 amplicons per 96 samples typed with the cSBT method for class I and DRB loci, this is a dramatic reduction over the >960 amplicons needed to perform comparably accurate typing from genomic DNA starting material using locus- and exon-specific amplification primers [22, 23, 28].
An important caveat to the presented data is the different levels of multiplexing used in the first two genotyping experiments. This limited data is insufficient to establish an “ideal” multiplexing level for cSBT. The first GS Junior sequencing run, which multiplexed 116 amplicons (equivalent to 58 class I samples, 39 class I + DRB, or 15 class I + all class II loci) was highly accurate, and is the current recommended multiplexing level. Investigators who consistently recover abundant data may wish to increase the multiplexing level nearer to the second sequencing run (271 amplicons multiplexed), which, in the presented data, performed nearly as well as the lower multiplexed run when lowering sequence number requirements.
Another consideration is the use of and availability of high-quality RNA, which is necessary for cSBT. Traditionally, HLA typing facilities have collected DNA for use with established assays. DNA has the advantage of being easier to collect and store, whereas RNA requires more robust preservation and extraction methods. The limited access to RNA may not make cSBT ideal for all situations, particularly clinical settings that only have access to poor-quality starting material. In such cases a genomic DNA-based method may be more appropriate . Current studies, underway with collaborators, are evaluating the use of saliva, buccal swabs, or tissue biopsies as RNA starting material. Initial results with saliva starting material have led to successful amplicon generation. However, class II genotyping likely requires the use of whole blood or PBMC as a starting material, since class II transcripts are unlikely to be expressed in other tissues.
This general approach will be adaptable to improving second- and third- generation sequencing techniques that may allow even longer amplicons to be completely sequenced [25, 26]. As longer read-length clonal sequencing technologies become available (such as GS-FLX + from Roche/454 or SMRT sequencing from Pacific Biosciences), it should be possible to use the exon 1 and exon 7 primer sites described here to amplify a single PCR product, instead of the two tiled amplicons that are required currently. We have already successfully amplified and sequenced this ~1 kb amplicon with Roche/454 Titanium sequencing, although the high-quality read lengths were insufficient to provide complete coverage (data not shown). However, the use of a single amplicon increases the likelihood that a subset of alleles with primer incompatibilities could be missed, since a two-tiled amplicon approach produces redundant data with two separate amplifications. Using two class I amplicons acts as an important control to ensure that any allele with primer incompatibilities for a given amplicon is detected with the other, such as the detection of HLA-B*18:01:01:01 in sample HLA-Ref20 despite a primer mismatch for one of the amplicons. In our highly-multiplexed experiment, the loss of one amplicon read direction was more common, even for non-mismatched alleles, and this impacted our ability to distinguish allele resolution, though we were still able to detect signatures of alleles missing from a single read group. The use of a second class I amplicon also facilitates analysis of apparently homozygous loci, which can be artifacts of poorly amplified or potentially novel alleles. It should be noted that although the sequencing platform and read length may change, the cSBT primers and much of the analysis methods presented here would be applicable to any cDNA-based assay.
Methods with more limited coverage than cSBT can lead to incorrect genotype associations. For instance, in our reference panel we were able to determine that two allele calls originally labeled as HLA-C*07:01 were in fact HLA-C*07:18, which encodes a distinct protein. This underscores the power of our approach, which does not assume ambiguously amplified sequences to be the most likely allele in a given population. Additionally, our genotyping results were usually accurate to six-digit resolution, an unprecedented achievement for a high-throughput HLA genotyping assay. While it is currently unknown whether synonymous coding six-digit variants have differential effects on allele expression or disease correlations, it should now be possible to examine this question.
Our preliminary investigations into typing other class II loci have show that the presented primers generate sequencable products, which map to known alleles. However, more work is needed to validate these amplicons properly as we have not determined the cause of observed differences between reference SSOP typing and the presented cSBT typing. We provide this data, and the primers used, so that investigators can carry out further work on these loci if desired.
Two other potential applications of expanded cSBT are novel allele discovery and extending the reference sequence length of known alleles. In the first case, we have shown that cSBT can effectively screen large cohorts for novel alleles, which can then be characterized with traditional and accepted sequencing methods. In the second application, the majority of named HLA class I alleles are known only from exon 2 and 3 sequence (3,439 alleles). Thus, if cSBT was applied to samples with these rare alleles, exons 4, 5, and 6 and parts of exons 1 and 7 could be quickly deduced to improve the length and quality of the IMGT reference database.