Comparison of RNA- or LNA-hybrid oligonucleotides in template-switching reactions for high-speed sequencing library preparation
© Harbers et al.; licensee BioMed Central Ltd. 2013
Received: 14 May 2013
Accepted: 24 September 2013
Published: 30 September 2013
Analyzing the RNA pool or transcription start sites requires effective means to convert RNA into cDNA libraries for digital expression counting. With current high-speed sequencers, it is necessary to flank the cDNAs with specific adapters. Adding template-switching oligonucleotides to reverse transcription reactions is the most commonly used approach when working with very small quantities of RNA even from single cells.
Here we compared the performance of DNA-RNA, DNA-LNA and DNA oligonucleotides in template-switching during nanoCAGE library preparation. Test libraries from rat muscle and HeLa cell RNA were prepared in technical triplicates and sequenced for comparison of the gene coverage and distribution of the reads within transcripts. The DNA-RNA oligonucleotide showed the highest specificity for capped 5′ ends of mRNA, whereas the DNA-LNA provided similar gene coverage with more reads falling within exons.
While confirming the cap-specific preference of DNA-RNA oligonucleotides in template-switching reactions, our data indicate that DNA-LNA hybrid oligonucleotides could potentially find other applications in random RNA sequencing.
KeywordsCAGE Template-switching LNA Transcriptome Quantitative sequencing
New generations of high-speed sequencing instruments in combination with dedicated sample preparation protocols can provide results within a few days. We have developed nanoCAGE to identify each transcript by a single sequencing read [1, 2]. The nanoCAGE protocol is most suitable for digital expression profiling with benchtop sequencers as it can use a lower number of reads than any RNA-seq method and avoids normalization of read counts over transcript length. Including a cap-enrichment step, nanoCAGE focuses on sequencing tags from the 5′ end of mRNA that directly indicate expression at defined genomic locations, the transcription start sites (TSS), after mapping to a reference genome.
NanoCAGE libraries are prepared in a two-step process comprising a reverse transcription reaction followed by stepwise PCR. During the reverse transcription reaction a template-switching oligonucleotide is added to directly introduce 5′ adapter sequences for PCR and Illumina sequencing. Further, the 5′ adapter can include different barcodes to facilitate multiplex sequencing . The 3′ adapter sequences needed for PCR and Illumina sequencing are introduced by the reverse-transcription primers and the later PCR. We prefer the use of random primers to cover also non-polyadenylated transcripts in the libraries.
Template-switching allows for cDNA preparation from very small RNA amounts, where even protocols have been published for the analysis of single-cells [4–6]. However, the selectivity for the cap-dependent enrichment of 5′ ends in mRNA can vary: template-switching has been used in cap-specific reactions for the preparation of full-length cDNA libraries  and 5′ RACE experiments , but also unspecific reactions are possible, for example in the preparation of RNA-seq libraries from fragmented RNA . Depending on the intended use of the template-switching reaction, it is therefore very important to find optimal conditions for cap-specific and cap-independent reverse transcription reactions. Commonly RNA-hybrid oligonucleotides are used in template-switching reactions because they provide cap-selectivity [7, 10]. The use of DNA oligonucleotides in template-switching reactions has also been reported by , but it is hard to draw general conclusions from that work, since, there is no comparison with DNA/RNA hybrids, and the target RNA was prokaryotic, and therefore not capped.
DNA/RNA hybrids have inherent disadvantages compared to DNA oligonucleotides such as their higher price and the intellectual property on their use . We therefore investigated alternatives, with the additional goal of increasing the efficiency or the cap specificity of the reaction. Here we compare nanoCAGE libraries prepared using DNA oligonucleotides as well as RNA- and LNA-hybrid oligonucleotides in the template-switching reactions. LNA (Locked Nucleic Acids) are nucleic acid analogs with a higher DNA and RNA binding affinity than classical DNA oligonucleotides . Therefore we tested whether LNA can mimic RNA-like features in template-switching reactions and increase the performance of nanoCAGE library preparation.
Results and discussion
Yields, amplification, and rates of strand-invasion artifacts in libraries made with the RNA- (R3), DNA- (D3), or LNA-based (L3) TS oligonucleotides
qPCR yield (Ct value)
Semisupressive PCR cycles
Strand invasion (%)
9.1 ± 0.7
73.6 ± 3.4
48.6 ± 6.9
38.6 ± 8.1
35.9 ± 4.2
Next we examined whether all nanoCAGE libraries covered the same set of genes First, we compared the number of different loci detected in each library, after normalizing their sequencing depth to a fixed number of mapped tags by random sampling of 30,000 tags per library. A similar number of genes were detected in each set of technical triplicates (Additional file 1: Table S1), where the coefficient of variation was never higher than 8%. After pooling the triplicates 5,414 (R3), 5,025 (L3) and 5,922 (D3) known loci were identified. Second, we intersected the lists of loci and measured that 2,643 loci were common to all pools. This apparently low overlap is typical in digital expression analysis of samples where the distribution of expression levels is scale-free, because most of the detected loci have low counts, and therefore are likely to be missed even in libraries that are similar by design. To illustrate this, we calculated the number of common loci after replacing the L3 and D3 pools by random samplings of R3 with no replacement, and identified 2,818 common loci. This indicates similar gene coverage by all the libraries despite the low sequencing depth, and also underlines that even with less than 100,000 aligned reads, nanoCAGE libraries can cover thousands different loci. Given the similar gene coverage of the libraries, we then calculated gene-wise Pearson correlation coefficients between the libraries after pooling all the tags aligning to the same gene and obtained correlations ranging between 0.87 and 0.99 (Figure 1C). Lastly, we inspected the significantly over- and under-represented tag clusters in all possible pairwise comparisons of the R3, D3 and L3 triplicates after removal of the strand-invasion artifacts (Additional file 2: Table S2). Using edgeR (Robinson et al., 2010), we found that the R3 libraries had 872 and 626 clusters over-represented compared to L3 and D3 respectively (with false discovery rates lower than 0.05). Respectively, 63 and 69% of the clusters had a known gene symbol and visual inspection confirmed that they corresponded to 5′ regions. The most significantly enriched clusters in R3 compared to L3 were signal recognition particle RNAs (7SL RNA). The same enrichment was also significant, although it did not rank as high, when comparing R3 to D3. Concomitantly, 7SL RNA was also found at significantly higher levels in D3 than in L3. This is the main difference between D3 and L3, where we found only 44 differentially represented clusters in total. Visual inspection of the clusters over-represented in D3 or L3 compared to R3 suggest that they are mostly comprised of remaining strand-invasion artifacts that were below the filter’s threshold, or low-complexity regions that were GT- (D3) or GA-rich (L3), which might reflect the sequence of the barcodes used for these libraries. Altogether, the overlap and differential representation studies confirm that the main differences between the libraries are the prevalence of strand invasion and the distribution of tags within gene models.
To examine whether the specificity of template-switching is dependent on temperature , we varied in the next experiment the temperature of the reverse transcription from 37 to 57°C. To further confirm the robustness of our observations across different RNA samples, we prepared these libraries from total HeLa cell RNA. High temperatures of 52 and 57°C caused the formation of large numbers of oligonucleotide artifacts (tag dust) with all types of oligonucleotides (42 ± 11% of all the reads, Additional file 1: Table S1). The annotation of the aligned reads also indicated that lower temperatures at 37°C or 42°C are preferable for RNA and LNA mediated template-switching reactions (Figure 1D). The experiment confirmed with a human RNA source that shifting from RNA to LNA reduces the promoter rate while increasing the coverage of tags mapping within transcripts regardless of the temperature. We explain the higher number of tags aligning within known regions in this experiment compared with the previous one using rat muscle RNA by the better annotation of the human genome in comparison with the rat genome.
Our results indicate that the cap-specificity of template-switching reactions depends on the nature of the nucleotides interacting with the first-strand cDNA during the template-switching reaction. Interestingly, RNA nucleotides give the highest cap specificity. While DNA/LNA duplexes are also expected to be more stable than DNA/DNA duplexes  they showed clearly reduced cap specificity as compared to RNA nucleotides. Therefore, the binding affinity alone does not explain the cap specificity during template-switching reactions. The yield of template-switched cDNAs was also higher with oligonucleotides ending with R3 compared with D3 and L3 (Table 1). We therefore hypothesize that the RNA template-switching oligonucleotides are more efficient with capped templates. A key difference is that capped templates induce the reverse-transcriptase to extend the first-strand cDNA with cytosines because it reverse-transcribes the cap [16–18], while with non-capped templates, extensions are more rare , or of a different nature [19, 20]. We speculate that only the presence of DNA/RNA base pairs facilitate template-switching at capped ends, and that DNA-DNA, DNA-LNA and DNA-RNA oligonucleotides have the same efficiency on blunt RNA/cDNA hybrids. The higher promoter rate with R3 would therefore reflect an additional production of template-switched cDNAs from capped templates. This dilution of the non-cap-specific signal would also explain the lower proportion of strand invasion in R3.
DNA oligonucleotides have the strongest tendency for strand-invasion, found in almost 75% of the alignments, which makes them impractical for cap selection. A careful assessment of the impact of these strand-invasion artifacts would be necessary before using DNA oligonucleotides for expression profiling. In addition, a large number of the remaining reads did not align in known genes, resulting in a net loss of usable reads for digital expression analysis and for template-switching-based methods in general. Compared to DNA-based libraries, RNA and LNA-based libraries had a deeper coverage over gene bodies, which in the case of LNA comprises mostly non-promoter, exonic signal (Figure 1A). While no longer providing the promoter/TSS specificity of nanoCAGE, LNA-mediated template-switching could become an option in the preparation of digital expression libraries or revised versions of RNA-seq protocols . Further work would be needed to assess whether LNA-mediated template-switching could be preferable for digital expression profiling starting from partially degraded RNA samples, where the cap structure of many transcripts has been lost.
All nanoCAGE libraries were prepared in technical triplicates according to . To simplify the library preparation, we used only two different numbers of PCR cycles (Table 1), instead of optimizing for each sample. As a consequence, the number of reads per sample was not optimally balanced. Another possible reason may be that we pooled the barcoded samples (see below and Additional file 1: Table S1) by using 80 ng of an equimassic mixture (not equimolar) of the cDNA PCR products, as templates for the Library PCR (See Salimullah et al., 2011  for details). Nevertheless, this does not affect our analysis since we either normalized the total number of reads, or used methods that are robust to variations of sequencing depth.
The sequences of the template-switching oligonucleotides sequence were as follows, where bases are desoxynucleotides except signaled by a lower case “l” for LNA and “r” for RNA, and where XXXXXX is a sample barcode of sequence CACTGA, GCTCTC or TCGCGT for R3; ATCGTG, CACGAT or GTATAC for D3; ACAGAT, CTGACG or GAGTGA for L1; AGTAGC, GCTGCA or TCGAGC for L2 and ATCATA, CGATGA or TATAGC for L3. The 8 N bases are a unique molecular identifier [21, 22], which was not used in this analysis.
Reads were sequenced on a MiSeq instrument (Illumina, USA) version 1 loaded at 10 or 12 pM, using single reads of length 70 or 64 nt, and the standard nanoCAGE sequencing primer . The sequences were analyzed on a Debian system  version 6. Barcodes were extracted, linkers were removed, and reads were trimmed to 31 nt with FASTX-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Reads resembling empty constructs or oligonucleotide artifacts were removed with TagDust . Reads were then aligned to the rat genome (5.0) and the human genome (hg19) with BWA 0.6.2 . Reads matching the reference ribosomal sequence V01270 (rat) U13369.1 (human) were flagged “failed” with the rRNAdust program, an efficient implementation of Myers’ bit parallel dynamic programming algorithm  using both SIMD instructions and threads (written by T. Lassmann). Adding the mitochondrial rRNA (NCBI gene ID 170603) to the filter would cause the removal of approximately one third more sequences. The alignments were post-filtered to remove strand-invasion artifacts as in . The CAGE tags were then clustered following the same principles as in , with no expression threshold. The filtering and clustering programs are available in the Additional file 3, with example scripts reproducing the digital expression comparison of the R3, L3 and D3 libraries. The genome annotations were retrieved from ENSEMBL’s biomart server (rat build 70) or the UCSC genome browser (human) and we used custom scripts to extract the coordinates of promoters (which we defined as +/− 100 nt flanking the start site of transcript models). Each CAGE tag cluster was annotated against this reference data using BEDTools . The resulting data was mined and processed using R (http://www.r-project.org/). Subsamplings were done with the ‘vegan’ package and graphs were prepared with the ‘ggplot2’ package and assembled with Inkscape (http://inkscape.org/).
Availability of supporting data
The FASTQ files are being deposited to DDBJ and are also available at http://genome.gsc.riken.jp/plessy-20130430/NCms10010.fastq.xz (rat muscle) and http://genome.gsc.riken.jp/plessy-20130430/NCms10013.fastq.xz (HeLa). The CAGE tags were then clustered using scripts available at http://genome.gsc.riken.jp/plessy-20130430/PromoterPipeline_20130430.tar.gz and filtered using the rRNAdust program available at http://genome.gsc.riken.jp/plessy-20130430/rRNAdust_1.02.tar.gz and the script available at http://genome.gsc.riken.jp/plessy-20130430/find_strand_invasion-20130307.pl. A digital expression comparison of the R3, L3 and D3 libraries is available at http://genome.gsc.riken.jp/plessy-20130430/analysis.html and its source is available at http://genome.gsc.riken.jp/plessy-20130430/analysis.Rmd.
Cap analysis gene expression
Locked nucleic acid.
We thank K.K. DNAFORM (Yokohama, Japan) for the kind gift of the rat skeleton muscle RNA sample. This work was supported by a research grant from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) for the RIKEN Omics Science Center to Yoshihide Hayashizaki, and by a research grant from MEXT to the RIKEN Center for Life Science Technologies.
RIKEN Omics Science Center ceased to exist as of April 1st 2013, due to RIKEN reorganization.
- Plessy C, Bertin N, Takahashi H, Simone R, Salimullah M, Lassmann T, Vitezic M, Severin J, Olivarius S, Lazarevic D: Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods. 2010, 7 (7): 528-534. 10.1038/nmeth.1470.PubMed CentralView ArticlePubMed
- Salimullah M, Sakai M, Plessy C, Carninci P: NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb Protoc. 2011, 2011 (1): pdb prot5559-PubMed CentralView ArticlePubMed
- Tang DT, Plessy C, Salimullah M, Suzuki AM, Calligaris R, Gustincich S, Carninci P: Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching. Nucleic Acids Res. 2012, 41 (3): e44-PubMed CentralView ArticlePubMed
- Fan JB, Chen J, April CS, Fisher JS, Klotzle B, Bibikova M, Kaper F, Ronaghi M, Linnarsson S, Ota T: Highly parallel genome-wide expression analysis of single mammalian cells. PLoS One. 2012, 7 (2): e30794-10.1371/journal.pone.0030794.PubMed CentralView ArticlePubMed
- Islam S, Kjallquist U, Moliner A, Zajac P, Fan JB, Lonnerberg P, Linnarsson S: Highly multiplexed and strand-specific single-cell RNA 5' end sequencing. Nat Protoc. 2012, 7 (5): 813-828. 10.1038/nprot.2012.022.View ArticlePubMed
- Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC: Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat biotechnol. 2012, 30 (8): 777-782. 10.1038/nbt.2282.PubMed CentralView ArticlePubMed
- Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD: Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques. 2001, 30 (4): 892-897.PubMed
- Pinto FL, Lindblad P: A guide for in-house design of template-switch-based 5' rapid amplification of cDNA ends systems. Anal Biochem. 2010, 397 (2): 227-232. 10.1016/j.ab.2009.10.022.View ArticlePubMed
- Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008, 5 (7): 613-619. 10.1038/nmeth.1223.View ArticlePubMed
- Chenchik A, Zhu Y, Diatchenko L, Siebert P: Methods and compositions for full-length cDNA Cloning using a template-switching oligonucleotide. 1997, U.S. patent number 5962272
- Jepsen JS, Sorensen MD, Wengel J: Locked nucleic acid: a potent nucleic acid analog in therapeutics and biotechnology. Oligonucleotides. 2004, 14 (2): 130-146. 10.1089/1545457041526317.View ArticlePubMed
- Li H, Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010, 26 (5): 589-595. 10.1093/bioinformatics/btp698.PubMed CentralView ArticlePubMed
- Lassmann T, Hayashizaki Y, Daub CO: TagDust–a program to eliminate artifacts from next generation sequencing data. Bioinformatics. 2009, 25 (21): 2839-2840. 10.1093/bioinformatics/btp527.PubMed CentralView ArticlePubMed
- Ouhammouch M, Brody EN: Temperature-dependent template switching during in vitro cDNA synthesis by the AMV-reverse transcriptase. Nucleic Acids Res. 1992, 20 (20): 5443-5450. 10.1093/nar/20.20.5443.PubMed CentralView ArticlePubMed
- Natsume T, Ishikawa Y, Dedachi K, Tsukamoto T, Kurita N: Hybridization energies of double strands composed of DNA, RNA, PNA and LNA. Chem physics letters. 2007, 434 (1–3): 133-138.View Article
- Hirzmann J, Luo D, Hahnen J, Hobom G: Determination of messenger RNA 5'-ends by reverse transcription of the cap structure. Nucleic Acids Res. 1993, 21 (15): 3597-3598. 10.1093/nar/21.15.3597.PubMed CentralView ArticlePubMed
- Ohtake H, Ohtoko K, Ishimaru Y, Kato S: Determination of the capped site sequence of mRNA based on the detection of cap-dependent nucleotide addition using an anchor ligation method. DNA Res. 2004, 11 (4): 305-309. 10.1093/dnares/11.4.305.View ArticlePubMed
- Lavie L, Maldener E, Brouha B, Meese EU, Mayer J: The human L1 promoter: variable transcription initiation sites and a major impact of upstream flanking sequence on promoter activity. Genome Res. 2004, 14 (11): 2253-2260. 10.1101/gr.2745804.PubMed CentralView ArticlePubMed
- Oz-Gleenberg I, Herzig E, Hizi A: Template-independent DNA synthesis activity associated with the reverse transcriptase of the long terminal repeat retrotransposon Tf1. Febs J. 2012, 279 (1): 142-153. 10.1111/j.1742-4658.2011.08406.x.View ArticlePubMed
- Mohr S, Ghanem E, Smith W, Sheeter D, Qin Y, King O, Polioudakis D, Iyer VR, Hunicke-Smith S, Swamy S: Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing. Rna. 2013, 19 (7): 958-970. 10.1261/rna.039743.113.PubMed CentralView ArticlePubMed
- Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J: iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol. 2010, 17 (7): 909-915. 10.1038/nsmb.1838.PubMed CentralView ArticlePubMed
- Kivioja T, Vaharautio A, Karlsson K, Bonke M, Enge M, Linnarsson S, Taipale J: Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. 2012, 9 (1): 72-74.View Article
- Moller S, Krabbenhoft HN, Tille A, Paleino D, Williams A, Wolstencroft K, Goble C, Holland R, Belhachemi D, Plessy C: Community-driven computational biology with Debian Linux. BMC Bioinformatics. 2010, 12 (11): S5-View Article
- Myers G: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J ACM. 1999, 46 (3): 395-415. 10.1145/316542.316550.View Article
- Suzuki H, Forrest AR, van Nimwegen E, Daub CO, Balwierz PJ, Irvine KM, Lassmann T, Ravasi T, Hasegawa Y, de Hoon MJ: The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat gene. 2009, 41 (5): 553-562. 10.1038/ng.375.View Article
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26 (6): 841-842. 10.1093/bioinformatics/btq033.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.