SMART amplification combined with cDNA size fractionation in order to obtain large full-length clones
© Wellenreuther et al; licensee BioMed Central Ltd. 2004
Received: 05 March 2004
Accepted: 15 June 2004
Published: 15 June 2004
cDNA libraries are widely used to identify genes and splice variants, and as a physical resource for full-length clones. Conventionally-generated cDNA libraries contain a high percentage of 5'-truncated clones. Current library construction methods that enrich for full-length mRNA are laborious, and involve several enzymatic steps performed on mRNA, which renders them sensitive to RNA degradation. The SMART technique for full-length enrichment is robust but results in limited cDNA insert size of the library.
We describe a method to construct SMART full-length enriched cDNA libraries with large insert sizes. Sub-libraries were generated from size-fractionated cDNA with an average insert size of up to seven kb. The percentage of full-length clones was calculated for different size ranges from BLAST results of over 12,000 5'ESTs.
The presented technique is suitable to generate full-length enriched cDNA libraries with large average insert sizes in a straightforward and robust way. The representation of full-coding clones is high also for large cDNAs (70%, 4–10 kb), when high-quality starting mRNA is used.
Full-length cDNA clones are indispensable tools for functional genomics . cDNA libraries are widely used to identify genes and splice variants and as a physical resource for full-length clones. Unfortunately, cDNA libraries constructed according to conventional methods  contain a high percentage of 5' truncated clones due to the premature stop of reverse transcription (RT) of the template mRNA. This is especially true for large mRNAs and those tending to form secondary structures. In addition, there is a size bias against large fragments inherent in the cloning procedure. For these reasons, large full-length cDNAs are strongly underrepresented in conventional libraries. Several methods have been developed to construct cDNA libraries that are enriched for full-length cDNAs. Most are based on either RNA oligo ligation to the 5' end of mRNA [3, 4], 5' cap affinity selection via eukaryotic initiation factor 4E , or 5' cap biotinylation followed by biotin affinity selection [6, 7]. Common to these methods is that they are laborious and contain several enzymatic steps that must be performed on mRNA. Therefore, they are sensitive to quality loss through RNA degradation. Furthermore, they require high amounts of starting mRNA (5–100 μg depending on method).
Results and Discussion
Generation of size-fractionated full-length enriched cDNA
Polymerase error rate is a major concern in PCR-based library construction techniques. Therefore, it is crucial to perform as few PCR cycles as possible, as each duplication increases the number of introduced errors by a factor of two, assuming a constant error rate of the used polymerase. The Expand™ PCR System we used was tested to have an error rate of 8,5 × 10-6 . Starting with PolyA+ RNA, we could restrict the number of cycles to 12 to 16. Levesque et al., who also combined SMART cDNA amplification with size fractionation, startet with total RNA and did 45 to 47 cycles in total. In contrast to our approach, where amplification follows size fractionation, they did 33 cycles before and 12–14 after fractionation. In their study, the obtained sub-libraries were not analysed for insert size range, instead, they screened them with three gene-specific probes .
Insert size of libraries
In conventionally-constructed libraries, large insert clones are rarely found. This is because very long transcripts often get truncated during cDNA synthesis, and because there is a strong size bias against large fragments inherent in the cloning procedure, i.e. ligation and bacterial transformation. In our strategy, PCR-amplified cDNA size fractions are restriction digested and separately cloned into a plasmid vector to obtain size fraction sub-libraries. To analyse the range of insert sizes within these sub-libraries, clone pools of 5000–10,000 clones were grown in semi-solid agar and plasmid restriction digests of the clone pools were performed. Each sub-library almost exclusively contains inserts within the size range of the corresponding cDNA size fraction that was cloned to produce this sub-library (figure 2, panels A and B). In sub-library 1 for example, most inserts are between 6 and 8 kb. Such inserts are rarely found in conventional libraries.
The full-length enriched cDNA sub-libraries generated according to the protocol described here serve as clone resource for the cDNA sequencing efforts of the German cDNA Consortium http://www.dkfz-heidelberg.de/mga/groups.asp?siteID=48. Within this project, over 100,000 5'ESTs have been generated. All sequences are submitted to public databases and clones are available through the German Resource Center for Genome Research http://www.rzpd.de. To determine the full-length cDNA content of our libraries, 5'ESTs were blasted against human RefSeq sequences according to parameters specified in the Methods section. The total number of hits to known mRNAs were set as 100% and the percentage of clones containing the 5' end of the hit was calculated. Accordingly, full-ORF content was determined by BLAST analysis against the SWISSPROT database.
cDNA size fractionation has been used previously in two studies to enrich cDNA libraries for full-length clones [12, 14]. In both studies, the sub-libraries were not analysed for insert sizes. In consequence, it remains unclear, if the sub-libraries actually contained the expected range of insert sizes. Levesque et al.  also combined the SMART technique with cDNA size fractionation, but did not analyse the overall full-length content, instead, they screened the libraries with three gene-specific probes. Draper et al.  calculated the percentage of full-coding clones in size fractionated libraries from BLAST results of 78 hits in total and down to 3 hits per size range. We calculated the percentage of full-coding clones in the libraries generated according to the presented method from BLAST results of over 12,000 hits in total and between 99 and 3363 hits per size range. The high number of hits for a given size range permit a much more reliable calculation of full-length percentages compared to former studies. Furthermore, because of the large insert size of our sub-libraries, large size ranges can be analysed (up to 10 kb), which had not been analysed before in similar studies [8, 9, 14].
The method presented is attractive for the construction of full-length enriched cDNA libraries with large average insert sizes for several reasons. First, there is no additional enzymatic step for the enrichment, which saves time. Second, it is easy-to-use, as enzymatic steps performed on mRNA, which are necessary in other full-length enriching techniques, are extremely critical in terms of mRNA degradation and quantity loss. Third, the cDNA sizing protocol presented is very efficient and can be performed with basic laboratory equipment. cDNA libraries constructed according to the method presented also yield high full-length percentages for large cDNAs/ORFs when high quality starting mRNA is used.
First strand cDNA was synthesized from 1 μg of mRNA with the "SMART cDNA Library Construction Kit" (Clontech) in a 10 μl reaction according to the manufacturers protocol. In this reaction, a fraction of full-transcribed first strand cDNA molecules but not truncated cDNAs is tagged with a short sequence complementary to the SMART oligo. The SMART oligo sequence (AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG) and the overhang of the oligo(dT) primer (ATTCTAGAGGCCGAGGCGGCCGACATG [dT]30VN) used for first strand synthesis both include a SfiI restriction site. After first strand synthesis, 40 pmol of 5' PCR primer (corresponding to the SMART oligo sequence) was added and first strand cDNA was denatured for 5 min at 95°C. The reaction was cooled to 60°C and second strand reaction mix was added to give a final concentration of 1x PCR reaction buffer (Expand 20 kbPLUS PCR System, Roche), 0.5 mM dNTPs, and 8.3 U/μl Expand 20 kbPLUS enzyme mix (Roche) in a volume of 60 μl. This second strand reaction mixture was incubated for 3 cycles of 15 min 60°C and 15 min 68°C. The second strand reaction was phenol-extracted and cDNA was precipitated from the aquous phase with 1/2 volume 7.5 M ammonium acetate and 2.5 volumes of 100% ethanol. The washed pellet was dried and suspended in 10 μl of water. As a quality control, 1 μl was electrophoresed on an 1% agarose gel.
Size fractionation of cDNA
PCR amplification of cDNA size fractions
One μl of each cDNA fraction was amplified in a 10 μl reaction containing a final concentration of 1x PCR reaction buffer (Expand 20 kbPLUS PCR System, Roche), 0.5 mM dNTPs, 0.5 pmol/μl forward primer (AAGCAGTGGTATCAACGCAGAGT), 0.5 pmol/μl reverse primer (ATTCTAGAGGCCGAGGCGGCCGACATG), and 8.3 U/μl Expand 20 kbPLUS enzyme mix (Roche). To perform manual hot start, the reactions were prepared in two master mixes, one containing buffer and enzyme, the other containing dNTPs, primer, and cDNA. The two master mixes were combined at 92°C. After initial denaturation at 92°C for 3 min, 12–16 cycles (depending on size fraction and second strand cDNA quality and intensity) of 92°C 10 sec and 68°C 14 min were performed. PCR products were analysed on agarose gel and PCR was repeated in a 50 μl volume with 5 μl cDNA and fine-tuned cycle number (i.e. reduced for intensive products and increased for weak signals). Five μl of 50 were analysed on an agarose gel. The remaining reaction was proteinase K digested, phenol extracted, and precipitated.
Cloning and quality control of sub-libraries
The precipitated amplified cDNA was SfiI-digested in a 40 μl volume. The SfiI digest was gel-purified using low-melting agarose and gelase (Epicentre). DNA was suspended in 10 μl water and concentration was determined using the PicoGreen reagent (Molecular Probes). 20 fmol of cDNA was ligated to 10 fmol Sfi-digested pSPORT1_Sfi vector (a modified pSPORT vector having the part of the MCS between KpnI and HindIII exchanged by the corresponding part of the pTriplEx2 MCS, so that it contains SfiI sites). For quality control, 5,000–10,000 clones were grown in semi-solid agar (SeaPrep agarose, BMA), centrifuged, plasmid DNA was extracted from these clone pools, SfiI-digested, and analyzed on an agarose gel. If the quality was satisfactory, 96 single clones were picked and insert analysis was performed as with the clone pools.
Examination of full-length clone content
Libraries were arrayed in 384-well plates and clones were randomly sequenced from the 5' end. 5'ESTs longer than 150 bp were compared to public databases using the BLAST algorithm [15, 16] within the Heidelberg Unix Sequence Analysis Resources (HUSAR; http://genome.dkfz-heidelberg.de/) .
5' ESTs were compared to a human subset of RefSeq  by BLAST (default parameters, except a wordsize of 20 bp was used) to calculate the percentage of full-length cDNA clones. The BLAST outputs were further analysed with the following criteria to find the maximum scoring RefSeq entry: Minimum HSP length of 50 bp, start of HSP within the first 100 bp of 5'EST, end of HSP within the last 15% of 5'EST length, sequence identity within HSP at least 95%. If several HSPs within the same hit fit these criteria, the more upstream match was chosen. A clone was defined as "full length", when the 5'end of the 5'EST was upstream or up to 50 bp downstream of the start of the corresponding RefSeq entry. This last criteria was chosen to take into account the fact that transcription start site is variable for most genes , or even unknown.
To calculate the percentage of full-ORF clones, a BLAST search of the 5'ESTs against the SWISSPROT database  was performed with default parameters. HSPs with a length less than 20 amino acids and sequence identity below 75% were filtered out. A clone was calculated as full-ORF, when the most upstream HSP of the maximum scoring hit contained the first amino acid of the SWISSPROT entry.
- SMART =:
Switching Mechanism At 5' end of RNA
- UTR =:
- ORF =:
open reading frame
- RT =:
- HSP =:
High-scoring Segment Pair
We thank Daniela Heiss and Nina Claudino for excellent technical assistance, clone picking, and management of the libraries, and Patricia McCabe for critical reading of the manuscript.
- Wiemann S, Mehrle A, Bechtel S, Wellenreuther R, Pepperkok R, Poustka A: CDNAs for functional genomics and proteomics: the German Consortium. C R Biol. 2003, 326: 1003-1009. 10.1016/j.crvi.2003.09.036.View ArticlePubMedGoogle Scholar
- Gubler U, Hoffman BJ: A simple and very efficient method for generating cDNA libraries. Gene. 1983, 25: 263-269.View ArticlePubMedGoogle Scholar
- Kato S, Sekine S, Oh SW, Kim NS, Umezawa Y, Abe N, Yokoyama-Kobayashi M, Aoki T: Construction of a human full-length cDNA bank. Gene. 1994, 150: 243-250. 10.1016/0378-1119(94)90433-2.View ArticlePubMedGoogle Scholar
- Suzuki Y, Yoshitomo-Nakagawa K, Maruyama K, Suyama A, Sugano S: Construction and characterization of a full length-enriched and a 5'- end-enriched cDNA library. Gene. 1997, 200: 149-156. 10.1016/S0378-1119(97)00411-3.View ArticlePubMedGoogle Scholar
- Edery I, Chu LL, Sonenberg N, Pelletier J: An efficient strategy to isolate full-length cDNAs based on an mRNA cap retention procedure (CAPture). Mol Cell Biol. 1995, 15: 3363-3371.PubMed CentralView ArticlePubMedGoogle Scholar
- Carninci P, Hayashizaki Y: High-efficiency full-length cDNA cloning. Methods Enzymol. 1999, 303: 19-44. 10.1016/S0076-6879(99)03004-9.View ArticlePubMedGoogle Scholar
- Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M, Shibata K, Sasaki N, Izawa M, Muramatsu M, Hayashizaki Y, Schneider C: High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics. 1996, 37: 327-336. 10.1006/geno.1996.0567.View ArticlePubMedGoogle Scholar
- Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD: Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques. 2001, 30: 892-897.PubMedGoogle Scholar
- Sugahara Y, Carninci P, Itoh M, Shibata K, Konno H, Endo T, Muramatsu M, Hayashizaki Y: Comparative evaluation of 5'-end-sequence quality of clones in CAP trapper and other full-length-cDNA libraries. Gene. 2001, 263: 93-102. 10.1016/S0378-1119(00)00557-6.View ArticlePubMedGoogle Scholar
- Carninci P, Shibata Y, Hayatsu N, Itoh M, Shiraki T, Hirozane T, Watahiki A, Shibata K, Konno H, Muramatsu M, Hayashizaki Y: Balanced-size and long-size cloning of full-length, cap-trapped cDNAs into vectors of the novel lambda-FLC family allows enhanced gene discovery rate and functional analysis. Genomics. 2001, 77: 79-90. 10.1006/geno.2001.6601.View ArticlePubMedGoogle Scholar
- Frey B, Suppmann B: Demonstration of the Expand PCR System's Greater Fidelity and Higher Yield with a lacI-based PCR Fidelity Assay. Biochemica. 1995, 2: 8-9.Google Scholar
- Levesque V, Fayad T, Ndiaye K, Nahe Diouf M, Lussier JG: Size-selection of cDNA libraries for the cloning of cDNAs after suppression subtractive hybridization. Biotechniques. 2003, 35: 72-78.PubMedGoogle Scholar
- Wiemann S, Weil B, Wellenreuther R, Gassenhuber J, Glassl S, Ansorge W, Bocher M, Blocker H, Bauersachs S, Blum H, Lauber J, Dusterhoft A, Beyer A, Kohrer K, Strack N, Mewes HW, Ottenwalder B, Obermaier B, Tampe J, Heubner D, Wambutt R, Korn B, Klein M, Poustka A: Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs. Genome Res. 2001, 11: 422-435. 10.1101/gr.GR1547R.PubMed CentralView ArticlePubMedGoogle Scholar
- Draper MP, August PR, Connolly T, Packard B, Call KM: Efficient cloning of full-length cDNAs based on cDNA size fractionation. Genomics. 2002, 79: 603-607. 10.1006/geno.2002.6738.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Senger M, Flores T, Glatting K, Ernst P, Hotz-Wagenblatt A, Suhai S: W2H: WWW interface to the GCG sequence analysis package. Bioinformatics. 1998, 14: 452-457. 10.1093/bioinformatics/14.5.452.View ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence project: update and current status. Nucleic Acids Res. 2003, 31: 34-37. 10.1093/nar/gkg111.PubMed CentralView ArticlePubMedGoogle Scholar
- Suzuki Y, Taira H, Tsunoda T, Mizushima-Sugano J, Sese J, Hata H, Ota T, Isogai T, Tanaka T, Morishita S, Okubo K, Sakaki Y, Nakamura Y, Suyama A, Sugano S: Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2001, 2: 388-393.PubMed CentralView ArticlePubMedGoogle Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31: 365-370. 10.1093/nar/gkg095.PubMed CentralView ArticlePubMedGoogle Scholar
- Pesole G, Liuni S, Grillo G, Licciulli F, Mignone F, Gissi C, Saccone C: UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Res. 2002, 30: 335-340. 10.1093/nar/30.1.335.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.