Non-coding antisense transcription detected by conventional and single-stranded cDNA microarray
© Vallon-Christersson et al; licensee BioMed Central Ltd. 2007
Received: 05 March 2007
Accepted: 29 August 2007
Published: 29 August 2007
Recent studies revealed that many mammalian protein-coding genes also transcribe their complementary strands. This phenomenon raises questions regarding the validity of data obtained from double-stranded cDNA microarrays since hybridization to both strands may occur. Here, we wanted to analyze experimentally the incidence of antisense transcription in human cells and to estimate their influence on protein coding expression patterns obtained by double-stranded microarrays. Therefore, we profiled transcription of sense and antisense independently by using strand-specific cDNA microarrays.
Up to 88% of expressed protein coding loci displayed concurrent expression from the complementary strand. Antisense transcription is cell specific and showed a strong tendency to be positively correlated to the expression of the sense counterparts. Even if their expression is wide-spread, detected antisense signals seem to have a limited distorting effect on sense profiles obtained with double-stranded probes.
Antisense transcription in humans can be far more common than previously estimated. However, it has limited influence on expression profiles obtained with conventional cDNA probes. This can be explained by a biological phenomena and a bias of the technique: a) a co-ordinate sense and antisense expression variation and b) a bias for sense-hybridization to occur with more efficiency, presumably due to variable exonic overlap between antisense transcripts.
Non-coding RNAs have recently been reported as more common, more diverse, and accredited more important functions than previously anticipated [1–3]. Among the most abundant non-coding transcripts, there is a group called natural antisense transcripts (NATs) that carries regions of perfect complementarity to protein coding (sense) RNAs [4–7]. In silico studies of available transcript sequence data have found that up to 24% of human protein coding loci also encode cis-NAT s [8, 9]. However, antisense transcripts tend to be poly(A) negative and nuclear localized . If this is true, the abundance of NAT s (cis and trans) may be higher yet, since nuclear non-polyadenylated transcripts are underrepresented in transcript sequence databases.
This fact may have important implications for researchers, not only because of their potential biological function but they may also turn out to be influential on the interpretation of large experimental data sets. For instance, the cDNA microarray technique has been used in genome-wide expression studies to address basic questions about gene function and in the pursuit of a more precise molecular classification of tumors. In this case, the ability to monitor the expression of thousands of genes simultaneously has allowed the identification of disease-specific subsets of genes useful to improve diagnosis and disease management . The majority of the more than 90.000 microarray expression profiles released through NCBI was obtained with double stranded cDNA capture probes and is assumed to reflect the pure expression of the sense transcripts used as templates for cDNA synthesis. However, the widespread expression of natural antisense transcripts (NAT s) invalidates this assumption since double-stranded probes will show the combined expression of both the intended sense target and any NAT with complementary sequence [12, 13]. Still, for nine out of ten cases, signals from double-stranded cDNA probes correlates with those obtained from sense specific oligonucleotide platforms . Based on these observations, we reasoned that antisense transcripts are either not efficiently detected by conventional cDNA capture probes or that important information must be hidden behind this paradox.
Therefore, we modeled a typical cDNA microarray tumor-classification analysis and compared the results from conventional double-stranded cDNA capture probes with single stranded cDNA capture probes capable of monitoring opposite strands of each cDNA independently. We detected a number of antisense signals that exceed by far the number of known antisense transcripts. The detected signals showed a clear cell specific expression pattern with a common core group of antisenses expressed in all analyzed materials. Moreover, antisense transcripts displayed a prevalent tendency to be positively correlated with the expression of their corresponding sense counterparts. This confirms the idea that a large part of the data obtained from conventional double-stranded cDNA microarrays are in fact compounded signals product of both sense and antisense hybridization. Yet, detection of antisense transcription by conventional double-stranded cDNA microarrays does not strongly distort the relationship between expression profiles of the analyzed samples compared with those obtained from pure sense signals. This is most likely due to the observed coordinate regulation of senses and antisenses and a more efficient hybridization of sense strands because a different exon structure of antisense transcripts and the sense transcripts used for cDNA synthesis.
Results and discussion
Production of single-stranded microarrays
Strand-specific cDNA microarrays
Antisense detection of N probes hybridized with total RNAs
Next, we modeled an experimental design typically used for tumor classification studies. We performed hybridizations using direct-labeled randomly primed total RNA extracted from eight breast-cancer cell lines and one derived from normal breast epithelium against Universal Human Reference RNA as a common reference following standard protocols (see Methods). To control for even processing into strand-specific probes we printed internal β-lactamase DNA control spots in each block and included β-lactamase spike-in sense and antisense control targets in either samples or reference. Their hybridization signals confirmed that detected N signals reflect hybridization to the N probe over the entire array (Figure 2c).
Hybridization signals detected in sample channel and reference channel
Detected N signals
Masked IMAGE ID
Putative antisense signals
repeats, double, or self++
% paired N signal$$$$
Relationship between C and N signals
Single-stranded compared to conventional double stranded cDNA arrays
To test this conjecture, we treated C, N and CN profiles from each cell line as part of an independent experiment. Expression data from each probe set were median-centered separately and their expression profiles compared by cluster analysis (Figure 5a). All cell lines segregate in a similar manner, independent of the nature of the probe. Moreover, the relationship between expression patterns detected by C, N and CN capture probes for each one of the nine samples was similar for all of them. C and CN segregated consistently close to each other leaving N probes as outliers. We observed a closer distance between C and CN expression than between either of these profiles and N, across all performed experiments. This could be explained by the fact that N probes are derived from sense transcripts cDNAs and are not a mirror copy of NAT s. Therefore, mature antisense transcripts and N probes would not be perfectly complementary but, assuming that all sense and antisense immature transcripts undergo splicing, will display restricted regions of exon overlap. This situation should favor the hybridization dynamics of C probes since renaturation kinetic is directly dependent on the length of the complementarity with the immobilized probes .
In spite of stringent signal-to-noise ratio criteria, our strand-specific cDNA microarrays detected a number of antisense transcripts that exceeds by far the number of previously annotated antisense genes in humans. Partial validation and observed well defined expression patterns suggest that a considerable fraction of these signals might represent bona-fide unannotated antisense transcripts (both cis- and trans). These NAT s are expressed in a cell specific manner and displayed a strong tendency to follow the expression pattern of their sense counterparts.
Since antisense transcription data is embedded in double stranded cDNA array experiments, it is expected to affect signals and gene clusters, and would make data validation difficult of array results by other means. We analyzed this issue and found that even if antisense transcription is genome-wide it exerts a restricted influence on the interpretation of conventional cDNA microarray data. Today, these problems can be circumvented in well-annotated genomes because strand specific expression can be discriminated by the use of oligonucleotide capture probes. However, strand-specific oligo design would be hampered today by the limited access to antisense sequence data, as shown here. Although reassuring the overall validity of cDNA microarrays in previous tumour-classification studies, our results emphasize the need for further development of methods that accurately measure strand-specific expression.
Preparation of capture probes
To produce cDNA arrays, 960 full-length cDNA clone inserts from the MGC collection  (plates IRAT3, IRAT33, IRAU2, IRAU19, IRAU31, IRAU44, IRAU46, IRAU62, IRAU68 and IRAU71) were amplified three times using different combinations of 5'-amino modified M13 universal sequencing primers (GTTGTAAAACGACGGCCAGTG forward and CACACAGGAAACAGCTATG reverse). One reaction set used 5'-amino C-6 link modified M13 forward paired with a non-modified reverse primer. A second reaction used 5'-amino C-6 link modified M13 reverse primer paired with a non-modified forward primer. A third reaction used 5'-amino C-6 link modified primers in both orientations to produce double-stranded probes. Amino-modified M13 forward primers were used to produce sense (coding) strand probes and M13 reverse primers to produce non-coding antisense strand probes in all reactions except those corresponding to plate IRAU3 (cloned into pCMV-SPORT6 with the cloning site in opposite orientation). To produce capture probes for the test array used in Figure 1, a one kilo-base (kb) fragment from the β-lactamase gene was amplified using the same procedure but with plasmid DNA as template and primers GGCACCTATCTCAGCGATCT and GCGGAACCCCTATTTGTTTA.
After amplification, all PCR products were precipitated with ethanol and resuspended in phosphate buffer at approximately 100 ng/μl for further deposition on arrays. Triplicates of sense and antisense amino-modified β-lactamase PCR products were also deposited on each of the blocks of the cDNA array to monitor the processing of double-stranded DNA to single-stranded (ssDNA).
Amino-modified PCR products were spotted onto CodeLink activated slides (Amersham) using a MicroGrid II array robot (BioRobotics). Capture probes on slides used in Figure 1 were printed as 10 × 10 replicates. The test slides also included background control spots produced by processing PCR reactions without template (PCR primers with no amplification product).
After printing, slides were coupled in a saturated NaCl chamber overnight at room temperature, blocked in 50 mM ethanolamine, 0,1 MTris (pH 9) at 50°C 30 min followed by washes in distilled water and 4 × SSC/0.1% SDS at 50°C. The deposited dsDNA was subsequently digested in situ with T7 exonucelase 6. Slides were overlaid with 45 μl exonuclease reaction mixture (1 U/μl T7 exonuclease 6 in 1× reaction buffer, New England Biolabs), covered with a coverslip and left to incubate for 30 minutes at 25°C in a CMT Corning hybridization chamber. Following in situ digestion, probes were denatured by immersing the arrays in boiling water for two minutes.
To produce targets for hybridization of the test array shown in Figure 1, the T7 RNA polymerase promoter was incorporated by PCR in the sense or antisense orientation relative to the β-lactamase gene, and corresponding transcripts were synthesized by in vitro transcription. For the cDNA arrays, total RNAs were extracted from cell lines ZR-75-30, BT-474, SKBR-3, MDA-361, UACC-812, UACC-893, CAMA1, MDA453 and MCF-10 with Trizol (Invitrogen), purified with Qiaex (Qiagen), and integrity was checked on a Bioanalyzer (Agilent). Targets of in vitro transcribed β-lactamase, total RNA from cell lines, or Human Reference RNA (Stratagene) were direct labeled by random priming (Promega, Pronto). Spike-in controls were included together with the cell line hybridization mixtures. A 1 ng spike-in of 5' amino-modified 50-mer DNA oligos with sequence complementarity to either the sense or antisense strand of the β-lactamase fragment were incubated in 0.3 M hydroxylamine at room temperature with monofunctional Cynine5 or Cynine3 reactive dyes (Amersham). The labeled long-mers were mixed in each breast cell line targets (Cy3 label) or Human Reference RNA (Cy5 label). A similar labeling method was used for the M13 forward and M13 universal sequencing primers used to validate the single stranded nature of the array (Figure 2a–b). All hybridizations were performed in 4 × SSC, 0.1% SDS, with human 0.5 μg/μl Cot-1 DNA at 42°C overnight. Washes following hybridization were three times 4 × SSC RT, twice 2 × SSC/0.1% SDS at hybridization temperature, one time 0.2 × SSC and finally one time 0.1 × SSC RT. Slides were dried by centrifugation and scanned.
Identification of potential non-specific cross-hybridization
Sequence information for the MGC clone inserts was retrieved from the MGC home page . Repeat sequences were identified using RepeatMasker (A.F.A. Smit, R. Hubley & P. Green, unpublished data). Clone inserts with repeats for which the product of repeat length and similarity to the consensus sequence [1 – divergence] was greater than 40 were identified as repeat containing. Self-complementarity of strands was identified by aligning the reverse complement of the clone insert sequence against itself (NCBI blast; word size = 7, e-value cut-off = 1000). Sequences with self-complementary matches of 30 basepairs or more (> 79 % identity) were identified as self-complementary. Clone insert sequences and their reverse complement for all cDNAs were also aligned against the RefSeq mRNA database to identify sequences for which both the sense and antisense matched the same RefSeq mRNA over at least 30 basepairs with > 80 % identity (NCBI blast, word size = 7, e-value cut-off = 1000).
Hybridized arrays were scanned using an Axon 4000A scanner (Axon Instruments). Acquired TIFF images were analyzed and individual spots were flagged as not found, found, or bad, in GenePix Pro 4 (Axon Instruments). The quantified data matrix was saved as a GenePix Results File (gpr) and loaded into a local installation of BioArray Software Environment (BASE) . Subsequent pre-processing steps, within slide normalization, data filtration, and transformations were performed with in BASE. Median foreground pixel intensities for spots were adjusted by subtracting median background pixel intensities. Spots flagged as not found or bad during image analysis or considered saturated (containing more than 5% saturated pixels in either signal) were removed. Data within arrays were normalized to the median log2 ratio of sample intensity to reference intensity. Median log2 ratio was calculated using spots with both signal intensities above 100 and excluding the 5% highest and 5% lowest log2 ratios. Spots with both signal-to-noise levels below 10 were removed and replicated spots were merged. Hierarchical cluster analysis was performed using TMeV .
Northern blot hybridization and RT-PCR validation experiments
For northern blots, 25 μg total Human Reference RNA (Stratagene) were loaded on 1,3% agarose gels containing formaldehyde. RNA Ladder High Range (Fermentas Life Sciences) was used as molecular weight standard. Electrophoresis was run in 1 × MOPS. The separated RNAs were transferred onto Hybond N+ nylon membrane by capillarity and subsequenlty cross-linked under UV light. P32-radiolabelled strand specific probes were synthesized by in vitro transcription (Riboprobe System, Promega). Hybridizations were performed in 50% (v/v) formamide, 5× SSPE, 5× Dendhardt's, 0,5% SDS, 100μg/ml boiled salmon sperm DNA at 42°C overnight. The blots were subsequently washed 3 times in 2 × SSC:0,1%SDS RT and twice 10 min in 0,1 × SSC:0,1%SDS at 65°C.
For RT-PCR, first strand cDNAs from each cell line were prepared from 500 ng total RNA. In all cases, reverse transcription was primed with a mixture of random hexamers using the Transcriptor First Strand cDNA Synthesis Kit (Roche Applied Sciences). One twentieth of the cDNA synthesis reaction was used as template in each PCR reaction. PCR reactions were primed with BC006233_for 5' TGTTTGTCAGCAAAGATGTGG and BC006233_rev 5' CTGGATGGAGGGGAGAAG. Expand High Fidelity polymerase (Roche Applied Sciences) was used for all reactions. Cycling conditions were: 2 min at 94°C, 20 seconds at 94°C, 30 seconds at 55°C, 2 minutes at 72°C and extension final of 10 minutes at 72°C. PCR products were resolved in 1% agarose gel electrophoresis and band intensities were measured with a Kodak 1D Image Analysis Software system.
We thank Amilcar Flores, for old fruitful discussions, Jessica Ahlsiö for skilled technical assistance, Carsten Peterson, Mattias Höglund and Timothy Hemesath, for their comments on the manuscript.
This work was supported by grants from the The Royal Physiographic Society in Lund, Erik Philip-Sörensen Foundation, Berta Kamprad Foundation, SWEGENE, American Cancer Society, and the Swedish Cancer Society.
- Storz G: An expanding universe of noncoding RNAs. Science. 2002, 296: 1260-1263. 10.1126/science.1072249.PubMedView ArticleGoogle Scholar
- Pang KC, Stephen S, Engstrom PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS: RNAdb – a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 2005, 33: D125-D130. 10.1093/nar/gki089.PubMed CentralPubMedView ArticleGoogle Scholar
- Huttenhofer A, Schattner P, Polacek N: Non-coding RNAs: hope or hype?. Trends Genet. 2005, 21: 289-297. 10.1016/j.tig.2005.03.007.PubMedView ArticleGoogle Scholar
- Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G: In search of antisense. Trends Biochem Sci. 2004, 29: 88-94. 10.1016/j.tibs.2003.12.002.PubMedView ArticleGoogle Scholar
- Lehner B, Williams G, Campbell RD, Sanderson CM: Antisense transcripts in the human genome. Trends Genet. 2002, 18: 63-65. 10.1016/S0168-9525(02)02598-2.PubMedView ArticleGoogle Scholar
- Shendure J, Church GM: Computational discovery of sense-antisense transcription in the human and mouse genomes. Genome Biol. 2002, 3: research 0044-10.1186/gb-2002-3-9-research0044.View ArticleGoogle Scholar
- Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, Shoshan A, Diber A, Biton S, Tamir Y, Khosravi R, et al: Widespread occurrence of antisense transcription in the human genome. Nat Biotech. 2003, 21: 379-386. 10.1038/nbt808.View ArticleGoogle Scholar
- Chen J, Sun M, Kent WJ, Huang X, Xie H, Wang W, Zhou G, Shi RZ, Rowley JD: Over 20% of human transcripts might form sense-antisense pairs. Nucleic Acids Res. 2004, 32: 4812-4820. 10.1093/nar/gkh818.PubMed CentralPubMedView ArticleGoogle Scholar
- Engström PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, et al: Complex Loci in Human and Mouse Genomes. PLoS Genetics. 2006, 2: e47-10.1371/journal.pgen.0020047. doi:10.1371/journal.pgen.0020047PubMed CentralPubMedView ArticleGoogle Scholar
- Kiyosawa H, Mise N, Iwase S, Hayashizaki Y, Abe K: Disclosing hidden transcripts: mouse natural sense-antisense transcripts tend to be poly(A) negative and nuclear localized. Genome Res. 2005, 15: 463-474. 10.1101/gr.3155905.PubMed CentralPubMedView ArticleGoogle Scholar
- Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al: Molecular portraits of human breast tumours. Nature. 2000, 406: 747-752. 10.1038/35021093.PubMedView ArticleGoogle Scholar
- Rhodius V, LaRossa R: Uses and pitfalls of microarrays for studying transcriptional regulation. Curr Opin Microbiol. 2003, 6: 114-119. 10.1016/S1369-5274(03)00034-1.PubMedView ArticleGoogle Scholar
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563. 10.1126/science.1112014.PubMedView ArticleGoogle Scholar
- Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J: Independence and reproducibility across microarray platforms. Nat Methods. 2005, 5: 337-344. 10.1038/nmeth757.View ArticleGoogle Scholar
- Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner RD, Collins FS, Wagner L, Shenmen CM, Schuler GD, Altschul SF, et al: Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci USA. 2002, 99: 16899-16903. 10.1073/pnas.242603899.PubMedView ArticleGoogle Scholar
- Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform forcomprehensive management and analysis of microarray data. Genome Biol. 2002, 3: SOFTWARE0003-10.1186/gb-2002-3-8-software0003.PubMed CentralPubMedView ArticleGoogle Scholar
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, et al: TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003, 34: 374-378.PubMedGoogle Scholar
- Medstrand P, van de Lagemaat LN, Mager DL: Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res. 2002, 12: 1483-1495. 10.1101/gr.388902.PubMed CentralPubMedView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, 33: D501-D504. 10.1093/nar/gki025.PubMed CentralPubMedView ArticleGoogle Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, et al: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308: 1149-1154. 10.1126/science.1108625.PubMedView ArticleGoogle Scholar
- Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, et al: Antisense transcription in the mammalian transcriptome. Science. 2005, 309: 1564-6. 10.1126/science.1112009.PubMedView ArticleGoogle Scholar
- Chen J, Sun M, Hurst LD, Carmichael GG, Rowley JD: Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts. Trends Genet. 2005, 21: 326-329. 10.1016/j.tig.2005.04.006.PubMedView ArticleGoogle Scholar
- Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, et al: Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004, 116: 499-509. 10.1016/S0092-8674(04)00127-8.PubMedView ArticleGoogle Scholar
- Stillman BA, Tonkinson JL: Expression microarray hybridization kinetics depend on length of the immobilized DNA but are independent of immobilization substrate. Anal Biochem. 2001, 295: 149-157. 10.1006/abio.2001.5212.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.