Conserved expression of natural antisense transcripts in mammals
© Ling et al.; licensee BioMed Central Ltd. 2013
Received: 5 November 2012
Accepted: 6 March 2013
Published: 12 April 2013
Skip to main content
© Ling et al.; licensee BioMed Central Ltd. 2013
Received: 5 November 2012
Accepted: 6 March 2013
Published: 12 April 2013
Recent studies had found thousands of natural antisense transcripts originating from the same genomic loci of protein coding genes but from the opposite strand. It is unclear whether the majority of antisense transcripts are functional or merely transcriptional noise.
Using the Affymetrix Exon array with a modified cDNA synthesis protocol that enables genome-wide detection of antisense transcription, we conducted large-scale expression analysis of antisense transcripts in nine corresponding tissues from human, mouse and rat. We detected thousands of antisense transcripts, some of which show tissue-specific expression that could be subjected to further study for their potential function in the corresponding tissues/organs. The expression patterns of many antisense transcripts are conserved across species, suggesting selective pressure on these transcripts. When compared to protein-coding genes, antisense transcripts show a lesser degree of expression conservation. We also found a positive correlation between the sense and antisense expression across tissues.
Our results suggest that natural antisense transcripts are subjected to selective pressure but to a lesser degree compared to sense transcripts in mammals.
Recent studies suggest that a substantial portion of mammalian genomes are transcribed as non-coding RNA [1–4], including cis-natural antisense transcripts (cis-NATs) [5–8]. Cis-NATs are transcribed from the antisense counterpart of protein coding sequences, which may result in post-transcriptional gene silencing . However, the extent to which cis-NATs are biologically functional and actively regulated remains a subject of debate . Some studies had suggested that cis-NATs represent transcriptional noise [11, 12], while others had reported supportive evidence for the function of various cis-NATs [13–16], especially in RNA editing , stability , and translation . Growing evidence implicating a role of NATs in medical conditions, such as hypertension  and immune disorders , suggests a functional role for cis-NATs. However, it cannot be assumed that cis-NATs are as actively regulated as its sense counterparts.
Through our previous study, we developed an Antisense Transcriptome analysis using Exon array (ATE) approach for high-throughput expression analysis of NATs by using commercial oligonucleotide DNA microarrays . The Affymetrix Exon array is an inexpensive high-density oligonucleotide microarray that has two unique features: (1) it has multiple probes for each of known or predicted exons, and (2) its signals are strand-specific because of the generation and labeling of single-stranded DNA targets. By modifying the recommended cDNA synthesis protocol, we demonstrated that it is possible to label targets in reverse orientation as what would be labeled according to the standard protocol (See Additional file 1: Figure S1). Thus, the cDNAs from known genes can no longer hybridize with these probes. Instead, any true hybridization signal must come from transcripts on the opposite strand, i.e., cis-NATs. Our preliminary microarray data on human Jurkat cells showed that the modified protocol can successfully detect a large number of NATs transcribed from known exonic loci . Although limited to exonic NATs, using the Affymetrix exon arrays with our modified protocol provides a cost-efficient method to study the expression of NATs on ~ 1 million exonic loci.
The expression patterns of protein-coding genes [23, 24] and orthologous genes that are essential to the organism  are evolutionarily conserved. Hence, it can be implied that orthologous transcripts that demonstrates expression conservation are likely to be biologically functional. Expression divergence of randomly assigned pairs of genes, by means of permutation, had been used as a baseline to approximate a neutral evolution of gene expression . If orthologous cis-NATs show more correlated expression patterns when compared to randomly permuted cis-NAT pairs, it would provide evidence that cis-NATs are actively regulated or subject to selective pressure.
In this study, we measured the expression of antisense transcript across human, mouse, and rat using the ATE procedure . Coupled with expression analysis of sense transcripts in the same samples, this will define a “double stranded” expression profile at the exon level. We report significant differences in expression divergence between antisense orthologous transcripts when compared to permuted pairs. However, the expression divergence of sense transcripts is significantly lower than that of antisense transcripts, suggesting that cis-NATs are subjected to selective pressure but to a lesser degree compared to sense transcripts.
Total RNA were purchased from Ambion (Austin, TX, USA). The RNA samples consist of human colon, mouse embryo, rat embryo, and 9 orthologous tissues from all 3 organisms, namely, brain, heart, kidney, liver, lung, spleen, ovary, testes, and thymus. Sense and antisense expression was measured using Affymetrix Exon 1.0 ST array according to manufacturer’s protocol and previously described modification for measuring antisense expression  respectively. Sense and antisense arrays were normalized separately using RMA method. The log expression values and exon annotations for core exon set were extracted using Affymetrix Expression Console. The final log expression values were averaged from two replicates.
Orthologous genes between human, rat, and mouse were identified from NCBI HomoloGene, build 65. The protein accession numbers were converted to gene accession numbers  and mapped to the microarray annotations for gene-level orthology. For exon-level orthology, three sets of exon sequences; human, mouse, and rat; were downloaded from USCS Genome Browser and used to generate BLAST databases. The exon sequence sets were compared using blastn and exon pairs with global sequence identity of more than 80% were considered to be orthologous. In event where one exon was found to have global sequence identity of more than 80% with more than one orthologous exons in the same organism, the exon with the highest global sequence identity were considered to be orthologous. In addition, exons with overlapping RefSeq transcripts on both sense and antisense strands were removed.
Permutation test was used to evaluate whether the expression divergence of orthologous exons are statistically different from random. Expression divergence between 2 probesets is defined by Euclidean distance of the relative abundance, converted from microarray log expression value, using 9 orthologous tissues. At least one of the tissues must have expression higher than 6.5. Relative abundance is defined as the quotient of the microarray log expression value of the sample and the sum of the log expression values of the 9 tissues in the same set. Student’s t-test was used to test the expression divergence between the orthologous probesets and permuted pairs. Permuted pairs were generated by randomly assigning pairs of probesets from different organisms within the orthologous set. As a result, the number of orthologous pairs and permuted pairs are equal.
Tissue-specific probesets were identified based on a previously described method  using 3 empirical criteria. Firstly, the log expression value of the highest expressing tissue must be higher than 6.5 which is the threshold for a detection p-value of 0.01 above background. Secondly, the Z-score of the log expression value for the highest expressing tissue must be higher than 2. Finally, the expression level of the highest expressing tissue must be at least one log higher than that of the second highest expressing tissue. Mouse and rat embryo samples were removed from the data set before identifying tissue-specific NAT probesets for the 9 orthologous tissues in rat and mouse.
Novel NAT probesets were identified from the core Affymetrix exon probeset based on a previously described method  using BLASTN (version 2.2.25+). Each probeset was queried against RefSeq database (downloaded from NCBI on August 9, 2012) and EST database (downloaded from NCBI on May 15, 2012) for perfect matches. Query sequences without perfect matches in RefSeq database and EST database were considered to be novel. The strand option was set to “minus” to query only the reverse-complement of the query sequence (personal communication, Wayne Matten, NIH).
Strand-specific RT-PCR was used to validate the sense-antisense transcripts candidates. Total RNA samples were purchased from Clontech and tested for genomic DNA contamination by direct PCR at the cycling conditions of 95°C 30 seconds, 56°C 30 seconds, 72°C 45 seconds for 38 cycles before visualized in a 2% agarose gel. For RT-PCR, sense and antisense primers were designed for each candidate using Primer3. Strand-specific reverse transcription were performed for each candidate on the conditions: RNA (100 ng) were reverse-transcribed using M-MLV Reverse Transcriptase (Invitrogen) and gene-specific sense primer (for antisense detection) or antisense primer (for sense detection) (2 pmole) in 20 μl volume at 65°C for 5 minutes, 37°C for 50 minutes, 70°C for 15 minutes. 5 μl of cDNA was amplified in 50 μl containing sense primer (10 pmole), antisense primer (10 pmole), Taq polymerase (1.25 U, Promega), MgCl2 (1.5 mM) in ABI 9700 cycler under cycling conditions: 95°C for 7 minutes, 25 or 38 cycles of 95°C for 30 seconds, 56°C for 30 seconds, 72°C for 45 seconds, 72°C for 7 minutes, and visualized in 2% agarose gels.
Using the ATE protocol , we analyzed antisense transcription in 10 corresponding tissues for human, mouse and rat. The Affymetrix GeneChip Human Exon ST array includes 1.4 million probesets targeting exonic loci, of which 287,329 “core” probes are supported by full-length RNAs. Similar arrays for mouse and rat have 1.2 and 1 million probesets, and 231,465 and 92,751 core probesets, respectively. The same RNA sample and microarray were used to detect sense gene expression using the standard protocol. Two technical replicates were performed for each biological sample, resulting in a total of 120 hybridizations.
Number of novel and tissue-specific antisense transcripts
Total number of probesets supported by RefSeq annotations (Core probesets)
Antisense transcripts that are not found in RefSeq database
Antisense transcripts that are not found in EST database
Antisense transcripts that are not found in both RefSeq or EST databases
Antisense transcripts that are expressed in at least one tissue and not found in RefSeq or EST databases (novel antisense transcripts)
Tissue-specific transcripts and not found in RefSeq or EST database (novel, tissue-specific transcripts)
Antisense transcripts that are expressed in at least one tissue and not found in RefSeq database (This is used for expression divergence analysis, Figure 5)
To identify antisense transcripts with tissue-specific pattern of expression, we used empirical criteria similar to . Our results suggest 14,485 (5.0% of core probesets) human antisense exon probesets, 12,673 (5.5% of core probesets) mouse antisense exon probesets and 11,358 (12.2% of core probesets) rat antisense exon probesets are tissue-specific. This may suggest a biological role for these antisense transcripts and warrant further investigation.
We had identified an example of human long noncoding RNA expression (Additional file 1: Figure S5) identified in NONCODE , which forms a sense-antisense pair at the 5′-end of Ubiquitin-specific protease 25 (USP25). USP25 is involved in degrading mis-folded proteins in the endoplasmic reticulum . This suggests that the expression of USP25 may be modulated by antisense transcript.
The average expression of antisense transcripts is lower than that of sense transcripts at both the gene and exon level (Additional file 1: Figures S6-S11). This is consistent with that of  who reported that the hybridization control probes were higher in antisense arrays, suggesting that the actual expression levels of transcripts on the antisense arrays were lower than sense arrays. The proportion of core exon probesets detected above background (DABG) with a detection p-value less than 0.01 (Additional file 1: Figure S12) were consistent with previous report . Our results also suggest that probesets with DABG p-value of less than 0.01 have an intensity of at least 6.5.
Two technical replicates were performed for each tissue and our results show that core probeset intensities of the technical replicates are strongly correlated (0.86 < r < 0.99) in both sense and antisense arrays (Additional file 1: Figures S13-S24), suggesting that the experimental procedure is reproducible. The average intensity values from the two technical replicates were used for analysis.
For further validation of ATE procedure, we examined pre-designed probes for known antisense transcripts. We searched for extended probesets at the same genic location to core probesets but are located on the opposite strand. Some antisense transcripts should be detected by these extended probesets in the sense arrays, as well as the core probesets in the antisense array. We found positive correlation (0.50 < r < 0.58) between the expression level of core probesets in antisense array and the overlapping extended probeset in sense array (Additional file 1: Figure S25, S26). This is expected as the extended probesets in the sense array and the sense probesets in the antisense array are detecting the same antisense transcript.
We extended the same analysis to exon-level expression data. Despite using different intensity thresholds to calculate expression divergence (Figure 4, Figure 5B, and Additional file 1: Figure S30), our results show that the expression divergence between orthologous sense exon transcripts is significantly lower than permuted pairs. Our results consistently demonstrate a stronger correlation between expression profiles of orthologous tissues than non-orthologous tissues as shown by the diagonal lines representing orthologous tissues between 2 organisms on the heatmap (Figure 4). As an example, the Pearson’s correlation between mouse and rat kidney is 0.72 while the correlation between mouse kidney and human brain is 0.44 (Additional file 1: Figure S31). Thus, our results validated our experiment and suggested that our analytical procedures are suitable for studying expression divergence in exon transcripts.
The same method for analyzing sense expression divergence at the exon level was used to analyze antisense expression divergence. Expression signal detected by each exon-specific probeset on the antisense array was treated as one antisense transcript. Using BLAST, we identified 7,460, 5,836 and 18,346 orthologous antisense transcripts for human/mouse, human/rat and mouse/rat comparisons, respectively. Student’s t-test with unequal variance between the expression divergence of orthologous antisense transcripts and that of permuted pairs suggests that the expression divergence of orthologous antisense transcripts are significantly lower from their respective permuted pairs in all 3 comparisons (p-value < 10-8; Figure 5C; Additional file 1: Figures S32A, S33A, and S35). Using different intensity thresholds (full set of orthologous exons without threshold, intensity of at least one orthologous probeset to be above 6.5 which is the intensity threshold for DABG p-value of less than 0.01, or intensity of at least one orthologous probeset to be above 8) also showed that the expression divergence of orthologous exons for antisense transcripts are significantly lower. At the same time, the Pearson’s correlations of expression divergence between permuted pairs (Additional file 1: Figures S32B, and S33B) are close to zero which is similar to that reported by . However, it had been suggested that Pearson’s correlation and Euclidean distance can produce different results  and a novel randomization procedure had been proposed recently . Using the randomization procedure proposed by , we arrive at the same conclusion that the expression divergence of orthologous antisense transcripts are significantly lower than their respective permuted pairs regardless of intensity thresholds (Additional file 1: Figure S34). This suggests that our results are not artifacts due to the use of Euclidean distance, different randomization methods or intensity thresholds. Therefore, our results suggest that the expressions of antisense transcripts are under selective pressure and their expressions are evolutionarily conserved which in agreement with the roles of antisense transcripts proposed by a large number of recent studies across different species (reviewed in [39, 40]).
Our results show that the average difference between the expression divergence of the sense orthologs and its permuted counterparts is larger than the average difference between that of the antisense orthologs and its permuted counterparts (Figures 5B and 5C). This may imply on the extent of selective pressure on antisense transcripts. Permuted pairs approximate a purely neutral evolution of gene expression without selective pressure . Hence, the deviation from permuted pairs may approximate the strength of selective pressure on top of a neutral background. Therefore, our results suggest that antisense expression is subjected to less selective pressure compared to sense expression.
Consistent positive correlation between sense and antisense divergence may suggest an over-arching regulatory mechanism for both sense and antisense expression while the large proportion of uncorrelated sense and antisense expression may suggest a layer of independent regulation. Chromatin structure had been shown to affect the accessibility of RNA polymerase II and other transcription factors to the site of transcription by means of methylation and acetylation [42, 43]; thereby, playing an important factor in regulating gene expression. However, the role of chromatin structure in antisense transcript regulation had only been recently reported by  using chromatin immuno-precipitation and demonstrated positive correlation between cis-NAT promoter activity, the presence of RNA polymerase II histones modification and the resulting antisense RNA-seq read density, suggesting that chromatin structure also may be involved in cis-NAT transcription.
In summary, we analyzed the expression of antisense transcripts in corresponding tissues across 3 species and found evidence for the conserved expression of these transcripts, similar to what have been observed in protein-coding genes. This supports the idea that expression of antisense transcripts is regulated and subject to selection pressure. Our result is based on a large number of antisense transcripts, supplementing previous studies of the functions of specific antisense transcripts. In addition, the tissue-specific expression pattern of some antisense transcripts might guide future in-depth study of their potential function.
One could argue that the conserved expression of natural antisense transcripts might be a by-product of the regulated and conserved expression of the corresponding protein-coding genes. Chromatin remodelling is one mechanism of gene regulation that makes the DNA sequences accessible to transcriptional protein complexes. It is possible for antisense transcripts to get a “free ride” when this happens in a regulated manner. Further studies are necessary to fully address these possibilities, especially using new technologies like strand-specific RNA-Seq.
The authors would like to thank Administrative and Research Computing at South Dakota State University for providing computational resources. We also thank Dr. Jaejung Kim of the Genomics Core Facility at University of Chicago for conducting the customized DNA microarray experiments.
Microarray data for this work had been deposited in NCBI GEO as GSE41462 (antisense data) and GSE41464 (sense data). The sense and antisense expression data for various normal tissues in human, mouse and rat will also be accessed at UCSC Genome Browser as a custom track.
This work was supported by National Institute of Health (GM083226) to SXG.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.