Uncovering information on expression of natural antisense transcripts in Affymetrix MOE430 datasets
© Oeder et al. 2007
Received: 14 February 2007
Accepted: 28 June 2007
Published: 28 June 2007
Skip to main content
© Oeder et al. 2007
Received: 14 February 2007
Accepted: 28 June 2007
Published: 28 June 2007
The function and significance of the widespread expression of natural antisense transcripts (NATs) is largely unknown. The ability to quantitatively assess changes in NAT expression for many different transcripts in multiple samples would facilitate our understanding of this relatively new class of RNA molecules.
Here, we demonstrate that standard expression analysis Affymetrix MOE430 and HG-U133 GeneChips contain hundreds of probe sets that detect NATs. Probe sets carrying a "Negative Strand Matching Probes" annotation in NetAffx were validated using Ensembl by manual and automated approaches. More than 50 % of the 1,113 probe sets with "Negative Strand Matching Probes" on the MOE430 2.0 GeneChip were confirmed as detecting NATs. Expression of selected antisense transcripts as indicated by Affymetrix data was confirmed using strand-specific RT-PCR. Thus, Affymetrix datasets can be mined to reveal information about the regulated expression of a considerable number of NATs. In a correlation analysis of 179 sense-antisense (SAS) probe set pairs using publicly available data from 1637 MOE430 2.0 GeneChips a significant number of SAS transcript pairs were found to be positively correlated.
Standard expression analysis Affymetrix GeneChips can be used to measure many different NATs. The large amount of samples deposited in microarray databases represents a valuable resource for a quantitative analysis of NAT expression and regulation in different cells, tissues and biological conditions.
Natural antisense transcripts (NATs) are RNA molecules harboring sequences complementary to other transcripts. Expression of endogenous NATs was first observed in viruses  and prokaryotes , followed by its detection also in eukaryotes . The number of sense-antisense (SAS) transcripts slowly grew during the 1990's, as more examples were found in the study of the regulation of expression of individual genes. In some of these cases, a regulatory function for antisense RNA in controlling the mRNA or protein levels of the respective sense gene product was described (for reviews see ref. [4, 5]). The availability of large scale cDNA sequencing then led to the in silico discovery of significant numbers of SAS transcript pairs in different species, including yeast, plants and mammals [6–14]. Most recently, systematic genome-wide detection of transcription by strand-specific tiling oligonucleotide arrays or sequencing (CAGE) substantiated the notion that expression of antisense transcripts is a widespread phenomenon, accounting for up to 50 % of all transcripts [15–17]. To date, only few studies have analyzed quantitative changes in the expression of larger numbers of NATs under different experimental conditions. The function(s) of NATs and the regulation of their expression in relation to the corrresponding sense transcripts are largely unknown [4, 18]. One possible function of NATs is decreasing sense transcript abundance, e.g. by RNAi-mediated degradation . In this case, the levels of sense and antisense transcript would be expected to be negatively correlated. The effects of NATs are not necessarily mediated by RNA-RNA hybridization but can also be brought about by RNA-protein interactions [20, 21].
Oligonucleotide microarrays such as Affymetrix GeneChips detect labeled cRNA in a strand-specific manner. Standard labeling methods based on incorporation of a T7 binding site with an oligo-dT primer during first strand cDNA synthesis not only amplify selectively polyA mRNA, but also conserve strand-specificity of the resulting cRNA. This technology therefore in principle allows the measurement of transcripts in an orientation-specific manner. Tiling arrays have indeed been used for a detailed experimental investigation of transcriptional activity on human chromosomes [15, 22, 23]. These studies have greatly advanced our understanding of the extent of genome-wide transcriptional activity and the abundance of NATs and other non-coding RNA transcripts. However, the requirement of multiple arrays to perform a genome-wide tiling array analysis of a single RNA sample makes larger experiments prohibitively expensive, and therefore has to date precluded a comprehensive investigation of NATs expression and regulation in many different experimental conditions. In contrast, expression analysis GeneChips such as MOE430 for murine and U133 for human RNA detect and measure a comprehensive set of the majority of protein coding mRNAs on one single microarray, making global mRNA expression profiling experiments relatively affordable. Their extensive use in functional genomics studies has resulted in a wealth of data sets deposited in public data repositories. In this paper, we point to the presence of a considerable number of probe sets detecting NATs on Affymetrix GeneChip expression analysis arrays. Probe sets potentially detecting NAT were validated in silico, and microarray results showing expression of NATs were confirmed by strand-specific RT-PCR. The large number of Affymetrix datasets in public domain repositories like Gene Expression Omnibus (GEO) can therefore be mined for the expression and regulation of hundreds of NATs.
For comparison, we manually validated the orientation of a subset of 391 potential NAT-detecting probe sets by performing a Blast search using the probe set target sequence as query in Ensembl. Comparing the results of the automatic and manual classification approaches, we found a high concordance in the category "antisense" where 58/391 were called by the automatic versus 63/391 by the manual method (overlap of 51 probe sets). In the category "overlap", automatic classification gave more hits than manual inspection (120 versus 81 of 391 probe sets). Together, the manual and automatic classification called 653 probe sets on the MOE430 2.0 GeneChip as NAT-detecting probe sets, which are listed in Additional file 1 annotated with the gene symbol and the result of manual and automated validation. This annotation may be used to mine existing microarray data sets for expression and regulation of hundreds of NATs. Based on a rate of 57.5 % of confirmed NAT-detecting transcripts for MOE430 2.0 (640/1113 by the automatic classification), the human U113 Plus 2.0 GeneChip that has 3141 probe sets with a NetAffx annotation of "Negative Strand Matching Probes" can be expected to contain in the range of 1800 NAT-detecting probe sets.
While these large numbers of NAT-detecting probe sets should facilitate the quantitative study of expression and regulation of antisense transcripts, it has to be pointed out that only a fraction of all NATs can be detected by mining MOE430 or U133 expression analysis data sets: Because of the design of MOE430 microarrays, NAT-detecting probe sets are restricted to the 3' region of mRNAs and will usually not detect intronic non-coding RNAs. Since the labeling protocol used in most MOE430 data sets deposited in public databases is specific for poly-adenylated transcripts, all polyA- NATs are also not detected by the NAT-detecting probe sets confirmed in our study.
Of the manually identified 144 probe sets detecting NATs on the MOE430 2.0 GeneChip, a significant fraction appears to be expressed based on present calls on at least 3 out of 12 arrays of 41 and 83 percent for "antisense" and "overlap" probe sets in a dataset obtained from macrophages stimulated in vitro (unpublished data). We next tested if the expression of NATs as indicated by these probe sets from the MOE430 2.0 GeneChip could be confirmed using strand-specific RT-PCR as an independent method.
Taken together, we have shown here that Affymetrix MOE430 microarray data sets contain a considerable amount of quantitative information about the expression levels of hundreds of NATs. This information can be uncovered using the NetAffx annotation "Negative Strand Matching Probes" followed by manual or automated in silico validation in Ensembl, as we have done here for the MOE430 2.0 GeneChip, yielding the more than 600 NAT-detecting probe sets provided in Additional file 1. Experimental confirmation by strand-specific RT-PCR showed that most of the NATs indicated by microarray results were indeed expressed, interestingly at different levels between tissues.
Very recently, Werner et al. reported a similar approach to measure NATs by using the first version of the Affymetrix U74 mouse expression analysis GeneChips, that contained a large number of probe sets in reverse complementary orientation . They found a similar present call rate as reported by us for NAT-detecting probe sets on MOE430 arrays and also could confirm the expression of many SAS pairs by RT-PCR. While MOE430 and U133 GeneChips harbor a much lower percentage of "Negative Strand Matching Probes" probe sets than the U74 mouse set, the large and rapidly expanding number of publicly accessible datasets provides the unprecedented opportunity to extract hidden information about SAS transcript pairs expression as a by-product of gene expression profiling experiments.
The manual and the automated probe set annotation are based on Ensembl  version 36 (released December 2005). Ensembl defines the probe set mapping to the genome and probe set association to genes as follows: each probe on the Affymetrix MOE430 2.0 array is mapped directly to the genome sequence. This mapping is not necessarily unique. Each probe set is then associated with one or more Ensembl genes by directly comparing the probe set to the set of cDNAs created from the Ensembl transcripts. Each transcript is associated with only one gene (although one gene may have many transcripts).
By manual inspection, the binding region was checked for the orientation of Ensembl known transcripts relative to the probe set. As the GeneChip microarray method analyzes cRNA, the orientations were evaluated as follows: If there is an annotated transcript in the same orientation and no annotated transcript in the reverse orientation, the probeset detects only sense transcripts (category "sense"). If there is an annotated transcript in the reverse orientation and no annotated transcript in the same orientation, the probe set detects only antisense transcripts (category "antisense"). If the probeset matches a genomic region where two annotated transcripts overlap, it detects transcripts that are sense and antisense (category "overlap").
In the automated classification, for a probe set to be considered "sense" to a gene, we required that the probe set be associated with only one gene and that the probes from the probe set are mapped to the genome on the same strand and overlapping the start of the gene by at least one base pair. "Antisense" probe sets map to the genome in a region where a gene is annotated on the opposite strand. Additionally, we require that "antisense" probe sets are not associated with any gene and have no mapped probes that overlap a gene on the same strand. Probes were classified as "cross-hybridizing" if they were associated with more than one gene and as "no transcript" if they are not associated with any gene and have no mapping that overlaps a gene by one base pair. Probes are classified as "overlap" if the genes they are associated with overlaps with a gene on the opposite strand by at least one base pair. This overlap is almost always associated with the untranslated regions.
Mouse organs were homogenized in peqGOLD TriFast (Peqlab) with an rotor-stator homogenizer, followed by extraction of RNA according to the manufacturer's instructions. Strand specific reverse transcription was performed using either the 5' or the 3' primer designed for a specific gene. The reaction was carried out in 10 μl on 100 ng RNA for one hour at 45°C using Superscript II (Invitrogen) and at 55°C using Superscript III. Subsequent PCR was performed with 5 μl of the RT reaction and the complemented primer (5' or 3') for 30 cycles for endpoint agarose gel analysis. Real-time qPCR using SYBRgreen was done for 40 cycles on an Applied Biosystems 7700 SDS.
natural antisense transcript
Work at the Microarray and Bioinformatics Core Unit in the Institute of Medical Microbiology, Immunology and Hygiene is funded by NGFN-2 grant FKZ 01GS0402 to RL, H. Wagner and R. Hoffmann. We thank Dr. Anne-Laure Boulesteix for statistical advice.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the Terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.