Non-random retention of protein-coding overlapping genes in Metazoa
- Giulia Soldà†1,
- Mikita Suyama†2,
- Paride Pelucchi3,
- Silvia Boi1,
- Alessandro Guffanti3,
- Ermanno Rizzi3,
- Peer Bork4,
- Maria Luisa Tenchini1 and
- Francesca D Ciccarelli5, 6Email author
© Soldà et al; licensee BioMed Central Ltd. 2008
Received: 29 October 2007
Accepted: 16 April 2008
Published: 16 April 2008
Although the overlap of transcriptional units occurs frequently in eukaryotic genomes, its evolutionary and biological significance remains largely unclear. Here we report a comparative analysis of overlaps between genes coding for well-annotated proteins in five metazoan genomes (human, mouse, zebrafish, fruit fly and worm).
For all analyzed species the observed number of overlapping genes is always lower than expected assuming functional neutrality, suggesting that gene overlap is negatively selected. The comparison to the random distribution also shows that retained overlaps do not exhibit random features: antiparallel overlaps are significantly enriched, while overlaps lying on the same strand and those involving coding sequences are highly underrepresented. We confirm that overlap is mostly species-specific and provide evidence that it frequently originates through the acquisition of terminal, non-coding exons. Finally, we show that overlapping genes tend to be significantly co-expressed in a breast cancer cDNA library obtained by 454 deep sequencing, and that different overlap types display different patterns of reciprocal expression.
Our data suggest that overlap between protein-coding genes is selected against in Metazoa. However, when retained it may be used as a species-specific mechanism for the reciprocal regulation of neighboring genes. The tendency of overlaps to involve non-coding regions of the genes leads to the speculation that the advantages achieved by an overlapping arrangement may be optimized by evolving regulatory non-coding transcripts.
The occurrence of overlapping genes in higher eukaryotes has long been considered a rare event [1, 2], but the completion of genome sequencing efforts and whole-transcriptome analyses have instead revealed that mammalian genomes harbor a high number of overlapping transcriptional units [3–8]. The majority of detected overlaps occurs between genes transcribed from opposite strands of the same genomic locus and often involves non-coding RNAs [6, 9–14]. These antisense transcripts participate in a number of cellular processes, such as genomic imprinting, X chromosome inactivation, alternative splicing, gene silencing and methylation, RNA editing and translation [15–20]. Comparatively, very little is known about overlapping genes lying on the same DNA strand, apart from a few cases reported in the literature [21–24]. Overlap is estimated to involve around 10% of protein-coding genes [13, 25], raising to 20%–60% when non-coding RNAs are included [6, 8–10, 12, 14, 26, 27]. Despite their abundance, the origin and evolution of overlapping genes in eukaryotes remain unclear, and different comparative studies have often led to discordant results [6, 12–14, 25]. The inclusion of non-coding RNAs and poorly annotated transcripts in these analyses, together with protein-coding genes, may have contributed to the conflicting results, as protein-coding genes and functional non-coding RNAs evolve differently . In order to investigate the evolution of gene overlap in Metazoa we decided to use a dataset restricted to well-annotated protein-coding genes. We retrieved overlapping protein-coding genes in 5 representative species (Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster and Caenorhabditis elegans), and compared the observed cases with a random distribution expected in case of functional neutrality. We identified features and conservation of protein-coding overlapping genes, and inferred possible mechanisms responsible for overlap formation. Finally, to evaluate the possible relationship between overlap and gene expression, we analyzed the expression of our set of overlapping genes in a human breast cancer cDNA library derived by 454 deep sequencing.
Results and Discussion
Non-random retention of protein-coding overlapping genes in Metazoa
Overlapping genes in five Metazoa.
Expected OGCs (SD)
Observed OG Pairs
Expected OG Pairs (SD)
Observed OGs (%)
Expected OGs (SD)
Expected OGs (%)
We compared the observed data on overlapping genes to a null model that simulates the distribution of expected events in case of neutrality. For each species, we re-assigned random positions to the individual genes within each chromosome and counted the resulting number of overlaps.
In all species the overall number of observed OGCs was significantly lower than randomly expected (Table 1), suggesting selection against the retention of overlap as a general mechanism of gene arrangement. There are at least two reasons possibly explaining the counter selection of gene overlap in Metazoa. First, each mutation occurring within the overlapping regions would affect two or more sequences at the same time, and would likely reduce the ability of the involved genes to become optimally adapted . Second, overlap can result in transcriptional [33, 34] or translational  interference between overlapping reading frames. Both these reasons help to explain why OGCs formed by several genes, as well as those involving coding sequences, are particularly selected against (see below).
While the number of random OGCs varied according to the different gene density of the analyzed species (Table 1 and Additional file 2), this tendency was not maintained in the observed data. Observed OGCs in human and mouse were around 4–5 times less than expected, while they were ~2 times less in fly and ~12 times less in worm. In agreement with our observation, a remarkable abundance of antisense transcripts in fly and a paucity in worm have been recently reported [12, 14]. The different rates of overlapping genes in fly and worm could be due to species-specific features. The higher proportion of overlapping genes in fly might be partly explained by the high gene density and the extended UTR length (Additional File 1). The low number of OGCs in worm may be instead a consequence of the presence of operons, which involve at least 15% of C. elegans genes . Each operon contains from two to eight genes which are cotranscribed from the same strand as a polycistronic RNA and trans-spliced . It is conceivable that such feature might place a constraint on the plasticity of the worm genome, disfavoring the retention of specific overlap types, such as antiparallel and partial arrangements. Similar genomic constraint has been recently proposed to explain the paucity of duplicated genes in operons .
In all genomes except zebrafish, OGCs formed by two genes occurred at a frequency significantly higher than expected (Figure 2A). In addition, OGCs in human, mouse, and fly were mostly formed by antiparallel convergent pairs which overlapped only partially, while in zebrafish and more markedly in worm nested overlaps were preferred (Figures 2B and 2C). However, the results in zebrafish should be taken carefully, since they are probably affected by the poor coverage of the corresponding gene set. Likewise, the annotation of 5' and 3' untranslated regions appears particularly incomplete in worm (Additional file 1), which may contribute to an underestimation of some overlap classes (i. e. partial overlap, CDS/UTR and UTR/UTR overlaps, Figure 2). In all species overlaps between genes lying on the same strand and those sharing coding regions are strongly selected against (Figures 2C and 2D). Overlap between UTRs is preferentially retained in all organisms, while the overlap between coding regions and introns is common in zebrafish, drosophila and worm (Figure 2D). The non-random features of observed OGCs suggest that different overlap types are under different selective pressures. The retention of specific overlapping classes might be allowed when it provides selective advantages: in the case of genes on opposite strands the advantage could be represented by antisense regulation. Human, mouse and fly are significantly enriched in overlapping pairs potentially able to form antisense, which include all antiparallel overlaps sharing exons (H. sapiens 55%, p < 0.001; M. musculus 58%, p < 0.001; D. melanogaster 53%, p < 0.001, chi-squared test). This result suggests that, at least in these species, positive selection might act to preserve antisense regulation. It cannot be excluded, however, that part of the positive effect could be a consequence of the negative selection towards parallel and CDS/CDS overlaps.
Poor evolutionary conservation of OGCs in Metazoa
Overlapping genes conservation between human and mouse.
Total human OG pairs
Human-mouse conserved OG pairs
Conservation rate (%)
Parallel OGCs did not show any significant enrichment in the conserved set (Table 2). Since same-strand overlaps are strongly selected against (Table 1), we investigated whether the ones that are conserved are more likely to be functional. Indeed, we found that several parallel OGCs conserved between human and mouse might be functionally related on the basis of the available literature data (Additional data file 3).
Although the vast majority of overlap is not conserved over long evolutionary distances, we found evidence of few ancient overlaps. Overall, three OGCs were conserved between Ecdysozoa (nematodes and arthropods) and Deuterostomia (vertebrates). Interestingly, the only OGC that is conserved from C. elegans to human was lost in arthropods, while two different OGCs are conserved from D. melanogaster to human. All of these OGCs are formed of two genes with a nested antiparallel arrangement. One of the two clusters conserved in D. melanogaster (Cluster 77, Additional File 2) involves the synapsin (Syn) and an inhibitor of metalloproteinase (Timp) genes. According to the model proposed for the evolution of the Syn-Timp cluster , the locus containing the ancestral nested genes has undergone gene duplications and losses in vertebrates, followed by function partitioning among the resulting paralogs. A comparable succession of events is compatible also with the evolution of the only OGC conserved between vertebrates and worm (Cluster 371, Additional File 2). In this case, the ancestral OGC locus seems to have undergone duplication after the split between Protostomia and Deuterostomia, followed by function partitioning among the resulting paralogs (Additional file 4).
The poor evolutionary conservation of gene overlap in Metazoa suggests that its occurrence is species-specific. Such species-specificity was not due to a recent origin of the overlapping genes, as previously suggested [2, 13, 32]. We found that most overlapping genes in one species had orthologs in the other species, although they did not overlap (Figures 3B and 3C). In addition, 30.2% of human overlapping genes and 25.8% of mouse overlapping genes remained physically adjacent in the compared genome, although the superimposition was lost (see below).
There are examples of functional processes whose poor conservation during evolution is part of their functional role, alternative splicing being the most striking one . Although approximately two-thirds of human genes are alternatively spliced , only 10–20% of them conserve the spliced exons in the orthologous genes in mouse . Hence we can propose a species-specific usage of gene overlap similarly to what seems to happen for alternative splicing .
Gene structure modifications associated with overlap formation
Gene structure comparison between human and mouse.
OG Pairs Conserved in Hs and Mm (282)
Human OG Pairs Adjacent in Mm (226)
Mouse OG Pairs Adjacent in Hs (171)
Average Gene Length
Average Exon Number
The structural analysis of orthologs of human and mouse overlapping genes that remain adjacent but lack the superimposition shows that the overlap formation is frequently associated with an increase in gene size and exon number. We therefore suggest that the overlap between adjacent genes may originate by species-specific acquisition of additional, non-coding exons. In agreement with our results, most of the loci analyzed by the ENCODE consortium were found to possess distal 5' non-coding exons which map into neighboring genes and tend to be tissue- or cell-line-specific .
Expression patterns of overlapping gene pairs
The observed rate of co-expression in the whole dataset was 27.6%, while the percentage of discordant expressed OGs was 42.5%. Taking into account the overall coverage of known genes in our cDNA library, the co-expression rate is four times higher than expected by the random probability of having any two genes expressed at the same time in the library (7.3%). Therefore, OGs showed a significant tendency to be co-expressed (upper cumulative distribution function, p = 6.7e-102). It should be noted that we obtained significant co-expression even though we removed all sequences mapping to more than one gene in the same cluster (see Methods). Such filtering step likely led to an underestimation of the level of co-expression of overlapping genes, but it did not influence the final result. By contrast, the percentage of discordantly expressed genes is not significantly different from random expectation (upper cumulative distribution function, p = 0.043). Previous studies reported higher co-expression rates, ranging from 35.1% to 44.9% [10, 47], with the differences likely due to experimental design (i. e. differences in the starting dataset) and in the number of analyzed tissues.
Considering the different overlapping arrangements, we also observed that co-expression was significantly higher for both convergent (chi-square = 4.69, p= 3.03e-2) and divergent OGs (chi-square = 4.28, p= 3.85e-2), when compared to the frequency of the complete overlaps. On the opposite, we observed no statistically significant differences among overlapping arrangements when considering discordantly expressed OGs. Taken together, these results further support the hypothesis that gene overlap might be used to co-ordinate expression of adjacent genes.
Our work shows for the first time that overlap between protein coding genes, although widespread, is counterselected during Metazoan evolution. We also show that overlap retention does not occur randomly, since it preferentially involves gene pairs lying on opposite DNA strands and sharing non-coding regions. The features of retained OGCs suggest a likely role for overlap in the reciprocal regulation of neighboring genes. The evidence that OGs are significantly co-expressed in the breast cancer transcriptome further supports this hypothesis. In addition, the poor conservation of overlap during evolution, and the fact that formation/loss of the overlapping arrangement is related to changes in gene structure, mostly occurring within non-coding regions, points to this as a species-specific mechanism. As non-coding regions generally have fewer constraints on their primary sequence, the tendency to confine the overlap to non-coding regions may achieve co-regulation without forcing two functional protein-coding genes to co-evolve. We might speculate that this tendency would ultimately result in the evolution of overlapping non-coding transcripts optimized for the regulation of their protein-coding partner.
Overlapping gene detection
The RefSeq cDNA sets  for five organisms (H. sapiens, M. musculus, D. rerio, D. melanogaster, and C. elegans) were downloaded from the UCSC ftp site (RefSeq v.10, March 2005) . We also retrieved mouse cDNAs from the RIKEN database (Fantom 3.0) and the UCSC collection of mouse cDNAs (Mm7 assembly), while for fly and worm we used Flybase (FlyBase r4.2) and Wormbase (WormBase WS140), respectively [50–53].
The genomic position of each sequence was mapped on the corresponding genome by using BLAT  (human Build 35; mouse Build 34; zebra fish Zv4; fly Release 4; worm WS120). The pairs of genes whose genomic coordinates partially or totally overlap were extracted and grouped in OGCs. Filters were adopted to avoid (a) splice variants of the same gene, and (b) artifacts due to the position mapping. We considered each pair of cDNAs sharing three or more exons as splice variants of the same gene if more than 20% of the exon number overlapped. In the case of cDNAs with two or less exons, we considered them as splice variants if at least one residue overlapped at the exon level. For each group of predicted splice variants, only the longest gene was taken as gene representative. Artifacts such as the inclusion of the mRNA poly-A in the gene mapping were avoided by excluding all the 3' exons composed of more than 70% of one single nucleotide.
Statistical null model for the overlap formation
For all five species analyzed, the gene positions of the unique gene sets were randomly reassigned within the corresponding chromosomes with no constraints in the type of overlaps, the reciprocal arrangement, and the number of genes per cluster. The analysis was repeated for 10 rounds and the resulting number of overlapping genes, overlapping gene pairs, and overlapping gene clusters were counted at each round. The average number was considered for comparison with the observed dataset. Features of the OGCs, such as the reciprocal arrangement, the component distribution and the type of overlapping region were also analyzed.
The fraction of overlaps that results in sense/antisense complementarity at the mRNA level were calculated by extracting all overlap that occur on opposite strands and involve exons of both genes. The statistical significance of the difference between the observed and the random set was assessed by applying a chi-squared test (degree of freedom = 1) to the resulting 2 × 2 contingency matrix .
To test the specificity of the data produced, we performed a manual analysis of the D. rerio dataset (108 OGCs). No obvious false positive due to the methodology could be found. The sensitivity of our method was assessed by benchmarking the derived set against an extensive collection of overlapping genes previously reported. We included 8 independent large-scale screenings of human antisense transcripts/nested genes [9, 13, 27, 29, 30, 55–57] and about 100 experimental studies on specific overlapping gene pairs (Additional files 5 and 6). OGCs reported in the literature with no match in our dataset were checked manually. The main reasons for the lack of coverage were due to the selection criteria (i. e. we deliberately excluded pseudogenes or non-coding RNAs which were instead included in some large-scale screenings). Only 5 cases were found to be false negatives, giving an estimate specificity of 99%.
The orthology relationships between the overlapping genes in the five analyzed species were assessed by using a two-step procedure (Figure 3A). First, for all pairs of species we carried out all-against-all tBLASTx  between the corresponding cDNA sets. The best reciprocal hits between two species were assigned as orthologous genes. Secondly, we derived orthologous overlapping genes by extracting all overlapping genes conserved between each pair of species.
Gene structure analysis
We compared the gene structure of the conserved OGCs between human and mouse with human and mouse overlapping genes whose orthologs do not overlap but are adjacent in the genome of the other species. The first set (conserved overlapping genes between human and mouse; the first column in Table 3) was composed of 282 pairs of overlapping genes, while the second (overlapping in human but adjacent in mouse chromosomes; the second column in Table 3), and the third (overlapping in mouse but adjacent in human chromosomes; the third column in Table 3) were composed of 226 and 171 gene pairs, respectively. For each gene, we measured the gene length, defined as the genomic coordinates on the corresponding chromosome, and the exon numbers, as derived from the BLAT output. Using the Mann-Whitney U-test  we compared gene length difference between the first and the second sets, and between the first and the third sets to assess the statistical significance of the difference in gene structure.
We also analyzed the feature of the region (UTR or coding) involved in the overlap for all OGCs in the 3 sets, by counting the number of detectable overlaps after removing the UTRs. In this case, the statistical significance of the difference between the first and the second sets and the first and the third set were assessed by applying a chi-squared test (degree of freedom = 1) to the resulting 2 × 2 contingency matrix .
Analysis of OGC expression in breast cancer
cDNA was obtained from polyadenylated breast cancer RNA (purity 85–90%). cDNA was normalized after reverse transcription to obtain a balanced mix of low and high abundance mRNA, as previously described . 2.1 micrograms of normalized, double-strand cDNA were then converted to a single strand library using the 454 protocol . Two independent cDNA libraries were generated with an average length per sequence read of 100 and 200 nt, respectively. A total of 198,658 non-redundant sequence reads, according to NCBI non-redundant database, were sequenced from each breast cancer cDNA library. The entire library was mapped against the 249,953 sequences of the human "all_mrna" transcript dataset from the UCSC human genome. A total of 37,774 reads corresponding to a specific cDNA and its related isoforms was identified (requiring blat perfect matches, 95% of the read covered by alignment). The reads were then aligned to the human RefSeq cDNA dataset from UCSC (25,922 sequences) requiring perfect coverage. 9,082 distinct matches were finally obtained, which were used for the subsequent calculations.
Reads-to-gene assignment was performed by blasting the nucleotide sequences of all OGs to the library. Only reads showing 100% identity with a transcript were used in the analyses. To ensure the 454 sequences were unambiguously matched to the assigned transcript, we removed reads mapped to more than one locus. Since the 454 sequencing process does not involve in-vivo cloning and the cDNA is subjected to nebulization, in the deriving library it is not possible to assign the strand when the two transcripts overlap. Thus, we removed all sequence reads mapping to more than one gene within the same cluster. In total, 36 out of 3701 reads were removed, corresponding to an estimated loss of 0.9%, which likely did not create a significant bias.
The statistical significance for the enrichment of co-expression in overlapping gene pairs was evaluated by an upper cumulative distribution function.
We wish to thank Davide Rambaldi (IEO, Milan) for his help in retrieving the data needed for the simulation of the random distribution. We also thank Raoul Bonnal and Michele Iacono of ITB-CNR for contributing to the generation, sequencing and analysis of the 454 cDNA library sequences. This work was supported by the Start Up grant of AIRC to FDC and by "Borsa di studio per il perfezionamento all'estero" of the University of Milan to GS.
- Boi S, Solda' G, Tenchini ML: Shedding light on the dark side of the genome: overlapping genes in higher eukaryotes. Current Genomics. 2004, 5: 509-524. 10.2174/1389202043349020.View ArticleGoogle Scholar
- Makalowska I, Lin CF, Makalowski W: Overlapping genes in vertebrate genomes. Comput Biol Chem. 2005, 29 (1): 1-12. 10.1016/j.compbiolchem.2004.12.006.PubMedView ArticleGoogle Scholar
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447 (7146): 799-816. 10.1038/nature05874.
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C: The transcriptional landscape of the mammalian genome. Science. 2005, 309 (5740): 1559-1563. 10.1126/science.1112014.PubMedView ArticleGoogle Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308 (5725): 1149-1154. 10.1126/science.1108625.PubMedView ArticleGoogle Scholar
- Engstrom PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, Lavorgna G, Brozzi A, Luzi L, Tan SL, Yang L: Complex Loci in human and mouse genomes. PLoS Genet. 2006, 2 (4): e47-10.1371/journal.pgen.0020047.PubMedPubMed CentralView ArticleGoogle Scholar
- Kapranov P, Drenkow J, Cheng J, Long J, Helt G, Dike S, Gingeras TR: Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 2005, 15 (7): 987-997. 10.1101/gr.3455305.PubMedPubMed CentralView ArticleGoogle Scholar
- Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J: Antisense transcription in the mammalian transcriptome. Science. 2005, 309 (5740): 1564-1566. 10.1126/science.1112009.PubMedView ArticleGoogle Scholar
- Chen J, Sun M, Kent WJ, Huang X, Xie H, Wang W, Zhou G, Shi RZ, Rowley JD: Over 20% of human transcripts might form sense-antisense pairs. Nucleic Acids Res. 2004, 32 (16): 4812-4820. 10.1093/nar/gkh818.PubMedPubMed CentralView ArticleGoogle Scholar
- Galante PA, Vidal DO, de Souza JE, Camargo AA, de Souza SJ: Sense-antisense pairs in mammals: functional/evolutionary considerations. Genome Biol. 2007, 8 (3): R40-10.1186/gb-2007-8-3-r40.PubMedPubMed CentralView ArticleGoogle Scholar
- Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G: In search of antisense. Trends Biochem Sci. 2004, 29 (2): 88-94. 10.1016/j.tibs.2003.12.002.PubMedView ArticleGoogle Scholar
- Sun M, Hurst LD, Carmichael GG, Chen J: Evidence for variation in abundance of antisense transcripts between multicellular animals but no relationship between antisense transcriptionand organismic complexity. Genome Res. 2006, 16 (7): 922-933. 10.1101/gr.5210006.PubMedPubMed CentralView ArticleGoogle Scholar
- Veeramachaneni V, Makalowski W, Galdzicki M, Sood R, Makalowska I: Mammalian overlapping genes: the comparative perspective. Genome Res. 2004, 14 (2): 280-286. 10.1101/gr.1590904.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Y, Liu XS, Liu QR, Wei L: Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species. Nucleic Acids Res. 2006, 34 (12): 3465-3475. 10.1093/nar/gkl473.PubMedPubMed CentralView ArticleGoogle Scholar
- Lapidot M, Pilpel Y: Genome-wide natural antisense transcription: coupling its regulation to its different regulatory mechanisms. EMBO Rep. 2006, 7 (12): 1216-1222. 10.1038/sj.embor.7400857.PubMedPubMed CentralView ArticleGoogle Scholar
- Li AW, Murphy PR: Expression of alternatively spliced FGF-2 antisense RNA transcripts in the central nervous system: regulation of FGF-2 mRNA translation. Mol Cell Endocrinol. 2000, 170 (1-2): 233-242. 10.1016/S0303-7207(00)00440-8.PubMedView ArticleGoogle Scholar
- Munroe SH, Lazar MA: Inhibition of c-erbA mRNA splicing by a naturally occurring antisense RNA. J Biol Chem. 1991, 266 (33): 22083-22086.PubMedGoogle Scholar
- Peters NT, Rohrbach JA, Zalewski BA, Byrkett CM, Vaughn JC: RNA editing and regulation of Drosophila 4f-rnp expression by sas-10 antisense readthrough mRNA transcripts. Rna. 2003, 9 (6): 698-710. 10.1261/rna.2120703.PubMedPubMed CentralView ArticleGoogle Scholar
- Sleutels F, Zwart R, Barlow DP: The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature. 2002, 415 (6873): 810-813.PubMedView ArticleGoogle Scholar
- Tufarelli C, Stanley JA, Garrick D, Sharpe JA, Ayyub H, Wood WG, Higgs DR: Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease. Nat Genet. 2003, 34 (2): 157-165. 10.1038/ng1157.PubMedView ArticleGoogle Scholar
- Bejanin S, Cervini R, Mallet J, Berrard S: A unique gene organization for two cholinergic markers, choline acetyltransferase and a putative vesicular transporter of acetylcholine. J Biol Chem. 1994, 269 (35): 21944-21947.PubMedGoogle Scholar
- Martianov I, Ramadass A, Serra Barros A, Chow N, Akoulitchev A: Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript. Nature. 2007, 445 (7128): 666-670. 10.1038/nature05519.PubMedView ArticleGoogle Scholar
- Nekrutenko A, Wadhawan S, Goetting-Minesky P, Makova KD: Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay. PLoS Genet. 2005, 1 (2): e18-10.1371/journal.pgen.0010018.PubMedPubMed CentralView ArticleGoogle Scholar
- Prasanth KV, Prasanth SG, Xuan Z, Hearn S, Freier SM, Bennett CF, Zhang MQ, Spector DL: Regulating gene expression through RNA nuclear retention. Cell. 2005, 123 (2): 249-263. 10.1016/j.cell.2005.08.033.PubMedView ArticleGoogle Scholar
- Dahary D, Elroy-Stein O, Sorek R: Naturally occurring antisense: transcriptional leakage or real overlap?. Genome Res. 2005, 15 (3): 364-368. 10.1101/gr.3308405.PubMedPubMed CentralView ArticleGoogle Scholar
- Kiyosawa H, Yamanaka I, Osato N, Kondo S, Hayashizaki Y: Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. Genome Res. 2003, 13 (6B): 1324-1334. 10.1101/gr.982903.PubMedPubMed CentralView ArticleGoogle Scholar
- Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, Shoshan A, Diber A, Biton S, Tamir Y, Khosravi R: Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 2003, 21 (4): 379-386. 10.1038/nbt808.PubMedView ArticleGoogle Scholar
- Pang KC, Frith MC, Mattick JS: Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 2006, 22 (1): 1-5. 10.1016/j.tig.2005.10.003.PubMedView ArticleGoogle Scholar
- Lehner B, Williams G, Campbell RD, Sanderson CM: Antisense transcripts in the human genome. Trends Genet. 2002, 18 (2): 63-65. 10.1016/S0168-9525(02)02598-2.PubMedView ArticleGoogle Scholar
- Shendure J, Church GM: Computational discovery of sense-antisense transcription in the human and mouse genomes. Genome Biol. 2002, 3 (9): RESEARCH0044-10.1186/gb-2002-3-9-research0044.PubMedPubMed CentralView ArticleGoogle Scholar
- Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE: Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 2002, 3 (12): RESEARCH0083-10.1186/gb-2002-3-12-research0083.PubMedPubMed CentralView ArticleGoogle Scholar
- Keese PK, Gibbs A: Origins of genes: "big bang" or continuous creation?. Proc Natl Acad Sci USA. 1992, 89 (20): 9489-9493. 10.1073/pnas.89.20.9489.PubMedPubMed CentralView ArticleGoogle Scholar
- Osato N, Suzuki Y, Ikeo K, Gojobori T: Transcriptional Interferences in cis Natural Antisense Transcripts of Humans and Mice. Genetics. 2007, 176 (2): 1299-1306. 10.1534/genetics.106.069484.PubMedPubMed CentralView ArticleGoogle Scholar
- Prescott EM, Proudfoot NJ: Transcriptional collision between convergent genes in budding yeast. Proc Natl Acad Sci USA. 2002, 99 (13): 8796-8801. 10.1073/pnas.132270899.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu JS, Kokoska RJ, Khemici V, Steege DA: In-frame overlapping genes: the challenges for regulating gene expression. Mol Microbiol. 2007, 63 (4): 1158-1172. 10.1111/j.1365-2958.2006.05572.x.PubMedView ArticleGoogle Scholar
- Blumenthal T, Gleason KS: Caenorhabditis elegans operons: form and function. Nat Rev Genet. 2003, 4 (2): 112-120. 10.1038/nrg995.PubMedView ArticleGoogle Scholar
- Cavalcanti AR, Stover NA, Landweber LF: On the paucity of duplicated genes in Caenorhabditis elegans operons. J Mol Evol. 2006, 62 (6): 765-771. 10.1007/s00239-005-0203-3.PubMedView ArticleGoogle Scholar
- Numata K, Okada Y, Saito R, Kiyosawa H, Kanai A, Tomita M: Comparative analysis of cis-encoded antisense RNAs in eukaryotes. Gene. 2007, 392 (1–2): 134-141. 10.1016/j.gene.2006.12.005.PubMedView ArticleGoogle Scholar
- Sun M, Hurst LD, Carmichael GG, Chen J: Evidence for a preferential targeting of 3'-UTRs by cis-encoded natural antisense transcripts. Nucleic Acids Res. 2005, 33 (17): 5533-5543. 10.1093/nar/gki852.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu WP, Brenner S, Venkatesh B: Duplication, degeneration and subfunctionalization of the nested synapsin-Timp genes in Fugu. Trends Genet. 2003, 19 (4): 180-183. 10.1016/S0168-9525(03)00048-9.PubMedView ArticleGoogle Scholar
- Blencowe BJ: Alternative splicing: new insights from global analyses. Cell. 2006, 126 (1): 37-47. 10.1016/j.cell.2006.06.023.PubMedView ArticleGoogle Scholar
- Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003, 302 (5653): 2141-2144. 10.1126/science.1090100.PubMedView ArticleGoogle Scholar
- Modrek B, Lee CJ: Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 2003, 34 (2): 177-180. 10.1038/ng1159.PubMedView ArticleGoogle Scholar
- Sokal RR, Rohlf FJ: Biometry. 1995, New York, USA: W.H. Freeman & Company, 3Google Scholar
- Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J: Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007, 17 (6): 746-759. 10.1101/gr.5660607.PubMedPubMed CentralView ArticleGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMedPubMed CentralGoogle Scholar
- Chen J, Sun M, Hurst LD, Carmichael GG, Rowley JD: Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts. Trends Genet. 2005, 21 (6): 326-329. 10.1016/j.tig.2005.04.006.PubMedView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, D501-504. 33 Database
- The UCSC ftp Web site. [ftp://hgdownload.cse.ucsc.edu/]
- The C. elegans Genome Database. [http://www.wormbase.org/]
- RIKEN Mouse Genome Project database. [http://fantom.gsc.riken.go.jp/]
- The Drosophila melanogaster genome database. [http://flybase.net/]
- UCSC Genome Bionformatics Site. [http://genome.ucsc.edu/]
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Quere R, Manchon L, Lejeune M, Clement O, Pierrat F, Bonafoux B, Commes T, Piquemal D, Marti J: Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression. Nucleic Acids Res. 2004, 32 (20): e163-10.1093/nar/gnh161.PubMedPubMed CentralView ArticleGoogle Scholar
- Scherer SW, Cheung J, MacDonald JR, Osborne LR, Nakabayashi K, Herbrick JA, Carson AR, Parker-Katiraee L, Skaug J, Khaja R: Human chromosome 7: DNA sequence and biology. Science. 2003, 300 (5620): 767-772. 10.1126/science.1083423.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu P, Ma D, Xu M: Nested genes in the human genome. Genomics. 2005, 86 (4): 414-422. 10.1016/j.ygeno.2005.06.008.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Zhulidov PA, Bogdanova EA, Shcheglov AS, Vagner LL, Khaspekov GL, Kozhemyako VB, Matz MV, Meleshkevitch E, Moroz LL, Lukyanov SA: Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 2004, 32 (3): e37-10.1093/nar/gnh031.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.