Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures
- Jason M Bechtel†1,
- Thomas Wittenschlaeger†1, 6,
- Trisha Dwyer1, 2,
- Jun Song1, 2, 7,
- Sasi Arunachalam1, 3,
- Sadeesh K Ramakrishnan1, 4,
- Samuel Shepard1, 5 and
- Alexei Fedorov1, 5Email author
© Bechtel et al; licensee BioMed Central Ltd. 2008
Received: 21 February 2008
Accepted: 12 June 2008
Published: 12 June 2008
Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression.
We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena.
We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20–1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.
RNA secondary structures
Secondary structures (SS) are crucial elements for the biosynthesis and/or correct action of non-coding RNAs in mammals and other eukaryotes. Moreover, they are key regulators in the function and turnover of mRNA molecules. SS in pre-mRNAs regulate the splicing process [1–3]. In mature mRNAs, SS located in 5'-untranslated regions (5'-UTRs) signal for translational control [4, 5] and those located in 3'-untranslated regions (3'-UTRs) regulate sub-cellular localization and stability [6–8]. SS located inside protein-coding sequences could play a role in translational speed and stability [9, 10].
Prior studies of the strength of computer-predicted SS in mRNA have had conflicting conclusions [11, 12]. More importantly, these studies did not investigate the abundance of SS and considered only coding sequences. This spurred us to perform a bioinformatics investigation into the abundance of SS throughout mammalian genomes. Our results show that the existence of many energetically-strong SS is associated with the phenomenon of global mid-range inhomogeneity (MRI), manifest as nucleotide compositional relationships at a scale of 20 to 1000 bases throughout the genome. MRI appears as a strong tendency for the clustering of particular bases (e.g. C and G nucleotides, or G and A nucleotides) inside short regions of genomic sequences. This paper provides new approaches and tools to gain insights into this form of genomic inhomogeneity.
It is well established that the particular base (A, G, C, or T) that appears in a given position of a genomic sequence significantly depends upon the nearest bases surrounding its position [13, 14]. Consequently, the frequency (F) of a dinucleotide XY is often not equal to the product of the individual frequencies of nucleotides X and Y (F XY ≠ F X *F Y ). The highest interdependence of base frequencies is always observed for adjacent nucleotides. The ratio (FXY/(FX *FY)) for adjacent bases X and Y is known as a "genomic signature" . Genomic signatures as low as 0.22 (for the CG dinucleotide in mouse) and as high as 1.75 (for the GC dinucleotide in Campylobacter jejuni) have been recorded . The interdependence of base frequencies sharply drops with increasing distance. When the distance between nucleotides X and Y is more than six bases, their occurrence interdependency becomes negligible. Here, we refer to this type of interdependency between nucleotides separated from each other by a few positions as short-range inhomogeneity (SRI).
Also well recognized are long-range interdependencies in nucleotide frequencies on a scale of up to millions of bases, known as genomic isochores . It has been shown that isochores can be generally categorized according to their level of G+C content. Isochores defined by G+C content correspond to many other genomic phenomena. GC-rich isochores replicate later in S-phase, contain higher concentrations of genes, and have genes with shorter introns and untranslated regions. Moreover, GC-rich isochores tend to have an "open" chromatin structure and thus have higher rates of transcription . Higher G+C content isochores also experience higher recombination rates – perhaps lending support to the notion that higher recombination rates led to the creation of isochores through biased gene conversion . While the evolution and maintenance of isochores is subject to debate, their presence is indeed evidence of existing interdependencies in nucleotide composition on the scale of tens of thousands to millions of nucleotides. We will refer to this form of non-randomness in genomic nucleotide composition as long-range inhomogeneity.
The compositional non-randomness between the two extremes described above we call mid-range inhomogeneity or MRI. MRI has yet to be thoroughly investigated. The only well-known manifestation of mid-range inhomogeneity is CpG islands. Most attempts to define CpG islands set hard requirements for region size (at least 200 or 500 bases long), G+C content (> 50% or 55%), and CpG observed/expected ratio (> 0.6 or 0.65) [[19, 20], respectively]. CpG-islands are found near 60% of human genes, including all housekeeping genes and about half of the tissue-specific genes . Here we demonstrate that MRI can be observed for regions from 30–1000 bp and is significant not only for G+C content but for other nucleotide pairings (A+G and G+T) as well as for the individual bases.
Analysis of strong local SS within pre-mRNAs
Percentage of GC-composition in different regions of pre-mRNA for diverse animal species.
Analysis of strong local SS in randomized sequences
To evaluate the abundance of local SS, one must compare their prevalence in naturally occurring mRNAs with their levels in reference sequences having no selection for SS. In most research, reference sequences are randomly generated to have nucleotide compositions approximating those of the naturally occurring mRNAs. In order to properly compare local SS in natural and randomized sequences one needs to preserve short-range inhomogeneity (SRI), as discussed previously by Workman and Krogh .
Excerpt from oligonucleotide frequency table for 5'-UTRs of 11,315 human genes and two SRI-generated counterparts. The entire dataset is presented in Additional file 1.
Random 1 SRI-generated
Random 2 SRI-generated
Protein-coding sequences (CDS) contain a profound 3 nt periodicity and other non-randomness associated with translational properties . All of this information would be lost in SRI-generated sequences. To overcome this problem we created CDS-generator, a public resource for the randomization of protein-coding sequences . CDS-generator changes only the variable nucleotides in the third codon position, which do not change the coded amino acids. Additionally, CDS-generator maintains the codon and dicodon biases of a given set of natural coding sequences. Thus, randomization by CDS-generator is much weaker than randomization by SRI-generator since it retains > 70% sequence identity between the natural and random sequences. On the other hand, maintaining a considerable level of sequence identity is useful because it preserves the major periodicity characteristics of the source coding sequences. Figure 2E demonstrates that natural coding sequences have twice the number of strong local SS as randomized sequences obtained by CDS-generator. The chi-square test confirms that the difference is statistically significant (p < 10-200).
Mid-range inhomogeneity in natural genomic sequences
Association of MRI with the over-abundance of strong local SS
Finally, we created a program named MRI-generator  for obtaining random sequences having the same oligonucleotide composition and also the same MRI pattern in GC-composition as a specified set of natural sequences. This program works by producing an excessively long SRI-generated sequence and then discarding segments with intermediate GC-content to obtain the desired pattern of GC-rich and CG-poor regions. Thus, the output sequence from MRI-generator has a Genomic-MRI pattern of GC-rich and GC-poor regions very similar to that of the natural sequence.
Comparison of natural sequences with their MRI-generated counterparts for each genomic sequence category (5'-UTRs, 3'-UTRs, introns, intergenic regions, and CDS) shows that they each have approximately the same number (5–10% difference) of strong local SS, as illustrated in Figure 2F for 3'-UTRs. This finding supports the conclusion that the abundance of strong SS in all parts of the mammalian genome (mRNA, introns, intergenic regions) is associated with the MRI of these sequences.
DNA repetitive elements and genomic MRI
Even though human interspersed repeats do not show an excess of strong SS as discussed above, they do influence the patterns of MRI [see Additional file 4]. The figure in additional file 4 illustrates the MRI patterns of the extra-large first intron of the DMD gene (see Figure 4) after masking its repetitive elements by RepeatMasker. Unsurprisingly, the number of MRI regions in the masked sequence is a fraction of those in its non-masked counterpart. The masked sequence contains 41% N's instead of A, G, C, or T bases. The current version of MRI-analyzer skips a window containing any non-A, G, C, or T character. For a proper comparison of MRI patterns in a masked sequence, one should compare it to the SRI-generated random sequence based on the masked sequence. Such random sequences contain the same number of N's at exactly at the same positions as the natural masked sequence. The figure in Additional file 4 demonstrates that the masked sequence of the first DMD intron has 3 to 12 times the number of MRI peaks compared to its random counterpart. This particular example with the DMD intron presents an AT-rich sequence (67% of A+T), which is typical for extra-large introns . Accordingly, we set the upper threshold for GC-composition to 60% in studying this sequence. Under such conditions, we observe GC-rich MRI peaks overlapping various portions of Alu-repeats. This overlapping of MRI regions with repetitive elements seems to depend on the threshold used and the G+C-composition of the region under analysis.
We have demonstrated an association between MRI in GC composition and the abundance of strong SS in genomic sequences. There are at least two possible interpretations of these results. First, one can argue that MRI causes the abundance of strong SS. The second possibility is that selection for strong SS was the reason for the appearance of MRI. Both views have merit and we thus include a discussion of the supporting evidence.
Central to this discussion is the observation that MRI exists not only in mRNA sequences but also in introns and intergenic regions. If selection were limited to transcripts or to mature mRNAs, there would be no way for evolution to directly drive the creation of SS in non-transcribed regions. This would leave MRI in GC composition as a potential mediator of strong SS enrichment. However, some experimental evidence suggests that much more of the genome is transcribed than previously thought . It has also been suggested for some time that SS play a role in the initiation of recombination. This theory predicts positive selection for SS throughout genomes and especially within introns and intergenic regions [31–33]. Moreover, studies of coding sequences in mammals have found that synonymous substitutions tend to increase the strength of SS and regulate mRNA stability [34–36]. Thus, SS could have emerged first due to selection for DNA hairpins to facilitate homologous recombination and for stable mRNA SS signals, yielding MRI in GC content as a by-product. On the other hand, MRI is also observed for AG- and GT-content as well as for the individual nucleotides (see Figures 6 and 7), so it is also possible that selection for MRI is a fundamental force driving genome organization and composition.
It is of special interest to investigate possible biological roles for MRI in the structural and functional organization of mammalian genomes. To address this important issue, we have studied 3.3 million point mutations occurring over the last 10 million years in humans and over 3.9 million SNPs in the MRI-regions and outside them. These results will be detailed in our next publication (under preparation). Based on the preliminary results of these investigations, we can state that MRI patterns are formed by a combination of processes. Some patterns (e.g. A+T-rich regions) are like cellular automata, based on non-selection biases in nucleotide changes at genomic regions with specific base compositions, while other patterns are formed by a strong fixation bias (presumably positive selection of functional regions) that preserve particular base enrichments in corresponding regions (e.g. G+C-, purine-, and pyrimidine-rich). These forces drive mid-range non-randomness, shaping the human genome and potentially imparting additional layers of organizational complexity.
Indeed, an important feature of the human genome is that its vast array of genes is differentially expressed in hundreds of different cell types and subtypes. Moreover, at different stages of development and in response to diverse extracellular stimuli, gene expression must be finely tuned. To perform the enormous task of creating a human body composed of trillions of cells, the genome must contain a vast number of signals for gene regulation, the majority of which have yet to be discovered. We hypothesize that MRI could represent a novel class of genomic signals, based on overall composition and clustering of nucleotides rather than particular sequence motifs. To facilitate the testing of this hypothesis, we created a free, public Internet resource called "Genomic MRI" that allows one to run all programs described here without any programming knowledge. Additionally, all of these programs are freely available for downloading and off-line usage, primarily for computational biologists.
The programs SRI-analyzer, SRI-generator, MRI-analyzer, MRI-generator, and CDS-generator are available via our website. A link to the current location of the website will be maintained at our departmental project site .
Sequence randomization algorithm (SRI-generator)
There are several possible approaches for randomizing nucleotide sequences while maintaining their N-mer oligonucleotide frequency composition. The simplest approach would be to randomly choose N-mer oligonucleotides based on their frequency composition and tile them one after the other. However, this approach does not necessarily preserve the frequencies of shorter length oligonucleotides that one may observe in natural sequences. For example, the random concatenation of N-mers as tiles artificially introduces dinucleotide composition bias created from the border of two adjacent oligonucleotide tiles – producing an overrepresentation of CpG dinucleotides and the like that do not match the SRI natural sequences. Therefore we chose a different approach which generates a randomized sequence one nucleotide at a time moving in a 5' to 3' direction.
We generate randomized sequences in the following manner. First we choose the largest oligonucleotide size (N) that is sufficiently sampled. In practice, this means avoiding sizes for which some of the oligonucleotides are never encountered in the input sequence (i.e. occur with zero frequency). Throughout our study we used 4-mer oligonucleotides (N = 4) because they were consistently well sampled across all of our input sequences, including a single large intron in the Figure 2C. The starting oligonucleotide is chosen at random, abiding by the frequency table for oligonucleotides of the chosen size (N). Next, we observe the last (N-1) bases of our sequence, and append a base to the 3' end, following the N-mer oligonucleotide frequencies. For example, if N = 4 and GTC were the last three bases in the growing random sequence, the frequencies of GTCA, GTCT, GTCC, and GTCG would be used in randomly adding the next base. For instance, suppose these four oligomers have relative frequencies of 0.5, 0.1, 0.2, and 0.2, respectively. Then the random number generator will append 'A' with a probability of 0.5, 'T' with a probability of 0.1, 'C' with a probability of 0.2, and 'G' with probability of 0.2. This final step is then repeated until the randomized sequence reaches the length of the input sequence. In contrast to the tiling method, our approach preserves the frequencies of short oligonucleotides in addition to preserving the N-mer frequency composition.
Finally, we made our SRI-generator work properly with sequences that have masked repetitive elements (where all sequences of DNA repeats are replaced by N's by the RepeatMasker program). Any non-A, T, C, or G bases are copied from the source sequence over the output sequence. The random sequences thus contain the same number of N's (or other non-A, T, C, or G bases) in the same positions as in the natural sequences provided as input.
Several sophisticated algorithms are already available for the randomization of coding sequences [37, 38]. However, here we used our own randomization approach developed by AF in 2001 while working on a context-dependent codon bias project in the Walter Gilbert lab . We stayed with our program because we are familiar with the peculiarities of this type of randomization. In addition our approach gets the dicodon distribution of randomized sequences very close to that of the natural CDS.
1) We observe a gradual diminution of the difference between real and randomized sequences when using progressively larger oligonucleotides with the randomized sequence generation programs (SRI-generator and MRI-generator). The difference is not considerable, but it is noticeable. Therefore, we recommend the use of longer oligonucleotides in the construction of randomized sequences – to maximize the retention of short-range inhomogeneity – as long as the rarest oligonucleotide in the corresponding frequency table occurs at least ten times. We use tetramer frequency tables throughout the manuscript for the sake of consistency and since they can safely be used for analyses of individual loci having as little as 100 kb.
WARNING: In MRI-generator it is easy to shift the nucleotide content level of generated sequences by using thresholds that do not balance the number of content-rich and content-poor regions. One must experiment with the thresholds and use SRI-analyzer to confirm that the content of the MRI-generated sequence approximates that of the source sequence.
2) The graphical output provided with the online version of MRI-analyzer serves only as a quick visual aid. The true output is represented by large tab-delimited files containing a record for each window in the analysis. Each record contains flags indicating a content-rich or content-poor window and, for those records where one of the thresholds has been crossed, the corresponding sequence.
3) All programs are written in Perl and may be freely downloaded from the website. They are licensed under version 3 of the GNU General Public License (GPL).
4) The RNALfold program from version 1.6.1 of the Vienna RNA package was utilized locally on our computers with default parameters.
Source for gene sample set
Our sample of 11,315 non-redundant human genes (with < 50% sequence identities between each other) was obtained from the human Exon-Intron Database, release 35p1 . Samples of intergenic regions were obtained from Genbank human genome files build 36 based on the records from the Feature Tables. We used only plus strands for calculations because there are only fluctuation differences between plus and minus strands in the non-coding regions of mammalian genomes. Also, plus and minus strands have the same G+C and A+T compositions. All these samples are available from our departmental project site .
This project is supported by NSF Career award MCB-0643542. We thank Peter Bazeley, University of Toledo, for his computational support and discussion of our algorithms.
- Buratti E, Baralle FE: Influence of RNA secondary structure on the pre-mRNA splicing process. Mol Cell Biol. 2004, 24: 10505-10514. 10.1128/MCB.24.24.10505-10514.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Antequera F: Structure, function and evolution of CpG island promoters. Cellular and Molecular Life Sciences. 2003, 60: 1647-1658. 10.1007/s00018-003-3088-6.PubMedView ArticleGoogle Scholar
- Marashi SA, Eslahchi C, Pezeshk H, Sadeghi M: Impact of RNA structure on the prediction of donor and acceptor splice sites. BMC Bioinformatics. 2006, 7: 297-10.1186/1471-2105-7-297.PubMedPubMed CentralView ArticleGoogle Scholar
- Kozak M: Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene. 2005, 361: 13-37. 10.1016/j.gene.2005.06.037.PubMedView ArticleGoogle Scholar
- Pickering BM, Willis AE: The implications of structured 5' untranslated regions on translation and disease. Semin Cell Dev Biol. 2005, 16: 39-47. 10.1016/j.semcdb.2004.11.006.PubMedView ArticleGoogle Scholar
- Chabanon H, Mickleburgh I, Hesketh J: Zipcodes and postage stamps: mRNA localisation signals and their trans-acting binding proteins. Brief Funct Genomic Proteomic. 2004, 3: 240-256. 10.1093/bfgp/3.3.240.PubMedView ArticleGoogle Scholar
- Chen JM, Ferec C, Cooper DN: A systematic analysis of disease-associated variants in the 3' regulatory regions of human protein-coding genes II: the importance of mRNA secondary structure in assessing the functionality of 3' UTR variants. Hum Genet. 2006, 120: 301-33. 10.1007/s00439-006-0218-x.PubMedView ArticleGoogle Scholar
- Svoboda P, Di Cara A: Hairpin RNA: a secondary structure of primary importance. Cell Mol Life Sci. 2006, 63: 901-8. 10.1007/s00018-005-5558-5.PubMedView ArticleGoogle Scholar
- Meyer IM, Miklos I: Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res. 2005, 33: 6338-6348. 10.1093/nar/gki923.PubMedPubMed CentralView ArticleGoogle Scholar
- Shabalina SA, Ogurtsov AY, Spiridonov NA: A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res. 2006, 34: 2428-2437. 10.1093/nar/gkl287.PubMedPubMed CentralView ArticleGoogle Scholar
- Seffens W, Digby D: mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res. 1999, 27: 1578-1584. 10.1093/nar/27.7.1578.PubMedPubMed CentralView ArticleGoogle Scholar
- Workman C, Krogh A: No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 1999, 27: 4816-4822. 10.1093/nar/27.24.4816.PubMedPubMed CentralView ArticleGoogle Scholar
- Karlin S, Campbell AM, Mrázek J: Comparative DNA analysis across diverse genomes. Annu Rev Genet. 1998, 2: 185-225. 10.1146/annurev.genet.32.1.185.View ArticleGoogle Scholar
- Karlin S, Burge C: Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 1995, 11: 283-290. 10.1016/S0168-9525(00)89076-9.PubMedView ArticleGoogle Scholar
- Campbell A, Mrázek J, Karlin S: Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. PNAS. 1999, 96: 9184-9189. 10.1073/pnas.96.16.9184.PubMedPubMed CentralView ArticleGoogle Scholar
- Bernardi G: The Vertebrate Genome: Isochores and Evolution. Mol Biol Evol. 1993, 10: 186-204.PubMedGoogle Scholar
- Bernardi G: The neoselectionist theory of genome evolution. PNAS. 2007, 104: 8385-8390. 10.1073/pnas.0701652104.PubMedPubMed CentralView ArticleGoogle Scholar
- Duret L, Eyre-Walker A, Galtier N: A new perspective on isochore evolution. Gene. 2006, 385: 71-74. 10.1016/j.gene.2006.04.030.PubMedView ArticleGoogle Scholar
- Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol. 1987, 196: 261-282. 10.1016/0022-2836(87)90689-9.PubMedView ArticleGoogle Scholar
- Takai D, Jones PA: The CpG island searcher: a new WWW resource. In Silico Biol. 2003, 3 (3): 235-240.PubMedGoogle Scholar
- Hackenberg M, Previti C, Luque-Escamilla PL, Carpena P, Martínez-Aroza J, Oliver JL: CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics. 2006, 7: 446-10.1186/1471-2105-7-446.PubMedPubMed CentralView ArticleGoogle Scholar
- Mathews DH: Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics. 2005, 21: 2246-2253. 10.1093/bioinformatics/bti349.PubMedView ArticleGoogle Scholar
- Hofacker IL: Vienna RNA secondary structure server. Nucl Acids Res. 2003, 31: 3429-3431. 10.1093/nar/gkg599.PubMedPubMed CentralView ArticleGoogle Scholar
- Kishore S, Stamm S: The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science. 2006, 311: 230-232. 10.1126/science.1118265.PubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucl Acids Res. 2006, 34: D140-D144. 10.1093/nar/gkj112.PubMedPubMed CentralView ArticleGoogle Scholar
- All described programs are freely available via our web site: Genomic Mid-Range Inhomogeneity. [http://hsc.utoledo.edu/depts/bioinfo/gmri/]
- Karlin S, Mrázek J: What drives codon choices in human genes?. J Mol Biol. 1996, 262: 459-472. 10.1006/jmbi.1996.0528.PubMedView ArticleGoogle Scholar
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 2004, [http://www.repeatmasker.org]Google Scholar
- Fedorov A, Saxonov S, Gilbert W: Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res. 2002, 30: 1192-1197. 10.1093/nar/30.5.1192.PubMedPubMed CentralView ArticleGoogle Scholar
- Kapranov P, Willingham AT, Gingeras TR: Genome-wide transcription and the implications for genomic organization. Nat Rev Genet. 2007, 8: 413-423. 10.1038/nrg2083.PubMedView ArticleGoogle Scholar
- Forsdyke DR: A stem-loop kissing model for the initiation of recombination and the origin of introns. Mol Biol Evol. 1995, 12: 949-958.PubMedGoogle Scholar
- Forsdyke DR: Stem-loop potential in MHC genes: A new way of evaluating positive Darwinian selection?. Immunogenetics. 1996, 43: 182-189.PubMedView ArticleGoogle Scholar
- Forsdyke DR: An alternative way of thinking about stem-loops in DNA. A case study of the human GOS2 gene. J Theor Biol. 1998, 192: 489-504. 10.1006/jtbi.1998.0674.PubMedView ArticleGoogle Scholar
- Chamary JV, Hurst LD: Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 2005, 6: R75-10.1186/gb-2005-6-9-r75.PubMedPubMed CentralView ArticleGoogle Scholar
- Duan JB, Wainright MS, Comeron JM, Saitou N, Sanders AR, Gelernter J, Gejman PV: Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. Hum Mol Genet. 2003, 12: 205-216. 10.1093/hmg/ddg055.PubMedView ArticleGoogle Scholar
- Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L: Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science. 1996, 314: 1930-1933. 10.1126/science.1131262.View ArticleGoogle Scholar
- Katz L, Burge CB: Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 2003, 9: 2042-2051. 10.1101/gr.1257503.View ArticleGoogle Scholar
- Down T, Leong B, Hubbard TJ: A machine learning strategy to identify candidate binding sites in human protein-coding sequence. BMC Bioinformatics. 2006, 7: 419-10.1186/1471-2105-7-419. Fedorov A, Saxonov S, Gilbert W: Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Research 2002, 30:1192–1197PubMedPubMed CentralView ArticleGoogle Scholar
- Shepelev V, Fedorov A: Advances in the Exon-Intron Database (EID). Briefings in Bioinformatics. 2006, 7: 178-85. 10.1093/bib/bbl003.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.