Co-occurrence of transcription and translation gene regulatory features underlies coordinated mRNA and protein synthesis
© Tamarkin-Ben-Harush et al.; licensee BioMed Central Ltd. 2014
Received: 10 April 2014
Accepted: 14 August 2014
Published: 19 August 2014
Variability in protein levels is generated through intricate control of the different gene decoding phases. Presently little is known about the links between the various gene expression stages. Here we investigated the relationship between transcription and translation regulatory properties encoded in mammalian genes.
We found that the TATA-box, a core promoter element known to enhance transcriptional output, is associated not only with higher mRNA levels but also with positive translation regulatory features and elevated translation efficiency. Further investigation revealed general association between transcription and translation regulatory trends. Specifically, translation inhibitory features such as the presence of upstream AUG (uAUG) and increased lengths of the 5′UTR, the coding sequence and the 3′UTR, are strongly associated with lower translation as well as lower transcriptional rate.
Our findings reveal that co-occurrence of several gene-encoded transcription and translation regulatory features with the same trend substantially contributes to the final mRNA and protein expression levels and enables their coordination.
KeywordsTATA-box TATA-less Transcription Translation uAUG 5′UTR 3′UTR uORF
Expression of protein-encoding gene in eukaryotes is an intricate process that includes several distinct steps of transcription, mRNA processing and mRNA translation. Each of these stages is controlled by cis-regulatory elements present in the DNA and the mRNA. Transcription is governed by two major types of DNA elements, enhancer and core promoter. Enhancer elements serve as binding sites for transcription regulatory factors and can function independently of their position. Core promoter elements, such as TATA-box and Initiator, are situated around the transcription start site (TSS) and are the sites on which the basal transcription machinery assembles. As such these elements have central role in determining promoter strength [1–3].
Cis-regulatory elements present in the mRNA are central to the control of protein synthesis. Specifically the nucleotide sequence surrounding the initiating AUG [4, 5], the presence of an AUG(s) upstream to the main ORF (uAUGs) [4, 6–8], the lengths of the 5′ and 3′ un-translated regions (UTRs) and the occurrence of stem–loop structures in the 5′UTR [9–14], all influence the rate of protein synthesis. Previous genomic and functional studies suggest that uAUGs act to reduce translation of the downstream ORF either of specific genes or globally [4, 6–8]. The presence of uAUG in eukaryotic mRNAs is highly prevalent, reaching almost half of protein coding genes [6, 15].
Here we investigated the relationship between various regulatory features of transcription and translation encoded in mammalian genes using bioinformatics and functional analyses. Our findings revealed remarkable coupling of several regulatory features that act in the same direction which substantially contribute to mRNA and protein levels and facilitate their coordinated expression.
The highly transcribed TATA-box genes have lower frequency of uAUG
TATA-box genes lacking uAUG are associated with positive translation regulatory features and higher translation
Considering that TATA genes are associated with positive translational features we would expect these highly transcribed genes to be efficiently translated. Therefore the two datasets were used to compare the translation levels of TATA and TATA-less genes, containing or lacking uAUG. We observed that among the uAUG-less genes, the TATA set showed significantly higher ribosomal density levels than that of the TATA-less set both in MEFs and mESCs (Figure 4B). While with the MEFs no significant differences were seen between the uAUG genes, with the ESCs the ribosomal density of the TATA set was higher (Figure 4C). Together, the analysis of the regulatory features and translational activities support the notion that regulatory traits in transcription and translation were evolved to act in a similar trend.
Co-occurrence of translation and transcription regulatory trends
To examine further the relationship between mRNA levels and translational features observed for mouse genes, we similarly analyzed human gene expression data that was downloaded from the gene expression atlas SymAtlas v1.2.3. This database contains expression data of thousands of human genes from 79 tissues and cell types. We determined the average expression of each gene in all tissues, setting a threshold of 200, a value that is above background. Then we determined the distribution of the average expression of each gene in uAUG-less and uAUG sets using boxplots. Here again it appears that human uAUG-less genes tend to have significantly higher levels of mRNA than uAUG genes (Figure 5B). This is particularly highlighted in the upper 50% of the gene population that is distributed more towards the higher expression levels, both in the human and the mouse data (Figure 5A and B). A similar pattern is observed when the maximal, rather than the average expression of genes is analyzed (Additional file 1: Figure S2). The number of tissues in which each gene is expressed was also determined in the two gene sets, and we found that uAUG-less genes tend to be expressed in more tissues than uAUG genes (Figure 5C).
The analysis shown in Figure 5A and B is derived from steady state mRNA levels data. To examine whether the transcription process directly contributes to the differences seen in mRNA levels between uAUG-less and uAUG gene we retrieved RNA-seq data from a global Nuclear Run-On experiment (Gro-seq), which directly measures the level of ongoing transcription for all genes . The same general trend was found, as uAUG-less genes display higher levels of transcription than those with uAUG (Figure 5D). These findings support the idea that uAUG genes are less efficiently transcribed than uAUG-less genes. It has been previously suggested that uAUG is associated with a shorter mRNA half-life , therefore it can be presumed that the combined effects of lower transcription and elevated decay rates are responsible for the marked difference in the steady state mRNA levels (Figure 5A and B). We also analyzed the transcription efficiency of TATA and TATA-less genes divided into uAUG-less and uAUG subsets. The results revealed the expected differences between TATA and TATA-less genes but remarkably this difference is much less dramatic among the uAUG genes (Figure 5E).
An important parameter that is known to influence transcription efficiency is the gene length [20, 30]. Upon analysis we found substantial differences in gene length between uAUG-less and uAUG genes in human and in mouse (Figure 5F), the median gene length in uAUG-less genes being almost half of that in uAUG genes. Exon count analysis showed the same trend (Additional file 1: Figure S3).
Next, expression data from multiple tissues of 6804 human genes were divided with into top 25% and bottom 75% expressed genes and determined the percentage of uAUG bearing genes in the two groups and found that the top 25% gene set has lower uAUG genes than the bottom 75% set (Additional file 1: Figure S4A). The prevalence of uAUG in the top 10% expressed genes is even lower. A similar trend was observed with the GRO-seq data (data not shown). To examine whether it is just the presence of the uAUG that is associated with the reduced mRNA levels we compared the translational regulatory features of the top 25% and bottom 75% expressed genes (at the mRNA level) within the same class, either uAUG-less or uAUG (Additional file 1: Figure S4B-E). While no significant difference is observed in the 5′UTR length between the high and the low expressing genes, clear and marked differences are seen with the lengths of 3′UTR, CDS and overall gene size, these features being much shorter in the higher expressing set both in uAUG-less and uAUG groups. These findings clearly show that while uAUG and the 5′UTR length are important, they are insufficient to account for the association between translational and transcriptional features reinforcing that the co-occurrence of other features also contribute to final expression levels.
Discussion and conclusions
The link between transcriptional and translational features is not limited to the TATA gene class as analysis of structural organization and functional features of mammalian genes bearing or lacking uAUG, revealed close association with mRNA levels and transcriptional rate as well as other translation regulatory traits that display the same trend. The negative correlation between mRNA abundance and uAUG has been previously noted [23, 27] but this was mostly attributed to the reduction in steady-state mRNA levels and mRNA half-life. The analysis of the Gro-Seq data which reflects only mRNA synthesis rate, revealed that the lower mRNA level associated with uAUG is a combination of lower synthesis with lower stability.
Several recent studies uncovered the associations between features present in the mRNA and protein abundance in yeast and mammalian cells (for example see [27, 28, 33]). These studies also reported that mRNA features, such as lengths of the 5′UTR, 3′UTR and CDS as well as uORF are associated with each other as we also find here. Our analysis further extends these findings by demonstrating that the translational activity of each mRNA is correlated with mRNA abundance. Furthermore, we show that 5′UTR, 3′UTR and CDS as well as uORF are also associated with genomic features that influence the rate of mRNA synthesis, in particular promoter features (TATA vs. TATA-less) and gene length.
A major biological implication resulting from our findings is the ability of the eukaryotic cell to synchronize, to some extent, the transcription and translation rates, through various regulatory features operating in the same direction. The gene ontology analysis (Additional file 1: Table S1) which revealed enrichment of functional categories associated with uAUG and uAUG-less genes, provides examples for coordination of expression level with biological activity. For instance, transcription factors are known to be transcribed at low levels and indeed these factors have higher prevalence of uAUG. On the other hand structural components such as nucleosomal proteins that are highly expressed at the mRNA and protein levels tend to lack uAUG.
In summary, the analysis of transcription and translation data reported here revealed significant association between mRNA levels that reflect transcriptional activity and decay, and translation efficiency.
Selection of genes and analyzing their features
Classification of genes according to their function was done using the gene-annotation enrichment analysis (http://david.abcc.ncifcrf.gov/).
Identification of TATA-box bearing genes
The nucleotides sequences from −40 to −15 upstream to the UCSC TSS, were retrieved. Using the ‘pattern matching’ tool in Regulatory Sequence Analysis Tools (RSAT) site (http://rsat.ulb.ac.be/) we searched for the TATA-box sequence of TATAWAG (allowing zero to one mismatch) in this region. The RefSeq output was transformed into official gene name to remove duplicates.
Gene expression data was analyzed as previously described . The transcription Gro-Seq data was retrieved from Core et al., . To avoid bias generated by proximal promoter pausing the reads of the first 1 kb were avoided. The number of reads divided by the gene length (minus 1 kb), reflected the transcription level of each gene. A ratio below 5 reads/kb was considered background. For mouse mRNA levels the total RNA-seq data derived from the ribosomal profiling studies [25, 26] were used.
Statistical analyses of gene features
The distributions of the gene features are skewed therefore non-parametric procedures were used to compare between the groups features, Mann–Whitney U-test for two and Kruskal-Wallis test for more than two samples. The differences between uAUG prevalence among the groups was analyzed by chi-square test and the Spearman’s rank correlation coefficient analysis between MEF’s translation efficiencies and the length of mRNA features was performed using the STATSTICA® software.
Cap Analysis of Gene Expression
Global Nuclear Run-On experiment
Mouse embryonic fibroblasts
Mouse embryonic stem cells
Regulatory Sequence Analysis Tools
Transcription start site
We are grateful to Eliezer Dikstein for his assistance in programming, Dr. Sandra Moshonov for critical reading and editing of the manuscript, Tali Shalit (INCPM Unit, WIS) for processing the Gro-seq data, Nicholas T. Ingolia (UCSF) and Carson C. Thoreen (Whitehead Institute for Biomedical Research) for providing the RNA-seq data of the ribosomal profiling. This work was supported by grants from the Minerva Foundation (#711124) and the Israel Science Foundation (#1168/13). R.D. is the incumbent of the Ruth and Leonard Simon Chair of Cancer Research.
- Dikstein R: The unexpected traits associated with core promoter elements. Transcription. 2011, 2 (5): 201-206. 10.4161/trns.2.5.17271.PubMed CentralPubMedView ArticleGoogle Scholar
- Juven-Gershon T, Hsu JY, Kadonaga JT: Perspectives on the RNA polymerase II core promoter. Biochem Soc Trans. 2006, 34 (Pt 6): 1047-1050.PubMedView ArticleGoogle Scholar
- Smale ST, Kadonaga JT: The RNA polymerase II core promoter. Annu Rev Biochem. 2003, 72: 449-479. 10.1146/annurev.biochem.72.121801.161520.PubMedView ArticleGoogle Scholar
- Kozak M: Selection of initiation sites by eucaryotic ribosomes: effect of inserting AUG triplets upstream from the coding sequence for preproinsulin. Nucleic Acids Res. 1984, 12 (9): 3873-3893. 10.1093/nar/12.9.3873.PubMed CentralPubMedView ArticleGoogle Scholar
- Kozak M: Initiation of translation in prokaryotes and eukaryotes. Gene. 1999, 234 (2): 187-208. 10.1016/S0378-1119(99)00210-3.PubMedView ArticleGoogle Scholar
- Calvo SE, Pagliarini DJ, Mootha VK: Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci U S A. 2009, 106 (18): 7507-7512. 10.1073/pnas.0810916106.PubMed CentralPubMedView ArticleGoogle Scholar
- Meijer HA, Thomas AA: Control of eukaryotic protein synthesis by upstream open reading frames in the 5′-untranslated region of an mRNA. Biochem J. 2002, 367 (Pt 1): 1-11.PubMed CentralPubMedView ArticleGoogle Scholar
- Morris DR, Geballe AP: Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol. 2000, 20 (23): 8635-8642. 10.1128/MCB.20.23.8635-8642.2000.PubMed CentralPubMedView ArticleGoogle Scholar
- Kozak M: Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc Natl Acad Sci U S A. 1986, 83 (9): 2850-2854. 10.1073/pnas.83.9.2850.PubMed CentralPubMedView ArticleGoogle Scholar
- Kozak M: Structural features in eukaryotic mRNAs that modulate the initiation of translation. The J Biological chemistry. 1991, 266 (30): 19867-19870.Google Scholar
- Kozak M: A short leader sequence impairs the fidelity of initiation by eukaryotic ribosomes. Gene Expr. 1991, 1 (2): 111-115.PubMedGoogle Scholar
- Kozak M: Effects of long 5′ leader sequences on initiation by eukaryotic ribosomes in vitro. Gene Expr. 1991, 1 (2): 117-125.PubMedGoogle Scholar
- Kuersten S, Goodwin EB: The power of the 3′ UTR: translational control and development. Nat Rev Genet. 2003, 4 (8): 626-637. 10.1038/nrg1125.PubMedView ArticleGoogle Scholar
- Mazumder B, Seshadri V, Fox PL: Translational control by the 3′-UTR: the ends specify the means. Trends Biochem Sci. 2003, 28 (2): 91-98. 10.1016/S0968-0004(03)00002-1.PubMedView ArticleGoogle Scholar
- Wethmar K, Barbosa-Silva A, Andrade-Navarro MA, Leutz A: uORFdb--a comprehensive literature database on eukaryotic uORF biology. Nucleic Acids Res. 2014, 42: 60-67.View ArticleGoogle Scholar
- Amir-Zilberstein L, Ainbinder E, Toube L, Yamaguchi Y, Handa H, Dikstein R: Differential regulation of NF-kappaB by elongation factors is determined by core promoter type. Mol Cell Biol. 2007, 27 (14): 5246-5259. 10.1128/MCB.00586-07.PubMed CentralPubMedView ArticleGoogle Scholar
- Blake WJ, Balazsi G, Kohanski MA, Isaacs FJ, Murphy KF, Kuang Y, Cantor CR, Walt DR, Collins JJ: Phenotypic consequences of promoter-mediated transcriptional noise. Mol Cell. 2006, 24 (6): 853-865. 10.1016/j.molcel.2006.11.003.PubMedView ArticleGoogle Scholar
- Hoopes BC, LeBlanc JF, Hawley DK: Contributions of the TATA box sequence to rate-limiting steps in transcription initiation by RNA polymerase II. J Mol Biol. 1998, 277 (5): 1015-1031. 10.1006/jmbi.1998.1651.PubMedView ArticleGoogle Scholar
- Marbach-Bar N, Ben-Noon A, Ashkenazi S, Harush AT, Avnit-Sagi T, Walker MD, Dikstein R: Disparity between microRNA levels and promoter strength is associated with initiation rate and Pol II pausing. Nat Commun. 2013, 4: 2118-PubMedView ArticleGoogle Scholar
- Moshonov S, Elfakess R, Golan-Mashiach M, Sinvani H, Dikstein R: Links between core promoter and basic gene features influence gene expression. BMC Genomics. 2008, 9 (1): 92-10.1186/1471-2164-9-92.PubMed CentralPubMedView ArticleGoogle Scholar
- Wobbe CR, Struhl K: Yeast and human TATA-binding proteins have nearly identical DNA sequence requirements for transcription in vitro. Mol Cell Biol. 1990, 10 (8): 3859-3867.PubMed CentralPubMedView ArticleGoogle Scholar
- Iacono M, Mignone F, Pesole G: uAUG and uORFs in human and rodent 5′untranslated mRNAs. Gene. 2005, 349: 97-105.PubMedView ArticleGoogle Scholar
- Matsui M, Yachie N, Okada Y, Saito R, Tomita M: Bioinformatic analysis of post-transcriptional regulation by uORF in human and mouse. FEBS Lett. 2007, 581 (22): 4184-4188. 10.1016/j.febslet.2007.07.057.PubMedView ArticleGoogle Scholar
- Mignone F, Gissi C, Liuni S, Pesole G: Untranslated regions of mRNAs. Genome Biol. 2002, 3 (3): REVIEWS0004-PubMed CentralPubMedView ArticleGoogle Scholar
- Thoreen CC, Chantranupong L, Keys HR, Wang T, Gray NS, Sabatini DM: A unifying model for mTORC1-mediated regulation of mRNA translation. Nature. 2012, 485 (7396): 109-113. 10.1038/nature11083.PubMed CentralPubMedView ArticleGoogle Scholar
- Ingolia NT, Lareau LF, Weissman JS: Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011, 147 (4): 789-802. 10.1016/j.cell.2011.10.002.PubMed CentralPubMedView ArticleGoogle Scholar
- Vogel C, Abreu Rde S, Ko D, Le SY, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO: Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol Syst Biol. 2010, 6: 400-PubMed CentralPubMedView ArticleGoogle Scholar
- Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M: Global quantification of mammalian gene expression control. Nature. 2011, 473 (7347): 337-342. 10.1038/nature10098.PubMedView ArticleGoogle Scholar
- Core LJ, Waterfall JJ, Lis JT: Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008, 322 (5909): 1845-1848. 10.1126/science.1162228.PubMed CentralPubMedView ArticleGoogle Scholar
- Swinburne IA, Miguez DG, Landgraf D, Silver PA: Intron length increases oscillatory periods of gene expression in animal cells. Genes Dev. 2008, 22 (17): 2342-2346. 10.1101/gad.1696108.PubMed CentralPubMedView ArticleGoogle Scholar
- Morachis JM, Murawsky CM, Emerson BM: Regulation of the p53 transcriptional response by structurally diverse core promoters. Genes Dev. 2010, 24 (2): 135-147. 10.1101/gad.1856710.PubMed CentralPubMedView ArticleGoogle Scholar
- Yean D, Gralla J: Transcription reinitiation rate: a special role for the TATA box. Mol Cell Biol. 1997, 17 (7): 3809-3816.PubMed CentralPubMedView ArticleGoogle Scholar
- Zur H, Tuller T: Transcript features alone enable accurate prediction and understanding of gene expression in S. cerevisiae. BMC Bioinformatics. 2013, 14 (Suppl 15): S1-10.1186/1471-2105-14-S15-S1.PubMed CentralPubMedView ArticleGoogle Scholar
- Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol/ edited by Frederick M Ausubel [et al]. 2010, Chapter 19 (Unit 19 10): 11-21.Google Scholar
- Goecks J, Nekrutenko A, Taylor J, Galaxy T: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11 (8): R86-10.1186/gb-2010-11-8-r86.PubMed CentralPubMedView ArticleGoogle Scholar
- Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005, 15 (10): 1451-1455. 10.1101/gr.4086505.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.