Upstream sequence elements direct post-transcriptional regulation of gene expression under stress conditions in yeast
© Lawless et al; licensee BioMed Central Ltd. 2009
Received: 18 July 2008
Accepted: 07 January 2009
Published: 07 January 2009
The control of gene expression in eukaryotic cells occurs both transcriptionally and post-transcriptionally. Although many genes are now known to be regulated at the translational level, in general, the mechanisms are poorly understood. We have previously presented polysomal gradient and array-based evidence that translational control is widespread in a significant number of genes when yeast cells are exposed to a range of stresses. Here we have re-examined these gene sets, considering the role of UTR sequences in the translational responses of these genes using recent large-scale datasets which define 5' and 3' transcriptional ends for many yeast genes. In particular, we highlight the potential role of 5' UTRs and upstream open reading frames (uORFs).
We show a highly significant enrichment in specific GO functional classes for genes that are translationally up- and down-regulated under given stresses (e.g. carbohydrate metabolism is up-regulated under amino acid starvation). Cross-referencing these data with the stress response data we show that translationally upregulated genes have longer 5' UTRs, consistent with their role in translational regulation. In the first genome-wide study of uORFs in a set of mapped 5' UTRs, we show that uORFs are rare, being statistically under-represented in UTR sequences. However, they have distinct compositional biases consistent with their putative role in translational control and are more common in genes which are apparently translationally up-regulated.
These results demonstrate a central regulatory role for UTR sequences, and 5' UTRs in particular, highlighting the significant role of uORFs in post-transcriptional control in yeast. Yeast uORFs are more highly conserved than has been suggested, lending further weight to their significance as functional elements involved in gene regulation. It also suggests a more complex and novel mechanism of control, whereby uORFs permit genes to escape from a more general attenuation of translation under conditions of stress. However, since uORFs are relatively rare (only ~13% of yeast genes have them) there remain many unanswered questions as to how UTR elements can direct translational control of many hundreds of genes under stress.
Much of the focus of post-genome science is now switching from simply cataloguing genes and gene products, to the greater challenge of understanding how they interact and regulate one another. The regulation of gene expression underpins the ability of cells to adapt to environments, deal with stresses, and progress through reproductive and cell cycle changes. Similarly, the increased complexity by which higher eukaryotes can control and manage the levels of their gene products has been suggested to underlie complexity in higher organisms – a compelling argument when the small difference between the numbers of genes in humans and the fly is considered. Yet a detailed understanding of how even the simplest cell controls its gene complement remains elusive, though great strides are being made in the field of systems biology. Much of this research has exploited advances in array-based technologies to characterise how gene expression levels change during the cell cycle or in response to stresses or environmental changes [1, 2], allowing transcriptional factors to be mapped to the genes they regulate [3, 4]. However, these approaches only deliver regulatory control information at the transcriptional level. The production, maturation and export of mRNA from the nucleus precedes translation into the gene products (proteins) but this latter process can also be regulated by a variety of post-transcriptional control mechanisms [5–14] and the measured correlations between transcriptome and proteome levels are imperfect [e.g. ] Most studies observe strong, but imperfect, correlations ranging from 0.2–0.8, suggesting that additional control at the mRNA or protein level must be present. Indeed, it has been suggested that as little as 20–40% of the control on gene expression can be attributed to mRNA levels [12, 16]. Many of these post-transcriptional control processes are governed by the sequence and structure of the mature mRNA molecule, affecting the regulatory molecules (usually proteins) that can interact with the transcript. Eukaryotic mRNA molecules have a tripartite structure, with a 5' untranslated region (UTR) preceding the principal open reading frame (ORF) and followed by a 3' UTR. UTRs are critical to the post-transcriptional regulation of gene expression, modulating mRNA export from the nucleus and affecting translational efficiency, subcellular localization and mRNA stability . Furthermore, mutations in UTRs can lead to serious pathology , demonstrating their importance to proper function in the cell.
Previous work from our laboratories has examined the effects that different stresses have on both the transcriptional and translational regulation of genes in the model organism, yeast [19, 20]. Gene arrays were used to monitor changes in transcript abundance for RNA populations from both stressed and control cells, which were additionally combined with polysomal gradient analyses to investigate the effects of different stress conditions on the yeast genome. The first study demonstrated a general down-regulation of translation and protein synthesis, where different RNA subsets depending on the given stress could resist this general trend and were regulated translationally . A second yeast study considered oxidative stresses which also elicit complex and different translational reprogramming effects, even at different concentration of hydrogen peroxide . These data provide excellent test sets of genes to examine potential translational regulation signals which could single out these genes for specific targeting at the translational level. Until recently however, the true transcriptional start sites (TSSs) of many yeast genes were unknown, making it difficult to quantify the effects of UTRs on post-transcriptional regulation. However, several recent large-scales studies using 5' SAGE  and G-Capping cDNAs  have defined TSSs for 2231 and 3599 yeast genes respectively. In addition, tiling array studies have characterised complete transcripts at both the 5' and 3' end for over a third of the yeast genome .
Here we examine the properties of these characterised yeast UTRs in the context of the stress-response microarray data sets [19, 20], looking for UTR properties (length, uORF content and conservation) which determine differential regulation. In particular we examine upstream open reading frames (uORFs), which can affect the level of gene expression via a number of potential post-transcriptional mechanisms [5, 6, 24–26] These include mRNA degradation via the nonsense mediated decay pathway (NMD) , possible activities of the small peptides encoded by the uORF themselves , and mechanisms relating to the scanning of the transcript by the ribosome. One such mechanism is "leaky scanning" where a proportion of the scanning complexes bypass the uORF AUG and continue scanning the transcript onto a downstream start codon. In this case, the uORF AUG acts as a "decoy" from the standard AUG start, so that it acts as a negative regulator of gene expression at least for some fraction of ribosomes. Similarly, the context of an ORF or uORF stop codon can exert general effects on translation by modulating the ability of the ribosome to reinitiate after termination and translate downstream ORFs . This latter case has been studied in great detail for the GCN4 UTR where several uORFs combine to produce complex regulatory effects under stress, chiefly involving differential behaviour towards uORFs 1 and 4 . Similarly, the uORFs in YAP1 and YAP2 have been studied in detail, revealing different potential mechanisms for destabilising mRNA post-termination after ribosomal scanning . As already mentioned, uORFs have also been implicated in the nonsense mediated decay (NMD) pathway, where aberrant transcripts are removed and it has been suggested that uORFs may be translated and trigger NMD in as many as 35% of cases . Moreover, several studies have looked at conservation and functional significance in yeast [6, 27, 28], and more recently flies , concluding that a limited number of uORFs other than GCN4 appear to be conserved and are functionally operative.
Our results provide further evidence that not only does yeast exert a general, large scale control over its genes at the translational level, but that under stress this effect is controlled by UTR sequence elements. This is the first evidence that suggests UTR elements are specifically responsible for translational control of an experimentally-determined gene set which is differentially expressed under stress. Although rare, we propose that uORFs play a role in this regulation for a significant number of genes, not just restricted to a very limited subset (e.g. GCN4, YAP1/2). We show that uORF stop codon readthrough is likely and propose that uORF-mediated changes to reinitiation competency is a common mechanism to regulate gene expression under stress, consistent with that proposed for GCN4. Nevertheless, although uORFs provide some explanations, translational control is apparently both widespread and complex, and other mechanisms must be responsible for the post-transcriptional control of the majority of yeast genes which exploit it.
Results and Discussion
Defining 5'and 3'UTRs
Identification of uORFs
It should also be noted that there is a correlation between 5' UTR length and the number of uORFs present (Pearson correlation r = 0.746, p < 0.0001). Despite this, a sizeable fraction of 5' UTR sequences have no uORFs at all, including some of those above 400 nt in length when at least three uORFs are expected by chance (see Additional File 2). We examined these genes for common function or role (e.g. common Gene Ontology definitions), but no obvious patterns emerged. Nevertheless, there appears to be a subset of yeast genes which have long 5'UTRs without apparent uORFs suggesting that evolution has selected against uORFs, as one might expect, since they generally act as negative regulators of the standard ORF translation.
Features of uORFs
Coding Frame Bias and uORF length
As a quality control step, we analysed the apparent reading frame of each uORF with respect to that of the true ORF, to test whether there was any frame bias (see Additional File 3). None was found suggesting there is no apparent affect due to misannotation of true start sites or a similar artefact.
It has been reported that uORFs greater than 35 codons in length (105 nt) decrease the reinitiation competence of ribosomes to zero . A total of 151 of the identified uORFs (~10%) exceed 35 codons (105 nt), although this corresponds to the UTRs of only 115 yeast genes (see Additional File 4). Our results suggest that such large uORFs do not necessarily adversely affect general translation of the normal ORF otherwise these genes could not be translated, and hence their uORF contexts may support ribosomal progression and re-initiation.
uORF visibility and translational efficiency
The AUG-CAI(r) only considers the initiation step. Although this is deemed to be rate-limiting to translation [12, 32], more general translational efficiency can also be measured by the adaptation of all the uORF's codons to the organism's tRNA pool using the translation adaptation index (tAI) . This metric has shown impressive power in distinguishing phenotypic adaptions across yeast species  and is an excellent measure of more general translational competence. We calculated this value for each uORF and each principal ORF and, as expected, principal ORFs are generally more highly adapted than uORFs with mean tAI scores of 0.40 and 0.32 respectively. Again, we calculated log2 ratios of the tAI scores for uORFs and ORFs, shown in Figure 4B. This shows again that principal ORFs appear better adapted with a shift in the distribution towards negative scores. However, a small number of uORFs have very high tAI efficiencies; 64 (4%) uORFS have tAI > 0.6. It should be noted though that these are generally very short uORFs which can skew the tAI calculations compared to large ORFs, and that again, no systematic functional bias in this subset was observed (data not shown). Nevertheless, it has been suggested that shorter uORFs are more functionally significant  and one possible route is via translation into bioactive peptides given their high adaptation to the yeast tRNA set.
Comparison of stop codon context compositional biases in ORFs and uORFS
ORF observed Frequency
All uORFs Observed Frequency
Log Odd Ratio (uORF/ORF)
Over-represented 6-mers downstream of stop codons
Under-represented 6-mers downstream of stop codons
Analysis of stress response datasets
After examining the uORF dataset for features and properties that correlate with the TSS mapped UTRs, we cross-referenced this data with gene sets known to exhibit differential expression as a response to different cellular stresses, particularly at the translational level. The aim was to investigate the reasons for translational changes in gene regulation in terms of the TSS-mapped UTR set and uORFs in particular. This data covered four stress response datasets; amino acid starvation, butanol addition, 0.2 M H2O2 addition and 2 M H2O2 addition. For each case, changes in both transcriptional and translational expression were characterised using standard array-based technology coupled with polysomal gradient analysis [19, 20]. In each of the stress conditions the full complement of yeast genes were initially considered, although not all passed quality control tests from the array analyses [19, 20]. Cross-referencing these sets with the TSS-datasets yielded 3,770 genes for amino acid starvation and butanol addition, and 3,860 genes for the H2O2 additions. The reduction in mapped TSS genes is a result of minor annotation differences between the papers and the Affymetrix GeneChips™.
The TSS filtered stress-response datasets were ranked according to the change in translational state. In the case of amino acid starvation and butanol addition this was characterised as the log ratio of the differential polysomal expression (polysomal P and monosomal M) observed from control (C) to stress (S) conditions (PS/MS:PC/MC). This characterises the translational "shift" in expression in stressed conditions by following the movement of each gene from polysomal to monosomal states. It should be stressed that the nature of these experiments characterise this as a relative change in translational control, since for stresses such as amino acid starvation, there is a general downshift in translation. Given these ratios measure relative changes, the values are not directly comparable between experiments and instead we selected gene lists representing the "extremes" in relative up/down regulation. Where possible, the top and bottom 100 gene log ratios that were above 0.9 and below -0.9, respectively, were used for subsequent analysis; this was possible with all but the butanol addition experiment where only 54 passed this filter (54 up and 54 down). In the subsequent analyses we refer to gene sets as "up" regulated (log ratio > 0.9) and "down" regulated (log ratio < -0.9) where strictly these values characterise differential relative expression states in the stress condition compared to the control state.
Our data suggest that a significant proportion of the yeast proteome appears to be, in part, regulated at the translational level, particularly under stress conditions. Given this observation, we wished to examine the transcript sequences of the genes involved to try and explain the differential regulation in terms of known control mechanisms. Namely, these were uORF-based attenuation of translation, known functional 5' UTR motifs, and the propensity of 5'UTRs to form secondary structures, all of which have been proposed as general translational control mechanisms [5, 6, 44]. Although post-transcriptional control mechanisms other than these exist, the availability of transcriptional data to define comprehensive 5' UTR data sets, as well as 3' UTR data , make them attractive to study. Additionally, 5' UTRs are expected to play a major role since they are recognised and scanned by the translational machinery.
5'UTR length and uORF analysis of stress response genes
T-test results for UTR length differences in all four stress conditions
Up v Down
Up v All
Down v All
0.2 mM Peroxide
2 mM Peroxide
3' UTRs b
Up v Down
Up v All
Down v All
0.2 mM Peroxide
2 mM Peroxide
These data provide further strong evidence that UTRs, and uORFs in particular, are involved in mediating translational control over yeast genes, particularly under stress conditions. Also, and perhaps surprisingly, they allow many genes to overcome the general down regulation observed. This mechanism has been observed before under stress, for GCN4  and our data suggest that it may be more widespread in translational responses to stress.
Conservation of uORFs
To investigate the uORF presence in UTRs further, we considered whether they were conserved across closely related species, namely Saccharomyces cerevisiae and 6 sensu stricto yeast species. Previous studies have already examined conservation of uORFs in yeast concluding that in general only a limited number are conserved, and a smaller subset still are deemed of functional importance [27, 28]. However, these studies pre-dated the recent TSS datasets and did not cross-reference the data with differential translational control in stress conditions. Nevertheless, they both highlight the difficulties concerned in defining conservation for these small genetic elements. Should they be absolutely conserved in UTRs to exert their effects (in position and/or composition), or is merely the presence of one or more uORFs anywhere within the 5' UTR sufficient? There is no simplistic single measure which accounts for all considerations. Here we have attempted to consider both, first calculating a direct measure of conservation at each single position in the S. cerevisiae UTR, using the phastCons program . This approach formally considers the relative evolutionary distance between the member genomes and yields a single value for each aligned position in a multiple sequence alignment. The phastCons score ranges from 0 to 1 (absolutely conserved). Secondly, we calculated a Z-score for each uORF, as the number of standard deviations from the mean score of similar sized windows in each UTR.
We considered a dataset of 61 "ultra-conserved" uORFs in more detail which had phastCons score > 0.99 and were completely conserved in at least 1 other species (see Additional File 7 for a complete list). To examine whether cross-species conservation was in part an artefact of the high phastCons score, we estimated the average level of conservation observed for randomly selected short UTR sequences with high phastCons scores, comparing them with the aligned sequence in the other species for conservation. This was done by selecting subsequences of the pattern C1-Xn-C2, where codons C1 and C2 are separated by n codons, where n > 0. The average phastCons score for these subsequences was calculated, as well as the number of species in which the pattern was "conserved". Since these test sequences are not uORFs, "conservation" is defined in a way analogous to a small ORF. The first codon C1 must be completely conserved (analogous to a start codon), the C2 codon must code for the same amino acid (analogous to conserving a stop signal) and C1 and C2 must be in-frame in the aligned species. The average number of "conserved" species was then calculated for phastCons scores in the range 0.99 to 1.
We also examined the correspondence between these 61 conserved uORFs and up/down regulation in our stress response gene sets. Unfortunately, only 4 of these genes appeared in the stress response data sets and it is therefore difficult to draw any general conclusions. However, one notable gene is the mitochondrial alcohol dehydrogenase ADH3 (YMR083W) which has a highly conserved uORF at position -281 with respect to the start codon, and is down regulated in low peroxide concentrations.
In this study we have examined the role of upstream sequence elements in yeast UTRs, considering whether they play a concerted role in regulation of gene expression under stress conditions. Specifically, we considered available data sets of full length mapped yeast transcripts to define a superset of mapped transcriptional start sites (TSSs) representing over 70% of yeast protein coding genes. This is the most comprehensive survey of yeast uORFs using known 5' UTR sequence, and the first time this data has been considered in light of known translationally regulated genes under stress conditions [17, 18]. The analysis shows that yeast uORFs are statistically under-represented in 5' UTRs and UTRs have generally evolved to select against uORFs, suggesting that those that are tolerated may play some specific role or function in translation. This is in agreement with previous studies, either on specific genes [13, 14] or more generally [6, 11, 27–29] which have suggested functional roles for uORFs. In addition to their relative scarcity, they are also less "efficient" in terms of their start codon local sequence (AUG CAI index), their overall codon bias (tAI), and importantly, their preferred stop codon contexts. The decreased translational adaption of uORFs has also been noted for those associated with NMD  and is consistent with their scarcity in UTRs and generic functional role. They would be expected to reduced translation efficiency of the principal ORF, blocking the ribosome from progressing to the true ORF or even promoting termination and detachment . Interestingly however, they tend to avoid strong stop signals which would promote termination, instead allowing re-initiation of the ribosomal machinery to continue scanning. One possibility is that uORFs might permit ribosomes to stall and "wait", a general mechanism which has been suggested to facilitate a fast response once stresses have been removed and normal translation can then continue .
Regarding differential regulation at the translational level, a significant trend is observed across most of the stresses examined here. Genes which are observed to undergo relative translational up-regulation under stress have longer UTRs. This observation seems self-evident when considered at face value – namely, that any gene which can be regulated at the translational level must have a mechanism to support this, and this should be via some motif or element contained within either the 5' or 3' UTR. However, here we demonstrate this for a variety of stresses for the first time, and importantly, demonstrate that this is a statistically significant trend. It offers a simple approach to select genes which are more likely to be translationally regulated on the basis of the UTR size and contents. Interestingly, the recent tiling array study of the yeast genome also defined 5' UTRs for a subset of the gene set  and these authors also noted trends with 5' UTR length. They noted that anecdotally, genes with shorter 5' UTR lengths were generally "housekeeping" genes involved in processes such as rRNA metabolism, RNA processing and ribosomal biogenesis. Our polysomal array data supports this, finding that these Gene Ontology categories are translationally down-regulated under stress, and do not have longer 5' UTRs that allows them to escape this. Equally, we also demonstrate genes with longer 5' UTRs are translationally up-regulated and includes those involved in processes such as transport and localisation as reported by David and colleagues . This provides further evidence for a relationship between mRNA transcript length and gene function as proposed by Hurowitz and Brown .
Pinpointing the precise nature of the elements conferring translational regulatory properties is rather more challenging. Our data suggests that uORFs play a significant role mediating gene expression during stress responses, as they are over-represented in translationally up-regulated genes, particularly under 0.2 mM peroxide stress. However, this trend is not so striking as the UTR length correlation, and must in part be a result of this; longer UTRs are more likely to have a uORF.
It should also be noted that the over-representation in up-regulated genes is difficult to reconcile with the "standard" uORF mechanism where they are generally expected to down-regulate gene expression at the translational level. This suggests that they are either acting in a novel way, that the complex "GCN4"-type mechanism is more widespread, or that other UTR elements than uORFs are responsible. Regardless, given their relative scarcity it is clear that there is still much to learn about UTR and uORF function.
Other authors have focused on conservation as a strong predictor of functional significance [27, 28, 46, 47]. Although early studies have suggested that uORFs are generally not conserved, this is a far from straightforward calculation to make. A single uORF may not necessarily be conserved in terms of exact sequence, length or relative position with respect to either the transcriptional or translational start, yet might still fill its functional role. In this study we add the additional constraint of known transcriptional start site and consider two complementary conservation metrics, the phastCons score  and a Z-score local conservation statistic looking at the local UTR background. We have also examined whether a uORF is directly conserved in close fungal relatives. Although uORFs are generally not conserved, many are more conserved than by chance within their respective UTR sequences, confirming and extending previous studies [27, 28]. Using strict criteria, we find 61 uORFs (from 43 genes) with high conservation across related fungal species (See Additional File 7) extending these previously reported data sets to 365 genes.
It has been reported that secondary structure in 5' UTRs mediates translational control of gene expression on a genomic scale in yeast [5, 44]. We re-examined this result using the true TSS-mapping for the 4149 5'UTR sequences and the program Randfold . In broad agreement with Ringnér and Krogh  the vast majority of 5'UTRs appear not to be strongly folded; only 20 5'UTRs were found to have low MFE values with an associated p-value < 0.005 (see Additional File 8). However, we do not see the same general trend between calculated 5' UTR folding energies and translation rates. Clearly, the use of the true TSS has a marked effect on the 5' UTR folding energy and raises the possibility that this trend might also be a result of the true size of the UTRs. Put simply, shorter 5' UTRs facilitate faster translation. Nevertheless, the secondary structural states might be stronger not weaker in the true 5' UTRs and this also seems to be a significant regulatory mechanism for a small number of genes.
In summary, the results presented here demonstrate convincing evidence that 5'UTR sequence has a major role to play in the regulation of gene expression, particularly under stress conditions and at the translational level. This effect appears to be widespread, affecting large numbers of different yeast genes under different conditions. Yeast has evolved a variety of mechanisms to effect these changes, including upstream open reading frames which are over-represented in translationally up-regulated genes.
All S. cerevisiae open reading frame (ORF) chromosomal co-ordinate information was obtained from the SGD website via: ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/ and all S. cerevisiae sequences were obtained from ftp://genome-ftp.stanford.edu/pub/yeast/sequence/genomic_sequence/orf_dna/
Transcription start and end sites
Transcription start site (TSS) information was obtained from published studies, for 5' UTRs from Zhang and Dietrich , and from Miura and colleagues  via supplementary data http://www.pnas.org/cgi/content/full/0605645103/DC1. The two gene sets were merged by ORF name selecting a single TSS for each ORF. For each ORF with multiple mapped TSSs, the occurrence of each TSS was counted. If the modal TSS for each ORF had a count of greater than two it was taken as the representative site for that gene, otherwise the longest 5'UTR was selected, the most distal from the ATG start codon. The 5'UTRs sequences were then taken from the ATG start codon up to the chosen TSS. A complete list of uORFs and 5' UTRs is given in Additional File 9. In addition, the 3' UTR ends were taken for a high quality 2,044 gene subset of the yeast genome determined via tiling arrays .
Stress response datasets
Microarray data pertaining to the stress response gene sets were obtained from ArrayExpress http://www.ebi.ac.uk/microarray-as/aer/index.html using accession numbers E-MEXP-323 (amino acid starvation), E-MEXP-324 (butanol addition) and E-MEXP-526 (0.2 mM and 2 mM H2O2 additions). Translational up- and down-regulation under stress for each gene was calculated as previously described [19, 20] using log ratios of shifts between polysomal (P, highly translated) and monosomal (M, weakly translated) components to monitor changes in translational control between stress (S) and control (C) experiments. In the H2O2 stress experiments PS/MS:PC/MC was not used as the sole measure of translational state, due to oxidative stress inhibiting ribosomal transit . In addition, the sum of the monosomal and polysomal fractions for stress conditions was compared to control conditions (PS+MS:PC+MC). For the peroxide experiments, up regulated genes were defined as genes that satisfied both criteria, with PS/MS:PC/MC and PS+MS:PC+MC greater than 1. Similarly, down regulated genes had PS/MS:PC/MC and PS+MS:PC+MC less than 1. This serves to highlight genes that when "up regulated" were able to overcome the translational initiation block and the inhibition of ribosomal transit.
AUG-CAI(r) and TAI indices
A previously published method was implemented  taking the look-up values from a position specific weight matrix calculated for positions (-6,-5,-4,-3,-2,-1,4,5,6) of the AUG initiation context of 63 highly expressed genes. The AUG-CAI is then calculated for each candidate AUG context by calculating the geometric mean of these look-up values (multiplying the 9 weights together and taking the 9th root). This mean value is then normalised, dividing by the maximum theoretical value, to give an AUG-CAI(r) value between 0 and 1.
The translation adaption (tAI) index implements a similar method, originally developed by dos Reis and colleagues [33, 34] (available from http://people.cryst.bbk.ac.uk/~fdosr01/tAI/). This estimates codon weights from gene copy numbers of the tRNA isoacceptors for a given codon, essentially reflecting the relative ease with which a gene can be translated based on the availability of the necessary tRNAs for that codon. A look-up table of codon adaptation weights were calculated for S. cerevisiae genes were and subsequently used to calculate a geometric mean for a given open reading frame as described in ref .
Gene Ontology over-representation
Over-representation statistics of Gene Ontology categories within gene-sets were performed using GOStat  at http://gostat.wehi.edu.au/ using yeast SGD gene names of the respective gene sets as input.
Defining upstream open reading frames
Upstream open reading frames (uORFs) were identified in 5'UTRs and up to 100 nucleotides into the real open reading frame (ORF). We defined uORFs from the relative nucleotide positions and coding frame of all start and stop codons with respect to the ATG start of the true ORF. All in-frame uORFs were collected from this set, taking the longer of any subsets that contained that same stop codon position. All identified uORFs were discarded if they were either less than three codons in length (including the stop codon) or within 20 nt of the 5'end, as the latter are not considered to have an effect on translation .
Predicted secondary structure with 5'UTR sequences was calculated using the Randfold method . All 5'UTR sequences were submitted to Randfold using the dinucleotide method and 100 L randomizations, where L = length of 5'UTR.
Conservation of individual base positions in each UTR was calculated using related fungal genomic sequences aligned to S. cerevisiae, using the PhastCons  algorithm. Alignment blocks and PhastCons scores for S. cerevisiae genomic alignments against S. paradoxus, S. Mikitae, S. kudriavzevii, S. bayanus, S. castelli and S. kluyveri where obtained using the Table Browser at UCSC , and individual phastCons scores extracted for each position. PhastCons scores were obtained from the UCSC table browser for only 3,877 of the 4,149 5'UTRs due to incomplete alignments covering the UTR region in S. cerevisiae. The average phastCons scores were calculated for all 3,877 uORFs across the all positions from start to stop, inclusive. An average "background" phastCons score was also calculated for each uORF from the 5'UTR sequence containing it, and the uORF scores converted to Z-scores. This provides an additional measure of whether a uORF is conserved over and above the level of conservation observed generally in each UTR, In addition, multiZ genomic alignment fragments were also obtained from the UCSC browser for 90 highly conserved uORFs with a mean phastCons scores >0.99. The aligned species sequences were analysed for in-frame start and stop codons that corresponded to those of the S. cerevisiae uORFs.
The authors wish to acknowledge the BBSRC for a studentship (BBSSA200410904) to CL, and thank John McCarthy for comments and discussions.
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.PubMed CentralView ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.PubMed CentralView ArticlePubMedGoogle Scholar
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, et al: Genome-wide location and function of DNA binding proteins. Science. 2000, 290: 2306-2309. 10.1126/science.290.5500.2306.View ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431: 99-104. 10.1038/nature02800.PubMed CentralView ArticlePubMedGoogle Scholar
- McCarthy JEG: Posttranscriptional control of gene expression in yeast. Microbiol Mol Biol Rev. 1998, 62: 1492-PubMed CentralPubMedGoogle Scholar
- Vilela C, McCarthy JEG: Regulation of fungal gene expression via short open reading frames in the mRNA 5 ' untranslated region. Molecular Microbiology. 2003, 49: 859-867. 10.1046/j.1365-2958.2003.03622.x.View ArticlePubMedGoogle Scholar
- Mignone F, Gissi C, Liuni S, Pesole G: Untranslated regions of mRNAs. Genome Biol. 2002, 3: reviews0004-10.1186/gb-2002-3-3-reviews0004.PubMed CentralView ArticlePubMedGoogle Scholar
- Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D: Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2003, 100: 3889-3894. 10.1073/pnas.0635171100.PubMed CentralView ArticlePubMedGoogle Scholar
- Morris DR, Geballe AP: Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol. 2000, 20: 8635-8642. 10.1128/MCB.20.23.8635-8642.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Beyer A, Hollunder J, Nasheuer HP, Wilhelm T: Post-transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol Cell Proteomics. 2004, 3: 1083-1092. 10.1074/mcp.M400099-MCP200.View ArticlePubMedGoogle Scholar
- Sachs MS, Geballe AP: Downstream control of upstream open reading frames. Genes & Development. 2006, 20: 915-921. 10.1101/gad.1427006.View ArticleGoogle Scholar
- Brockmann R, Beyer A, Heinisch JJ, Wilhelm T: Posttranscriptional expression regulation: what determines translation rates?. Plos Comp Biol. 2007, 3: e57-10.1371/journal.pcbi.0030057.View ArticleGoogle Scholar
- Gaba A, Jacobson A, Sachs MS: Ribosome occupancy of the yeast CPA1 upstream open reading frame termination codon modulates nonsense-mediated mRNA decay. Mol Cell. 2005, 20: 449-460. 10.1016/j.molcel.2005.09.019.View ArticlePubMedGoogle Scholar
- Hinnebusch AG: Translational regulation of yeast GCN4 – A window on factors that control initiator-tRNA binding to the ribosome. J Biol Chem. 1997, 272: 21661-21664. 10.1074/jbc.272.35.21661.View ArticlePubMedGoogle Scholar
- Castrillo JI, Zeef LA, Hoyle DC, Zhang N, Hayes A, Gardner DC, Cornell MJ, Petty J, Hakes L, Wardleworth L, Rash B, Brown M, Dunn WB, Broadhurst D, O'Donoghue K, Hester SS, Dunkley TP, Hart SR, Swainston N, Li P, Gaskell SJ, Paton NW, Lilley KS, Kell DB, Oliver SG: Growth control of the eukaryote cell: a systems biology study in yeast. J Biol. 2007, 6: 4-10.1186/jbiol54.PubMed CentralView ArticlePubMedGoogle Scholar
- Tian Q, Stepaniants SB, Mao M, Weng L, Fetham MC, Doyle MJ, Yi EC, Dai H, Thorsson V, Eng J, Goodlett D, Berger JP, Gunter B, Linseley PS, Stoughton RB, Aebersold R, Collins SJ, Hanlon WA, Hood LE: Integrated genomic and proteomic analyses of gene expression in Mammalian cells. 2004, 3: 960-9.Google Scholar
- Moore MJ: From birth to death: The complex lives of eukaryotic mRNAs. Science. 2005, 309: 1514-1518. 10.1126/science.1111443.View ArticlePubMedGoogle Scholar
- Cazzola M, Skoda RC: Translational pathophysiology: a novel molecular mechanism of human disease. Blood. 2000, 95: 3280-3288.PubMedGoogle Scholar
- Smirnova JB, Selley JN, Sanchez-Cabo F, Carroll K, Eddy AA, McCarthy JE, Hubbard SJ, Pavitt GD, Grant CM, Ashe MP: Global gene expression profiling reveals widespread yet distinctive translational responses to different eukaryotic translation initiation factor 2B-targeting stress pathways. Mol Cell Biol. 2005, 25 (21): 9340-9349. 10.1128/MCB.25.21.9340-9349.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Shenton D, Smirnova JB, Selley JN, Carroll K, Hubbard SJ, Pavitt GD, Ashe MP, Grant CM: Global translational responses to oxidative stress impact upon multiple levels of protein synthesis. J Biol Chem. 2006, 281 (39): 29011-29021. 10.1074/jbc.M601545200.View ArticlePubMedGoogle Scholar
- Zhang ZH, Dietrich FS: Mapping of transcription start sites in Saccharomyces cerevisiae using 5 ' SAGE. Nucl Acids Res. 2005, 33: 2838-2851. 10.1093/nar/gki583.PubMed CentralView ArticlePubMedGoogle Scholar
- Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T: A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci USA. 2006, 103: 17846-17851. 10.1073/pnas.0605645103.PubMed CentralView ArticlePubMedGoogle Scholar
- David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, Jones T, Davis RW, Steinmetz LM: A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci. 2006, 106: 5320-5325. 10.1073/pnas.0601091103.View ArticleGoogle Scholar
- Gaba A, Wang Z, Krishnamoorthy T, Hinnebusch AG, Sachs MS: Physical evidence for distinct mechanisms of translational control by upstream open reading frames. Embo J. 2001, 20: 6453-6463. 10.1093/emboj/20.22.6453.PubMed CentralView ArticlePubMedGoogle Scholar
- Crowe ML, Wang XQ, Rothnagel JA: Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides. BMC Genomics. 7: 16-10.1186/1471-2164-7-16.
- Guan Q, Zheng W, Tang S, Liu X, Zinkel RA, Tsui KW, Yandell BS, Culbertson MR: Impact of nonsense-mediated mRNA decay on the global expression profile of budding yeast. PLoS Genet. 2 (11): e203-10.1371/journal.pgen.0020203.
- Zhang ZH, Dietrich FS: Identification and characterization of upstream open reading frames (uORF) in the 5 ' untranslated regions (UTR) of genes in Saccharomyces cerevisiae. Curr Genetics. 2005, 48: 77-87. 10.1007/s00294-005-0001-x.View ArticleGoogle Scholar
- Cvijoviæ M, Dalevi D, Bilsland E, Kemp GJ, Sunnerhagen P: Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation. BMC Bioinformatics. 2007, 8: 295-10.1186/1471-2105-8-295.View ArticleGoogle Scholar
- Hayden CA, Bosco G: Comparative genomic analyses of novel conserved peptide upstream open reading frames in Drosophila melanogaster and other dipteran species. BMC Genomics. 2008, 9: 61-10.1186/1471-2164-9-61.PubMed CentralView ArticlePubMedGoogle Scholar
- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, Weng S, Botstein D: SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998, 26: 73-9. 10.1093/nar/26.1.73.PubMed CentralView ArticlePubMedGoogle Scholar
- Rajkowitsch L, Vilela C, Berthelot K, Ramirez CV, McCarthy JE: Reinitiation and recycling are distinct processes occurring downstream of translation termination in yeast. J Mol Biol. 2004, 335: 71-85. 10.1016/j.jmb.2003.10.049.View ArticlePubMedGoogle Scholar
- Arava Y, Boas FE, Brown PO, Herschlag D: Dissecting eukaryotic translation and its control by ribosome density mapping. Nucleic Acids Res. 2005, 33: 2421-2432. 10.1093/nar/gki331.PubMed CentralView ArticlePubMedGoogle Scholar
- dos Reis M, Savva R, Wernisch L: Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004, 32: 5036-5044. 10.1093/nar/gkh834.View ArticlePubMedGoogle Scholar
- Man O, Pilpel Y: Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species. Nature Genetics. 2007, 39: 415-421. 10.1038/ng1967.View ArticlePubMedGoogle Scholar
- Namy O, Hatin I, Rousset JP: Impact of the six nucleotides downstream of the stop codon on translation termination. EMBO rep. 2001, 2: 787-793. 10.1093/embo-reports/kve176.PubMed CentralView ArticlePubMedGoogle Scholar
- Williams I, Richardson J, Starkey A, Stanfield I: Genome-wide prediction of stop codon readthrough during translation in the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 2004, 32: 6605-6616. 10.1093/nar/gkh1004.PubMed CentralView ArticlePubMedGoogle Scholar
- Grant CM, Hinnebusch AG: Effect of sequence context at stop codons on efficiency of reinitiation in GCN4 translational control. Mol Cell Biol. 1994, 14: 606-618.PubMed CentralView ArticlePubMedGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20: 1464-1465. 10.1093/bioinformatics/bth088.View ArticlePubMedGoogle Scholar
- Rivals I, Personnaz L, Taing L, Potier M: Enrichment or depletion of a GO category within a class of genes: which test?. Bioinformatics. 2007, 23 (4): 401-407. 10.1093/bioinformatics/btl633.View ArticlePubMedGoogle Scholar
- Kuhn KM, DeRisi JL, Brown PO, Sarnow P: Global and specific translational regulation in the genomic response of Saccharomyces cerevisiae to a rapid transfer from a fermentable to a nonfermentable carbon source. Mol Cell Biol. 2001, 21: 916-927. 10.1128/MCB.21.3.916-927.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Preiss T, Baron-Benhamou J, Ansorge W, Hentze MW: Homodirectional changes in transcriptome composition and mRNA translation induced by rapamycin and hear shock. Nat Struct Biol. 2003, 10: 1039-1047. 10.1038/nsb1015.View ArticlePubMedGoogle Scholar
- Serikawa KA, Xu XL, MacKay VL, Law GL, Zong Q, Zhao LP, Bumgarner R, Morris DR: The transcriptome and its translation during recovery from cell cycle arrest in Saccharomyces cerevisiae. Mol Cell Proteomics. 2003, 2: 191-204. 10.1074/mcp.D200002-MCP200.View ArticlePubMedGoogle Scholar
- MacKay VL, Li X, Flory MR, Turcott E, Law GL, Serikawa KA, Xu XL, Lee H, Goodlett DR, Aebersold R, Zhao PL, Morris DR: Gene expression analyzed by high-resolution state array analysis and quantitative proteomics. Mol Cell Proteomics. 2004, 3: 478-489. 10.1074/mcp.M300129-MCP200.View ArticlePubMedGoogle Scholar
- Ringner M, Krogh M: Folding free energies of 5'-UTRs impact post-transcriptional regulation on a genomic scale in yeast. Plos Computational Biology. 2005, 1: 585-592. 10.1371/journal.pcbi.0010072.View ArticleGoogle Scholar
- Hurowitz EH, Brown PO: Genome-wide analysis of mRNA lengths in Saccharomyces cerevisiae. Genome Biology. 2003, 5: R2-10.1186/gb-2003-5-1-r2.PubMed CentralView ArticlePubMedGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-50. 10.1101/gr.3715005.PubMed CentralView ArticlePubMedGoogle Scholar
- Galagan J, Henn M, Ma L-J, Cuomo C, Birren B: Genomics of the fungal kingdom: Insights into eukaryote biology. Genome Res. 2005, 15: 1620-1631. 10.1101/gr.3767105.View ArticlePubMedGoogle Scholar
- Bonnet E, Wuyts J, Rouze P, Peer Van de Y: Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics. 2004, 20 (17): 2911-2917. 10.1093/bioinformatics/bth374.View ArticlePubMedGoogle Scholar
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004, 32: D493-6. 10.1093/nar/gkh103.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.