- Research article
- Open Access
Conserved upstream open reading frames in higher plants
BMC Genomics volume 9, Article number: 361 (2008)
Upstream open reading frames (uORFs) can down-regulate the translation of the main open reading frame (mORF) through two broad mechanisms: ribosomal stalling and reducing reinitiation efficiency. In distantly related plants, such as rice and Arabidopsis, it has been found that conserved uORFs are rare in these transcriptomes with approximately 100 loci. It is unclear how prevalent conserved uORFs are in closely related plants.
We used a homology-based approach to identify conserved uORFs in five cereals (monocots) that could potentially regulate translation. Our approach used a modified reciprocal best hit method to identify putative orthologous sequences that were then analysed by a comparative R-nomics program called uORFSCAN to find conserved uORFs.
This research identified new genes that may be controlled at the level of translation by conserved uORFs. We report that conserved uORFs are rare (<150 loci contain them) in cereal transcriptomes, are generally short (less than 100 nt), highly conserved (50% median amino acid sequence similarity), position independent in their 5'-UTRs, and their start codon context and the usage of rare codons for translation does not appear to be important.
RNA-omics, or more simply R-nomics, is the large-scale study of RNA structure and function . One of the major challenges faced by R-nomics is to understand the regulatory mechanisms of complex signals found in the untranslated regions (UTRs) of messenger RNAs. In particular, the control signals found in the 5'-UTR of some eukaryotic mRNAs play a crucial role in translational control that can result in rapid changes to the proteome . These post-transcriptionally regulated mRNAs frequently encode important regulatory proteins (e.g., proto-oncogenes, growth factors, and transcription factors)  that need to be strongly or precisely regulated for normal cellular activity. In other cases, control signals in the 5'-UTR provide continuous regulation of essential mRNAs by providing an alternative route for translation when cap-dependent translation is compromised (e.g., under stress conditions) .
Translational control signals are often found in long 5'-UTRs (>100 nt)  where they can contain either a single control signal  or multiple control signals that function independently  or in a coordinated fashion [8–10]. One important translational control signal found in both prokaryotes and eukaryotes is the upstream open reading frame (uORF), a small open reading frame located upstream of the main coding region .
Two types of functional upstream open reading frames have been described that have a demonstrated activity either in-vitro or in-vivo: a) uORFs encoding bioactive peptides [12–15] that either act on translation or have biological roles other than reducing the translation of the main ORF, and therefore can be described as sequence-dependent, and b) sequence-independent uORFs. A sequence-dependent uORF encodes a small peptide, and some of these uORF-encoded peptides have been shown to directly affect translation via either ribosomal stalling during translation of the uORF or termination of translation by inhibiting the peptidyl transferase activity of the ribosome and thus peptide bond formation [16, 17]. For sequence-independent uORFs, the uORF-encoded peptide is not important for translational control, but other factors like uORF recognition, length, stop codon environment, and the downstream intercistronic sequence (length and structure) can affect reinitiation efficiency at the downstream ORF [18, 19]. Sequence-independent uORFs can also indirectly affect translation by allowing ribosomes to bypass inhibitory stem structures  or activate dormant internal ribosome entry sites (IRES)  via conformational changes induced by the translation of the uORF. These distinct mechanisms of translational control have been proven to be important through in-vitro genetic (mutational analyses) and biochemical (toe-printing) assays .
There are two known pathways where uORFs can influence mRNA stability. Studies in yeast have indicated that both sequence-dependent and sequence-independent uORFs can cause mRNA destabilisation by the nonsense-mediated mRNA decay pathway . Mutations in the mRNA 5'-UTR that insert an uORF trigger the nonsense-mediated decay pathway and lead to decapping of the mRNA. Alternatively, mRNA destabilisation can occur via the termination dependent decay pathway . In this pathway, the 40S ribosomal units are released from the mRNA due to features such as stop codon environment (e.g., GC rich) or short intercistronic sequence containing a secondary structure. Release of the 40S ribosomal units prevent reinitiation of translation downstream of the uORF, and the mRNA becomes susceptible to decay. The mechanisms underlying both uORF nonsense-mediated decay and post-termination mediated decay remain unclear.
Identifying uORFs involved in regulation of gene expression remains a challenge [16, 23, 24]. Recently it has been estimated that it would take 20 man-months to find a single functional uORF by random selection and testing of yeast mRNAs . To overcome this problem Selpi et al.  used an artificial intelligence approach called inductive logic programming to identify likely functional uORFs. The approach used rules based on background knowledge of uORFs in yeast mRNAs and as such may not be applicable to other organisms such as plants.
Another approach for identifying sequence-independent uORFs was recently described . Kochetov et al.  selected human mRNAs with specific sequence organisation (i.e., uORF overlapping the main ORF) that could facilitate reinitiation at downstream start codons. If the downstream start codons were nested in-frame with the main ORF then potentially N-terminally truncated variants of the main protein could be produced via reinitiation. Kochetov et al.  reported that 297 out of 754 mRNAs (39% of the sub-sample) contained this specific sequence organisation with an average intercistronic spacer of 66 ± 77 nt, which provides sufficient space for reinitiation. This novel approach highlights another way in which uORFs can be functional via the generation of novel protein isoforms.
The number of characterised uORFs in plants is apparently less than 100 (0.3%) based on a PubMed search, and about four have been identified in cereals. They include the uORFs of the S-adenosylmethionine decarboxylase gene (AdoMetDC) in both monocots and dicots [9, 27, 28], rice myb7 gene ; transcription factorssuch as maize Opaque-2 , maize R , and maize Lc . Also, uORFs have been found in dicot plant genes that include AtB2/AtbZIP11 , ABI3 , and CpbZIP2 ; and auxin responsive factor genes ETT and MP . These characterised uORFs (<0.3%) in plants are much lower than the estimated number of genes that contain uORFs, which can vary from 11%  to 60% .
One strategy for identifying functional uORFs in plants is to use a comparative approach [12, 13, 37]. There are extensive assembled EST datasets for five important cereal crops and Arabidopsis. The cereals include rice (Oryza sativa L.), wheat (Triticum aestivum), barley (Hordeum vulgare), maize (Zea mays), and sorghum (Sorghum bicolor). Rice is the best characterised of these cereals with a sequenced genome  and a cDNA database containing 32,000 clones that were enriched for 5' full-length sequences . Cereals such as wheat are unlikely to be fully sequenced in the near future because of their large genome size. Wheat has a hexaploid genome of 16000 Mb  that is 37 times that of rice (430 Mb), and 5.5 times (2900 Mb) the size of the human genome . A comparative approach is likely to identify sequence dependent uORFs  where the encoded peptide of the uORF is involved in regulation of gene expression.
In this study, we used comparative R-nomics to identify conserved uORF motifs in cereals and Arabidopsis. We constructed a bioinformatics pipeline called uORFSCAN that performs a comparative analysis on the important agronomic crops rice, wheat, barley, maize, and sorghum; and the well studied dicot plant Arabidopsis. To account for the variable quality of assembled EST data, we have used orthologous sequence clustering, iterative sequence analysis, and manual curation. Our comparative method is easily transferable to uORF identification in other species.
KOME (Knowledge-based Oryza Molecular biological Encyclopedia) full-length rice cDNA sequences were obtained from ftp://cdna01.dna.affrc.go.jp/pub/data/CURRENT/INE_FULL_SEQUENCE_DB.zip. This file is dated Tuesday, 24 January 2006, and contains 32,127 full-length cDNA clones (originally 28,469). The TIGR plant gene indices database ftp://ftp.tigr.org/pub/data/tgi/ was used to obtain tentative contigs (TCs) from wheat (release 10.0, Jan 05, 580155 ESTs, 44954 TCs), barley (release 9.0, Sept 04, 370546 ESTs, 23176 TCs), maize (release 17.0, Nov 06, 695811 ESTs, 56687 TCs), and sorghum (release 8.0, Nov 05, 187282 ESTs, 20029 TCs). Data cleaning was performed on the TIGR dataset to select for sequences that are designated as tentative contigs (identifiers prefixed with "TC"), thereby excluding all singletons. All data files were imported and managed using Microsoft Access 2003. We also re-ran the analysis using the TIGR Plant Transcript Assemblies (last updated on October 17th, 2006) for wheat (840871 ESTs), barley (456410 ESTs), maize (1084701 ESTs), and sorghum (203575 ESTs) on the uORFSCAN pipeline, but did not find any additional conserved upstream open reading frames (uORFs).
The reciprocal best hit method (rbh) was adapted to account for alternative splice forms that are present in the KOME dataset that would otherwise give many false negatives. The problem with alternative splice forms is that they will never have the highest score in the reverse BLAST because the presence of a longer alternative splice form will always be listed higher on the hit list due to the way BLAST  ranks hits (according to score and e-value). To account for alternative splice forms, we examined not only the top hit but also similar hits (percent identity to top hit: Δ -5%, similar length to top hit: +/- 20%) for symmetry with the top hit in the forward blast. If there is symmetry between the forward and reverse blasts then we considered the reciprocal pair to be orthologous. General parameters for similarity searches were: tblastx program, expect threshold value at 1.0e-50, scoring parameters, BLOSUM62 matrix; gap costs (existence, 11; extension 1), and filter and masking, off. Only sequence alignments with at least 70% sequence coverage were considered further. Similarity searches were performed at The South Australian Partnership for Advanced Computing (SAPAC) http://www.sapac.edu.au/.
Verification of main ORF
The rice cDNA sequences containing conserved uORFs were used in a blastn search against NCBI Non-redundant database to identify uORFs predicted from ribosomal RNA genes, chloroplastic genes, and mitochondrial genes. These genes do not represent coding genes derived from the nuclear genome, and therefore have been removed from this study. Also, the main open reading frames, predicted by uORFSCAN were used to search (blastn) the coding sequence (CDS) annotations from TIGR rice pseudomolecules database http://www.tigr.org/tdb/e2k1/osa1/data_download.shtml. Alignments not starting from the beginning of the CDS were regarded as suspicious. As additional verification, the rice main open reading frame predictions were also compared with protein data from The UniProt Knowledgebase (UniProtKB) http://www.ebi.uniprot.org/database/download.shtml. Translations of the rice cDNA sequences in the same frame as the predicted main open reading frame, starting from the 5'-untranslated region to the end of the main open reading frame, were used to search (blastp) against UniProtKB. Aligments not beginning from the start of the protein sequence were discarded if they also did not have TIGR CDS support.
Statistical analysis of codon usage
The p-values were calculated according to the following formulas:
The probability to observe the number of times each codon was present in the uORFs (nobs) that was less than or equal to the expected (nav) by chance alone is:
The probability to observe the number of times each codon was present in the uORFs (nobs) that was greater than or equal to the expected (nav) by chance alone is:
nobs = The observed number of times a codon was present in the uORFs.
nav = The average number of times a codon was present in the uORFs based on the frequency of this codon in the mORF and the sample size (the observed number of codons for the set of codons for an amino in the uORFs).
The uORFSCAN pipeline for discovering uORFs
The uORFSCAN pipeline used rice full-length cDNAs  and wheat, barley, maize, and sorghum assembled EST data for comparative analysis (Figure 1). In the first step of the pipeline, we identified rice genes that had orthologues in wheat, barley, maize, and sorghum. The use of orthologous sequences allowed us to more accurately predict the main coding region and define the 5'-UTR that is necessary to identify conserved uORFs.
The reciprocal best hit (rbh) method was used to find true orthologues by a process of eliminating paralogues [43, 44]. The principle of rbh is that a pair of sequences are orthologues if they are each others best hit. We modified the rbh method to find orthologues of rice genes in wheat, barley, maize, and sorghum such that it allowed us to keep alternative splice forms while at the same time eliminating paralogues. Alternative splice forms of a gene were distinguished by changes in gene length while still maintaining high sequence identity. We found that the modified reciprocal best hit method eliminated 70–75% of paralogue sequences. For example, in the one directional BLAST against the barley assembled EST database 19,655 sequences were identified, however this number was reduced to 5,115 (26%) sequences when the reciprocal best hit method was used (Figure 1, Step 1).
Only 1723 of the rice genes had conserved orthologues in the other four cereals (wheat, barley, maize, and sorghum), most likely because none of the assembled EST datasets contained the entire transcriptome. To account for missing or erroneous sequences, we grouped the orthologues into three datasets for 5'-UTR analysis (Figure 1, Step 2). The datasets included rice genes that had orthologues in four other cereals (5 out of 5 dataset), in three other cereals (4 out of 5 dataset), and in two other cereals (3 out of 5 dataset).
In Figure 1 (Step 3), we developed a program called uORFSCAN (see Additional file 9) to find conserved uORFs. uORFSCAN takes as input a FASTA file containing the rice cDNA sequence and its orthologues, and identifies for each of these sequences all the possible open reading frames (ORFs). In the first iteration, the longest conserved ORF was designated as the main coding region. However, the longest ORF is not always the main coding region when there are other ORFs of similar length. Therefore, a comparative approach was used to identify the main coding region (Figure 1, Step 3.1). This involved finding the longest ORF that was present in all orthologous sequences, and then iteratively reducing the number of orthologous sequences, one at a time, to determine if a longer conserved set of ORFs could be found, and finally terminating when there was no improvement. The longest ORF in at least three out of five cereals was considered the main coding region. In Figure 1 (Step 3.2), uORFSCAN attempts to align rice uORFs with similar length orthologous uORFs (+/- 5%) at the protein level using ClustalW (, see Additional file 8) . Finally, uORFSCAN analysed each alignment file to determine the average conservation of the uORFs, and grouped the alignments based on the number of conserved orthologous uORFs found. For example, using the 4 out of 5 dataset generated the 4 out of 4 and the 3 out of 4 datasets (Figure 1). We report uORFs from orthologous genes that shared sequence similarity because of our interest in finding functional uORFs.
The final step (Figure 1, Step 4) was manual curation to verify the predicted rice main coding region of each gene by comparing it with the genome annotation and other protein data. This was necessary, as uORFSCAN is expected to be sensitive to inaccurate (e.g., frame-shifts) and/or incomplete sequence data. For example, rice full length sequences can be incomplete because of failure of the 5' capping method . If the coding region is truncated, this can result in an internal methionine selected as the start codon and therefore the derived 5'-UTR is actually coding sequence, which is often highly conserved and can lead to false positive predictions.
Conserved upstream open reading frames appear to be rare
The uORFSCAN pipeline identified nine cDNAs containing uORFs that were conserved in the five out of five (5/5) dataset containing orthologous sequences in all 5 cereals (Table 1). Three of these cDNAs encoded multiple uORFs, one of the cDNAs being AdoMetDC, which has previously been reported to contain two uORFs . We manually curated all nine cDNAs and showed that they were all reliable based on our validation criteria (Table 2), which included the removal of the uORFs predicted from ribosomal rRNA genes (data not shown). The cDNAs included the multiple uORFs in S-adenosylmethionine decarboxylase cDNA , alkaline phytoceramidase cDNA, calcineurin B-like (CBL)-interacting protein kinase cDNA; and a single conserved uORF in a cDNA encoding an oxidoreductase protein, ribosomal protein S6 kinase, trehalose-6-phosphate phosphatase, ubiquitin-fold protein, F9L1.29 protein, and an ankyrin-3 protein.
To account for variable quality in assembled EST data, we also looked for cases where the uORFs (4/5, 3/5, 4/4, 3/4, and 3/3 result set) were conserved in only four out of five (4/5) orthologues and three out of five orthologues (see Additional files 1, 2, 3, 4, 5; Figure 1, Step 3.2). In brief, the 4/5 result set contains 16 rice genes with a total of 20 conserved uORFs in orthologous cereal genes, the 3/5 result set contains 44 rice genes with a total of 79 conserved uORFs in orthologous cereal genes, the 4/4 result set contains 16 rice genes with a total of 23 conserved uORFs in orthologous genes, the 3/4 result set contains 113 rice genes with a total of 129 conserved uORFs in orthologous genes, and finally the 3/5 result set contains 65 genes with a total of 93 conserved uORFs in orthologous genes.
In order to identify sequence dependent uORFs, we extended our search for cereal uORFs that might also be conserved in the dicot Arabidopsis by using the rice cDNAs that contained conserved uORFs in four other cereals (5/5 result set) and the Arabidopsis Tair 7 cDNA dataset (see Methods). The uORFSCAN pipeline identified 13 rice cDNAs containing uORFs that were conserved in Arabidopsis (Table 3). Four of these cDNAs encoded multiple uORFs. Of the 13 cDNAs with uORFs, only 11 were verified as reliable based on manual curation (Table 4) that removed the uORFs predicted from a cDNA encoding a helicase. Manual curation of the helicase cDNA revealed that the genome and protein annotation for the coding region extended further upstream than predicted by uORFSCAN, highlighting the limitations of using assembled EST data where frame-shift errors was the likely reason for the false positive prediction. The reliable predictions included the multiple uORFs found in a cDNA encoding ww domain containing protein, trehalose-6-phosphate phosphatase, GAMYB-binding protein, and ankyrin-3. The latter three cDNAs contained a combination of uORFs that were conserved between the cereals (rice and at least two other cereals) and Arabidopsis, and uORFs conserved between rice and Arabidopsis (Table 3). uORFSCAN also identified seven rice cDNAs containing a single uORF that were conserved in Arabidopsis and in almost all cases (except cDNA encoding an auxilin-like protein) the cereals as well (Table 3). They included the uORFs found in a cDNA encoding phosphatase 2a protein, homeodomain containing protein, S-Adenosylmethionine decarboxylase, auxilin-like protein, CBL-interacting protein kinase, protein kinase ATN1, and a hypothetical protein.
Position and occupation of uORFs in 5'-UTRs
Studies have shown that the position of an uORF within its 5'-UTR, which determines the pre-orf and intercistronic distances, can have profound effects on its function [18, 22]. When we examined the positions of cereal uORFs within their 5'-UTRs we found that there was no positional preference with the exception that they were not positioned too closely to the start of their individual 5'-UTR and coding region (Figure 2). For example, all of the uORFs conserved in five orthologous cereals (5/5 result set) and in Arabidopsis were at least positioned 20 nucleotides from the start of their 5'-UTR, which is thought to be the minimum number of nucleotides required for a functional uORF . We noticed that the intercistronic distances for these uORFs were generally shorter than the pre-orf distance. We also found seven uORFs, which included the functional small AdoMetDC uORF, that occupied greater than 20% of their individual 5'-UTR.
The length distribution of uORFs
Since earlier reports showed that plant uORFs can vary in length from 6 to 156 nucleotides [7, 9, 29–31, 46], we examined the length distribution of the cereal uORFs. There are two peaks in the distribution that were found between 1 to 40 nucleotides, and 81 to 120 nucleotides (Figure 3). The uORFs found in the first peak are tiny with 9 (out of 14) uORFs having a length of nine nucleotides. Some of these tiny uORFs could be artefactual as a result of point mutations that insert an in-frame start and/or stop codon in the 5'-UTR. If these artefactual uORFs were removed then the uORF length distribution would move towards a normal distribution. Seventy six percent of the uORFs in the length distribution are shorter than 100 nucleotides, and 48% are shorter than 40 nucleotides. The shortest conserved uORF found in four independent cDNAs was nine nucleotides, even though the cut-off length used by uORFSCAN to identify uORFs was six nucleotides (a start and a stop codon). One of the nine nucleotide uORFs was the 5' tiny uORF found in the S-adenosylmethionine decarboxylase cDNA , and three new uORFs, two found in a cDNA encoding alkaline phytoceramidase, and one in a cDNA encoding oxidoreductase, (Table 1). Two long conserved uORFs (>181 nucleotides) were found in cDNAs encoding protein kinases that included one uORF found in a cDNA encoding a CBL-interacting protein kinase and another uORF found in a cDNA encoding a ribosomal protein S6 kinase.
Sequence conservation in uORFs
The level of amino acid sequence conservation in cereal uORFs was generally high, as expected, based on our approach of reporting similar length orthologous uORFs that shared sequence similarity. For example, in the 5 out of 5 dataset the median value is 50% sequence similarity. When the two main datasets were included (uORFs conserved in all five cereals and uORFs conserved between rice and Arabidopsis), the median value is 36% sequence similarity. The uORFs conserved between the cereals (rice and at least two others) and Arabidopsis (median value of 36% sequence similarity) generally had a higher amino acid sequence similarity than those uORFs conserved between rice and Arabidopsis (median value of 28% sequence similarity). Given that the uORFs from orthologous genes were selected to be within a given length interval for alignment purposes, the high amino acid sequence similarity may suggest that these uORFs have a functional role (e.g., ribosomal stalling) that is mediated by the encoded uORF peptide.
Start codon context and codon usage of uORFs
The presence of uORFs does not mean that they will be translated. The sequence context of some plant uORFs has been shown to be sub-optimal for efficient initiation [47, 48]. We therefore examined the sequence context of our cereal uORF AUG codons to see if there was any sequence conservation that may aid in their ribosome initiation. We found that there were no informative positions in the uORF consensus sequence context (see Additional file 6 – Tran_FigureS1.eps) based on the observed number of positions that showed sequence conservation was not greater than expected by chance alone. However when the context of the AUGs demarcating the conserved uORFs were compared with the context of the AUG at the main ORF it was evident that the main ORF generally had a better sequence context denoted by a purine in the -3 position and a guanine in the +4 position (Table 5).
Recent work showed that ribosome stalling could occur at rare codons [49–53]. We therefore examined the codon usage of the identified uORFs to determine if they contained an increased number of rare codons. We showed that the frequencies of some codons had a p-Value less than <0.05 in the rice uORF codon usage compared to the rice main coding region based on a significant deviation of observed from expected numbers of uORF codons (Equations 1 and 2); however, the number of codons that had these p-Values were not greater than expected by chance (see Additional file 7 – Tran_FigureS2.eps).
Discussion and conclusion
Conserved uORFs are rare
This study provides a method to identify conserved uORFs from large assembled EST datasets. We developed a pipeline that used a modified reciprocal best hit method to identify putative orthologous sequences that were then analysed by a comparative R-nomics program called uORFSCAN to find conserved uORFs. We showed that this pipeline was successful in identifying 29 rice uORFs that are conserved at the amino acid level (median value of 36% sequence similarity) in wheat, barley, maize, sorghum, and in some cases (33%) Arabidopsis.
The number of conserved uORFs that share sequence similarity in the transcriptome of cereals appears to be low. This is consistent with reports of conserved uORFs in distantly related plants (i.e., rice and Arabidopsis)  and in Drosophila melanogaster . One explanation is that genes controlled at the level of translation by uORFs have low levels of transcription  and therefore are under- represented in cDNA and assembled EST databases. Another explanation for the low numbers of conserved cereal uORFs is that the uORFs have evolved in both length and sequence such that they no longer share sequence similarity across minor taxonomic groups (i.e., within the cereals) (Table 2 and 4). Furthermore, if the codon usage of cereal uORFs rather than the uORF-encoded peptide were a major controlling mechanism then amino acid sequence may not be conserved.
Cereal uORFs conserved in Arabidopsis
It has been shown that the amino acid sequence of uORFs in monocot and dicot plants can be similar . Sequence similarity was observed at the amino acid level across the major taxonomic groups (e.g., Arabidopsis and rice) (Table 3). We identified 11 rice genes that contained uORFs conserved in Arabidopsis, of which nine were also conserved in additional cereals (at least two others). For example, a rice cDNA encoding Ankyrin-3 contains an uORF that is conserved in the cereals and Arabidopsis, but it contains a nested uORF that appears to be conserved only in rice and Arabidopsis. Therefore, it is likely that after the split between the two major groups of angiosperms (monocots and dicots), the rice gene has gained an additional in-frame and internal start codon, that is not present in the other cereals, making a nested uORF that is shorter by 33 nucleotides. It would be of interest to determine if the nested uORF is still functional.
Conservation of uORF sequence within the cereals might simply reflect a relatively recent ancestor, rather than conservation of function, therefore it is difficult to predict whether these uORFs are likely to be sequence dependent or sequence independent uORFs [18, 19]. However, uORFs that are conserved across both monocots and dicots suggest that these uORFs have a role in a sequence dependent manner. Indeed, six rice uORFs (out of 15, excluding nested uORFs, Table 3) that were conserved in Arabidopsis had a notable amino acid composition that was rich in serine or arginine (at least 20%). It has been suggested that uORF peptides that are rich in serine could either promote or inhibit ribosomal stalling through their phosphorylation [12, 54], while arginine rich motifs can be involved in RNA binding . Interestingly, of these six rice uORFs two (AK101100 and AK067412) are found in genes involved in phosphorylation, a function that appears to be over-represented in this dataset (Table 3). We hypothesize that the main protein of these genes could have dual functions, the primary function is as a trans-acting factor in an unknown signalling cascade, and a secondary function as a regulator of mORF expression whereby the mORF protein phosphorylates the serine-rich uORF peptides, resulting in a conformational change that allows the uORF peptides to bind and stall ribosomes .
There are uORFs previously identified in Arabidopsis that were not identified in this study. For example, the Arabidopsis auxin response factor (ARF) genes ETTIN (ETT) and MONOPTEROS (MP) contain uORFs and while orthologues of these genes were found in the rice, sorghum and wheat assembled EST datasets, the uORFs showed no sequence similarity (by ClustalW) and were of different lengths (data not shown). Similarly, uORFs found in Arabidopsis genes AtMHX and AtNMT1 encoding encoding a tonoplast transporter  and a phosphoethanolamine N-methyltransferase  respectively were not identified because the uORFs were not conserved in rice and at least two other cereals. Finally, the gene containing the uORF in Arabidopsis sac51 encoding a bHLH-type transcription factor  could not be identified in our rice dataset as we could not identify a clear orthologue. Therefore, it will be of interest to monitor new rice full-length cDNAs and high quality sequences for cereals as they become available to see if more conserved uORFs can be found.
Recently, a pair-wise comparative approach was used to identify conserved uORFs within homology groups that also included paralogs and ohnologs (homologous genes arising by whole-genome duplication) using rice and Arabidopsis full-length cDNAs . Compared to the 11 genes we had identified Hayden and Jorgensen  reported that 19 genes contained conserved uORFs between rice and Arabidopsis. Interestingly only four genes (S-Adenosylmethionine decarboxylase, Trehalose-6-phosphate phosphatase, Auxilin-like protein, and Ankyrin-3) were in common highlighting the benefits of complementary search methods. For example, we used the modified reciprocal best hit method to find putative orthologues. It is likely that some of the homologue groups identified by Hayden and Jorgensen  may not be true orthologues. For example, homologue group 12 identified by Hayden and Jorgensen  were not reciprocal best hit pairs according to our analysis, and therefore are likely to be paralogues. Our approach is deliberately conservative, eliminating paralogues, to maximise the finding of all conserved uORFs independent of their length.
One possible criticism of our approach is that we have included uORFs as short as 9 nt. However, there are two independent reports that showed that the tiny uORF of SAMDC is functional [27, 59], although there is controversy regarding the type of effect and conditions under which the tiny uORF of SAMDC exerts its affect on downstream translation. Therefore, there is insufficient data to conclude one way or the other, and as such we have elected to be conservative. This has allowed us to find several genes (e.g. protein phosphatase 2a, a protein containing a ww domain, and GAMYB-binding protein) that were not found by Hayden and Jorgensen's 'uORF-Finder' program  because it only detected conserved uORFs greater than 63 codons.
Better quality assembled EST data is needed
One unavoidable limitation of using incomplete assembled EST data for orthology determination is that orthologues could be falsely assigned in situations where sequences have multiple protein domains. This will increase the number of putative orthologues identified prior to the prediction of uORFs, which is not necessarily harmful as these predictions are manually curated. However, to minimise this problem, we used a sequence coverage cutoff of at least 70% of any of the protein sequences in the alignment (see Methods). We also grouped the orthologues into several datasets representing the number of orthologues that could be found for each gene. For example, the datasets included rice genes that had orthologues in four other cereals (5 out of 5 dataset), in three other cereals (4 out of 5 dataset), and in two other cereals (3 out of 5 dataset). This grouping of orthologues will also help minimise the effects of missing, incomplete, or erroneous assembled EST data.
There are reports of conserved uORFs in monocots and dicots that share high sequence similarity that were not found by our pipeline, due to either lack of sequence conservation or due to limitations in the assembled ESTs currently available. For example, the uORF found in the basic region leucine zipper (bZIP)-type transcription factor AtB2/AtbZIP11 was found to be conserved in rice and barley , but not in the other cereals included in this study because the sequences are not represented in the other datasets. Current limitations include incomplete data (i.e., not all sequences are represented) and poor quality sequence data, leading to frame-shifts and incorrect prediction of uORFs. Therefore, it is possible to obtain higher numbers of conserved uORFs if the cluster size was relaxed to two out of five, but this approach would reduce the power of comparative R-nomics, and would require significant manual curation.
Sequence dependent and independent uORF
The cereal uORFs identified here are likely to encode bioactive peptides as selection has occurred at the peptide level. Those cereal uORFs that showed sequence conservation at the amino acid level with Arabidopsis are likely to be classified as sequence-dependent, as the encoded uORF peptide has remained conserved across the angiosperms, suggesting the peptide is directly involved in translational control  or has some other biological activity [12–15]. Some identified uORFs were conserved only within the cereals, indicating a relative recent origin or selective loss of the uORFs in Arabidopsis. We cannot rule out the possibility that some conserved cereal uORFs could also act in a sequence-independent manner, as a recent paper reported a conserved uORF in human and mouse ribosomal protein S6 kinase genes (the same finding by our analysis in cereals, Table 1), and suggested that the uORF translational control of the main ORF was through reinitiation . Experiments are needed to confirm the biological activity of the uORF in ribosomal protein S6 kinase gene.
The sequence context surrounding an uORF (ignoring secondary structure) does not appear to play a major role in its recognition and initiation of translation at an uORF. We hypothesize that this sub-optimal uORF sequence context (compared to optimal Kozak consensus  sequence for the main coding region) would allow for leaky scanning [48, 60] of the uORF, and preferential initiation at the downstream main coding region. An optimal uORF sequence context would provide rigid control in the translational regulation of the main coding region, as initiation would predominantly start at the uORF resulting in reduced availability of initiation factors, such as eIF2, for re-initiation at the downstream main open reading frame.
Sequence-independent uORFs allow for low-level translation of the downstream main coding region . Low-level translation is possible, as sequence-independent uORFs do not cause ribosomal stalling as seen in sequence-dependent uORFs. The regulatory mechanism of the sequence-independent uORF involves other factors (uORF recognition, length, stop codon environment, and the downstream intercistronic sequence) that influence reinitiation efficiency [18, 19], and more recently leaky scanning , to regulate downstream translation. We analysed the codon usage of conserved uORFs and found no preferential usage of rare codons in the uORFs. Therefore, it is unlikely that the uORF codon usage could contribute to low-level translation as seen for certain rare codons in Xenopus laevis  and Eschericia coli  that can reduce translation.
In conclusion, this study showed that the uORFSCAN pipeline is a useful tool for identifying conserved uORFs in closely related species. This pipeline has allowed us to identify 29 conserved uORFs in cereals. Possibly more conserved uORFs will be identified once the cDNA and assembled EST datasets become more comprehensive. These conserved rice uORFs will be useful for future functional analyses that should provide some perspective into downstream translational regulation by uORFs.
five prime untranslated region
upstream open reading frame
expressed sequence tag
reciprocal best hit
Clote P: RNALOSS: a web server for RNA locally optimal secondary structures. Nucleic acids research. 2005, 33 (Web Server issue): W600-4. 10.1093/nar/gki382.
Le Roch KG, Johnson JR, Florens L, Zhou Y, Santrosyan A, Grainger M, Yan SF, Williamson KC, Holder AA, Carucci DJ, Yates JR, Winzeler EA: Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle. Genome Res. 2004, 14 (11): 2308-2318. 10.1101/gr.2523904.
Mignone F, Gissi C, Liuni S, Pesole G: Untranslated regions of mRNAs. Genome Biol. 2002, 3 (3): reviews0004.1–reviews0004.10-10.1186/gb-2002-3-3-reviews0004.
Holcik M, Sonenberg N, Korneluk RG: Internal ribosome initiation of translation and the control of cell death. Trends Genet. 2000, 16: 469-473. 10.1016/S0168-9525(00)02106-5.
Kozak M: An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 1987, 15 (20): 8125-8148. 10.1093/nar/15.20.8125.
Raney A, Baron AC, Mize GJ, Law GL, Morris DR: In vitro translation of the upstream open reading frame in the mammalian mRNA encoding S-adenosylmethionine decarboxylase. J Biol Chem. 2000, 275 (32): 24444-24450. 10.1074/jbc.M003364200.
Wang L, Wessler SR: Role of mRNA secondary structure in translational repression of the maize transcriptional activator Lc(1,2). Plant Physiol. 2001, 125 (3): 1380-1387. 10.1104/pp.125.3.1380.
Yaman I, Fernandez J, Liu H, Caprara M, Komar AA, Koromilas AE, Zhou L, Snider MD, Scheuner D, Kaufman RJ, Hatzoglou M: The zipper model of translational control: a small upstream ORF is the switch that controls structural remodeling of an mRNA leader. Cell. 2003, 113 (4): 519-531. 10.1016/S0092-8674(03)00345-3.
Franceschetti M, Hanfrey C, Scaramagli S, Torrigiani P, Bagni N, Burtin D, Michael AJ: Characterization of monocot and dicot plant S-adenosyl-l-methionine decarboxylase gene families including identification in the mRNA of a highly conserved pair of upstream overlapping open reading frames. Biochem J. 2001, 353 (Pt 2): 403-409. 10.1042/0264-6021:3530403.
Jin X, Turcott E, Englehardt S, Mize GJ, Morris DR: The two upstream open reading frames of oncogene mdm2 have different translational regulatory properties. J Biol Chem. 2003, 278 (28): 25716-25721. 10.1074/jbc.M300316200.
Lovett PS, Rogers EJ: Ribosome regulation by the nascent peptide. Microbiol Rev. 1996, 60 (2): 366-385.
Hayden CA, Jorgensen RA: Identification of novel conserved peptide uORF homology groups in Arabidopsis and rice reveals ancient eukaryotic origin of select groups and preferential association with transcription factor-encoding genes. BMC Biol. 2007, 5: 32-10.1186/1741-7007-5-32.
Crowe ML, Wang XQ, Rothnagel JA: Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides. BMC Genomics. 2006, 7: 16-10.1186/1471-2164-7-16.
Hayden CA, Bosco G: Comparative genomic analysis of novel conserved peptide upstream open reading frames in Drosophila melanogaster and other dipteran species. BMC Genomics. 2008, 9: 61-10.1186/1471-2164-9-61.
Iacono M, Mignone F, Pesole G: uAUG and uORFs in human and rodent 5'untranslated mRNAs. Gene. 2005, 349: 97-105. 10.1016/j.gene.2004.11.041.
Gaba A, Wang Z, Krishnamoorthy T, Hinnebusch AG, Sachs MS: Physical evidence for distinct mechanisms of translational control by upstream open reading frames. EMBO J. 2001, 20 (22): 6453-6463. 10.1093/emboj/20.22.6453.
Luo Z, Sachs MS: Role of an upstream open reading frame in mediating arginine-specific translational control in Neurospora crassa. J Bacteriol. 1996, 178 (8): 2172-2177.
Vilela C, McCarthy JE: Regulation of fungal gene expression via short open reading frames in the mRNA 5'untranslated region. Mol Microbiol. 2003, 49 (4): 859-867. 10.1046/j.1365-2958.2003.03622.x.
Meijer HA, Thomas AA: Control of eukaryotic protein synthesis by upstream open reading frames in the 5'-untranslated region of an mRNA. Biochem J. 2002, 367 (Pt 1): 1-11. 10.1042/BJ20011706.
Hemmings-Mieszczak M, Hohn T, Preiss T: Termination and peptide release at the upstream open reading frame are required for downstream translation on synthetic shunt-competent mRNA leaders. Mol Cell Biol. 2000, 20 (17): 6212-6223. 10.1128/MCB.20.17.6212-6223.2000.
Ruiz-Echevarria MJ, Peltz SW: The RNA binding protein Pub1 modulates the stability of transcripts containing upstream open reading frames. Cell. 2000, 101 (7): 741-751. 10.1016/S0092-8674(00)80886-7.
Vilela C, Ramirez CV, Linz B, Rodrigues-Pousada C, McCarthy JE: Post-termination ribosome interactions with the 5'UTR modulate yeast mRNA stability. EMBO J. 1999, 18 (11): 3139-3152. 10.1093/emboj/18.11.3139.
Wu C, Amrani N, Jacobson A, Sachs MS: The use of fungal in vitro systems for studying translational regulation. Methods Enzymol. 2007, 429: 203-225. 10.1016/S0076-6879(07)29010-X.
Spevak CC, Park EH, Geballe AP, Pelletier J, Sachs MS: her-2 upstream open reading frame effects on the use of downstream initiation codons. Biochem Biophys Res Commun. 2006, 350 (4): 834-841. 10.1016/j.bbrc.2006.09.128.
Selpi, Bryant CH, Kemp GJL, Cvijovic M: A first step towards learning which uORFs regulate gene expression. J Int Bioinformatics. 2006, 3 (2): 31-44.
Kochetov AV, Ahmad S, Ivanisenko V, Volkova OA, Kolchanov NA, Sarai A: uORFs, reinitiation and alternative translation start sites in human mRNAs. FEBS Lett. 2008, 582 (9): 1293-1297. 10.1016/j.febslet.2008.03.014.
Hu WW, Gong H, Pua EC: The pivotal roles of the plant S-adenosylmethionine decarboxylase 5' untranslated leader sequence in regulation of gene expression at the transcriptional and posttranscriptional levels. Plant Physiol. 2005, 138 (1): 276-286. 10.1104/pp.104.056770.
Tassoni A, Franceschetti M, Tasco G, Casadio R, Bagni N: Cloning, functional identification and structural modelling of Vitis vinifera S-adenosylmethionine decarboxylase. J Plant Physiol. 2007, 164 (9): 1208-1219. 10.1016/j.jplph.2006.07.009.
Locatelli F, Magnani E, Vighi C, Lanzanova C, Coraggio I: Inhibitory effect of myb7 uORF on downstream gene expression in homologous (rice) and heterologous (tobacco) systems. Plant Mol Biol. 2002, 48 (3): 309-318. 10.1023/A:1013340004348.
Lohmer S, Maddaloni M, Motto M, Salamini F, Thompson RD: Translation of the mRNA of the maize transcriptional activator Opaque-2 is inhibited by upstream open reading frames present in the leader sequence. Plant Cell. 1993, 5 (1): 65-73. 10.1105/tpc.5.1.65.
Wang L, Wessler SR: Inefficient reinitiation is responsible for upstream open reading frame-mediated translational repression of the maize R gene. Plant Cell. 1998, 10 (10): 1733-1746. 10.1105/tpc.10.10.1733.
Wiese A, Elzinga N, Wobbes B, Smeekens S: A conserved upstream open reading frame mediates sucrose-induced repression of translation. Plant Cell. 2004, 16 (7): 1717-1729. 10.1105/tpc.019349.
Ng DW, Chandrasekharan MB, Hall TC: The 5' UTR negatively regulates quantitative and spatial expression from the ABI3 promoter. Plant Mol Biol. 2004, 54 (1): 25-38. 10.1023/B:PLAN.0000028767.06820.34.
Ditzer A, Bartels D: Identification of a dehydration and ABA-responsive promoter regulon and isolation of corresponding DNA binding proteins for the group 4 LEA gene CpC2 from C. plantagineum. Plant Mol Biol. 2006, 61 (4-5): 643-663. 10.1007/s11103-006-0038-3.
Nishimura T, Wada T, Yamamoto KT, Okada K: The Arabidopsis STV1 protein, responsible for translation reinitiation, is required for auxin-mediated gynoecium patterning. Plant Cell. 2005, 17 (11): 2940-2953. 10.1105/tpc.105.036533.
Pesole G, Gissi C, Grillo G, Licciulli F, Liuni S, Saccone C: Analysis of oligonucleotide AUG start codon context in eukaryotic mRNAs. Gene. 2000, 261 (1): 85-91. 10.1016/S0378-1119(00)00471-6.
Pavesi G, Zambelli F, Pesole G: WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences. BMC Bioinformatics. 2007, 8: 46-10.1186/1471-2105-8-46.
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002, 296 (5565): 79-92. 10.1126/science.1068037.
Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A: Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science. 2003, 301 (5631): 376-379. 10.1126/science.1081288.
Cenci A, Chantret N, Kong X, Gu Y, Anderson OD, Fahima T, Distelfeld A, Dubcovsky J: Construction and characterization of a half million clone BAC library of durum wheat (Triticum turgidum ssp. durum). Theor Appl Genet. 2003, 107 (5): 931-939. 10.1007/s00122-003-1331-z.
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X: The sequence of the human genome. Science. 2001, 291 (5507): 1304-1351. 10.1126/science.1058040.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.
Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y: Predicting function: from genes to genomes and back. J Mol Biol. 1998, 283 (4): 707-725. 10.1006/jmbi.1998.2144.
Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science. 1997, 278 (5338): 631-637. 10.1126/science.278.5338.631.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.
Hanfrey C, Franceschetti M, Mayer MJ, Illingworth C, Michael AJ: Abrogation of upstream open reading frame-mediated translational control of a plant S-adenosylmethionine decarboxylase results in polyamine disruption and growth perturbations. J Biol Chem. 2002, 277 (46): 44131-44139. 10.1074/jbc.M206161200.
Joshi CP, Zhou H, Huang X, Chiang VL: Context sequences of translation initiation codon in plants. Plant Mol Biol. 1997, 35 (6): 993-1001. 10.1023/A:1005816823636.
Wang XQ, Rothnagel JA: 5'-untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. Nucleic Acids Res. 2004, 32 (4): 1382-1391. 10.1093/nar/gkh305.
Fernandez J, Yaman I, Huang C, Liu H, Lopez AB, Komar AA, Caprara MG, Merrick WC, Snider MD, Kaufman RJ, Lamers WH, Hatzoglou M: Ribosome stalling regulates IRES-mediated translation in eukaryotes, a parallel to prokaryotic attenuation. Mol Cell. 2005, 17 (3): 405-416. 10.1016/j.molcel.2004.12.024.
Meijer HA, Thomas AA: Ribosomes stalling on uORF1 in the Xenopus Cx41 5' UTR inhibit downstream translation initiation. Nucleic Acids Res. 2003, 31 (12): 3174-3184. 10.1093/nar/gkg429.
Chumpolkulwong N, Sakamoto K, Hayashi A, Iraha F, Shinya N, Matsuda N, Kiga D, Urushibata A, Shirouzu M, Oki K, Kigawa T, Yokoyama S: Translation of 'rare' codons in a cell-free protein synthesis system from Escherichia coli. J Struct Funct Genomics. 2006, 7 (1): 31-36. 10.1007/s10969-006-9007-y.
Shu P, Dai H, Gao W, Goldman E: Inhibition of translation by consecutive rare leucine codons in E. coli: absence of effect of varying mRNA stability. Gene Expr. 2006, 13 (2): 97-106. 10.3727/000000006783991881.
Col B, Oltean S, Banerjee R: Translational regulation of human methionine synthase by upstream open reading frames. Biochim Biophys Acta. 2007, 1769 (9-10): 532-540.
Wang X, Proud CG: A novel mechanism for the control of translation initiation by amino acids, mediated by phosphorylation of eukaryotic initiation factor eIF2B. Mol Cell Biol. 2008, 28 (5): 1429-1442. 10.1128/MCB.01512-07.
Bayer TS, Booth LN, Knudsen SM, Ellington AD: Arginine-rich motifs present multiple interfaces for specific binding by RNA. Rna. 2005, 11 (12): 1848-1857. 10.1261/rna.2167605.
David-Assael O, Saul H, Saul V, Mizrachy-Dagri T, Berezin I, Brook E, Shaul O: Expression of AtMHX, an Arabidopsis vacuolar metal transporter, is repressed by the 5' untranslated region of its gene. J Exp Bot. 2005, 56 (413): 1039-1047. 10.1093/jxb/eri097.
Tabuchi T, Okada T, Azuma T, Nanmori T, Yasuda T: Posttranscriptional regulation by the upstream open reading frame of the phosphoethanolamine N-methyltransferase gene. Biosci Biotechnol Biochem. 2006, 70 (9): 2330-2334. 10.1271/bbb.60309.
Imai A, Hanzawa Y, Komura M, Yamamoto KT, Komeda Y, Takahashi T: The dwarf phenotype of the Arabidopsis acl5 mutant is suppressed by a mutation in an upstream ORF of a bHLH gene. Development. 2006, 133 (18): 3575-3585. 10.1242/dev.02535.
Hanfrey C, Elliott KA, Franceschetti M, Mayer MJ, Illingworth C, Michael AJ: A dual upstream open reading frame-based autoregulatory circuit controlling polyamine-responsive translation. J Biol Chem. 2005, 280 (47): 39229-39237. 10.1074/jbc.M509340200.
Smith E, Meyerrose TE, Kohler T, Namdar-Attar M, Bab N, Lahat O, Noh T, Li J, Karaman MW, Hacia JG, Chen TT, Nolta JA, Muller R, Bab I, Frenkel B: Leaky ribosomal scanning in mammalian genomes: significance of histone H4 alternative translation in vivo. Nucleic Acids Res. 2005, 33 (4): 1298-1308. 10.1093/nar/gki248.
Child SJ, Miller MK, Geballe AP: Translational control by an upstream open reading frame in the HER-2/neu transcript. J Biol Chem. 1999, 274 (34): 24335-24341. 10.1074/jbc.274.34.24335.
Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.
The authors thank Julian Schwerdt for high performance computing support at the South Australian Partnership for Advanced Computing (SAPAC), Andreas Schreiber for statistical support, and Rodney Davies for helpful discussions. This work was supported by the Australian Centre for Plant Functional Genomics (ACPFG) funded by Grains Research and Development Corporation (GRDC), Australian Research Council (ARC), the University of Adelaide, and the Government of South Australia. Michael Tran is the recipient of an ACPFG Postgraduate Scholarship.
MT conducted the research, analysed the data and drafted the manuscript. UB and CJS designed the research, participated in the study design, coordinated the study and contributed to the final manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 6: . 'The pattern of nucleotide sequence conservation calculated for the decanucleotide surrounding the uORF AUG triplet using WebLogo '. The overall height of each stack indicates the nucleotide sequence conservation at that position (measured in bits), whereas the height of nucleotide symbols (A, T, G, C) within the stack reflects the relative frequency of the corresponding nucleotide at that position. (B) Positions showing detectable nucleotide sequence conservation were magnified. (EPS 8 MB)
Additional file 7: . 'Relative frequencies of codons showing significant deviation in codon usage between rice uORFs and rice main coding regions'. Rice uORF codon usage calculated from http://www.bioinformatics.org/sms/codon_usage.html. (EPS 400 KB)
Additional file 8: TRAN_TableS6. 'ClustalW alignment of uORFs identified by uORFSCAN in 5 out of 5 cereals and in Arabidopsis'. (DOC 63 KB)
About this article
Cite this article
Tran, M.K., Schultz, C.J. & Baumann, U. Conserved upstream open reading frames in higher plants. BMC Genomics 9, 361 (2008). https://doi.org/10.1186/1471-2164-9-361
- Rice Gene
- Rare Codon
- Alternative Splice Form
- Upstream Open Reading Frame
- High Amino Acid Sequence Similarity