- Research article
- Open Access
Evolution of alternative and constitutive regions of mammalian 5'UTRs
© Resch et al; licensee BioMed Central Ltd. 2009
- Received: 17 January 2009
- Accepted: 16 April 2009
- Published: 16 April 2009
Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized.
We evaluated the abundance, conservation and evolution of putative regulatory control elements, namely, upstream start codons (uAUGs) and open reading frames (uORFs), in the 5'UTRs of human and mouse genes impacted by alternative events. For genes with alternative 5'UTRs, the fraction of alternative sequences (those present in a subset of the transcripts) is much greater than that in the corresponding coding sequence, conceivably, because 5'UTRs are not bound by constraints on protein structure that limit AS in coding regions. Alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive portions. This relatively weak selection results in over-abundance of uAUGs and uORFs in the alternative regions of 5'UTRs compared to constitutive regions. Nevertheless, even in alternative regions, uORFs evolve under a stronger selection than the rest of the sequences, indicating that some of the uORFs are conserved regulatory elements; some of the non-conserved uORFs could be involved in species-specific regulation.
The findings on the evolution and selection in alternative and constitutive regions presented here are consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes are likely to contribute to the regulation of translation.
- Alternative Splice
- Alternative Event
- Alternative Region
- Potential Regulatory Element
- Alternative Transcription Initiation
Alternative splicing (AS) has emerged as a major mechanism for regulating gene expression and function in animals, particularly, in mammals. Large-scale studies based on mapping of expressed sequence data on genomic sequences have produced estimates of as many as 30–60% of human genes undergoing alternative splicing [1–5]. The impact of alternative splicing on protein function has been studied in great detail and is generally recognized as a source of protein diversity that expands the repertoire of protein function [6–8].
In contrast, little is known about the prevalence and impact of alternative events, such as AS and alternative transcription initiation (ATI), in 5'-untranslated regions (5'UTRs). AS and ATI are the primary sources of 5'UTR transcript diversity, and several reports have conjectured that these mechanisms might play an important role in orchestrating complex regulatory mechanisms within the 5'UTRs [9–13]. Estimates of the number of genes with alternative 5'UTRs vary from 12%  to 22% , while estimates of alternative promoter usage range from 10%  to 18% . Anecdotally, studies have shown that alternative events are responsible for 5'UTR transcript diversity in mammals, but to our knowledge, there have been no detailed, genome-wide studies aimed at elucidating the functional role of transcript diversity in mammalian 5'UTRs.
The bias toward studying AS in the coding regions versus 5'UTR probably reflects two obvious sources of complications in the analysis of transcript diversity in UTRs. First, it is easier to assess the functional impact of AS in protein-coding regions because elimination or disruption of known protein domains resulting from AS is readily interpretable [6–8]. Second, boundaries of the coding sequence typically are identified with relative ease because protein sequences are defined by their start and stop codons whereas precise delineation of the 5'UTR often is problematic.
Translational efficiency of eukaryotic mRNAs depends on the presence of regulatory elements within the 5'UTR. It has been shown that the occurrence of initiation codons and open reading frames upstream of the authentic start codon (uAUGs and uORFs, respectively) can affect the translation of mRNA into protein. The presence of uAUGs and uORFs in mammalian 5'UTRs is typically associated with translational repression [10, 17] but cases of increased translation efficiency also have been reported . A complementary computational study has demonstrated substantial conservation of uAUGs and uORFs in 5'UTRs of mammalian mRNAs, suggesting that at least some of these elements are functionally important . These experimental and computational findings raise the possibility that 5'UTR diversity has the potential to produce mRNA isoforms that differ with respect to their uAUG and uORF content, which could be an important facet of the regulation of translation.
Experimental evidence shows that 5'UTR transcript diversity is achieved during transcription, via ATI, and after transcription, via AS. In some instances, both mechanisms are employed. For example, AS in the 5'UTR of human axin2, a negative regulator of Wnt/B-catenin signaling, generates three isoforms with different arrangements of uAUGs and uORFs, resulting in a set of 5'UTRs that each confer different mRNA stabilities and translational efficiencies upon the respective isoforms . Similarly, an alternatively spliced exon located in the 5'UTR of neuronal nitric-oxide synthase (nNOS) has been shown to introduce a translational control element that inhibits translation of the mRNA . The diversity and complexity of the mu-opioid receptor gene expression is achieved by a combination of alternative splicing and alternative promoter usage . Translational repression of the mouse mu-opioid receptor expression using uORFs and leaky scanning has been recently reported .
The significance of the ATI and AS mechanisms in generating 5'UTR transcript diversity lies with the ability to alter the 5'UTR landscape by rearranging both the number and type of translational control elements included in each transcript. Slight differences in the arrangement of translational control elements between isoforms can lead to major changes in regulatory effects on translation. For instance, translational regulation of the multidrug resistance-associated protein 2 (Mrp2) is mediated by ATI. The 5'UTR of Mrp2 contains four different transcription start sites and three uORFs, and experimental data show that uORF3 has a much stronger inhibitory effect on translation than uORF1 and uORF2 . A combination of ATI and AS in Dicer, a ribonuclease that mediates RNA interference at the transcriptional and post-transcriptional levels, appears to regulate translational efficiency as well, resulting in long and short transcript variants. Both variants encode uAUGs (9 in the long form and five in the short form), and although both forms show decreased levels of translation, the longer form appears to exhibit greater inhibitory effects, probably, because of the increased number of uAUGs . Likewise, Tie2, an endothelium-specific receptor tyrosine kinase required for blood vessel maturation, contains multiple transcription start sites and encodes five uORFs. Apparently, the greater the number of uORFs contained in the 5'UTR, the greater the inhibitory effects on translation, suggesting an accumulative effect on the overall level of translational efficiency .
In addition, it has been shown that both evolutionarily conserved and non-conserved regulatory control elements exhibit inhibitory effects, suggesting that non-conserved control elements may regulate at a species-specific level. For example, human oncogene mdm2 contains two uORFs, both conserved in mouse. Although the inhibitory effect of uORF1 exceeds that of uORF2, both uORFs are required for maximum inhibition of translation . In contrast, the human stress response transcription factor ATF5 contains two uORFs that are conserved in mouse, as well as three non-conserved species-specific uAUGs. The alternative transcript encoding the conserved uORFs exhibits a stronger inhibitory effect than the transcript encoding the non-conserved uAUGs, but the effect of the non-conserved elements is non-negligible as well [23, 24].
We performed a genome-wide comparative study of 5'UTR sequences in primates and rodents with the principal goal of understanding how alternative events impact regulation of translation. To this end, we compared the abundance, conservation and evolution of translational control elements within alternative and constitutive regions of 5'UTR. Alternative nucleotides were not classified according to type of alternative event, because the focus of the study was to examine the broad impact of transcript diversity on mammalian 5'UTRs. Accordingly, we reasoned that the regulatory effects of uAUGs and uORFs located in alternative regions of 5'UTRs should be the same regardless of whether the existing transcript diversity for the given gene is generated via ATI, AS, or both. We find that, although alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive regions, they possess extensive potential for translation regulation.
5'UTR statistics for human and mouse
A genome-wide comparative analysis of 5'UTRs in human and mouse was carried out to investigate the prevalence, conservation, and evolution of putative translational control elements within alternative and constitutive regions of mammalian 5'UTRs. Starting with high quality annotation from the ASD Database, we restricted our analysis of alternative events to those that involved exclusively 5'UTR sequences, and limited the pool of alternative transcript data to reliably identified, high quality isoforms with mRNA evidence and known protein-coding sequences (see Methods for details). A total of 7735 human isoforms (with 5'UTR sequences) were used to identify 2915 human genes with alternative 5'UTRs, and 2165 mouse isoforms (also with 5'UTR sequences) were similarly used to identify a set of 909 mouse genes with alternative 5'UTRs (hereinafter ALT_5'UTR sets). These stringent criteria likely yield a relatively small subset of the complete set of genes with alternative 5'UTRs, but were chosen to eliminate potentially unreliable splice predictions and alignment artifacts from biasing the results. Our ALT_5'UTR subsets represent 12% of human genes and 3.4% of mouse genes. Previous reports suggest that 10–22% of human genes [5, 14, 15] and 19–20% of mouse genes [25, 26] contain alternative 5'UTRs. Our estimates for mouse are lower than those previously reported [25, 26]; most likely, these low values reflect the fact that there is less high quality annotation for alternative events in 5' UTRs available for mouse than there is for human. Our estimates, which are based on mRNA evidence, probably, represent the lower bounds for the number of mammalian genes with ALT_5'UTRs.
5'UTR statistics for ALT and nonALT gene sets
Genes with uORFs
Higher prevalence of alternative sequences in 5'UTRs compared to coding regions
Number of alternative and constitutive nucleotides in 5'UTR and CDS
Excess of potential regulatory elements in alternative regions of 5'UTR
Given that the fraction of alternative sequence is greater in the 5'UTRs than in the coding region among genes with alternative 5'UTR events, it is natural to ask how often 5'UTRs from this set harbor putative control elements that might be involved in translation regulation. We examined the organization of putative regulatory motifs within ALT and CONSTIT regions of 5'UTR by searching for potential regulatory elements including uAUGs and uORFs.
Frequency of uAUGs and uORFs in alternative and constitutive regions
Total uAUGs mapped to genomic
uAUGs mapped to ALT
uAUGs mapped to CONSTIT
uAUGs included in evolutionary conservation analysis in Human-Macaque and Mouse-Rat orthologs
Total uORFs mapped to genomic
uORFs mapped to ALT
uORFs mapped to CONSTIT
uORFs fully contained within ALT
uORFs fully contained within CONSTIT
uORFs included in evolutionary conservation analysis in Human-Macaque and Mouse-Rat orthologs
Observed and expected frequency of AUG triplets and shuffled triplets
The efficiency of translation initiation depends on the arrangement of nucleotides surrounding the translational start codon, so the AUG context is thought to be an important regulatory factor (reviewed in ). The nucleotide contexts of uAUGs located in ALT and CONSTIT regions were evaluated using previously published methods , and no significant differences were detected in either human or mouse. The majority of uAUGs in ALT and CONSTIT regions exhibited weak contexts (Additional file 2), consistent with previous results [19, 28] and with the above conclusion on selection against AUGs in 5'UTRs, in that uAUGs are unlikely to efficiently initiate translation.
Next, we estimated the size and abundance of uORFs in ALT and CONSTIT regions of 5'UTR in order to assess the role of uORFs as potential control elements. The uORFs were identified in the 5'UTR sequences of transcripts, mapped to the genomic sequence, and then classified as ALT or CONSTIT using the same approach that was applied to the mapping of uAUGs. To be included in the analysis, a uORF must be fully contained in either the ALT or CONSTIT regions of the corresponding 5'UTR (uORFs that spun the boundaries between ALT and CONSTIT regions were removed from the final data set). The great majority of uORFs (81% and 79% in mouse) indeed are fully contained within ALT or CONSTIT regions, indicating that restricting the analysis to these ORFs would not significantly bias the results.
We further addressed the question whether uORFs are more prevalent in 5'UTRs of genes that contain alternative regions? Comparison of uORF abundance between genes in the ALT and nonALT control sets indicates that ~53% of human and ~51% of mouse ALT genes contain uORFs, whereas only ~44% of human and ~42% of mouse nonALT control genes contain uORFs, suggesting that uORFs are indeed more common in 5'UTRs containing alternative regions (Table 1). Comparison of uORF densities (the number of uORFs per gene) between genes from the ALT and nonALT control sets showed substantially greater densities in the ALT sets in both human (P = 1.46 × 10-11; Student's t-test) and mouse (P = 0.0004; Student's t-test) (Additional file 1). Genes from the ALT set contain an average of 3.1 and 2.7 uORFs per gene in human and mouse, respectively, whereas genes from the nonALT control set contain an average of 2.4 uORFs per gene in human and 2.2 in mouse. A comparison of uORF length distributions between the ALT and nonALT sets showed that uORFs from the nonALT control set are significantly longer than uORFs located in constitutive regions of ALT genes, but shorter than uORFs located in alternative regions of ALT genes (Table 1 and Additional file 1). Taken together, these observations show that genes with alternative 5'UTRs are more likely to encode uORFs than genes without such regions, and that uORFs are more abundant in alternative than in constitutive regions. The results obtained for uORFs are fully consistent with those for the uAUGs and suggest that there is a weaker selection against translation initiation upstream of the authentic AUG in alternative regions of mammalian 5'UTRs than there is in constitutive regions. From a complementary perspective, one would note that alternative regions contain a greater concentration of potential control elements that could regulate translation.
We examined the nucleotide composition of human and mouse 5'UTRs and found that GC-contents for human and mouse 5'UTRs were estimated, respectively, at 60% and 59% (Additional file 2), in good agreement with previous reports [27, 28]. No significant differences in nucleotide composition or GC-content were detected between ALT and CONSTIT regions of the 5'UTR. Thus, the excess of uORFs in alternative versus constitutive regions of 5'UTRs is not caused by differences in nucleotide compositions of these regions.
Conservation of putative control elements in constitutive and alternative regions of mammalian 5'UTRs
Conservation of uAUGs and uORFs in alternative and constitutive regions
Analysis of uORF conservation yielded similar results, i.e., 71.8% of CONSTIT uORFs and 60.5% of ALT uORFs were conserved between human and macaque (P = 6.5 × 10-9; Fisher's exact test), and the corresponding values for mouse-rat were 56.6% and 41.5% (P = 0.001; Fisher's exact test) (Table 5). With regard to the substantially lower conservation of uORFs in alternative regions, it is necessary to point out that, although evolutionary conservation is a strong indicator of the functional relevance of the corresponding element, it is not a strict requirement. In particular, both conserved and non-conserved control elements have been implicated in translational repression [22, 23, 37]; obviously, non-conserved elements are more likely to exert species-specific regulation. For instance, the 5'UTR of the mu-opioid receptor gene contains non-conserved control elements that inhibit translation in a species-specific fashion . Similarly, species-specific patterns of 5'UTR alternative exon usage have been linked to species-specific patterns of tissue expression [39–41].
A recent analysis of uORF conservation in four Cryptococcus species has led to the estimate that approximately one-third of the uORFs are conserved owing to their importance for the regulation of translational efficiency . Substantial conservation of uORFs has also been observed in 5'UTRs of Saccharomyces  and plants , findings that are compatible with the possibility that many uORFs are associated with biological functions. Neafsey and Galagan report that the majority of conserved uORFs in Cryptococcus do not exhibit codon usage bias or conservation at the amino acid level, effectively ruling out the possibility that a significant fraction of the uORFs encode functional peptides . We performed a similar analysis and found no evidence of codon usage bias among the human and mouse uORF sequences included in this study. We repeated this analysis after partitioning the uORFs according to ALT and CONSTIT in human, and found that, in general, profiles of relative codon frequencies are similar between all codons in ALT and CONSTIT uORFs (Additional file 1). Thus, it appears likely that, to the extent that they are functional, most of the uORFs are control elements, although a minority might encode biologically relevant peptides. Indeed, a recent proteomic analysis resulted in the identification of 54 proteins, less than 100 amino acids in length each, that are suspected of being translated from uORFs located in the 5'UTRs of human mRNAs .
Fast evolution of alternative regions in mammalian 5'UTRs
Substitution rates for alternative and constitutive regions of 5'UTR
0.045 ± 0.001
0.052 ± 0.001
0.047 ± 0.001
0.054 ± 0.001
≥ 30 bp
0.046 ± 0.001
0.051 ± 0.002
0.040 ± 0.001
0.046 ± 0.002
0.042 ± 0.001
0.051 ± 0.002
≥ 30 bp
0.041 ± 0.002
0.048 ± 0.003
K 5(uORFs excluded)
0.092 ± 0.002
0.111 ± 0.002
0.099 ± 0.002
0.098 ± 0.001
≥ 30 bp
0.094 ± 0.004
0.114 ± 0.005
0.088 ± 0.003
0.103 ± 0.004
0.093 ± 0.002
0.100 ± 0.001
≥ 30 bp
0.091 ± 0.004
0.108 ± 0.007
The present analysis indicates that ALT regions of mammalian 5'UTRs evolve faster than CONSTIT regions, in all likelihood, owing to relaxed purifying selection. There is no consensus regarding trends in evolutionary rates between ALT and CONSTIT regions of coding sequence (CDS) . Nevertheless, most comparisons indicate a higher non-synonymous substitution rate (Ka) in alternative exons [47, 48], but a significantly lower synonymous substitution rate (Ks) , resulting in a much higher Ka/Ks ratio than in constitutive exons. The increased Ka values in alternative regions, at least in part, seem to be due to positive selection at the protein sequence level . The cause of the low Ks values in ALT regions is unclear, but might have to do with more stringent requirements for RNA secondary structure in AS [49, 51]. Here we did not observe this paradoxical relationship between Ka and Ks in ALT and CONSTIT uORFs in mammalian 5'UTRs (even after removing short uORFs from the calculation), but instead found a consistent pattern for all measured rates (Table 6). The likely explanation for this apparent difference between the evolution of alternative regions in 5'UTRs and CDS is that for uORFs, purifying selection at the amino acid level is weak at best and there is no positive selection; in contrast, selection for RNA secondary structure would almost equally affect Ka and Ks.
Tight regulation of expression in genes with alternative 5'UTRs
We characterized the biological roles of genes that contain alternative regions in 5'UTRs, by examining patterns of gene expression and functional annotation for genes and transcripts within the ALT_5'UTR sets. Gene expression patterns for ALT_5'UTRs were evaluated using Atlas2 microarray and expressed sequence tag (EST) data. Gene Ontology annotation was used to classify ALT_5'UTRs according to biological process, molecular function and cellular localization patterns.
Average probe expression levels were calculated using probe data for the human genes in the analyzed set. Atlas2 expression data were separated into two groups: the ALT_5'UTR set (2097 probes) and the nonALT CONTROL set (8060 probes). We calculated average probe expression levels across the 79 tissue types in human and found that transcripts from the ALT_5'UTR set were expressed at significantly lower levels, on average, than transcripts from the nonALT CONTROL set (P = 7.3 × 10-13; Student's t-test). Furthermore, when sets of 1611 ALT and nonALT transcripts with size-matched 5'UTRs were compared, the statistically significant difference remained (P = 5.0 × 10-5; Student's t-test). This observation indicates that the difference in expression levels could not be explained, simply, by the greater average length of the 5'UTRs in the ALT_5'UTR set.
Average probe expression levels were also calculated using EST abundance as a measure of expression. The number of gene-specific EST sequences in the EST databases gives a reasonably accurate approximation of relative gene expression . Alignments of transcript sequences from ALT_5'UTR and nonALT CONTROL sets with ESTs from the human normal tissue GenBank EST libraries were selected for analysis using thresholds given in the Methods section. Gene expression levels based on the analysis of EST database were calculated for 57 different human tissues. Significant differences between the EST data for ALT_5'UTR and nonALT CONTROL sets was demonstrated with a Monte Carlo approximation of Fisher's exact test (P < 10-5); compatible results were obtained with sets of 5733 transcripts with length-matched 5'UTRs (P < 10-3). Thus, the analysis of EST abundance data confirmed that transcripts from the ALT_5'UTR set that contain alternative regions in 5'UTRs are expressed at lower levels, on average, than transcripts from the nonALT CONTROL set, and the difference cannot be explained solely through the length differences between the 5'UTRs.
Functional classification of human genes with alternative 5'UTRs
response to stimulus
G-protein coupled receptor protein signaling pathway
rhodopsin-like receptor activity
integral to membrane
nucleic acid binding
proteinaceous extracellular matrix
regulation of apoptosis
Qualitatively, the two groups of observations, those on the lower expression level of genes containing alternative regions in 5'UTRs and those on the tight regulation of the corresponding genes, appear congruent and compatible with plausible hypothesis that these genes are subject to especially strong down-regulation at both levels, transcription and translation, with the control elements in the alternative regions involved in the latter. This hypothesis is strengthened by results from a recent study which show that uORF-containing transcripts, on average, are expressed at lower levels and have shorter half-lives than transcripts without uORFs . Thus, the effects of translational repression in uORF-containing transcripts, in part, might be achieved via an RNA decay mechanism.
All findings presented here seem to be consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes contribute to the regulation of translation. At least within the set of genes that was conservatively defined to include genes with reliably demonstrated alternative events within 5'UTRs, the fraction of the 5'UTRs that is involved in an alternative event is much greater than that in the corresponding coding regions. In retrospect, this finding might not be particularly unexpected considering that 5'UTRs are not bound by constraints on protein structure and function, which limit the number of alternative nucleotides that are admissible in coding regions. The ratio of alternative-to-constitutive nucleotides is much higher in 5'UTRs than in coding regions, suggesting the possibility that alternative regions play a major role in regulating translation of the respective genes. With regard to this hypothesis, the results of the present analysis are somewhat ambiguous. The alternative regions of mammalian 5'UTRs contain a greater density of potential control elements (uAUGs and uORFs) than constitutive regions, but these elements are less conserved than those in constitutive regions. Thus, the fraction of conserved control elements among the uORFs contained within the alternative regions of mammalian 5'UTRs is lower than that in constitutive regions. Nevertheless, the uORF sequences even in alternative regions are, on average, subject to a stronger selection than the rest of the sequence. Furthermore, there is anecdotal experimental evidence that implicates non-conserved uORFs in translation regulation. Therefore, such variable elements provide considerable potential for regulation that, however, can be adequately explored only by direct experimentation.
The genes containing alternative regions in the 5'UTRs are relatively lowly expressed and typically belong to functional categories such as transcription factors, receptors, and other signaling pathway molecules, whose expression is strongly and tightly controlled. Together with the observations on the patterns of evolution of potential regulatory elements, these trends indicate that AS and ATI are probably important mechanisms of translation regulation in mammals.
The 5'UTR data set
Alternative splicing data were obtained from the ASD Database for human (Ensembl v36.35i) and mouse (Ensembl v37.34e) . Those ASD isoforms that were generated on the basis of computational predictions and/or EST evidence alone were discarded, in effort to eliminate potentially unreliable splice predictions and alignment artifacts from biasing the results. In order to be included in the analysis, isoforms had to meet the following criteria: 1) isoforms must have supporting mRNA evidence, and 2) start and stop codon positions for coding sequences (CDS) must be known. Start codon positions were used to identify the subset of isoforms that included 5'UTR|CDS boundaries. The 7735 human transcripts selected by these criteria were used to identify 2915 human genes on the basis of the ASD information, and 2165 mouse transcripts were similarly used to identify 909 mouse genes. This filtering method underestimates the number of genes with alternative regions in 5'UTRs but yields a high quality dataset based upon the alignment of full-length mRNAs with protein and start codon annotations. We refer to these sets of human and mouse genes as the ALT_5'UTR sets, because the 5'UTRs of these genes contain alternative (ALT) and constitutive (CONSTIT) nucleotides. By definition, alternative nucleotides are present in a fraction of the transcripts of the respective genes, whereas constitutive nucleotides are present in all of the genes transcripts.
Determination of the boundaries of 5'UTR and CDS
The 5'UTR and CDS boundaries for genes in the ALT_5'UTR set were determined by combining start and stop codon annotations from all isoforms that mapped to a gene. The 5'UTR boundaries for each gene were determined as follows: the upstream 5'UTR boundary was selected by choosing the most upstream isoform position; the downstream 5'UTR boundary was selected by choosing the most upstream start codon position from the set of isoforms that map to the gene (if more than one authentic start codon exists among the set of isoforms, the most upstream start codon position is chosen) (Figure 1). These conservative criteria ensure that protein-coding regions are excluded from the analysis of 5'UTR sequence variability. The exclusion of fragmentary sequence data is especially important in identifying the upstream 5'UTR boundary, as short non-coding RNA sequences are often positioned immediately upstream of the 5'-end of coding genes .
The CDS boundaries were determined as follows: the upstream CDS boundary was selected by choosing the most upstream start codon position from the set of isoforms; the downstream CDS boundary was selected by choosing the most upstream stop codon position from the set of isoforms that map to the gene (if more than one authentic stop codon exists, the most upstream stop codon position is chosen (Figure 1).
Detection of potential regulatory elements in mammalian 5'UTRs
Starting with 5'UTR leader sequences from the isoforms in the ALT_5'UTR set, we calculated the uAUG and uORF abundance in ALT and CONSTIT regions of these 5'UTR sequences. The uAUGs were identified by searching for AUG triplets within the 5'UTRs. The uAUGs were then mapped to the genomic sequence, where redundant elements (uAUGs from different isoforms that mapped to the same genomic coordinates) were removed from the final set. The uORFs were also detected in the 5'UTRs and similarly mapped to the genomic sequence. The uORFs were identified using the EMBOSS program getorf , with the following criteria: 1) uORFs must be at least 6 nucleotides in length, 2) a uORF must start with methionine and end with one of the three stop codons, and 3) uORFs are identified in all reading frames.
uAUGs and uORFs were classified as ALT or CONSTIT based upon their overlap with ALT and CONSTIT regions of the corresponding genomic sequence. Only uAUGs and uORFs that are fully contained within ALT and CONSTIT regions of 5'UTRs were included in this study (i.e., uAUGs and uORFs that overlapped between constitutive and alternative regions were excluded).
Identification of orthologous genes
Human-macaque (human version NCBI36; macaque version MMUL1) and mouse-rat (mouse version NCBI36; rat version RGSC3.4) orthologous gene pairs were downloaded from Ensembl using the BioMart data mining tool . 5'UTR alignments were generated using the OWEN alignment tool  with the following parameters: a P-value < 0.001 for each hit was required, and 5'UTRs were required to be bound at the 3' ends by exons that align across > 80% of length . For the beginning of the CDS, alignment of the nucleotide sequences was guided by the amino acid sequence alignment . Putative ortholog alignments were cleaned using previously reported thresholds . In cases where greater than 40% of the gaps or unannotated regions of orthologous sequences did not align, the orthologs were removed from the final set. For example, 5'UTRs of many macaque genes are not properly annotated (contain 'NNNNN'), making it difficult to identify the upstream 5'UTR boundary. The human-macaque alignments with uncertainties of this type were discarded. In all, in a total of ~2800 human-macaque and ~900 mouse-rat whole gene alignments were generated.
Comparison of substitution rates in coding and non-coding DNA
Synonymous (Ks) and non-synonymous (Ka) substitution rates for the uORFs located in the 5'UTR were calculated using the Pamilo-Bianchi-Li method [64, 65] which takes into account transition and transversion rates. Rates of divergence were calculated separately for uORFs located in constitutive and alternative regions of 5'UTR. We calculated Ka and Ks separately for all uORFs and for the subset of uORFs ≥ 30 nucleotides in length, to ensure that short uORFs did not bias the results. Substitution rates for the 5'-untranslated regions with uORFs (K5 uORFs) and without uORFs (K5 uORFs excluded) were calculated along the 5'UTRs for each gene using Kimura's two-parameter model .
Gene expression analysis
GNF Atlas2 expression data was downloaded from the UCSC genome browser (gnfAtlas2 table) to study gene expression patterns in human (Mar.2004 assembly) and mouse (Aug.2005 assembly) [67, 68]. GenBank mRNA accession IDs were used to map isoform sequences to probe data. Probe data were partitioned into two groups: the ALT_5'UTR sets, and the nonALT CONTROL sets, which contained the full complement of probe data. The nonALT CONTROL set consists of transcripts whose 5'UTRs have not been shown to undergo alternative events such as AS or ATI. Atlas2 expression data are classified into 79 distinct tissue types in human and 61 tissues types in mouse, and duplicate intensity values for each tissue type are included in the raw data. Replicates were combined by calculating the average intensity value for each tissue type. Average expression levels were calculated separately for individual probes and individual tissue types. Average probe expression levels were calculated by summing intensity values across all tissue types in the probe, and dividing by the total number of tissues. Average tissue expression levels were calculated by summing intensity values for a set of probes and dividing by the total number of probes in the dataset.
Gene expression levels based on EST abundance
Gene expression levels were also evaluated by tallying the numbers of gene-specific EST sequences in the databases. Transcript sequences from the ALT_5'UTR set and the nonALT control set were aligned with ESTs from the human normal tissue GenBank EST libraries (~8 millions of ESTs, release 071808) using the BLASTN program . EST hits with the identity more than 95% and longer than 80% of EST sequence length were accepted as matches. Gene expression levels based on EST abundance calculated for 57 normal human tissues  were used for statistical analysis. Similar tissue-specific preferences were considered for both the ALT_5'UTR and nonALT control sets in the final classification and statistical analysis. A Monte Carlo approximation of Fisher's exact test implemented in the COLLAPSE program  was used to assess the significance of the differences between the EST data for ALT_5'UTR and nonALT control sets.
Gene Ontology Annotation
Functional annotation for human and mouse was downloaded from the Gene Ontology (GO) database . Starting with a total of 16,468 annotated human genes, GO annotations were mapped to 89% (2608) of the genes in the ALT_5'UTR set. With 17,480 annotated mouse genes, GO annotations were mapped to 94% (861) of the genes in our ALT_5'UTR set. Keyword frequencies were tabulated for the ALT and ALL sets, and normalized by the total numbers in each set. P-values were calculated using the χ2 test.
The authors' research is supported by the DHHS (NIH, National Library of Medicine) intramural funds.
- Brett D, Hanke J, Lehmann G, Haase S, Delbruck S, Krueger S, Reich J, Bork P: EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 2000, 474 (1): 83-86. 10.1016/S0014-5793(00)01581-7.View ArticlePubMedGoogle Scholar
- Mironov AA, Fickett JW, Gelfand MS: Frequent alternative splicing of human genes. Genome Res. 1999, 9 (12): 1288-1293. 10.1101/gr.9.12.1288.PubMed CentralView ArticlePubMedGoogle Scholar
- Croft L, Schandorff S, Clark F, Burrage K, Arctander P, Mattick JS: ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nat Genet. 2000, 24 (4): 340-341. 10.1038/74153.View ArticlePubMedGoogle Scholar
- Kan Z, Rouchka EC, Gish WR, States DJ: Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 2001, 11 (5): 889-900. 10.1101/gr.155001.PubMed CentralView ArticlePubMedGoogle Scholar
- Modrek B, Resch A, Grasso C, Lee C: Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 2001, 29 (13): 2850-2859. 10.1093/nar/29.13.2850.PubMed CentralView ArticlePubMedGoogle Scholar
- Black DL: Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology. Cell. 2000, 103 (3): 367-370. 10.1016/S0092-8674(00)00128-8.View ArticlePubMedGoogle Scholar
- Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S: Increase of functional diversity by alternative splicing. Trends Genet. 2003, 19 (3): 124-128. 10.1016/S0168-9525(03)00023-4.View ArticlePubMedGoogle Scholar
- Resch A, Xing Y, Modrek B, Gorlick M, Riley R, Lee C: Assessing the impact of alternative splicing on domain interactions in the human proteome. J Proteome Res. 2004, 3 (1): 76-83. 10.1021/pr034064v.View ArticlePubMedGoogle Scholar
- Irvin-Wilson CV, Chaudhuri G: Alternative initiation and splicing in dicer gene expression in human breast cells. Breast Cancer Res. 2005, 7 (4): R563-569. 10.1186/bcr1043.PubMed CentralView ArticlePubMedGoogle Scholar
- Ji H, Zhang Y, Zheng W, Wu Z, Lee S, Sandberg K: Translational regulation of angiotensin type 1a receptor expression and signaling by upstream AUGs in the 5' leader sequence. J Biol Chem. 2004, 279 (44): 45322-45328. 10.1074/jbc.M407261200.View ArticlePubMedGoogle Scholar
- Pan YX: Diversity and complexity of the mu opioid receptor gene: alternative pre-mRNA splicing and promoters. DNA Cell Biol. 2005, 24 (11): 736-750. 10.1089/dna.2005.24.736.View ArticlePubMedGoogle Scholar
- Park EH, Lee JM, Pelletier J: The Tie2 5' untranslated region is inhibitory to 5' end-mediated translation initiation. FEBS Lett. 2006, 580 (5): 1309-1319. 10.1016/j.febslet.2006.01.049.View ArticlePubMedGoogle Scholar
- Zhang Y, Li W, Vore M: Translational regulation of rat multidrug resistance-associated protein 2 expression is mediated by upstream open reading frames in the 5' untranslated region. Mol Pharmacol. 2007, 71 (1): 377-383. 10.1124/mol.106.029793.View ArticlePubMedGoogle Scholar
- Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O: Automated classification of alternative splicing and transcriptional initiation and construction of visual database of classified patterns. Bioinformatics. 2006, 22 (10): 1211-1216. 10.1093/bioinformatics/btl067.View ArticlePubMedGoogle Scholar
- Zhang T, Haws P, Wu Q: Multiple variable first exons: a mechanism for cell- and tissue-specific gene regulation. Genome Res. 2004, 14 (1): 79-89. 10.1101/gr.1225204.PubMed CentralView ArticlePubMedGoogle Scholar
- Trinklein ND, Aldred SJ, Saldanha AJ, Myers RM: Identification and functional analysis of human transcriptional promoters. Genome Res. 2003, 13 (2): 308-312. 10.1101/gr.794803.PubMed CentralView ArticlePubMedGoogle Scholar
- Song KY, Hwang CK, Kim CS, Choi HS, Law PY, Wei LN, Loh HH: Translational repression of mouse mu opioid receptor expression via leaky scanning. Nucleic Acids Res. 2007, 35 (5): 1501-1513. 10.1093/nar/gkm034.PubMed CentralView ArticlePubMedGoogle Scholar
- Reynolds K, Zimmer AM, Zimmer A: Regulation of RAR beta 2 mRNA expression: evidence for an inhibitory peptide encoded in the 5'-untranslated region. J Cell Biol. 1996, 134 (4): 827-835. 10.1083/jcb.134.4.827.View ArticlePubMedGoogle Scholar
- Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV: Evolutionary conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic genes. Nucleic Acids Res. 2005, 33 (17): 5512-5520. 10.1093/nar/gki847.PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes TA, Brady HJ: Expression of axin2 is regulated by the alternative 5'-untranslated regions of its mRNA. J Biol Chem. 2005, 280 (9): 8581-8588. 10.1074/jbc.M410806200.View ArticlePubMedGoogle Scholar
- Newton DC, Bevan SC, Choi S, Robb GB, Millar A, Wang Y, Marsden PA: Translational regulation of human neuronal nitric-oxide synthase by an alternatively spliced 5'-untranslated region leader exon. J Biol Chem. 2003, 278 (1): 636-644. 10.1074/jbc.M209988200.View ArticlePubMedGoogle Scholar
- Jin X, Turcott E, Englehardt S, Mize GJ, Morris DR: The two upstream open reading frames of oncogene mdm2 have different translational regulatory properties. J Biol Chem. 2003, 278 (28): 25716-25721. 10.1074/jbc.M300316200.View ArticlePubMedGoogle Scholar
- Watatani Y, Ichikawa K, Nakanishi N, Fujimoto M, Takeda H, Kimura N, Hirose H, Takahashi S, Takahashi Y: Stress-induced translation of ATF5 mRNA is regulated by the 5'-untranslated region. J Biol Chem. 2008, 283 (5): 2543-2553. 10.1074/jbc.M707781200.View ArticlePubMedGoogle Scholar
- Matveeva OV, Shabalina SA: Intermolecular mRNA-rRNA hybridization and the distribution of potential interaction regions in murine 18S rRNA. Nucleic Acids Res. 1993, 21 (4): 1007-1011. 10.1093/nar/21.4.1007.PubMed CentralView ArticlePubMedGoogle Scholar
- Zavolan M, van Nimwegen E, Gaasterland T: Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Res. 2002, 12 (9): 1377-1385. 10.1101/gr.191702.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O: Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes. Gene. 2005, 364: 53-62. 10.1016/j.gene.2005.07.027.View ArticlePubMedGoogle Scholar
- Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S: Structural and functional features of eukaryotic mRNA untranslated regions. Gene. 2001, 276 (1–2): 73-81. 10.1016/S0378-1119(01)00674-6.View ArticlePubMedGoogle Scholar
- Rogozin IB, Kochetov AV, Kondrashov FA, Koonin EV, Milanesi L: Presence of ATG triplets in 5' untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon. Bioinformatics. 2001, 17 (10): 890-900. 10.1093/bioinformatics/17.10.890.View ArticlePubMedGoogle Scholar
- Lynch M, Scofield DG, Hong X: The evolution of transcription-initiation sites. Mol Biol Evol. 2005, 22 (4): 1137-1146. 10.1093/molbev/msi100.View ArticlePubMedGoogle Scholar
- Davuluri RV, Suzuki Y, Sugano S, Zhang MQ: CART classification of human 5' UTR sequences. Genome Res. 2000, 10 (11): 1807-1816. 10.1101/gr.GR-1460R.PubMed CentralView ArticlePubMedGoogle Scholar
- Pesole G, Gissi C, Grillo G, Licciulli F, Liuni S, Saccone C: Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs. Gene. 2000, 261 (1): 85-91. 10.1016/S0378-1119(00)00471-6.View ArticlePubMedGoogle Scholar
- Suzuki Y, Ishihara D, Sasaki M, Nakagawa H, Hata H, Tsunoda T, Watanabe M, Komatsu T, Ota T, Isogai T, et al: Statistical analysis of the 5' untranslated region of human mRNA using "Oligo-Capped" cDNA libraries. Genomics. 2000, 64 (3): 286-297. 10.1006/geno.2000.6076.View ArticlePubMedGoogle Scholar
- Iacono M, Mignone F, Pesole G: uAUG and uORFs in human and rodent 5'untranslated mRNAs. Gene. 2005, 349: 97-105. 10.1016/j.gene.2004.11.041.View ArticlePubMedGoogle Scholar
- Anant S, Mukhopadhyay D, Hirano K, Brasitus TA, Davidson NO: Apobec-1 transcription in rat colon cancer: decreased apobec-1 protein production through alterations in polysome distribution and mRNA translation associated with upstream AUGs. Biochim Biophys Acta. 2002, 1575 (1–3): 54-62.View ArticlePubMedGoogle Scholar
- Araud T, Genolet R, Jaquier-Gubler P, Curran J: Alternatively spliced isoforms of the human elk-1 mRNA within the 5' UTR: implications for ELK-1 expression. Nucleic Acids Res. 2007, 35 (14): 4649-4663. 10.1093/nar/gkm482.PubMed CentralView ArticlePubMedGoogle Scholar
- Kozak M: Pushing the limits of the scanning mechanism for initiation of translation. Gene. 2002, 299 (1–2): 1-34. 10.1016/S0378-1119(02)01056-9.View ArticlePubMedGoogle Scholar
- Child SJ, Miller MK, Geballe AP: Translational control by an upstream open reading frame in the HER-2/neu transcript. J Biol Chem. 1999, 274 (34): 24335-24341. 10.1074/jbc.274.34.24335.View ArticlePubMedGoogle Scholar
- Shabalina SA, Zaykin DV, Gris P, Ogurtsov AY, Gauthier J, Shibata K, Tchivileva IE, Belfer I, Mishra B, Kiselycznyk C, et al: Expansion of the human mu-opioid receptor gene architecture: novel functional variants. Hum Mol Genet. 2009, 18 (6): 1037-1051. 10.1093/hmg/ddn439.PubMed CentralView ArticlePubMedGoogle Scholar
- Kleene KC: Patterns, mechanisms, and functions of translation regulation in mammalian spermatogenic cells. Cytogenet Genome Res. 2003, 103 (3–4): 217-224.PubMedGoogle Scholar
- Osada N, Hirata M, Tanuma R, Kusuda J, Hida M, Suzuki Y, Sugano S, Gojobori T, Shen CK, Wu CI, et al: Substitution rate and structural divergence of 5'UTR evolution: comparative analysis between human and cynomolgus monkey cDNAs. Mol Biol Evol. 2005, 22 (10): 1976-1982. 10.1093/molbev/msi187.View ArticlePubMedGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456 (7221): 470-476. 10.1038/nature07509.PubMed CentralView ArticlePubMedGoogle Scholar
- Neafsey DE, Galagan JE: Dual modes of natural selection on upstream open reading frames. Mol Biol Evol. 2007, 24 (8): 1744-1751. 10.1093/molbev/msm093.View ArticlePubMedGoogle Scholar
- Cvijovic M, Dalevi D, Bilsland E, Kemp GJ, Sunnerhagen P: Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation. BMC Bioinformatics. 2007, 8: 295-10.1186/1471-2105-8-295.PubMed CentralView ArticlePubMedGoogle Scholar
- Hayden CA, Jorgensen RA: Identification of novel conserved peptide uORF homology groups in Arabidopsis and rice reveals ancient eukaryotic origin of select groups and preferential association with transcription factor-encoding genes. BMC Biol. 2007, 5: 32-10.1186/1741-7007-5-32.PubMed CentralView ArticlePubMedGoogle Scholar
- Oyama M, Kozuka-Hata H, Suzuki Y, Semba K, Yamamoto T, Sugano S: Diversity of translation start sites may define increased complexity of the human short ORFeome. Mol Cell Proteomics. 2007, 6 (6): 1000-1006. 10.1074/mcp.M600297-MCP200.View ArticlePubMedGoogle Scholar
- Artamonova , Gelfand MS: Comparative genomics and evolution of alternative splicing: the pessimists' science. Chem Rev. 2007, 107 (8): 3407-3430. 10.1021/cr068304c.View ArticlePubMedGoogle Scholar
- Ermakova EO, Nurtdinov RN, Gelfand MS: Fast rate of evolution in alternatively spliced coding regions of mammalian genes. BMC Genomics. 2006, 7: 84-10.1186/1471-2164-7-84.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen FC, Wang SS, Chen CJ, Li WH, Chuang TJ: Alternatively and constitutively spliced exons are subject to different evolutionary forces. Mol Biol Evol. 2006, 23 (3): 675-682. 10.1093/molbev/msj081.View ArticlePubMedGoogle Scholar
- Xing Y, Lee C: Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc Natl Acad Sci USA. 2005, 102 (38): 13526-13531. 10.1073/pnas.0501213102.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramensky VE, Nurtdinov RN, Neverov AD, Mironov AA, Gelfand MS: Positive selection in alternatively spliced exons of human genes. Am J Hum Genet. 2008, 83 (1): 94-98. 10.1016/j.ajhg.2008.05.017.PubMed CentralView ArticlePubMedGoogle Scholar
- Xing Y, Lee C: Alternative splicing and RNA selection pressure – evolutionary consequences for eukaryotic genomes. Nat Rev Genet. 2006, 7 (7): 499-509. 10.1038/nrg1896.View ArticlePubMedGoogle Scholar
- Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA: Selection for short introns in highly expressed genes. Nat Genet. 2002, 31 (4): 415-418.PubMedGoogle Scholar
- Kozak M: An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 1987, 15 (20): 8125-8148. 10.1093/nar/15.20.8125.PubMed CentralView ArticlePubMedGoogle Scholar
- Geballe AP, Morris DR: Initiation codons within 5'-leaders of mRNAs as regulators of translation. Trends Biochem Sci. 1994, 19 (4): 159-164. 10.1016/0968-0004(94)90277-1.View ArticlePubMedGoogle Scholar
- Matsui M, Yachie N, Okada Y, Saito R, Tomita M: Bioinformatic analysis of post-transcriptional regulation by uORF in human and mouse. FEBS Lett. 2007, 581 (22): 4184-4188. 10.1016/j.febslet.2007.07.057.View ArticlePubMedGoogle Scholar
- Thanaraj TA, Stamm S, Clark F, Riethoven JJ, Le Texier V, Muilu J: ASD: the Alternative Splicing Database. Nucleic Acids Res. 2004, D64-69. 10.1093/nar/gkh030. 32 DatabaseGoogle Scholar
- Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermuller J, Hofacker IL, et al: RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007, 316 (5830): 1484-1488. 10.1126/science.1138341.View ArticlePubMedGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.View ArticlePubMedGoogle Scholar
- Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E: EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004, 14 (1): 160-169. 10.1101/gr.1645104.PubMed CentralView ArticlePubMedGoogle Scholar
- Ogurtsov AY, Roytberg MA, Shabalina SA, Kondrashov AS: OWEN: aligning long collinear regions of genomes. Bioinformatics. 2002, 18 (12): 1703-1704. 10.1093/bioinformatics/18.12.1703.View ArticlePubMedGoogle Scholar
- Kondrashov AS, Shabalina SA: Classification of common conserved sequences in mammalian intergenic regions. Hum Mol Genet. 2002, 11 (6): 669-674. 10.1093/hmg/11.6.669.View ArticlePubMedGoogle Scholar
- Shabalina SA, Ogurtsov AY, Lipman DJ, Kondrashov AS: Patterns in interspecies similarity correlate with nucleotide composition in mammalian 3'UTRs. Nucleic Acids Res. 2003, 31 (18): 5433-5439. 10.1093/nar/gkg751.PubMed CentralView ArticlePubMedGoogle Scholar
- Shabalina SA, Ogurtsov AY, Rogozin IB, Koonin EV, Lipman DJ: Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids Res. 2004, 32 (5): 1774-1782. 10.1093/nar/gkh313.PubMed CentralView ArticlePubMedGoogle Scholar
- Li WH: Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol. 1993, 36 (1): 96-99. 10.1007/BF02407308.View ArticlePubMedGoogle Scholar
- Pamilo P, Bianchi NO: Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. Mol Biol Evol. 1993, 10 (2): 271-281.PubMedGoogle Scholar
- Kimura M: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980, 16 (2): 111-120. 10.1007/BF01731581.View ArticlePubMedGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, et al: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101 (16): 6062-6067. 10.1073/pnas.0400782101.PubMed CentralView ArticlePubMedGoogle Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, et al: The UCSC Genome Browser Database. Nucleic Acids Res. 2003, 31 (1): 51-54. 10.1093/nar/gkg129.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Ogurtsov AY, Marino-Ramirez L, Johnson GR, Landsman D, Shabalina SA, Spiridonov NA: Expression patterns of protein kinases correlate with gene architecture and evolutionary rates. PLoS ONE. 2008, 3 (10): e3599-10.1371/journal.pone.0003599.PubMed CentralView ArticlePubMedGoogle Scholar
- Khromov-Borisov NN, Rogozin IB, Pegas Henriques JA, de Serres FJ: Similarity pattern analysis in mutational distributions. Mutat Res. 1999, 430 (1): 55-74.View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.