Codon usage patterns in Chinese bayberry (Myrica rubra) based on RNA-Seq data
© Feng et al.; licensee BioMed Central Ltd. 2013
Received: 23 July 2013
Accepted: 21 October 2013
Published: 25 October 2013
Codon usage analysis has been a classical topic for decades and has significances for studies of evolution, mRNA translation, and new gene discovery, etc. While the codon usage varies among different members of the plant kingdom, indicating the necessity for species-specific study, this work has mostly been limited to model organisms. Recently, the development of deep sequencing, especial RNA-Seq, has made it possible to carry out studies in non-model species.
RNA-Seq data of Chinese bayberry was analyzed to investigate the bias of codon usage and codon pairs. High frequency codons (AGG, GCU, AAG and GAU), as well as low frequency ones (NCG and NUA codons) were identified, and 397 high frequency codon pairs were observed. Meanwhile, 26 preferred and 141 avoided neighboring codon pairs were also identified, which showed more significant bias than the same pairs with one or more intervening codons. Codon patterns were also analyzed at the plant kingdom, organism and gene levels. Changes during plant evolution were evident using RSCU (relative synonymous codon usage), which was even more significant than GC3s (GC content of 3rd synonymous codons). Nine GO categories were differentially and independently influenced by CAI (codon adaptation index) or GC3s, especially in 'Molecular function’ category. Within a gene, the average CAI increased from 0.720 to 0.785 in the first 50 codons, and then more slowly thereafter. Furthermore, the preferred as well as avoided codons at the position just following the start codon AUG were identified and discussed in relation to the key positions in Kozak sequences.
A comprehensive codon usage Table and number of high-frequency codon pairs were established. Bias in codon usage as well as in neighboring codon pairs was observed, and the significance of this in avoiding DNA mutation, increasing protein production and regulating protein synthesis rate was proposed. Codon usage patterns at three levels were revealed and the significance in plant evolution analysis, gene function classification, and protein translation start site predication were discussed. This work promotes the study of codon biology, and provides some reference for analysis and comprehensive application of RNA-Seq data from other non-model species.
KeywordsRNA-Seq Myrica rubra Chinese bayberry Codon usage Codon pairs Plant evolution Gene ontology classification Translation rate Gene discovery
Triplet codons are central to all biological kingdoms, acting as basic coding units or indispensable recognition components in mRNAs to either code for a particular amino acid or cause initiation or termination of a protein chain. Often the same amino acids are encoded by multiple synonymous codons, ranging from two to six, except Met and Trp . Although synonymous mutations are silent in protein sequences according to the central dogma, synonymous codon bias exists widely within and between genomes . As a result, the study of codon usage patterns is beneficial for a better understanding of molecular biology and evolution, mRNA translation, and design of transgenes, new gene discovery, and other biological applications, and has been investigated over several decades [3–6].
The foundation of codon biology is based on the study of full length ORF (open reading frame) sequences from a range of species such as Caenorhabditis, Drosophila, Arabidopsis, Populus, apple  kiwifruit , and melon , which have been obtained mainly from EST technology in recent decades. To date, with the rapid development of deep sequencing technology, large amounts of sequence data have been generated through genome sequencing or RNA-Seq, providing data for a new focus on codon usage patterns [1, 2, 4, 6]. The extensive sequence research in plants has mainly focused on model plant genomes or millions of EST data, from plants such as Arabidopis[3, 11, 12], rice [12–15], Populus[16–19] and citrus , etc. Similar studies in non-model plants have been neglected, despite the existence of sequence data assembled from RNA-Seq. Further research and analysis in this area can aid the understanding of breeding of crops.
Chinese bayberry (Myrica rubra Sieb. and Zucc.), is an economically important subtropical fruit crop native to Asian countries . The fruit is popular throughout China and overseas for its appealing color, distinctive flavor and various bioactive compounds [22, 23]. Physiological studies on this plant have been carried out extensively during the last ten years [24–26], and recent research at the molecular level has also been initiated [27–30], especially in the spatio-temporal expression, transcriptional regulatory and functional verification of genes related to anthocyanin [31–34]. However, there is no report on codon usage patterns in Chinese bayberry.
The RNA-Seq project on Chinese bayberry (Accession: PRJNA77861) has been completed, and the data (Accession: SRX176533) made public in our previous study . In the present work, bayberry codon usage was calculated from full length sequences assembled by RNA-Seq. Furthermore, related patterns were revealed mainly through RSCU (relative synonymous codon usage) and CAI (codon adaptation index) at three levels, i.e., across different groups in the plant kingdom, different genes in Chinese bayberry, and different positions in the genes. These analyses will help us to understand the patterns in Chinese bayberry, to improve the research on codon usage in plant biology, and the potential for application of deep sequencing, especially in non-model plants.
Results and discussion
Codon usage in Chinese bayberry
Codon usage analysis in Chinese bayberry was based on 1,066 full-length ORF sequences after layers of filtering of 31,665 mRNAs, which were assembled from our previous RNA-Seq data. The overall codon usage Table was created from 354,551 codons, with each codon, excepting stop codons, represented at least 2,216 times (Additional file 1). This amount of data is larger than those used in studies of Populus, apple  and kiwifruit .
The overall GC content of 354,551 codons in the study is 0.477, but it varies in different codon positions, with the highest in GC1 (GC content of 1st nucleotide in codon, with value at 0.536), lowest in GC2 (GC content of 2nd nucleotide in codon, with value at 0.411), and intermediate in GC3 (GC content of 3rd nucleotide in codon, with value at 0.484), which is consistent with observations in other plants, such as citrus , apple, woodland strawberry, Arabidopsis thaliana, etc. (Additional file 2). The GC3s content (GC content of 3rd synonymous codons) of Chinese bayberry is 0.447, similar to but a little smaller than GC3 content, because Met and Trp encoding codons (AUG and UGG, with G for 3rd nucleotide), are included in the calculation of GC3 content but not GC3s. GC3s content of Chinese bayberry is similar to the range in Eudicotyledons (Additional file 2).
The RSCU of 64 codons were calculated. AGG and GCU, encoding Arg and Ala, had the highest values (1.67 and 1.53, respectively). AAG and GAU, encoding Lys and Asp, were used more frequently than the synonymous codon for the corresponding amino acids (63.0% and 61.5%, respectively). These four codons have been named high-frequency codons in our study (Figure 1).
Four NCG codons in Chinese bayberry had quite low RSCU (0.44, 0.51, 0.54 and 0.61), which is beneficial for avoiding possible mutation caused by DNA methylation. Because methylated cytosine (C) in the CG dinucleotide is more easily deaminated into thymine (T), and the G in the 3rd codon position is wobbly, therefore the species with a high level of DNA methylation have a tendency to avoid NCG codons to avoid mutation [7, 10]. The low RSCU of NCG codons indicate that Chinese bayberry may be a species with a relative high methylation level, which is confirmed by the NCG:NCC ratio. This index has been widely used to estimate CpG suppression, and to reflect the methylation level in mRNA coding sequences, especially in Eudicotyledons, such as Populus, apple , kiwifruit  and melon . Species with a low methylation level have a relatively higher NCG:NCC ratio, such as Arabidopsis thaliana (0.921), A. lyrata (0.93), whereas those with a high methylation level have a relatively lower value, such as grape (0.414), Populus (0.463), while intermediate methylation species have intermediate values, such as apple (0.639), tomato (0.634) (Additional file 2). Chinese bayberry has a relatively low NCG:NCC ratio (0.552), which suggests that it is an organism with a relatively higher methylation level. Moreover, a recent report showed that methylation changes have a regulatory effect on tomato ripening , so methylation may also play an important regulatory role in Chinese bayberry.
Four NUA codons also have low RSCU (0.5, 0.51, 0.52, and 0.66) (Figure 1), which was also observed in other plant species. This phenomenon can be explained by the hypothesis that reducing UA may increase protein production via inhibition of mRNA degradation .
As to stop codons, UGA was the most frequently used, with RSCU of 1.39, UAG was the least used stop codon with the RSCU 0.67, and UAA was intermediate with the RSCU 0.94, which coincided with the overall rules discerned for plants .
Codon pairs in Chinese bayberry
Moreover, among 141 avoided neighboring codon pairs, 88 pairs (62.4%) showed a pattern where the former codon ended with C and the later codon started with G (Additional file 3), which may relate to a relatively higher methylation level of bayberry DNA, as mentioned above. On the other hand, 37 pairs (26.2%) had UA at the junction (Additional file 3), which may also increase the rate of protein production . These two types were also underrepresented compared to others in overall neighboring codon pairs (Figure 2J), as were mononucleotide repeats, GGGGGG, CCCCCC, UUUUUU. The list of avoided codon pairs in Chinese bayberry is a little different from other species studied [39, 40] and could play an important role in transgene design of exogenous genes, especially for Ser, Arg, Leu, Phe and Gly codons.
It is widely reported that transforming synonymous codons can significantly influence translational efficiency [2, 41, 42] and, indeed, a recent case in tomato showed that codon optimization of the MIR gene could enhance its expression . Thus, the large-scale identification of high-frequency codons in Chinese bayberry, especially the high-frequency codon pairs (Additional file 4), could be used as a reference in design of exogenous transgenes. Moreover, codon optimization based on the frequency of codon pairs, rather than just high frequency codons, may further promote translational efficiency.
Codon usage patterns across the plant kingdom
In recent years, many plant genome sequencing projects have been completed and JGI (DOE Joint Genome Institute, version 9), the biggest integrated genome data platform, has a uniform analysis and storage format. In this study, the annotation data of 26 plants, consisting of 5 Chlorophytes (Algae), 1 Bryophyte, 1 Pteridophyte, 5 Monocotyledons and 14 Eudicotyledons, were downloaded and used for codon analysis.
For each plant genome, thousands of full-length ORFs (5,986 to 64,902) and millions of synonymous codons (2,587,991 to 27,829,277) were obtained, and the corresponding GC1, GC2 and GC3 contents were calculated (Additional file 2). In all 27 species (including Chinese bayberry), GC1 content was much larger than GC2 content, with difference value between 0.096 (Medicago truncatula) and 0.155 (Micromonas pusilla RCC299). GC3 content was a little higher than GC1 content in the Gymnosperm, Monocotyledons and Chlorophyte species in this study, while GC3 content was similar to GC2 content in Physcomitrella patens and 15 Eudicotyledons. This indicated that some pressure existed to select G/C in position 1, T/A in position 2, with significant wide variation in position 3.
Of course, this evaluation indicator needs further development, for Physcomitrella patens (Bryophytes) and Selaginella moellendorffii (Pteridophytes) have similar GC3s values with Linum usitatissimum (Eudicotyledons) and Brachypodium distachyon (Monocotyledons), respectively. However, it is possible that the clustering can improve when genome sequencing data for more species of Bryophytes and Pteridophytes are available and can be applied in this analysis (Additional file 2, Figure 5A).
Codon usage patterns across Chinese bayberry transcripts
Each Chinese bayberry full length ORF sequence was analyzed to discover the patterns of codon usage within a single RNA. Based on RSCU for 59 synonymous codons among 1,066 ORF sequences, correspondence analysis of codon usage (RSCU) and sequences were performed using PCA (Additional file 5). 30 A/U-ending codons and 29 G/C-ending codons were separated into two groups with respect to the first two axes. Meanwhile, 1,066 ORF sequences with different GC3s content could also be separated mainly along the first axis. This result indicates some correlation between codon usage and GC3s among Chinese bayberry ORF sequences, and similar results have also been reported in other plants, such as rice . However, the percentage of contribution of the axes is somewhat low, and may be improved if the genome sequences of Chinese bayberry are available in future.
Moreover, CAI, another important index of codon usage bias, was introduced to estimate synonymous codon usage bias for each ORF sequence, and a strong negative correlation (r = -0.567, p = 1.03E-91) between CAI and GC3s was observed (Figure 6B).
Codon usage patterns across different positions in each gene
The average CAI of the codon following the start codon in all the 1,066 ORFs was 0.720, much smaller than other positions (Figure 8A). In further analysis, the nucleotides (or codons) following the start codon or non-start codon AUG were compared to understand the significance of this phenomenon (Figure 8C - F). It was found that 'G’ is the preferred nucleotide following the start codon AUG (Figure 8E). This 'G’, together with 'A/G’ just preceding the start codon AUG, are the key positions in the Kozak sequence for identification of the translation start site (Figure 8G) , while a 'C’ following the start codon AUG tends to be less favored (Figure 8E) as our data confirms, with 15 of 16 CNN codons at this position under-represented compared to other positions. However, CCG represented 2.5 fold of expected frequency. Besides, the three other NCG codons are also over-represented (1.63, 4.03 and 11.79 fold) following the start codon AUG. On the whole, the NCG:NCC ratio is 2.23, four fold of the overall ratio, suggesting that this site maybe one of the most important for methylation regulation in this species. Meanwhile, 8 preferred and 16 avoided codons were observed for the codon following the start codon AUG (Figure 8C). CCG, one of the 8 preferred codons, as mentioned above, having the lowest expected frequency among codons encoding Pro, was over-abundant, while the other three codons encoding Pro occurred less frequently than expected. Similarly, two out of six Ser codons with low expected frequency were over-represented as well, while two out of three Ile codons with high expected frequency were under-represented. At the amino acid level, it was found that Ala was the most preferred amino acid following the initiating Met, while many other amino acids, such as Cys, Ile, Leu, His, Met and Trp, tended to be less favored (Additional file 3). In contrast, no bias was observed in the codons following internal AUG codons (Figure 8D). This finding could be further used for construction of an algorithm to predict the TSS (translation start site) of a gene, and therefore could aid gene characterization in Chinese bayberry, and provide a reference for other species.
A comprehensive codon usage Table in Chinese bayberry was established and the numbers of high-frequency codon pairs were analyzed. Underrepresentation of codons NCG and NUA was observed, which may have the function of avoiding the mutation caused by DNA methylation and increasing protein production, respectively. Prominent bias on neighboring codon pairs was also found, indicating possible mechanistic significance in regulating protein synthesis rate. Codon usage patterns were comprehensively analyzed at plant kingdom, organism, and gene levels. It was found that RSCU is strongly related to plant evolution, and is even more significant than GC3s. At the species level, nine GO categories were differentially and independently influenced by CAI (codon adaptation index) or GC3s. Within a gene, CAI increased from the beginning to the end of an ORF, especially during the first 50 codons, which may be beneficial for translation efficiency. The codons following the start codon AUG have the lowest CAI and greatest bias, which is related to a special translation start recognition motif of Kozak sequence ( A/G NNAUG G ). This feature may play an important role in prediction of the translation start site and discovery of new genes (or transcripts). These findings established knowledge of the codon patterns of Chinese bayberry, provided additional information for the study of codon biology, and a reference for comprehensive analysis and application of RNA-Seq data to other non-model species.
Sequence data collection, filtering and mining
The dataset is comprised of two main parts, firstly, RNA-Seq data of Chinese bayberry which was downloaded from NCBI SRA (Sequence Read Archive) database (http://www.ncbi.nlm.nih.gov/Traces/sra/, Accession No.: SRX176533), and further assembly and annotation as in our previous work . Secondly, protein-coding sequences (*_cds.fa.gz and *_protein.fa.gz) from 26 of the published plant genomes were downloaded from JGI (ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0) on Jan 16th, 2013.
Full length coding sequences were identified, beginning with an AUG start codon, ending with UAA, UAG or UGA stop codon. From these, low quality sequences, i.e., sequences of a length no more than 300 bp, or those having an internal stop codon, were excluded. Then additional filtering steps were used to remove low quality sequences mined from bayberry RNA-Seq. The sequences containing uncertain nucleotides or encoding low abundance genes, those with RPKM (reads per kb per million reads) value lower than 10, and those obviously incomplete or too long (less than 95% or over 105% when compared to the length of top hit homologous sequences from other plants using BLASTx with an e-value cutoff of 1e-5) were excluded. All the above procedures were performed with Microsoft Excel 2010 and some PERL scripts written in-house.
The first 50 codons (excluding the start codon) and the last 50 codons (excluding the stop codon) in each gene obtained via filtering were named as cod_1 to cod_50, and cod_-1 to cod_-50, respectively. Codons with the same name were mixed, stored in Fasta format, and then the new rearranged sequences were structured to represent the situation of codon distribution at different positions. All the above procedures were performed using Microsoft Excel 2010 and PERL scripts written in-house.
After filtering and rearranging, sequences were used to calculate the basic index of codon usage, such as the nucleotide composition at the 3rd codon position, the codon number, RSCU, and ENC, using codonW 1.4.2 (http://codonw.sourceforge.net). RSCU is calculated according to the formula described in Sharp and Li . Codons with RSCU over 1.0 occur at high frequency and the larger the number the more significant the bias, while numbers below 1.0 indicates the opposite.
ENC is calculated according to the formula described in Wright , and it is a measure of the unevenness of use of codons for all the 20 amino acids across ORFs, with the value between 20 and 61. The value is 20 when only one special codon is used for each amino acid, while it is 61 when all the codons are equally used for each amino acid.
CAI is calculated using the formula described in Sharp and Li , and is widely applied in estimating codon usage bias, with the value between 0 and 1. The larger the value the greater the degree of positive selection, while the lower the value the greater the degree of negative bias.
Identification of high-frequency codons (or codon pairs)
From the calculation of RSCU of all the full length protein-coding sequences, codons with RSCU over 1.5, or those having a relative frequency above 60% of synonymous codon for the corresponding amino acids, were selected and defined as high-frequency codons [19, 52]. The concept of high-frequency codon pairs is as follows. For neighboring amino acids, codon X is the first amino acid, and candidate codons Y1 to Yi is the second amino acid, where i is equal to the number of synonymous codons for the second amino acid. If the occurrence of neighboring Codon XYj (1 ≤ j ≤ i) is over 1.5 fold above the average occurrence of neighboring Codon XY1 to XYi, or takes over 60% of total occurrence of neighboring Codon XY1 to XYi, calculating from data from full length ORFs, excluding the first and stop codons, then Codon XYj is called a high-frequency codon pair. Identification of the high-frequency codon pairs was performed using PERL scripts written in-house and the software Cytoscape (version 3.0.1, http://www.cytoscape.org/) .
Identification of preferred and avoided codons (or codon pairs)
The expected frequency is the ratio of the total occurrence of a certain codon to the total occurrence of all 61 codons (excluding the stop codons and the AUG when serving as the start codon), calculating from the ORFs (excluding the first and the last codons). The observed frequency is the ratio of the actual occurrence of a certain codon in a certain position of all ORFs to the total occurrence of all 61 codons in that position. The frequencies of the codons following the start codon AUG, and those following non-start internal AUG codon were also calculated. Frequencies with p-values less than 0.01 were considered statistically significant, and the ratio of observed frequency to expected frequency (log2), with a ratio cutoff of ±1 (2 fold changes), was the standard used to identify the preferred or avoided codon pairs. P-value was calculated following the formula described in Audic and Claverie  via our previous PERL program .
The expected frequency of 3721 (61*61) codon pairs (both the neighboring codon pairs and those separated by several intervening codons) is the product of the corresponding expected frequencies of each codon. The observed frequency of codon pairs is the ratio of occurrence of a certain pair to occurrence of all 3721 codon pairs, calculating from full length ORFs excluding the first and stop codons, as mentioned above. The parameters for screening preferred and avoided codon pairs were the same as described above. All procedures were performed using Microsoft Excel 2010 and PERL scripts written in-house.
Cluster and PCA analysis
The RSCU of 59 codons within synonyms from Chinese bayberry and 26 other plants were calculated by complete linkage clustering with Euclidean distance using Mev v4.8.1  (http://sourceforge.net/projects/mev-tm4/files/mev-tm4/) software. PCA of these 27 plants were performed based on RSCU of 59 synonymous codons, meanwhile variance contribution ratio, accumulated variance contribution ratio and quality of representation (cos2α, α is the angle of spot vector and axial vector) were calculated using MATLAB (version 7.0) and drawn by OriginLab Origin (version 8.0, Microcal Software Inc., Northampton, MA, USA). RSCU of 59 synonymous codons from all the bayberry full length ORFs (59 spots) were reduced from 1,066 dimensions (1,066 ORFs) into two principal components by PCA method, while for the transposed matrix, all the full length ORFs with RSCU value of 59 synonymous codons (1,066 spots) were reduced from 59 dimensions (59 codons) into two principal components, using the same procedure.
Gene ontology annotation
Gene Ontology annotation of full length ORFs was performed using Blast2GO (http://www.blast2go.com) , and GO classifications were compared among different groups according to CAI or GC3s variation using WEGO (http://wego.genomics.org.cn/cgi-bin/wego/index.pl) .
Identification of 13 consensus nucleotides (n9AUGn)
The 13 nucleotides (including 9 nucleotides before the start codon, 3 nucleotides of the start codon and 1 nucleotide following the start codon) of each mRNAs were picked out via PERL scripts written in-house, where N represents an unknown nucleotide. The consensus motif was created using WEBLOGO (http://weblogo.berkeley.edu) .
The correlation analysis between CAI and GC3s, ENC and GC3s, ENC and CAI, or between CAI and codon position, was performed and drawn by OriginLab Origin (version 8.0, Microcal Software Inc., Northampton, MA, USA) software. Meanwhile, the corresponding parameters, such as linear regression equation, r value, p value, were calculated using MATLAB (version 7.0). Other simple statistical analyses were performed mainly in Microsoft Excel 2010.
Codon adaptation index
Effective number of codons
GC content of 1st nucleotide in codon
GC content of 2nd nucleotide in codon
GC content of 3rd nucleotide in codon
GC content at 3rd nucleotide of synonymous codon
DOE joint genome institute
Open reading frame
Principal component analysis
Reads per kb per million reads
Relative synonymous codon usage
Translation start site.
We would like to thank Prof. Don Grierson from the University of Nottingham (UK) for his kind discussion, suggestions, and efforts in language editing. We are also grateful to Dr. Zhang-jun Fei from Cornell University (USA) for his valuable suggestions on data analysis. This research was supported by the National High Technology Research and Development Program of China (2013AA102606), the Special Scientific Research Fund of Agricultural Public Welfare Profession of China (201203089–2), the Program of International Science and Technology Cooperation (2011DFB31580), the Science and Technology Project of Zhejiang Province (2012C12904-3), the 111 project, and the Fundamental Research Funds for the Central Universities.
- Novoa EM, Pavon-Eternod M, Pan T, Ribas de Pouplana L: A role for tRNA modifications in genome structure and codon usage. Cell. 2012, 149 (1): 202-213. 10.1016/j.cell.2012.01.050.View ArticlePubMedGoogle Scholar
- Plotkin JB, Kudla G: Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet. 2011, 12 (1): 32-42. 10.1038/nrg2899.PubMed CentralView ArticlePubMedGoogle Scholar
- Duret L, Mouchiroud D: Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci U S A. 1999, 96 (8): 4482-4487. 10.1073/pnas.96.8.4482.PubMed CentralView ArticlePubMedGoogle Scholar
- Hershberg R, Petrov DA: Selection on codon bias. Annu Rev Genet. 2008, 42: 287-299. 10.1146/annurev.genet.42.110807.091442.View ArticlePubMedGoogle Scholar
- Murray EE, Lotzer J, Eberle M: Codon usage in plant genes. Nucleic Acids Res. 1989, 17 (2): 477-498. 10.1093/nar/17.2.477.PubMed CentralView ArticlePubMedGoogle Scholar
- Shabalina SA, Spiridonov NA, Kashina A: Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res. 2013, 41 (4): 2073-2094. 10.1093/nar/gks1205.PubMed CentralView ArticlePubMedGoogle Scholar
- Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH: A Populus EST resource for plant functional genomics. Proc Natl Acad Sci U S A. 2004, 101 (38): 13951-13956. 10.1073/pnas.0401641101.PubMed CentralView ArticlePubMedGoogle Scholar
- Newcomb RD, Crowhurst RN, Gleave AP, Rikkerink EH, Allan AC, Beuning LL, Bowen JH, Gera E, Jamieson KR, Janssen BJ: Analyses of expressed sequence tags from apple. Plant Physiol. 2006, 141 (1): 147-166. 10.1104/pp.105.076208.PubMed CentralView ArticlePubMedGoogle Scholar
- Crowhurst RN, Gleave AP, MacRae EA, Ampomah-Dwamena C, Atkinson RG, Beuning LL, Bulley SM, Chagne D, Marsh KB, Matich AJ: Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening. BMC Genomics. 2008, 9: 351-10.1186/1471-2164-9-351.PubMed CentralView ArticlePubMedGoogle Scholar
- Gonzalez-Ibeas D, Blanca J, Roig C, Gonzalez-To M, Pico B, Truniger V, Gomez P, Deleu W, Cano-Delgado A, Arus P: MELOGEN: an EST database for melon functional genomics. BMC Genomics. 2007, 8: 306-10.1186/1471-2164-8-306.PubMed CentralView ArticlePubMedGoogle Scholar
- Qiu S, Zeng K, Slotte T, Wright S, Charlesworth D: Reduced efficacy of natural selection on codon usage bias in selfing Arabidopsis and Capsella species. Genome Biol Evol. 2011, 3: 868-880. 10.1093/gbe/evr085.PubMed CentralView ArticlePubMedGoogle Scholar
- Mukhopadhyay P, Basak S, Ghosh TC: Differential selective constraints shaping codon usage pattern of housekeeping and tissue-specific homologous genes of rice and arabidopsis. DNA Res. 2008, 15 (6): 347-356. 10.1093/dnares/dsn023.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang HC, Hickey DA: Rapid divergence of codon usage patterns within the rice genome. BMC Evol Biol. 2007, 7 (Suppl 1): S6-10.1186/1471-2148-7-S1-S6.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu QP: Mutational bias and translational selection shaping the codon usage pattern of tissue-specific genes in rice. PLoS One. 2012, 7 (10): e48295-10.1371/journal.pone.0048295.PubMed CentralView ArticlePubMedGoogle Scholar
- Mukhopadhyay P, Basak S, Ghosh TC: Nature of selective constraints on synonymous codon usage of rice differs in GC-poor and GC-rich genes. Gene. 2007, 400 (1–2): 71-81.View ArticlePubMedGoogle Scholar
- Ingvarsson PK: Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol Biol Evol. 2007, 24 (3): 836-844.View ArticlePubMedGoogle Scholar
- Ingvarsson PK: Molecular evolution of synonymous codon usage in Populus. BMC Evol Biol. 2008, 8: 307-10.1186/1471-2148-8-307.PubMed CentralView ArticlePubMedGoogle Scholar
- Ingvarsson PK: Natural selection on synonymous and nonsynonymous mutations shapes patterns of polymorphism in Populus tremula. Mol Biol Evol. 2010, 27 (3): 650-660. 10.1093/molbev/msp255.View ArticlePubMedGoogle Scholar
- Zhou M, Tong CF, Shi JS: Analysis of codon usage between different poplar species. J Genet Genomics. 2007, 34 (6): 555-561. 10.1016/S1673-8527(07)60061-7.View ArticlePubMedGoogle Scholar
- Ahmad T, Sablok G, Tatarinova TV, Xu Q, Deng XX, Guo WW: Evaluation of codon biology in Citrus and Poncirus trifoliata based on genomic features and frame corrected expressed sequence tags. DNA Res. 2013, 20 (2): 135-150. 10.1093/dnares/dss039.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen KS, Xu CJ, Zhang B, Ferguson IB: Red bayberry: botany and horticulture. Hortic Rev. 2004, 30: 83-114.Google Scholar
- Zhang B, Kang MX, Xie QP, Xu B, Sun CD, Chen KS, Wu YL: Anthocyanins from Chinese bayberry extract protect beta cells from oxidative stress-mediated injury via HO-1 upregulation. J Agric Food Chem. 2011, 59 (2): 537-545. 10.1021/jf1035405.View ArticlePubMedGoogle Scholar
- Sun CD, Zhang B, Zhang JK, Xu CJ, Wu YL, Li X, Chen KS: Cyanidin-3-glucoside-rich extract from Chinese bayberry fruit protects pancreatic beta cells and ameliorates hyperglycemia in streptozotocin-induced diabetic mice. J Med Food. 2012, 15 (3): 288-298. 10.1089/jmf.2011.1806.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang WS, Chen KS, Zhang B, Sun CD, Cai C, Zhou CH, Xu WP, Zhang WQ, Ferguson IB: Postharvest responses of Chinese bayberry fruit. Postharvest Biology and Technology. 2005, 37 (3): 241-251. 10.1016/j.postharvbio.2005.05.005.View ArticleGoogle Scholar
- Zhang WS, Li X, Wang XX, Wang GY, Zheng JT, Abeysinghe DC, Ferguson IB, Chen KS: Ethanol vapour treatment alleviates postharvest decay and maintains fruit quality in Chinese bayberry. Postharvest Biology and Technology. 2007, 46 (2): 195-198. 10.1016/j.postharvbio.2007.05.001.View ArticleGoogle Scholar
- Zhang WS, Li X, Zheng JT, Wang GY, Sun CD, Ferguson I, Chen KS: Bioactive components and antioxidant capacity of Chinese bayberry (Myrica rubra Sieb. and Zucc.) fruit in relation to fruit maturity and postharvest storage. Eur Food Res Technol. 2008, 227 (4): 1091-1097. 10.1007/s00217-008-0824-z.View ArticleGoogle Scholar
- Feng C, Chen M, Xu CJ, Bai L, Yin XR, Li X, Allan AC, Ferguson IB, Chen KS: Transcriptomic analysis of Chinese bayberry (Myrica rubra) fruit development and ripening using RNA-Seq. BMC Genomics. 2012, 13: 19-10.1186/1471-2164-13-19.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang SY, Li X, Feng C, Zhu CQ, Grierson D, Xu CJ, Chen KS: Development and characterization of 109 polymorphic EST-SSRs derived from the Chinese bayberry (Myrica rubra, Myricaceae) transcriptome. Am J Bot. 2012, 99 (12): e501-507. 10.3732/ajb.1200156.View ArticlePubMedGoogle Scholar
- Zhu CQ, Feng C, Li X, Xu CJ, Sun CD, Chen KS: Analysis of expressed sequence tags from Chinese bayberry fruit (myrica rubra sieb. And zucc.) at different ripening stages and their association with fruit quality development. Int J Mol Sci. 2013, 14 (2): 3110-3123. 10.3390/ijms14023110.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiao Y, Jia HM, Li XW, Chai ML, Jia HJ, Chen Z, Wang GY, Chai CY, van de Weg E, Gao ZS: Development of simple sequence repeat (SSR) markers from a genome survey of Chinese bayberry (Myrica rubra). BMC Genomics. 2012, 13: 201-10.1186/1471-2164-13-201.PubMed CentralView ArticlePubMedGoogle Scholar
- Niu SS, Xu CJ, Zhang WS, Zhang B, Li X, Lin-Wang K, Ferguson IB, Allan AC, Chen KS: Coordinated regulation of anthocyanin biosynthesis in Chinese bayberry (Myrica rubra) fruit by a R2R3 MYB transcription factor. Planta. 2010, 231 (4): 887-899. 10.1007/s00425-009-1095-z.View ArticlePubMedGoogle Scholar
- Huang YJ, Song S, Allan AC, Liu XF, Yin XR, Xu CJ, Chen KS: Differential activation of anthocyanin biosynthesis in Arabidopsis and tobacco over-expressing an R2R3 MYB from Chinese bayberry. Plant Cell Tiss Organ Cult. 2013, 113 (3): 491-499. 10.1007/s11240-013-0291-5.View ArticleGoogle Scholar
- Liu XF, Feng C, Zhang MM, Yin XR, Xu CJ, Chen KS: The MrWD40-1 gene of Chinese bayberry (Myrica rubra) interacts with MYB and bHLH to enhance anthocyanin accumulation. Plant Mol Biol Rep. 2013, doi: 10.1007/s11105-013-0621-0Google Scholar
- Liu XF, Yin XR, Allan AC, Lin-Wang K, Shi YN, Huang YJ, Ferguson IB, Xu CJ, Chen KS: The role of MrbHLH1 and MrMYB1 in regulating anthocyanin biosynthetic genes in tobacco and Chinese bayberry (Myrica rubra) during anthocyanin biosynthesis. Plant Cell Tiss Org. 2013, doi: 10.1007/s11240-013-0361-8Google Scholar
- Zhong S, Fei Z, Chen YR, Zheng Y, Huang M, Vrebalov J, McQuinn R, Gapper N, Liu B, Xiang J: Single-base resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening. Nat Biotechnol. 2013, 31 (2): 154-159. 10.1038/nbt.2462.View ArticlePubMedGoogle Scholar
- Al-Saif M, Khabar KS: UU/UA dinucleotide frequency reduction in coding regions results in increased mRNA stability and protein expression. Mol Ther. 2012, 20 (5): 954-959. 10.1038/mt.2012.29.PubMed CentralView ArticlePubMedGoogle Scholar
- Sun JC, Chen M, Xu JL, Luo JH: Relationships among stop codon usage bias, its context, isochores, and gene expression level in various eukaryotes. J Mol Evol. 2005, 61 (4): 437-444. 10.1007/s00239-004-0277-3.View ArticlePubMedGoogle Scholar
- Irwin B, Heck JD, Hatfield GW: Codon pair utilization biases influence translational elongation step times. J Biol Chem. 1995, 270 (39): 22801-22806. 10.1074/jbc.270.39.22801.View ArticlePubMedGoogle Scholar
- Tats A, Tenson T, Remm M: Preferred and avoided codon pairs in three domains of life. BMC Genomics. 2008, 9: 463-10.1186/1471-2164-9-463.PubMed CentralView ArticlePubMedGoogle Scholar
- Moura G, Pinheiro M, Silva R, Miranda I, Afreixo V, Dias G, Freitas A, Oliveira JL, Santos MA: Comparative context analysis of codon pairs on an ORFeome scale. Genome Biol. 2005, 6 (3): R28-10.1186/gb-2005-6-3-r28.PubMed CentralView ArticlePubMedGoogle Scholar
- Shao ZQ, Zhang YM, Feng XY, Wang B, Chen JQ: Synonymous codon ordering: a subtle but prevalent strategy of bacteria to improve translational efficiency. PLoS One. 2012, 7 (3): e33547-10.1371/journal.pone.0033547.PubMed CentralView ArticlePubMedGoogle Scholar
- Qian W, Yang JR, Pearson NM, Maclean C, Zhang J: Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 2012, 8 (3): e1002603-10.1371/journal.pgen.1002603.PubMed CentralView ArticlePubMedGoogle Scholar
- Hiwasa-Tanase K, Nyarubona M, Hirai T, Kato K, Ichikawa T, Ezura H: High-level accumulation of recombinant miraculin protein in transgenic tomatoes expressing a synthetic miraculin gene with optimized codon usage terminated by the native miraculin terminator. Plant Cell Rep. 2011, 30 (1): 113-124. 10.1007/s00299-010-0949-y.View ArticlePubMedGoogle Scholar
- Lynch DB, Logue ME, Butler G, Wolfe KH: Chromosomal G + C content evolution in yeasts: systematic interspecies differences, and GC-poor troughs at centromeres. Genome Biol Evol. 2010, 2: 572-583. 10.1093/gbe/evq042.PubMed CentralView ArticlePubMedGoogle Scholar
- Eyre-Walker A, Hurst LD: The evolution of isochores. Nat Rev Genet. 2001, 2 (7): 549-555. 10.1038/35080577.View ArticlePubMedGoogle Scholar
- Duret L, Galtier N: Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009, 10: 285-311. 10.1146/annurev-genom-082908-150001.View ArticlePubMedGoogle Scholar
- Varenne S, Buc J, Lloubes R, Lazdunski C: Translation is a non-uniform process. Effect of tRNA availability on the rate of elongation of nascent polypeptide chains. J Mol Biol. 1984, 180 (3): 549-576. 10.1016/0022-2836(84)90027-5.View ArticlePubMedGoogle Scholar
- Kozak M: The scanning model for translation: an update. J Cell Biol. 1989, 108 (2): 229-241. 10.1083/jcb.108.2.229.View ArticlePubMedGoogle Scholar
- Sharp PM, Li WH: An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986, 24 (1–2): 28-38.View ArticlePubMedGoogle Scholar
- Wright F: The ’effective number of codons’ used in a gene. Gene. 1990, 87: 23-29. 10.1016/0378-1119(90)90491-9.View ArticlePubMedGoogle Scholar
- Sharp PM, Li WH: The codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15 (3): 1281-1295. 10.1093/nar/15.3.1281.PubMed CentralView ArticlePubMedGoogle Scholar
- Lin T, Ni ZH, Shen MS, Chen L: High-frequency codon analysis and its application in codon analysis of tobacco. J Xiamen Univ. 2002, 41 (5): 551-554.Google Scholar
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27 (3): 431-432. 10.1093/bioinformatics/btq675.PubMed CentralView ArticlePubMedGoogle Scholar
- Audic S, Claverie JM: The significance of digital gene expression profiles. Genome Res. 1997, 7 (10): 986-995.PubMedGoogle Scholar
- Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. DNA Microarrays, Part B: Databases and Statistics. Edited by: Kimmel A, Oluver B. 2006, San Diego: Elsevier Academic Press Inc, 134-193. vol. 411View ArticleGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMedGoogle Scholar
- Ye J, Fang L, Zheng HK, Zhang Y, Chen J, Zhang ZJ, Wang J, Li ST, Li RQ, Bolund L: WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 2006, 34 (Web Server issue): W293-297.PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.