Gender-specific selection on codon usage in plant genomes
BMC Genomics volume 8, Article number: 169 (2007)
Currently, there is little data available regarding the role of gender-specific gene expression on synonymous codon usage (translational selection) in most organisms, and particularly plants. Using gender-specific EST libraries (with > 4000 ESTs) from Zea mays and Triticum aestivum, we assessed whether gender-specific gene expression per se and gender-specific gene expression level are associated with selection on codon usage.
We found clear evidence of a greater bias in codon usage for genes expressed in female than in male organs and gametes, based on the variation in GC content at third codon positions and the frequency of species-preferred codons. This finding holds true for both highly and for lowly expressed genes. In addition, we found that highly expressed genes have greater codon bias than lowly expressed genes for both female- and male-specific genes. Moreover, in both species, genes with female-specific expression show a greater usage of species-specific preferred codons for each of the 18 amino acids having synonymous codons. A supplemental analysis of Brassica napus suggests that bias in codon usage could also be higher in genes expressed in male gametophytic tissues than in heterogeneous (flower) tissues.
This study reports gender-specific bias in codon usage in plants. The findings reported here, based on the analysis of 1 497 876 codons, are not caused either by differences in the biological functions of the genes or by differences in protein lengths, nor are they likely attributable to mutational bias. The data are best explained by gender-specific translational selection. Plausible explanations for these findings and the relevance to these and other organisms are discussed.
Although tissue-specific gene expression has been associated with bias in codon usage in certain multicellular organisms including humans, Drosophila melanogaster, and Arabidopsis thaliana [1–5], very little data currently exists for most organisms, particularly regarding the role of gender-specific tissues and gametes. One of the few studies addressing the effect of gender, in Drosophila, suggests that genes having a higher ratio of female to male expression have a greater bias in codon usage [4, 6]. A largely unstudied biological system where gender-specific gene expression could significantly alter codon usage is plants. Recent findings, in A. thaliana, have indicated that male gametes pass on a greater number of induced harmful mutations to their offspring, suggesting that mutations are subject to less selection in male tissues/gametes than in female tissues/gametes . Such findings at the population level (short-term), suggest that gender could also impact the selective processes that alter molecular evolution in plants, including the usage of synonymous codons. Given that gender-specific selective pressures on codon usage could alter gene evolution and structure, and thereby influence population genetics, disease, and/or reproductive biology, and given the general lack of data to date, further investigation is warranted. Here, we focus on the evaluation of gender-specific codon usage in plants.
Nonrandom use of synonymous codons is a prevalent phenomenon observed in a diverse range of organisms [1, 8–17]. A bias in codon usage occurs when synonymous codons are not all used at the same frequency in coding DNA [14, 18, 19]. Such bias in codon usage could result from mutational pressure, as indicated by a positive correlation between the nucleotide content of third codon positions and adjacent introns [20–23], or from selective pressure. Selective pressure has been supported by two findings. Firstly, greater levels of bias in codon usage are generally associated with a greater frequency of "preferred" codons (those used most frequently in the most biased genes) , a trend that corresponds to the abundance and/or gene number of tRNA in bacteria, yeast, C. elegans, Drosophila, Arabidopsis and other organisms [1, 8, 9, 12–15, 25–27]. Secondly, bias in codon usage has been well correlated to the level of gene expression, with the greatest bias occurring in highly expressed genes [8, 10, 12, 28, 29]. Each of these findings suggests that the use of preferred codons confers fitness benefits that enhance translational efficiency, a phenomenon particularly advantageous for the highly expressed genes [8, 12]. In this regard, gene expression level is an essential component of understanding gender-specific influences on codon usage.
The main challenges for comparing male and female codon usage relative to gene expression for plant species, where the availability of genomic DNA sequences is often limited, are obtaining sufficient coding DNA data to assess codon usage in those tissues and determining the level of gene expression. EST datasets provide an effective solution to both issues. In particular, EST data have proven to be an effective means of quantifying gene expression in a range of tissues as the extent of redundancy in ESTs reflects the abundance of mRNA in the tissue or cells from which the library was obtained [8, 10, 12, 30–32]. In addition, the increased availability of EST data, the long sequence length (200 to 800 bp) and the level of accuracy of the sequence data (from efficient sequence techniques and base-calling software, and the general use of only high quality reads) [33–35] makes it possible to study codon usage directly from EST sequences, even before the corresponding genomic sequences are available [36, 37]. In this regard, the recent availability of male- and female-specific EST libraries in plants provide an effective resource to better understand selection on codon usage and how it may be influenced by gender.
In the present study, the main goal was to assess whether gender-specific gene expression per se and gender-specific gene expression level are correlated with codon usage in Zea mays and Triticum asestivum. As a supplemental analysis, we compared the bias in codon usage for genes expressed only in gametophytic tissue (male germline cells, microspores) and in flower tissue (composed of the somatic and reproductive tissues) at both high and low expression levels in Brassica napus. Given that gene function and protein length [8, 10], have previously been found to influence codon usage, we also evaluated the role of those parameters within our analysis.
In order to compare codon usage relative to gender-specific expression, we collected data from sperm and egg EST libraries for Z. mays and anther and ovary libraries for T. aestivum (Table 1). Microspore and flower libraries were obtained for B. napus. In brief, we obtained gender/tissue-specific datasets in the following manner: 1) clustering and assembly of ESTs from each library using CAP3 , 2) identification of unigenes (from now forward referred to as "genes") having translation products that matched known or hypothetical proteins in A. thaliana (using BLASTX [39–41]), 3) extraction of genes with tissue-specific expression by comparisons between the two compared tissues for each species (using MEGABLAST) , and 4) determination of the expression level per gene (based on the number of ESTs, see Methods). The six tissue-specific sequence datasets obtained are: Z. mays sperm-specific genes (N = 955), Z. mays egg-specific genes (N = 946), T. aestivum anther-specific genes (N = 3326), T. aestivum ovary-specific genes (N = 1489), B. napus microspore-specific genes (N = 1675) and B. napus flower-specific genes (N = 3181). Note that gender-specific genes represent those that are specific to a particular tissue or gamete (e.g. sperm) when compared to only one other tissue or gamete (e.g., egg,) and not relative to all tissues from the plant. Thus, these gene sets are larger than one would have found if the ESTs had been compared to all tissues of a plant. Subsequently, the GC content at third nucleotide positions (GC3) and the frequency of preferred codons (Fpr), each of which have been shown to be effective indicators of bias in codon usage [1, 8, 31], were determined for every gene from each of the tissue-specific datasets. Bias in codon usage was quantified using a single EST sequence to represent each gene (i.e., genes are represented by the longest EST in the contig or a singleton EST, see Methods). The data show, as described in detail below, that genes specific to female tissues and gametes have a greater bias in codon usage in both Z. mays and T. aestivum than genes expressed in male tissues and gametes. As well, male microspores have a greater bias than the heterogeneous tissues of the flower in B. napus.
The GC content at third nucleotide positions and the frequency of preferred codons were each statistically significantly higher for genes expressed specifically in eggs as compared to sperm in Z. mays and for genes expressed in ovary as compared to anther in T. aestivum (Figure 1). This result was statistically significant for genes expressed at both high and at low levels (high > 5 ESTs per 10 000; low ≤ 5 ESTs per 10 000) as well as across all genes. Statistically significant higher values for GC3 were detected in microspore-specific genes as compared to the flower-specific genes in B. napus for genes expressed at low levels, but not for highly expressed genes (only GC3 was determined for B. napus as preferred codons have not been described yet). Within the male-specific and the female-specific genes, GC3 and Fpr were statistically significantly greater for the highly expressed as opposed to the lowly expressed genes in both Z. mays and T. aestivum and for the male-specific specific as compared to flower-specific genes in B. napus. Notably, lowly-expressed female-specific genes have a statistically significant higher GC3 and Fpr than the highly expressed male-specific genes in Z. mays and no difference was detected between these two groups for T. aestivum. Most of the statistically significant comparisons (of 35 pairwise comparisons in total) remained significant after Bonferroni correction except for some contrasts among genes for high versus low expression within a gender (i.e., male-specific tissue/gametes in Z. mays and T. aestivum and within the flower in B. napus) and a single between gender comparison for Fpr at high expression levels in T. aestivum (GC3 for this comparison remained statistically significant). All other between gender comparisons remained statistically significant.
The relative synonymous codon usage (RSCU) represents the observed frequency of a codon divided by the expected frequency (i.e., if all synonymous codons were used equally) . Values different from 1, thus indicate the presence of bias. Analysis of RSCU for genes with male-specific and female-specific expression (concatenated across all genes) indicated that the bias in codon usage towards species-specific preferred codons was consistently higher in the female than in male tissues/gametes for Z. mays and T. aestivum (Table 2). In particular, 26 of 27 of the species-specific preferred codons for 18 amino acids in Z. mays (i.e, 18 have synonymous codons, Table 2)  were more frequent in female-specific than in male-specific genes. This represents a higher usage of at least one species-specific preferred codon for every amino acid with synonymous codons (noting that some amino acids have more than one preferred codon). For T. aestivum, female-specific genes had a greater usage of all 23 of the species-specific preferred codons (for 22 of 23 comparisons, the difference in RSCU was greater than 0.1). In addition, hierarchical clustering was conducted using Pearson correlation coefficients between RSCU values for each combination of species and gender-specific tissues/gametes for Z. mays and T. aestivum . The results indicate that these groups cluster by gender rather than by species, consistent with gender being a major parameter in shaping codon usage (Figure 2). The RSCU data also show that B. napus demonstrates a preference towards GC ending codons in male-specific as compared to flower-specific genes (Table 2). The entire dataset across all species consisted of 1 497 876 codons.
Given that bias in codon usage has been inversely associated with protein length in certain eukaryotic organisms [e.g., ], we assessed whether it played a role in the observed gender-specific bias described above. For this, we identified the protein length for each gene under study (number of amino acids) in the Arabidopsis thaliana protein sequence database (i.e., genes previously identified as a match, see above). We used protein lengths from Arabidopsis because our DNA sequence data was derived from ESTs (that are partial sequences, not containing gene or protein length information) and because, unlike well-studied model organisms, the complete and/or annotated genomic DNA or protein sequences are not yet available for most of the genes in these plants. Protein lengths tend to be highly conserved among eukaryotes . The data indicates that the mean protein length was greater for male-specific genes than female-specific genes in Z. mays and T. aestivum and for flower-specific genes as compared to male-specific genes in B. napus (mean protein length (± Standard Error): Z. mays egg 409.1 (± 13.2), Z. mays sperm 493.2 (± 13.3), T. aestivum ovary 426.3 (± 8.1), T. aestivum anther 571.3 (± 6.7), B. napus microspore 370 (± 6.7), B. napus flower 441.3 (± 5.6)). The higher mean lengths generally resulted from the presence of a relatively few genes that encoded very long proteins (between 1000 and 5000 amino acids). The number of genes encoding proteins with more than 1000 amino acids relative to the total number of genes are: Zea mays egg 38/946 = 4.0%,. Z. mays sperm 80/955 = 8.3%, T. aestivum ovary 69/1489 = 4.6%, T. aestivum anther 368/3326 = 11.1%, B. napus microspore 51/1675 = 3.0%, B. napus flower 176/3181 = 5.5%. We thus determined RSCU values for genes encoding proteins of similar lengths, in order to assess whether this protein length variation was related to our findings of gender-specific biases in codon usage. For this, male-specific and female-specific genes from each species under study (and B. napus flower-specific genes) were classified as being of either short (≤200 amino acids), medium (>200 and ≥400 amino acids) or of long length (>400 amino acids).
Comparisons were conducted for RSCU values relative to gender for Z. mays and T. aestivum (and male-versus flower-specific genes in B. napus) for the concatenated EST sequences (longest EST per contig) within each protein length category. The results show that female-specific genes consistently have higher values for RSCU for the preferred codons (as indicated by "+" sign in each of the columns) within each of these three protein length categories for Z. mays and T. aestivum (Table 3) [see Additional data file 1; Tables 1 to 3)] indicating that protein length variation does not explain the gender-specific bias observed for specific codons described in Table 2 (see Table 3). The bias in codon usage for male-specific versus flower-specific genes in B. napus was most evident for genes encoding long proteins.
Given that the codons showing bias generally ended in G or C for each of the species examined (Table 2), we also compared the GC3 content between each of the two tissue/gamete types per species for each of the protein length categories (Table 4). The results of pairwise comparisons show that the GC3 values are statistically significantly higher for female-specific than male-specific genes in Z. mays and T. aestivum and for male-specific than flower-specific genes in B. napus within each of the protein length categories, consistent with a gender-bias on codon usage. Examination of only those genes encoding very long proteins (equal to or more than 1000 amino acids) showed similar trends for each of these species. Notably, GC3 values were inversely correlated with protein length within each of the six species and tissue-specific datasets (i.e., for the male-specific, female-specific and for flower-specific datasets, Table 4) [see also Additional data file 1; Table 4], consistent with the relationship between codon usage and protein length reported in other species .
Biological function has been proposed as a potential factor altering certain molecular evolutionary processes ; therefore we examined the gene profiles for each of the contrasting tissues for each species under study. The profile of biological functions of genes expressed only in the female tissues/gametes and male tissues/gametes were nearly identical for Z. mays and for T. aestivum (Figure 3). Similarly, the biological functions for genes specifically expressed in male microspores and those expressed in flowers were strongly associated in B. napus.
Gender differences in codon usage
The statistically significant higher GC content at third codon positions and the greater frequency of preferred codons for genes expressed specifically in eggs as compared to sperm in Z. mays and for genes expressed in the ovary as compared to anther in T. aestivum, provide strong evidence that there is a greater bias in codon usage for genes expressed in female tissues than in male tissues and/or gametes (Figure 1). These findings were similar when genes were classified as either high expressing or low expressing, suggesting that gender has a substantial impact on synonymous codon use. We can infer that these differences are likely due to selective pressure because the bias is associated with gene expression (tissue-specific gene expression or high versus low expression level, Figure 1). Furthermore, the data indicate that the gender effect cannot be attributed to variation in protein lengths or to gene function (Table 3, Table 4, Figure 3). Overall, these results, across a broad range of genes, provide evidence that codon usage is altered by gender-specific pressures in plants.
Genes expressed in eggs have a higher relative synonymous codon usage value than those expressed in sperm for 26 of the 27 previously identified preferred codons for Z. mays, and indicates, remarkably, that selective pressure in eggs specifically acts to enhance the frequency of preferred codons for each of the 18 amino acids that have synonymous codons (i.e., egg-specific genes have a greater frequency of at least one of the preferred codons per amino acid, Table 2). Similar findings for T. aestivum, showing that all 23 of the preferred codons for this species are enhanced in genes expressed in the ovary as compared to the anther, also demonstrate that a selective pressure inherent to these female organs and gametes is acting to enhance the incidence of preferred codons across all synonymous codon groups. In addition, the fact that the female-bias was detected for each gene length category (i.e., among short genes, among medium length genes, and among long genes, Tables 3 and 4) and that the gender-specific gene expression was the major determinant of hierarchical clustering (relative to RSCU values), rather than species (Figure 2), supports the notion that the codon usage bias demonstrated here is greatly influenced by gender-specific factors. It is notable for T. aestivum that in three cases the RSCU was greater for female-than male-specific genes (bold values, no asterisk, Table 2) for G or C ending codons that had not been previously identified as preferred, but had been described as preferred in Z. mays and other plant species . The overall results presented here, showing greater use of preferred codons in genes expressed in female organs and gametes, suggests that these codons are probably also preferred in T. aestivum, at least for genes expressed in the reproductive tissues and gametes.
The greater bias in codon usage among genes expressed in female organs and gametes as compared to male organs and gametes, reflecting an increased propensity for translational selection, could be caused by several factors. In particular, it is possible that protein products of genes expressed in female organs and gametes experience a more diverse biochemical environment than their male counterparts, a phenomenon that could lead to greater selective constraint on proteins [45, 46], and thus, on their translation. It is also possible that mutations at third codon positions in genes expressed in female organs and gametes may on average have greater effects on fitness, as has been proposed for genes expressed across a broad array of tissues (higher selection coefficients) [8, 9]. This could occur, for example, if translational inefficiency in female organs and gametes alters the cellular energy resources or interferes with essential biological processes in a manner not prevalent in male regions. Mutations affecting female regions could also have greater fitness effects because of the general uncertainty in the pollination process, which makes it highly advantageous for each ovary, ovule and/or egg to be fully functional (thereby mutations in female regions may affect fitness overall more than for anthers, pollen or sperm)  and because maternal traits can have a much greater impact on seed production (seed number, size, and dispersal) and survival (and thus, on overall fitness) . Another possible explanation for the observed results is that there are differences in gene function between female and male organs and gametes, a theory that has been proposed as a potential factor altering amino acid substitution rates . As shown in Figure 3, however, this is not the likely explanation in this study, as there is remarkable similarity in the biological functions represented by the male-specific and female-specific genes in both Z. mays and T. aestivum. Nonetheless, subtle differences in gene function (e.g., specific genes that influence codon usage) or other, unidentified, functional differences between the male and female tissues/gametes could play a role . An additional potential contributing factor worth consideration is that genes that have greater breadth of expression throughout the entire plant (that can have greater bias in codon usage) , are coincidentally also more commonly expressed in female organs and gametes than in males. Although this possibility cannot be definitively excluded, it seems unlikely given the similarity between the functional profiles of female- and male-specific genes. Altogether, it seems that the best explanations are differences in the amount of selective pressure for effective translation due to different cellular environments and/or a greater impact of mutations on female tissues and gametes. Further studies will nonetheless be needed to ascertain the mechanisms underlying the greater bias in codon usage in female organs and gametes in these plant species.
The relationship between gender-specific gene expression and codon usage in Z. mays and T. aestivum is consistent with the very limited data currently available for other organisms. It has been shown in humans, for example, that genes expressed in ovaries have likely been under slightly greater selective constraint than testes for codon usage following the divergence of humans and mice . The trend notably corresponds to the generally high rates of protein evolution (and thus reduced selective pressure) reported in genes involved in spermatogenesis in primates . In Drosophil a, it has been found that the relative expression of genes in females versus males (the female : male ratio of gene expression) is well correlated to bias in codon usage . In addition, in Arabidopsis, previous findings have indicated that more induced harmful mutations are passed to progeny by the sperm than by the eggs, consistent with the relatively lower selective pressure on mutations in male than in female tissues and gametes . The present results extend these findings to include gender-specific selection on codon usage. Each of these gender-specific trends, in humans, Drosophila, and Arabidopsis are consistent with the findings we report here, and suggest that the higher bias in codon usage for genes expressed in female tissues could be inherent to a range of organisms. Further studies will be needed to better understand the full range of organisms for which gender-specific gene expression is associated with a bias in codon usage.
Gene expression level
Gene expression level has been shown to be positively correlated with bias in codon usage in many organisms [1, 8–15, 24–27, 31]. Selection is the best explanation for this finding because higher levels of gene expression lead to greater opportunity for selection to alter codon usage [8, 10, 12] and because mutational bias has only rarely been associated with gene expression level (in certain microorganisms) [8, 50, 51]. In Drosophila, a positive relationship between gene expression and bias in codon usage has been reported for female tissues, but a relatively weak negative correlation was detected for male tissues . Our findings of greater values for GC3 and Fpr for highly expressed genes than for lowly expressed genes for both male-specific and female-specific genes from Z. mays and T. aestivum suggests that gene expression level is positively correlated to bias in codon usage for genes expressed in male and in female regions for these plants.
It is notable nonetheless that we found that the differences in the bias in codon usage between highly and lowly expressed genes were not as marked the male as in female tissues and gametes, as evidenced by the fact that the Bonferroni correction excluded the statistical significance of this comparison for both Z. mays and T. aestivum. In fact, the lowly expressed female-specific genes had statistically significantly higher bias in codon usage than the highly expressed male-specific genes in Z. mays and no difference was detected between these two groups for T. aestivum (Figure 1). It thus seems that female tissues/gametes maintain substantial selective pressure on codon usage even for genes with reduced expression, in a manner not characteristic of male tissues/gametes.
Selection and gender-bias
The greater bias in codon usage for female-specific than for male-specific genes is currently best explained by selection. This is for the following reason. The male-specific and female-specific gene sets examined here were determined based on calculations that these genes were solely or primarily expressed in one tissue type and not in the other (i.e., the lack of ESTs in the contrasting tissue indicates that the mRNA was very rare or absent). Thus, the observed effects are associated with gender-specific gene expression (gene expression allows opportunity for selection, and is not usually associated with mutational bias [8, 10, 12]). In addition, our data indicate that gene function and protein length variation between male and female tissues/gametes do not explain the observed bias in codon usage between the gender-specific gene sets. Gene expression level differences cannot be implicated because the gender bias was detected for genes expressed at similar levels (high versus high expression and low versus low expression, regardless of protein length, Figure 1,) [see Additional data file 1, Table 5]. Notably, because we examined ESTs for the present analysis, which rarely contain introns, and studied plant species where annotated genomic DNA (containing the introns) is not yet available (NCBI, personal communication), we do not include an analysis of the GC (or AT) content of introns versus third codon positions in our genes, an approach sometimes used to exclude mutational bias [8, 12, 24, 42, 45, 52]. Nevertheless, each of these trends, taken in their entirety, suggest that the bias in codon usage associated with gender-specific expression is best explained by differential selective pressure on genes expressed in male-specific tissues/gametes versus female-specific tissues/gametes.
Male-specific versus flower-specific genes in Brassica napus
Although the differences in bias in codon usage between the microspore and flower in B. napus were generally lower in magnitude than the previous between gender comparisons for Z. mays and T. aestivum (Figure 1, Table 2), the data overall indicates that genes expressed in the two B. napus tissues have specific patterns of codon usage. Specifically, the higher GC content in B. napus microspore-specific genes than in flower-specific genes suggests that the male portion of the flower may be under more selective pressure for codon usage than the flower as a whole. In particular, given that the flower and flower bud EST library should represent genes from the male, female and vegetative (somatic) tissue, one can infer that the combined vegetative and female tissue is under less selective pressure than the microspore. Given that the vegetative tissue usually represents the greatest fraction of the flower tissue (petals, sepals) , then it could, in turn, be inferred that the somatic region is likely under reduced selective pressure for codon usage than the microspore (with no inference regarding the pressure in female tissues). The fact that the GC content at third codon positions of the genes specifically expressed in the microspores varied markedly among synonymous codon groups (Table 2), and was found to be positively correlated to gene expression level (Figure 1), further supports the notion that translational selection is enhanced in the microspore component of the flower. Moreover, from examination of Table 2, it is evident that for six of seven comparisons where the differences in RSCU between tissues were greater than 0.1, the microspore had enhanced usage of G or C ending codons (which were notably also the preferred codons in Z. mays and T. aestivum), a trend consistent with greater selective pressure. Notably, analysis of RSCU values relative to protein length suggests that differences between male-specific and flower-specific genes are greatest for genes encoding longer proteins (>400 amino acids, Table 3) in the B. napus tissue comparisons as these genes have substantially greater usage of G or C ending codons in this category ("+" signs in Table 3). This effect could be partially caused by the greater percentage of genes encoding very long proteins in the flower-specific dataset or by greater male-specific effects on codon usage for genes encoding longer proteins. Nonetheless, all three of the gene length categories demonstrate higher GC3 values for male-specific genes (Table 4). One possible interpretation of all of these findings in B. napus, when combined with the data from Z. mays and T. aestivum, is that the translational selection increases in the following order: flower-specific (heterogeneous) genes, male-specific genes, female-specific genes. Because these analyses are in different species, however, further evaluation of this possible relationship will be needed. Altogether, the totality of the findings here suggest that genes expressed in reproductive tissues may be under greater translational level selection than those expressed in vegetative (somatic) tissues, a factor consistent with the key role of reproductive success in fitness.
Protein length and gene expression
The analysis of protein lengths indicates that genes encoding shorter proteins tend to generally have greater bias in codon usage, as indicated by GC3, for the species examined here (Table 4) [see also Additional data file 1; Table 4]. This is consistent with the trends reported in other organisms to date such as Arabidopsis, Drosophila, C. elegans and yeast [e.g., [8, 54]]. We also found marked evidence that the gene expression level in the species studied here is inversely correlated to protein length for each of the six datasets examined (across all genes per dataset), a result consistent with trends reported in humans, Drosophila and Populus tremula [2, 55, 56]. In particular, the Pearson correlation coefficients were: Z. mays male (R = -0.135, P = 2.6 × 10-5), Z. mays female (R = -0.080, P = 0.010), T. aestivum male (R = -0.049, 4.3 × 10-3), T. aestivum female (R = -0.18, P = 1.1 × 10-12), B. napus male (R = -0.149, P = 1.9 × 10-9), and B. napus flower (R = -0.081, P = 4.6 × 10-6). This suggests that the tendency of shorter genes to have greater bias in codon usage (Table 4), at least for the genes examined here, may be due to greater levels of gene expression and an associated selective pressure . Notably, in a complementary analysis to Figure 1, we found that the gender-bias in codon usage was evident among highly and among lowly expressed genes within each the three different protein length categories (short, medium, and long) [see Additional data file 1; Table 5].
Thus, the gender-specific biases in Z. mays and T. aestivum (and flower-specific differences in B. napus) at high and low levels of gene expression observed in Figure 1 cannot be explained by differences in protein lengths. In addition to the inverse association between protein length and bias in codon usage, it is also evident from Table 4 that the three species examined here tend to have different values for GC3, with decreasingly lower values occurring from T. aestivum, to Z. mays and to B. napus. Altogether, it is evident from our entire analysis that the gender-specific effects on codon usage can be detected across a range of protein lengths, gene expression levels, and for different plant species, thereby demonstrating that gender-specific factors play a significant role in genome evolution.
It should be noted that the B. napus material used for the microspore cDNA library was grown at low temperatures (10°C/5°C) that could potentially alter some fraction of the gene expression in the microspore if stress-mediation genes were enhanced. In fact, we found less than 1% of the total genes in B. napus microspores were stress-related (data not shown). Another issue worth considering is the implication of previous findings of gender-specific mutation rates in plants, a trend that was based on the detection of higher evolutionary rates at silent sites, including third codon positions, in male gametes . Higher mutation rates in sperm, however, should act to enhance per generational mutation rates across the entire genome, including those genes expressed in females and in males, and thus, not impact the observed bias in codon usage.
Nonetheless, it should be noted that the differential male/female inheritance of organelles (and the underlying mechanisms; pre- or post-zygotic), could influence whether these genes are expressed in male or female tissues/gametes, and potentially contribute to the codon usage for organellar genes and their substitution rates [57, 58]). It is also notable that the abundance of tRNA of the preferred codons could be greater for female than the male tissues/gametes and contribute to the gender differences in codon usage [3, 4]. This seems relatively unlikely given that the abundance of tRNAs would have to be higher in female-tissues/gametes for every single preferred codon and that such differences have been shown not to explain the gender-specific codon usage in Drosophila . It should also be noted that the reproducibility of the results for GC3 and Fpr observed in this study is consistent with the notion that GC3 content alone could be an effective indicator of codon bias in some species .
This study reports findings of female-specific bias in codon usage in plants. The remarkable consistency of the increased GC content of third codon positions and the increased frequency of preferred codons for both Z. mays and T. aestivum, even across different gene expression levels and protein lengths, combined with the enhanced usage of species-specific preferred codons for each of the 18 amino acids having synonymous codons, strongly indicate that gender plays a key role in codon usage. The findings in B. napus suggest that the tissues of the reproductive system, including both male and female organs, have a greater impact on codon usage than somatic regions. Overall, it is apparent that gender needs to be a key player in furthering our understanding of translational level selection. Further study will be needed to ascertain whether this is a generalized phenomenon, inherent to other organisms, as it could play a key role in DNA and protein sequence changes relevant to epidemiology, population genetics and to molecular evolution.
Sperm and egg EST libraries from Z. mays and anther and ovary libraries for T. aestivum were extracted from Genbank using Entrez Nucleotide available at the National Center for Biotechnology Information  (Table 1). We chose these data because of the availability of large gender-specific EST libraries (>4000) in Genbank. When more than one library was available we chose the one most likely to reflect gamete expression (e.g., an ovary library was selected over a pistil library in T. aestivum). In addition to these libraries, we also collected B. napus sequences from an in-house cDNA library representing isolated late-uninucleate and early-binucleate microspores (male germline cells) and from a publicly available flower library (Table 1). The EST datasets used represent those that were available to us as of March 2006.
Expression profiles and preparation of sequence data
Each of the EST sequences from each of the six libraries was compared against the A. thaliana protein sequence database using BLASTX [39–41]. Only EST sequences having an e-value of less than 1 × 10-7 to known or hypothetical proteins of A. thaliana were kept for further analysis. Notably, this process automatically excludes all rRNAs from the analysis as they would not be in the protein database. Using these datasets, we clustered and assembled the ESTs for each library into contigs and singletons using the software program CAP3  (Table 1). The expression profile for each gene was determined from the number of ESTs per contig, and this value was 1 for singleton ESTs. For each gene in each of the six datasets, we standardized the expression level by dividing these values by the total number of ESTs in the original and complete EST library (Table 1), an approach that has been previously demonstrated to be an effective measure of expression level [8, 12, 31]. These values were multiplied by 10 000 (to obtain ESTs per 10 000) and the expression level for each gene was categorized as either high (>5 ESTs per 10 000) or low (≤5 ESTs per 10 000). Although many of the putative unigenes used here have not yet been definitively described as genes for those species, we nonetheless refer to them as "genes" here and in the text.
We identified genes that were expressed only in the male library and only in the female library (or flower for B. napus) as follows. Beginning with the male EST library from each species, the longest EST sequence per contig was identified and chosen as the representative for that gene. Each singleton represents its own gene. Each of these ESTs was then submitted to MEGABLAST  as a query against the original and redundant female-specific EST dataset (or flower library for B. napus, Table 1). The original female EST dataset was used in order to be conservative in the identification of male-specific ESTs. ESTs having more than 95% similarity were considered a match, which represents a level of similarity rigorous enough to distinguish among genes in conserved gene families . The genes not having matches were categorized as male-specific. The process was then repeated for the female set of genes (or flower-specific for B. napus), after removing the sequences that were identified as matches to the male EST library. Specifically, the longest EST per contig and each singleton was queried against the original and redundant male-specific EST library in MEGABLAST. Genes that did not have matches were considered female-specific. The final datasets were: Z. mays sperm-specific genes (N = 955), Z. mays egg-specific genes (N = 946), T. aestivum anther-specific genes (N = 3326), T. aestivum ovary-specific genes (N = 1489), B. napus microspore-specific genes (N = 1675) and B. napus flower-specific genes (N = 1637). Genes with high expression (>5 ESTs per 10 000) represented approximately one quarter (or less) of each of these datasets. Notably, these gender-specific genes represent those that are specific to a particular tissue or gamete (e.g. sperm) when compared to only one other tissue or gamete (e.g., egg), and thus the tissue/gamete-specific genes identified here for each species are more numerous than one would observe if these had been compared to all the available libraries for that species (as is the standard analysis for ESTs). As well, the description of a gene as tissue-specific (e.g., female-specific) does not necessarily indicate that the gene is not expressed in the opposing tissue (e.g., male-specific) but rather that there were no ESTs present in the publicly available EST dataset used for comparison. It should be noted that although the genomic DNA sequences were available for a small portion of the ESTs for each species, we only used EST sequences for this entire analysis for consistency.
The open reading frame for each singleton and an EST representing every contig (longest EST per contig) was identified for every gene from each of the six reduced-sized gender-specific sequence datasets (see above) using alignments from BLASTX against the A. thaliana protein database. The BLASTX amino acid based algorithm provides alignments of the six-frame translated EST relative to the protein database and thereby is more sensitive to elements of functionality and homology than DNA alignments and accurately reveals reading frames [41, 59]. Using the amino acid alignments between the translated ESTs and the A. thalian a homologues, we identified and extracted that portion of each EST sequence representing the reading frame (these generally did not include the start or termination codons). Most edited sequences were between 200 and 700 bp in length. Gaps in the alignments were rare, but when identified these regions were excluded from the EST, as were any occasional missense codons (resulting from the less than 1% sequencing errors in most large scale EST sequencing projects; in-house ESTs had PHRED scores  of greater than 20, representing less than a 1% error). All DNA sequence editing was conducted using BioEdit 126.96.36.199 .
The GC content at third codon positions (GC3) and the frequency of preferred codons (Fpr) for each gene was determined using CodonW . GC3 content has been shown to be well-correlated with the degree of biased codon usage for A. thaliana and other plant species [1, 31]. For the determination of Fpr, we used the preferred codons (sometimes called favoured codons) for Z. mays and T. aestivum previously identified by Kawabe and Miyashita (2003) . Preferred codons are those that are most frequently used in the most highly-biased genes (as compared to lowly-biased genes) per degenerate codon group, and have been well-correlated to the optimal codons for many species (those codons used in the most highly expressed genes) . Fpr was not determined for B. napus as the preferred codon data have not been described yet (and cannot be determined here as then the same data would then have to be used to determine both the preferred codons and Fpr).
A series of pairwise comparisons were conducted with respect to each combination of gender and gene expression level using GC3 and Fpr (including the flower for B. napus). Additional pairwise contrasts were conducted to assess the impact of gene expression within each gender-specific dataset for each species (and the flower in B. napus) for a total of 35 contrasts. Tests were conducted using the non-parametric Mann-Whitney Rank Sum test (as normality was not detected in some contrasts; t-tests nonetheless yielded similar results). Statistical significance required P < 0.05. A Bonferroni correction was applied across all contrasts. All statistical analyses were conducted in SigmaPlot 10.0 and SigmaStat 3.5 for Windows (Systat© software 2006). In addition to these pairwise tests, we also determined the relative synonymous codon usage (RSCU) for the concatenated reading frames for the entire dataset of male-specific and for female-specific (and the flower for B. napus) genes for each species using CodonW . See main text for the description of gender-specific RSCU, GC3, and gene expression relative to protein length. Hierarchical clustering was conducted based on the Pearson correlation coefficients between RSCU values for each combination of species and gender-specific tissues/gametes for Z. mays and T. aestivum (Systat, 2004).
As a means to assess whether the profile of gene functions conducted by male-specific and female-specific genes differed for Z. mays and for T. aestivum, each of these gender-specific sequence datasets (open reading frames) were submitted to MIPs . A similar analysis was conducted for male-specific and flower-specific genes in B. napus. The gene functions were determined by comparison to proteins characterized in the A. thaliana database (MATDB) with annotation from The Arabidopsis Information Resource (TAIR) and as implemented by the software Classification Superviewer .
Chiapello H, Lisacek F, Caboche M, Henaut A: Codon usage and gene function are related in sequences of Arabidopsis thaliana. Gene. 1998, 209: GC1-GC38.
Urrutia AO, Hurst LD: Codon usage bias covaries with expression breadth and rate of synonymous evolution in humans, but this is not evidence for selection. Genetics. 2001, 159: 1191-1199.
Plotkin JB, Robins H, Levine AJ: Tissue-specific codon usage and the expression of humans genes. Proc Natl Acad Sci USA. 2004, 101: 12588-12591.
Hambuch TM, Parsch J: Patterns of synonymous codon usage in Drosophila melanogaster genes with sex-biased expression. Genetics. 2005, 170: 1691-1700.
Kotlar D, Lavner Y: The action of selection on codon bias in humans is related to frequency, complexity and chronology of amino acids. BMC Genomics. 2006, 7: 67-
Zhang Z, Hambuch TM, Parsh J: Molecular evolution of sex-biased genes in Drosophila. Mol Biol Evol. 2004, 21: 2130-2139.
Whittle CA, Johnston MO: Male-biased transmission of deleterious mutations to the progeny in Arabidopsis thaliana. Proc Natl Acad Sci USA. 2003, 100: 4055-4059.
Duret L, Mouchiroud D: Expression pattern and, surprisingly, gene length shade codon usage in Caenorhabdit is, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA. 1999, 96: 4482-4487.
Duret L: tRNA gene number and codon usage in the C. elegans genome are coadapted for optimal translation of highly expressed genes. Trends Genet. 2000, 16: 287-289.
Akashi H: Gene expression and molecular evolution. Curr Opin Genet Dev. 2001, 11: 660-666.
Akashi H: Translational selection and yeast proteome evolution. Genetics. 2003, 164: 1291-1303.
Wright SI, Yau CB, Loosely M, Myers C: Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol. 2004, 21: 1719-1726.
Guoy M, Gautier C: Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 1982, 10: 7055-7074.
Sharp PM, Li W-H: An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986, 24: 28-38.
Stenico M, Lloyd T, Sharp PM: Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res. 1994, 22: 2437-2446.
Ermolaeva MD: Synonymous codon usage in bacteria. Curr Issues Mol Biol. 2001, 3: 91-97.
Chen SL, Lee W, Hottes AK, Shapiro L, Harley H: Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci USA. 2004, 101: 3480-3485.
Miyata T, Hayashida H: Extraordinarily high evolutionary rate of pseudogenes: evidence for the presence of selective pressure against changes between synonymous codons. Proc Natl Acad Sci. 1981, 78: 5739-5743.
Sharp PM, Li WH: The codon adaptation index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15: 1281-1295.
Osawa S, Ohama T, Yamao F, Muto , Jukes TH, Ozeki H, Umesono K: Directional mutation pressure and transfer RNA in choice of the third nucleotide of synonymous two-codon sets. Proc Natl Acad Sci USA. 1988, 85: 1124-1128.
Sueoka N: Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA. 1988, 85: 2653-2657.
Kano A, Andachi Y, Ohama T, Osawa S: Novel anticodon composition of transfer RNAs in Micrococcus luteus, a bacterium with a high genomic G + C content: Correlation with codon usage. J Mol Biol. 1991, 221: 387-401.
Sharp PM, Averof M, Lloyd AT, Matassi G, Peden JF: DNA sequence evolution: The sounds of silence. Philos Trans R Soc Lond B Biol Sci. 1995, 349: 241-247.
Wright SI: The "effective number of codons" used in a gene. Gene. 1990, 87: 23-29.
Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985, 2: 13-34.
Kanaya S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T: Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol. 2002, 53: 290-298.
Moriyama EN, Powell JR: Codon usage bias and tRNA abundance in Drosophila. J Mol Evol. 1997, 45: 514-523.
Grantham R, Gautier C, Guoy M, Jacobzone M, Mercier M: Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res. 1981, 9: r43-r49.
Murray EE, Lotzer J, Eberle M: Codon usage in plant genes. Nucleic Acids Res. 1989, 17: 477-498.
Duret L, Mouchiroud D: Determinants of substitution rates in mammalian genes: expression patterns affects selection intensity but not mutation rate. Mol Biol Evol. 2000, 17: 68-74.
Wright SI, Lauga B, Charlesworth D: Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol Biol Evol. 2002, 19: 1407-1420.
Subramanian S, Kumar S: Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics. 2004, 168: 373-381.
Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using PHRED. I Accuracy Assessment. Genome Res. 1998, 8: 175-185.
Staden R, Beal KF, Bonfield JK: The Staden Package, 1998. Bioinformatics Methods and Protocols. Edited by: Misener S, Krawetz SA. 1999, Totowa, NJ: The Humana Press, 115-130.
Adzhubei AA, Laerdahl JK, Vlasova AV: preAssemble: A tool for automatic sequencer trace data processing. BMC Bioinformatics. 2006, 17: 17-22.
Tiffin P, Hahn MW: Coding sequence divergence between two closely related plant species: Arabidopsis thailiana and Brassica rapa ssp. pekinensis. J Mol Evol. 2002, 54: 746-753.
Mitreva M, Wendl MC, Martin J, Wylie T, Yin Y, Larson A, Parkinson J, Waterston RH, McCarter JP: Codon usage patterns in Nematoda: analysis based on over 25 million codons in thirty-two species. Genome Biol. 2006, 7: R75-
Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9: 868-877.
Altschul SF, Gish W, Miller W, Myers WE, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402.
National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
Kawabe A, Miyashita N: Patterns of codon usage bias in three dicot and four monocot plant species. Genes Genet Syst. 2003, 78: 343-352.
Systat 11 Statistics, Vol. I. 2004, Systat Software, Inc., Richmond, CA
Wang D, Hsieh M, Li W-H: A general tendency for conservation of protein lengths across eukaryotic Kingdoms. Mol Biol Evol. 2004, 22: 142-147.
Maside X, Lee AW, Charlesworth B: Selection on codon usage in Drosophila americana. Curr Biol. 2004, 14: 150-154.
Hastings KE: Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J Mol Evol. 1996, 42: 631-640.
Charlesworth D: Why do plants produce so many more ovules than seeds?. Nature. 1989, 338: 21-22.
Mazer SJ, Snow AA, Stanton ML: Fertilization dynamics and parental effects upon fruit development in Raphanus raphanistrum: consequences for seed size variation. Am J Bot. 1986, 73: 500-511.
Wyckoff G, Wang W, Wu CI: Rapid evolution of male reproductive genes in the descent of man. Nature. 2000, 403: 304-309.
Datta A, Jink-Roberton S: Association of increased spontaneous mutation rates with high levels of transcription in yeast. Science. 1995, 268: 1616-1619.
Beletskii A, Bhagwat AS: Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc Natl Acad Sci USA. 1996, 93: 13919-13924.
Kliman RM, Hey J: The effects of mutation and natural selection on codon bias in the genes of Drosophila. Genetics. 1994, 137: 1049-1056.
Polowick PL, Sawhney VK: A scanning electron microscopic study on the initiation and development of floral organs of Brassica napus cv. Westar. Am J Bot. 1986, 73: 254-263.
Moriyama EN, Powell JR: Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 1998, 26: 3188-3193.
Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL: Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol. 2005, 22: 1345-1354.
Ingvarsson PK: Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol Biol Evol. 2007, 24: 836-844.
Whittle CA, Johnston MO: Male-driven evolution of mitochondrial and chloroplastidial DNA sequences in plants. Mol Biol Evol. 2002, 19: 938-949.
Birky CW: Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and evolution. Proc Natl Acad Sci USA. 1995, 92: 11331-11338.
Gish W, States DJ: Identification of protein coding regions by database similarity search. Nat Genet. 1993, 3: 266-272.
Hall T: BioEdit: A user-friendly biological sequence alignment program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999, 41: 95-98.
Peden J: Analysis of codon usage. PhD Thesis. 1999, Department of Genetics, University of Nottingham, UK
Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenster B, Munsterkkotter M, Rudd S, Weil B: MIPS: A database for genomes and protein sequences. Nucleic Acids Res. 2002, 30: 31-34.
Provart N: Classification Superviewer. 2006, [http://bbc.botany.utoronto.ca]
Engel ML, Charboud A, Dumas C, McCormick S: Sperm cells of Zea mays have a complex complement of mRNAs. Plant J. 2003, 34: 697-707.
Yang H, Kaur N, Kiriakopolos S, McCormick S: EST generation and analyses towards identifying female gametophyte-specific genes in Zea mays L. Planta. 2006, 224: 1004-1014.
The authors thank three anonymous reviewers for their valuable comments on the manuscript. We also thank the researchers listed in Table 1 for making their EST data publicly available. This work was funded by the National Research Council Canada – Genomics and Health Initiative III. This paper is National Research Council Canada publication No. 48420.
CAW conceived the project, conducted the majority of the bioinformatics and comparative analysis, and prepared the manuscript. MRM conducted laboratory work for the B. napus microspore EST library and participated in the analysis and the editing of the manuscript. JEK is the research group supervisor and contributed to the conception of the study, managed the data from the B. napus microspore library, and was involved in the data analysis and the editing of the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
About this article
Cite this article
Whittle, CA., Malik, M.R. & Krochko, J.E. Gender-specific selection on codon usage in plant genomes. BMC Genomics 8, 169 (2007). https://doi.org/10.1186/1471-2164-8-169