Gender differences in codon usage
The statistically significant higher GC content at third codon positions and the greater frequency of preferred codons for genes expressed specifically in eggs as compared to sperm in Z. mays and for genes expressed in the ovary as compared to anther in T. aestivum, provide strong evidence that there is a greater bias in codon usage for genes expressed in female tissues than in male tissues and/or gametes (Figure 1). These findings were similar when genes were classified as either high expressing or low expressing, suggesting that gender has a substantial impact on synonymous codon use. We can infer that these differences are likely due to selective pressure because the bias is associated with gene expression (tissue-specific gene expression or high versus low expression level, Figure 1). Furthermore, the data indicate that the gender effect cannot be attributed to variation in protein lengths or to gene function (Table 3, Table 4, Figure 3). Overall, these results, across a broad range of genes, provide evidence that codon usage is altered by gender-specific pressures in plants.
Genes expressed in eggs have a higher relative synonymous codon usage value than those expressed in sperm for 26 of the 27 previously identified preferred codons for Z. mays, and indicates, remarkably, that selective pressure in eggs specifically acts to enhance the frequency of preferred codons for each of the 18 amino acids that have synonymous codons (i.e., egg-specific genes have a greater frequency of at least one of the preferred codons per amino acid, Table 2). Similar findings for T. aestivum, showing that all 23 of the preferred codons for this species are enhanced in genes expressed in the ovary as compared to the anther, also demonstrate that a selective pressure inherent to these female organs and gametes is acting to enhance the incidence of preferred codons across all synonymous codon groups. In addition, the fact that the female-bias was detected for each gene length category (i.e., among short genes, among medium length genes, and among long genes, Tables 3 and 4) and that the gender-specific gene expression was the major determinant of hierarchical clustering (relative to RSCU values), rather than species (Figure 2), supports the notion that the codon usage bias demonstrated here is greatly influenced by gender-specific factors. It is notable for T. aestivum that in three cases the RSCU was greater for female-than male-specific genes (bold values, no asterisk, Table 2) for G or C ending codons that had not been previously identified as preferred, but had been described as preferred in Z. mays and other plant species [42]. The overall results presented here, showing greater use of preferred codons in genes expressed in female organs and gametes, suggests that these codons are probably also preferred in T. aestivum, at least for genes expressed in the reproductive tissues and gametes.
The greater bias in codon usage among genes expressed in female organs and gametes as compared to male organs and gametes, reflecting an increased propensity for translational selection, could be caused by several factors. In particular, it is possible that protein products of genes expressed in female organs and gametes experience a more diverse biochemical environment than their male counterparts, a phenomenon that could lead to greater selective constraint on proteins [45, 46], and thus, on their translation. It is also possible that mutations at third codon positions in genes expressed in female organs and gametes may on average have greater effects on fitness, as has been proposed for genes expressed across a broad array of tissues (higher selection coefficients) [8, 9]. This could occur, for example, if translational inefficiency in female organs and gametes alters the cellular energy resources or interferes with essential biological processes in a manner not prevalent in male regions. Mutations affecting female regions could also have greater fitness effects because of the general uncertainty in the pollination process, which makes it highly advantageous for each ovary, ovule and/or egg to be fully functional (thereby mutations in female regions may affect fitness overall more than for anthers, pollen or sperm) [47] and because maternal traits can have a much greater impact on seed production (seed number, size, and dispersal) and survival (and thus, on overall fitness) [48]. Another possible explanation for the observed results is that there are differences in gene function between female and male organs and gametes, a theory that has been proposed as a potential factor altering amino acid substitution rates [10]. As shown in Figure 3, however, this is not the likely explanation in this study, as there is remarkable similarity in the biological functions represented by the male-specific and female-specific genes in both Z. mays and T. aestivum. Nonetheless, subtle differences in gene function (e.g., specific genes that influence codon usage) or other, unidentified, functional differences between the male and female tissues/gametes could play a role [10]. An additional potential contributing factor worth consideration is that genes that have greater breadth of expression throughout the entire plant (that can have greater bias in codon usage) [1], are coincidentally also more commonly expressed in female organs and gametes than in males. Although this possibility cannot be definitively excluded, it seems unlikely given the similarity between the functional profiles of female- and male-specific genes. Altogether, it seems that the best explanations are differences in the amount of selective pressure for effective translation due to different cellular environments and/or a greater impact of mutations on female tissues and gametes. Further studies will nonetheless be needed to ascertain the mechanisms underlying the greater bias in codon usage in female organs and gametes in these plant species.
The relationship between gender-specific gene expression and codon usage in Z. mays and T. aestivum is consistent with the very limited data currently available for other organisms. It has been shown in humans, for example, that genes expressed in ovaries have likely been under slightly greater selective constraint than testes for codon usage following the divergence of humans and mice [3]. The trend notably corresponds to the generally high rates of protein evolution (and thus reduced selective pressure) reported in genes involved in spermatogenesis in primates [49]. In Drosophil a, it has been found that the relative expression of genes in females versus males (the female : male ratio of gene expression) is well correlated to bias in codon usage [4]. In addition, in Arabidopsis, previous findings have indicated that more induced harmful mutations are passed to progeny by the sperm than by the eggs, consistent with the relatively lower selective pressure on mutations in male than in female tissues and gametes [7]. The present results extend these findings to include gender-specific selection on codon usage. Each of these gender-specific trends, in humans, Drosophila, and Arabidopsis are consistent with the findings we report here, and suggest that the higher bias in codon usage for genes expressed in female tissues could be inherent to a range of organisms. Further studies will be needed to better understand the full range of organisms for which gender-specific gene expression is associated with a bias in codon usage.
Gene expression level
Gene expression level has been shown to be positively correlated with bias in codon usage in many organisms [1, 8–15, 24–27, 31]. Selection is the best explanation for this finding because higher levels of gene expression lead to greater opportunity for selection to alter codon usage [8, 10, 12] and because mutational bias has only rarely been associated with gene expression level (in certain microorganisms) [8, 50, 51]. In Drosophila, a positive relationship between gene expression and bias in codon usage has been reported for female tissues, but a relatively weak negative correlation was detected for male tissues [4]. Our findings of greater values for GC3 and Fpr for highly expressed genes than for lowly expressed genes for both male-specific and female-specific genes from Z. mays and T. aestivum suggests that gene expression level is positively correlated to bias in codon usage for genes expressed in male and in female regions for these plants.
It is notable nonetheless that we found that the differences in the bias in codon usage between highly and lowly expressed genes were not as marked the male as in female tissues and gametes, as evidenced by the fact that the Bonferroni correction excluded the statistical significance of this comparison for both Z. mays and T. aestivum. In fact, the lowly expressed female-specific genes had statistically significantly higher bias in codon usage than the highly expressed male-specific genes in Z. mays and no difference was detected between these two groups for T. aestivum (Figure 1). It thus seems that female tissues/gametes maintain substantial selective pressure on codon usage even for genes with reduced expression, in a manner not characteristic of male tissues/gametes.
Selection and gender-bias
The greater bias in codon usage for female-specific than for male-specific genes is currently best explained by selection. This is for the following reason. The male-specific and female-specific gene sets examined here were determined based on calculations that these genes were solely or primarily expressed in one tissue type and not in the other (i.e., the lack of ESTs in the contrasting tissue indicates that the mRNA was very rare or absent). Thus, the observed effects are associated with gender-specific gene expression (gene expression allows opportunity for selection, and is not usually associated with mutational bias [8, 10, 12]). In addition, our data indicate that gene function and protein length variation between male and female tissues/gametes do not explain the observed bias in codon usage between the gender-specific gene sets. Gene expression level differences cannot be implicated because the gender bias was detected for genes expressed at similar levels (high versus high expression and low versus low expression, regardless of protein length, Figure 1,) [see Additional data file 1, Table 5]. Notably, because we examined ESTs for the present analysis, which rarely contain introns, and studied plant species where annotated genomic DNA (containing the introns) is not yet available (NCBI, personal communication), we do not include an analysis of the GC (or AT) content of introns versus third codon positions in our genes, an approach sometimes used to exclude mutational bias [8, 12, 24, 42, 45, 52]. Nevertheless, each of these trends, taken in their entirety, suggest that the bias in codon usage associated with gender-specific expression is best explained by differential selective pressure on genes expressed in male-specific tissues/gametes versus female-specific tissues/gametes.
Male-specific versus flower-specific genes in Brassica napus
Although the differences in bias in codon usage between the microspore and flower in B. napus were generally lower in magnitude than the previous between gender comparisons for Z. mays and T. aestivum (Figure 1, Table 2), the data overall indicates that genes expressed in the two B. napus tissues have specific patterns of codon usage. Specifically, the higher GC content in B. napus microspore-specific genes than in flower-specific genes suggests that the male portion of the flower may be under more selective pressure for codon usage than the flower as a whole. In particular, given that the flower and flower bud EST library should represent genes from the male, female and vegetative (somatic) tissue, one can infer that the combined vegetative and female tissue is under less selective pressure than the microspore. Given that the vegetative tissue usually represents the greatest fraction of the flower tissue (petals, sepals) [53], then it could, in turn, be inferred that the somatic region is likely under reduced selective pressure for codon usage than the microspore (with no inference regarding the pressure in female tissues). The fact that the GC content at third codon positions of the genes specifically expressed in the microspores varied markedly among synonymous codon groups (Table 2), and was found to be positively correlated to gene expression level (Figure 1), further supports the notion that translational selection is enhanced in the microspore component of the flower. Moreover, from examination of Table 2, it is evident that for six of seven comparisons where the differences in RSCU between tissues were greater than 0.1, the microspore had enhanced usage of G or C ending codons (which were notably also the preferred codons in Z. mays and T. aestivum), a trend consistent with greater selective pressure. Notably, analysis of RSCU values relative to protein length suggests that differences between male-specific and flower-specific genes are greatest for genes encoding longer proteins (>400 amino acids, Table 3) in the B. napus tissue comparisons as these genes have substantially greater usage of G or C ending codons in this category ("+" signs in Table 3). This effect could be partially caused by the greater percentage of genes encoding very long proteins in the flower-specific dataset or by greater male-specific effects on codon usage for genes encoding longer proteins. Nonetheless, all three of the gene length categories demonstrate higher GC3 values for male-specific genes (Table 4). One possible interpretation of all of these findings in B. napus, when combined with the data from Z. mays and T. aestivum, is that the translational selection increases in the following order: flower-specific (heterogeneous) genes, male-specific genes, female-specific genes. Because these analyses are in different species, however, further evaluation of this possible relationship will be needed. Altogether, the totality of the findings here suggest that genes expressed in reproductive tissues may be under greater translational level selection than those expressed in vegetative (somatic) tissues, a factor consistent with the key role of reproductive success in fitness.
Protein length and gene expression
The analysis of protein lengths indicates that genes encoding shorter proteins tend to generally have greater bias in codon usage, as indicated by GC3, for the species examined here (Table 4) [see also Additional data file 1; Table 4]. This is consistent with the trends reported in other organisms to date such as Arabidopsis, Drosophila, C. elegans and yeast [e.g., [8, 54]]. We also found marked evidence that the gene expression level in the species studied here is inversely correlated to protein length for each of the six datasets examined (across all genes per dataset), a result consistent with trends reported in humans, Drosophila and Populus tremula [2, 55, 56]. In particular, the Pearson correlation coefficients were: Z. mays male (R = -0.135, P = 2.6 × 10-5), Z. mays female (R = -0.080, P = 0.010), T. aestivum male (R = -0.049, 4.3 × 10-3), T. aestivum female (R = -0.18, P = 1.1 × 10-12), B. napus male (R = -0.149, P = 1.9 × 10-9), and B. napus flower (R = -0.081, P = 4.6 × 10-6). This suggests that the tendency of shorter genes to have greater bias in codon usage (Table 4), at least for the genes examined here, may be due to greater levels of gene expression and an associated selective pressure [54]. Notably, in a complementary analysis to Figure 1, we found that the gender-bias in codon usage was evident among highly and among lowly expressed genes within each the three different protein length categories (short, medium, and long) [see Additional data file 1; Table 5].
Thus, the gender-specific biases in Z. mays and T. aestivum (and flower-specific differences in B. napus) at high and low levels of gene expression observed in Figure 1 cannot be explained by differences in protein lengths. In addition to the inverse association between protein length and bias in codon usage, it is also evident from Table 4 that the three species examined here tend to have different values for GC3, with decreasingly lower values occurring from T. aestivum, to Z. mays and to B. napus. Altogether, it is evident from our entire analysis that the gender-specific effects on codon usage can be detected across a range of protein lengths, gene expression levels, and for different plant species, thereby demonstrating that gender-specific factors play a significant role in genome evolution.
Notable issues
It should be noted that the B. napus material used for the microspore cDNA library was grown at low temperatures (10°C/5°C) that could potentially alter some fraction of the gene expression in the microspore if stress-mediation genes were enhanced. In fact, we found less than 1% of the total genes in B. napus microspores were stress-related (data not shown). Another issue worth considering is the implication of previous findings of gender-specific mutation rates in plants, a trend that was based on the detection of higher evolutionary rates at silent sites, including third codon positions, in male gametes [57]. Higher mutation rates in sperm, however, should act to enhance per generational mutation rates across the entire genome, including those genes expressed in females and in males, and thus, not impact the observed bias in codon usage.
Nonetheless, it should be noted that the differential male/female inheritance of organelles (and the underlying mechanisms; pre- or post-zygotic), could influence whether these genes are expressed in male or female tissues/gametes, and potentially contribute to the codon usage for organellar genes and their substitution rates [57, 58]). It is also notable that the abundance of tRNA of the preferred codons could be greater for female than the male tissues/gametes and contribute to the gender differences in codon usage [3, 4]. This seems relatively unlikely given that the abundance of tRNAs would have to be higher in female-tissues/gametes for every single preferred codon and that such differences have been shown not to explain the gender-specific codon usage in Drosophila [4]. It should also be noted that the reproducibility of the results for GC3 and Fpr observed in this study is consistent with the notion that GC3 content alone could be an effective indicator of codon bias in some species [31].