Post-genomic studies now consider the multiple biological dimensions of genes or groups of genes [10, 24]. The results of such studies are becoming increasingly informative, but the task of conducting the research is also becoming increasingly complex. Clearly, numerous genomic and functional properties of genes are direct determinants or mere correlates of their evolutionary dynamics, and properties may auto-correlate with one another, as shown comprehensively by, e.g. [1, 3, 4]. This insight that the evolution of genes is affected by an array of potentially interacting (or correlated) properties suggests that any study focusing on a particular biological dimension needs to also consider an array of other factors.
In essence, genome biologists face the challenge of determining the relative influences of numerous biological facets of a gene in order to understand its most important features. Simply identifying new previously overlooked features that merit investigation should be considered an achievement of the post genome era; even if these features emerge as weak effectors, they may provide insight into the biology of individual genes, groups of genes, or the genome as a whole. Moreover, quantification of the effects of previously overlooked features is important to gauge the bias these may introduce to studies if ignored.
The list of biological properties that should be considered during studies is not standardized, even though it has long been good practice to consider, for example, X- versus autosomal linkage and recombination rates. Other properties also commonly considered include gene duplication and sex-specific expression. The recently published compilation of properties affecting the evolution of genes in Drosophila provides guidance to such multidimensional studies . However, the absence of a discussion on gene constellations prompted us to examine if this property deserves consideration also, and if the study of gene constellations would add to the understanding of the complex biology of genes, or adversely, if studies that ignore this property would show bias.
Relative abundances of gene constellations
Our classification scheme, which considered transcriptional territories but minimized their spatial extent, suggests that the stereotypical single gene architecture should be considered as the exception, rather than the rule, in the Drosophila melanogaster genome. Our reference to the rarity of the stereotypical gene is conservative because if higher distances were applied to account for transcriptional territories, their number would be reduced further (Figure 2B). However, as stated, our reference to the rarity of the solitary gene depends on the acceptance of transcriptional territories as a biological reality . Thus, our statement needs to be interpreted in light of their relevance to any particular study. If transcriptional territories were ignored as a genomic feature, or deemed irrelevant in a particular study context, the number of solitary genes would be as high as ~50% (Figure 2B).
Our discovery of the rarity of the stereotypical solitary gene (Tables 1 and 2) might be of relevance to the design of molecular evolution and population genetic studies and to the possibility that previous studies ignoring this genomic property suffered from bias. Conceivably, researchers would pick solitary genes for analysis, because intuitively, overlapping genes would seem a poor choice. Chromatin-clustering genes would also be less favored in an attempt to avoid the effect of correlated expression patterns of co-clustered genes [12, 13] or to space genes along the chromosome to avoid correlations in recombination rates . Such a study design would likely end up with a collection of stereotypical solitary genes, which, according to our results, can differ from other genes e.g. in terms of codon usage bias and Ka/Ks. Thus, such a collection of stereotypical genes may not be representative of the overall genome.
We feel that the classification and resulting enumeration of 5PP, EE, and 5PI-EI genes are unlikely to be contentious, except that the distance between promoter/enhancer overlapping genes may be varied (reduced to <1 kb). Chromatin clustering genes COS and CSS are defined from previous results describing transcriptional territories, but this is also subject to other uncertainties. These uncertainties refer to the relevance of such transcriptional territories to any particular study context. Moreover, the number of genes that co-cluster in transcriptional territories varies as a function of the distances applied to classify them.
A second concern regards the distinction of 5PP and EE genes. The 5-prime region of a gene may overlap with the 5-prime region of another gene located in its 5' end on the opposite strand (i.e., be classified as Group 5PP, Figure 1B, top), but may also overlap with the coding region of another gene located in its coding region (i.e., be classified as Group EE, Figure 1B, middle). This conflict in classification results in an overestimate of the number of 5PP genes and an underestimate of EE genes, and in part, might explain the similarity of 5PP and EE genes during analyses. Overall, by using our priority scheme, the risk of overestimating gene numbers in each group decreases in the order 5PP, EE, 5PI-EI, COS and CSS.
With these caveats in mind, we suggest that any given random sample of genes likely contains mostly genes that deviate from the stereotypical gene architecture, thereby potentially affecting studies on codon usage bias and substitution rates if the data used are enriched or devoid of genes belonging to particular constellations. For example, the enrichment of datasets with overlapping genes of type 5PP and EE would bias codon usage in an upward direction, whereas substitution rate estimates (Ka/Ks) would be biased downwardly. Thus, even though schemes used to classify genes may be varied, we suggest that our classification provided a reasonable framework to illustrate the fact that gene constellations can affect the functional and evolutionary properties of genes, and thus, attention should be paid to this property.
Quantification of the constellation effect
Studies continue to reveal genomic and functional properties of genes as correlates of their evolutionary dynamics . Given the increasing number of potentially important biological facets of genes, there is a need to quantify their relative effects on the evolutionary dynamics of genes to enable the identification of those most relevant to genome evolution and those most confounding to evolutionary analyses if ignored .
From the results of our multivariate analyses, which were not exhaustive, we deduce that the known effect of recombination rates on codon usage bias was about five times (0.041/0.008; c.f. Table 3) more pronounced than the effect of gene constellation (in terms of the statistical effect of coefficient, R). In addition, during multivariate analysis, the effects of X- versus autosomal linkage and gene function on codon usage bias emerged as greater than one order of magnitude more pronounced than the effect of gene constellation (Table 3). Similarly, in examining Ka/Ks values, we estimated that the expected importance of gene function during multivariate analysis is at least ten times more relevant than the effect of gene constellation (Table 3). We expect that these contrasts between the relative effects of genomic properties that are broadly embraced as important to the evolution of genes and the new property 'gene constellation' examined by us are reasonably informative. This is because these analyses were based on large numbers of genes representing a range of recombination rates (Figure 5, 6, 7, 8).
When analyzing genetic polymorphism data the effect of gene constellation that was observed was absent or subtle (Tables 2, 3). This could be due to limited data availability and/or because the effect truly is weak or nonexistent. Analyses of genetic polymorphism data require immense care and power to distinguish the generally subtle effects of, for example, recombination and demographics. Thus, whether future polymorphisms analyses that are based on more comprehensive samples of genes would be able to uncover subtle effects of gene constellation remains to be seen in our view, and we would expect these to be at least 1-2 orders of magnitudes smaller than the effect of recombination rates.
Thus, despite the fact that the list of potentially interesting evolutionary properties examined was not exhaustive, and the fact that our study should be considered a first-pass analysis of this feature, we showed that gene constellation might factor into some of the functional properties of genes examined (e.g. correlation of expression). However, the effects on evolutionary properties of genes we were able to detect generally were as weak (e.g. codon usage bias, Ka/Ks) or much weaker as we had expected (polymorphism data). We did not correct for multiple testing because our intention was to identify those properties that might be most relevant in this study context. We therefore placed more emphasis on the relative quantification of the effects than on formal significance.
Implications for hypothesis testing
We examined whether gene constellation might be a genomic property that should be considered more routinely during molecular evolutionary and population genetic studies. Results of multivariate analyses were consistent with a significant influence of gene constellation on a subset of the evolutionary properties studied. However, the effect was weak when expressed in terms of the relative contribution of model features during multivariate model fitting (Table 3). Thus, the practical relevance of ignoring the confounding effects of gene constellation is questionable.
We showed that two data sets were biased in their representation of gene constellations (Table 4). The bias observed is expected to result in a type I error, in that the compositional bias would result in inflated Ka/Ks. This is of concern, because this bias is in the direction of the alternative hypothesis, which in this context posits that male reproductive genes tend to evolve at accelerated rates [2, 33].
Results of multivariate analysis showed that gene constellation was important during the study of male reproductive genes. However, the effect did not confound the main conclusion of the study, as the biological function of a gene as a male accessory gland protein was two orders of magnitude more important (in terms of the statistical effect) compared to gene constellation. The authors' examination of Ka/Ks values suggested that such genes evolved rapidly (c.f. ), however, unbalanced sampling of gene constellation groups appears to have biased the estimation of Ka/Ks in an upward direction, as predicted from the positive effect of the model feature gene constellation (Table 5). Thus, while the conclusion that male reproductive genes evolve at Ka/Ks above the genomic average remains valid, we would predict that the rates reported for Acp genes are somewhat inflated.