GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes
© Gilbert et al.; licensee BioMed Central Ltd. 2013
Received: 14 June 2012
Accepted: 5 June 2013
Published: 4 September 2013
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic composition of their side chains. GRASP integrates comparative protein composition data with annotation data from multiple public databases. Currently, GRASP includes information on proteins of 12 sequenced Drosophila (fruit fly) proteomes, which will be expanded to include increasingly diverse organisms over time. In this paper we illustrate the potential of GRASP for testing stoichioproteomic hypotheses by conducting an exploratory investigation into the composition of 12 Drosophila proteomes, testing the prediction that protein atomic content is associated with species ecology and with protein expression levels.
Elements varied predictably along multivariate axes. Species were broadly similar, with the D. willistoni proteome a clear outlier. As expected, individual protein atomic content within proteomes was influenced by protein function and amino acid biochemistry. Evolution in elemental composition across the phylogeny followed less predictable patterns, but was associated with broad ecological variation in diet. Using expression data available for D. melanogaster, we found evidence consistent with selection for efficient usage of elements within the proteome: as expected, nitrogen content was reduced in highly expressed proteins in most tissues, most strongly in the gut, where nutrients are assimilated, and least strongly in the germline.
The patterns identified here using GRASP provide a foundation on which to base future research into the evolution of atomic composition in Drosophila and other taxa.
Understanding the basis of biological diversity requires integration of ecological and evolutionary information. One exciting emerging picture is that ecological variation in the availability of key elements can have evolutionary consequences even at the primary protein sequence level [1–4], a perspective known as “stoichioproteomics” (reviewed in ). Indeed, various studies, both older and newer, have detected selection for efficiency of usage in key limiting elements in amino acid side chains, both in the sequences of individual proteins [1, 6] and across entire proteomes [7–10].
To begin to explore this potentially vast source of variation within and among species, it is necessary to have reliable and comparable sequence datasets for multiple taxa. This problem applies most strikingly in multicellular eukaryotes. Although several studies have explored stoichioproteomics in prokaryotes (e.g., [7, 8, 11–15]) and eukaryotes [2, 9–11], prokaryotic species have more often been the subject of comparative analysis than eukaryotes. Comparative analyses of molecular-scale variations in the elemental compositions of proteins among plant and animal species are currently very scarce (see e.g. [2, 9]). One reason for this may be that, in these taxa, such analyses are more difficult owing to the more complicated relationships between gene, transcript, and protein (for example, through alternative splicing), which blurs the definition of “homology” and makes meaningful comparisons among proteomes more difficult to achieve. In such species, answering even simple questions about atomic composition can quickly become a daunting task that requires merging several large datasets from different research groups using multiple sequence identity codes.
To begin to address this problem, we present GRASP (Genomic Resource Access for Stoichioproteomics, URL: http://www.graspdb.net), a public web resource focused on providing a centralized and standardized resource for analyzing the elemental composition of whole eukaryotic proteomes. GRASP is intended, first and foremost, to encourage and enable researchers to conduct their own comparative stoichioproteomic analyses. Second, it is intended to simplify and greatly facilitate these analyses for eukaryotes, by providing a common, standardized repository of protein-by-protein information with easy ways to search, match, extract, and analyse composition data from groups of homologous proteins and splice variants across multiple species with sequenced genomes. Third, we seek to facilitate testing of biological hypotheses by linking protein data to other publicly available sources of biological information using standard naming conventions. GRASP does not provide new data; rather, the advance GRASP represents is one of convenience and streamlining of analyses that would otherwise be laborious, in a manner analogous to repositories of biological data such as FishBase (http://www.fishbase.org), the Tree of Life (http://www.tolweb.org) and the Global Biodiversity Information Facility (http://www.gbif.org).
In its current form, GRASP includes information on the atomic composition of proteins of all twelve fully-sequenced species of Drosophila: D. ananassae, D. erecta, D. grimshawi, D. melanogaster, D. mojavensis, D. persimilis, D. pseudoobscura, D. sechellia, D. simulans, D. virilis, D. willistoni and D. yakuba. Information on multiple splice variants is currently available for D. melanogaster. In the future, we plan to expand the database to include a diversity of multicellular and unicellular eukaryotes.
Exploring Drosophila stoichioproteomics
Combined with an almost unparalleled understanding of the biology of Drosophila from many decades of intensive research (see  and references therein), these 12 sequenced genomes have already been used to make inferences about species relationships and speciation, patterns of genome organization, e.g. , the evolution and function of gene sequences, e.g., , and rates of evolution, e.g., . However, the potential of this clade for studying variation in atomic composition has yet to be investigated.
To illustrate the potential that GRASP represents for researchers interested in testing biological and ecological hypotheses using stoichioproteomic data, and the kinds of analyses that are facilitated by GRASP, we present here the first exploratory analysis and overview of proteomic variation in atomic composition among the 12 sequenced Drosophila species. We specifically illustrate the potential of this resource by conducting preliminary tests of two stoichioproteomic hypotheses.
Stoichioproteomics derives a number of specific hypotheses from a single core precept: that limitation in an element leads to purifying selection in order to reduce the usage of amino acids needing that element in protein sequences or their expression. Limitation could occur at any of several levels. At the long-term ecological level, limitation would result in predictable changes in protein stoichiometry among species (see ). Alternatively, at the short-term ecological level, limitation would result in predictable changes in protein stoichiometry among or even within individuals (see ). Limitation may operate even at the intracellular level, whereby temporary nutrient limitation within cells due to the demands of protein expression results in predictable changes in composition among proteins with predictable expression profiles such as nutrient assimilation proteins , differentially expressed protein variants , or according to transient expression profiles . We examined two sets of predictions arising from this general elemental limitation hypothesis: first, ecological differences should lead to predictable protein stoichiometry among species, and second, highly-expressed proteins should have sequences conservative in key nutrients, specifically N.
Predictions about associations between genomic and ecological traits and the evolution of protein atomic content
Expected or potential associations with protein stoichiometry
Genome size (130 – 364 Mb); Intron percent (19.6–24.0%) 
Larger proteomes (as indicated by larger genomes) require more intensive translation activity, and so should contain proportionally less of a limiting element (C, N or S), owing to selection for efficiency of element usage. This may be confined to the proteins involved in translation, which are overexpressed when intensive translation is required. Similarly, higher percentages of protein–coding DNA (i.e. lower intron percent) should select for higher efficiency of usage owing to a proportionally greater effect on the phenotype.
Male and female body size (thorax width; males, 0.64–1.78 mm; females, 0.80–1.89 mm); Sexual dimorphism (female thorax – male thorax; 0.00–0.18 mm); Development time (10–27d); Male and female specific growth rate (development time/thorax width) 
Contrasting predictions arise from the growth rate hypothesis . First, smaller organisms have faster specific growth rates than larger organisms and therefore require proportionally more transcription activity. Thus they should require more nutrient conservation in proteins, particularly the proteins of transcription and translation that are overexpressed when increased protein synthesis is required. Conversely, owing to higher rates of protein synthesis, smaller organisms have a lower protein:nucleic acid ratio, the N:P ratio of their tissues is accordingly lower, and they are predicted to be more easily P limited rather than N limited. This predicts that smaller organisms should be under weaker selection to conserve nitrogen in their protein sequences.
Ovariole number (16–43) 
With limited nutritional choices, organisms often prioritize allocation to fertility over lifespan ; this may impose selection for nutrient conservation in key proteins. Therefore evolutionary increases in ovariole number should result in nutrient depletion across proteomes, or differentially in the proteins of oogenesis.
It is advantageous to maintain a stoichiometric balance close to that of one’s food (reviewed in ). Generalist flies are more likely to be able to adjust the nutritional balance of ingested food to their own nutrient demands , whereas specialist flies are more likely to evolve a body stoichiometry that corresponds with that of their resources . Therefore the evolution of feeding specialization may involve evolution of distinctive protein stoichiometry, across proteomes or in the proteins of nutrient assimilation and digestion.
Second, we combined the data in GRASP with a public database of protein expression (FlyAtlas, http://www.flyatlas.org) in different Drosophila tissues to test for a negative association between protein expression and N content. Using tissue-specific expression allowed us to assess the predicted relationship not only in a general context but also in tissues where this relationship would be expected to apply strongly or not at all, respectively. First, the insect midgut is the site of nutrient uptake and assimilation; enzymes that function to assimilate nutrients have evolved to contain less of the element they assimilate , leading to the prediction that we should observe a stronger relationship between expression and N content in the midgut. Conversely, in the testes of D. melanogaster, evidence suggests that protein synthesis is greatly reduced, which should reduce the requirement for N conservation. Protein expression during spermatogenesis in Drosophila occurs in a unique way: transcription occurs only in early meiotic divisions, which peak during the pupal stage. Post-meiosis, there is almost no transcription in Drosophila spermatids; instead, protein synthesis is achieved by retention of mRNA transcripts for relatively long periods of time . Translation also appears to be reduced—12 ribosomal proteins are down-regulated in adult testes while none is up-regulated . In a global expression study, transcription and translation proteins were not among those differentially expressed in testes, unlike in ovaries . Thus, in contrast to other tissues, an adult testis would have no particular requirement for N conservation in its proteins, because proteins are being synthesized at a much lower rate.
Results and discussion
The multivariate analyses we present here incorporate (1) elemental content, following previous authors, e.g. [1, 2, 7, 10], (2) DNA GC content, and (3) several basic properties of amino acid sequences (protein length, proportions of hydrophobic, polar, positive, negative, and aromatic residues). We restricted our analyses to the subset of proteins that have orthologs in all 12 available Drosophila species (n = 4934). Future authors may wish to base more detailed analyses upon individual amino acid contents and raw numbers of constituent elements, or on the composition of proteins lost and gained during the evolution of this clade.
Figure 1 shows pairwise plots of variable loadings for the first eight principal component axes, which collectively explained 89% of the variance. Most of the co-linearity stemmed from inherent properties of protein sequences. Aside from fundamental associations such as those between N and O content and charge density (described by PC1), and between C content and aromaticity (described by PC2), Figure 1b also shows, for example, that S content was negatively associated with protein length. Although this partly reflects the effect of a constant initial methionine residue, PC1 and PC2 loadings did not appreciably change after excluding this initial residue from all proteins (data not shown) so this may stem from the tendency of smaller proteins to be stabilized by disulphide links while longer proteins tend to have salt bridges . Also reflecting previous findings, DNA GC content was correlated negatively with protein C content [14, 15] and also with O content .
Most species showed only small differences on all PC axes (Figure 2). D. willistoni was an outlier in many cases, notably PC2, PC3 and PC7, stemming from its proteome’s relatively exceptionally high O content (median 0.496 atoms/residue) and its genome’s well-documented low GC content (median for our dataset 46.5%; see ). Although D. willistoni is not exceptional among eukaryotes either in its GC or O content, since it falls roughly centrally among eukaryotes plotted in Vieira-Silva & Rocha's Figure 2 (in , p. 1935), it was a clear outlier within the clade studied here.
Overall, protein functional categories differed in elemental content and sequence properties largely in line with expectations from the biochemistry of each protein category (Figure 3). For example, transcription factors and nucleic acid binding proteins had very low values on PC1, indicating high N and O content, high charge density and hydrophilicity; nucleic acid binding proteins in particular had high N content, reflecting the requirement for positive charge associated with binding to negatively charged DNA. In contrast, receptors and transporters had high values of PC1 indicating very low N and O, low charge density, and high hydrophobicity, consistent with the high proportion of hydrophobic groups required to function within a plasma membrane.
Patterns in elemental content across the whole phylogeny
Why we should observe this pattern remains an open question. It seems likely that selection acting on DNA GC content may drive the observed difference; PC2, PC3 and PC4 all contain heavy loadings for GC content, a fundamental property of DNA, and, in PCA analyses, extant standing variation in species tended to fall along a common line roughly parallel with variation in GC content in all cases (Figure 2). While it is beyond the scope of this study to speculate on causal relationships between genomic GC content and protein properties, which are currently unclear (for discussion, see  with respect to O content and  with respect to C content), changes in GC content in D. willistoni have been shown to correlate with changes in amino acid transition rates . If evolutionary changes in GC content indirectly drive evolution in amino acid composition and protein properties such as O content, it is likely that this change may be sufficient to account for observed differences in PC2, PC3 and PC4 between D. willistoni, D. pseudoobscura, D. persimilis and their congeners.
However, evolutionary patterns were sometimes different from standing variation within proteomes. Evolution in N content and % positive charge followed patterns different from those seen in static variation. Evolutionary changes in N and % positive charge were independent of O content and % negative charge both on EPC1 and (to a lesser extent) on EPC3 (Figure 5). In contrast, within proteomes, these two variables were positively related to O content and % negative charge on PC1 (collectively describing charge density) and negatively related on PC3 (collectively describing a positive–negative continuum; see Figure 1).
Testing hypotheses across the phylogeny
Number of significant relationships
Specific development time
Ecological selection pressures evident at the proteomic level have been detected previously using comparative analyses across whole kingdoms (see e.g. [2, 7, 11]); the relatively few substantial findings we report here may also reflect a relatively short divergence period (compared to divergence among kingdoms), or that differences in the ecologies of Drosophila are not substantial or consistent enough to generate the selection pressures we predicted – although major differences in body composition reflect those seen among the flies’ respective substrates , these differences may not ramify into the proteins. Given the scope of the proteomic datasets, our overview-style analysis was also necessarily very broad and coarse-grained. More detailed research into the atomic content of specific proteins or protein groups using GRASP may be better able to reveal effects of nutritional limitation upon protein atomic content among Drosophila species.
Protein expression levels in D. melanogaster
Highly expressed proteins (i.e. proteins that impose substantial nutrient demands upon a cell) should theoretically evolve to be nutrient poor [2, 6] and, conversely, nutrient-rich proteins should be down-regulated in times of low nutrient availability [13, 20]. To test this hypothesis, and to illustrate the ease with which the information in GRASP can be integrated with other publicly available resources, we asked how atomic composition, specifically N content, was related to protein expression (FlyAtlas, http://www.flyatlas.org) across different tissue types in D. melanogaster.
Bragg & Wagner  outlined two hypotheses to account for how nutrient conservation in highly expressed proteins might come about. First, relief of nutrient limitation might arise mainly from changes in expression, with nutrient-rich proteins down-regulated and nutrient-poor proteins up-regulated. This scenario predicts a proteome-wide negative correlation between expression levels and content of the limiting nutrient. Second, specifically up-regulated proteins may have evolved to be nutrient-poor, resulting in a negative expression-nutrient content relationship only in up-regulated proteins [3, 20].
Expression was bimodal in all tissues, the lower distribution corresponding to low- or rarely-expressed genes (see e.g. ). To test among the three alternatives (the two predictions outlined above, plus a null hypothesis of no negative association between nutrient content and expression), we conducted analyses for each tissue separately. Specifically, we conducted piecewise regression, allowing us to separate the low expression and high expression clusters at the most likely point (corresponding to a log2 abundance of 5.5; see Methods).
Linear and piecewise regression statistics for models of N content against expression level (log 2 transcript abundance) in 27 Drosophila melanogaster tissues, in descending order of the estimated slope of the relationship in the high-expression cluster (expression > 5.5)
Linear vs. piecewise
Estimate (< 5.5)
SE (< 5.5)
Estimate (> 5.5)
SE (> 5.5)
Larval Malpighian tubule
Adult Malpighian tubule
Cultured S2 cells
Larval fat body
Whole adult fly
Larval salivary gland
Adult accessory gland
Adult salivary gland
Adult fat body
This indicates that, specifically in the highly expressed proteins of all tissues except the germline, increased expression was associated with conservation of N in protein sequences. In the testes, upregulated proteins were actually higher in N - the only tissue for which this was the case. The most steeply negative expression/N relationships were seen in the midguts of adults and larvae. In these tissues, doubling expression (i.e. increasing by one log2 unit) was associated with approx. 0.01 fewer N atoms per amino acid residue. The next-steepest relationships were also all gut-related tissues (hindgut and malpighian tubules; Table 3).
One clear interpretation of these patterns is that high levels of protein expression place a high demand for N upon somatic cells, creating a selection pressure for conservation of N in the most highly expressed proteins . Thus, our results support the hypothesis that specifically up-regulated proteins have evolved to be nutrient-poor [3, 20] in keeping with the idea that proteins evolve to reflect material costs of their production [1, 4]. Among eukaryotes, this specific expression/N content relationship has so far only been identified in plants [eg., 2, 9] and is weaker or absent in animals, possibly owing to relaxed selection for efficiency of N usage in heterotrophs . Proteins involved in nutrient assimilation show strong evolutionary conservation of the element they assimilate , so we would expect a priori to see the steepest relationships between N content and expression at the sites of N assimilation, such as the gut. Accordingly, midgut tissues, the main site of nutrient uptake, showed the steepest relationships of all tissues – followed by all other gut tissues in both larvae and adults (Table 3). In contrast, sites where protein synthesis is arrested or reduced, such as the testes, are not expected to show such a pattern. In contrast to the testes, the ovary grows during adult life , but we still found a relatively shallow relationship between PC1 and expression in ovaries, suggesting they may also be under reduced selection for N conservation. As a potential hypothesis for future study, conservation of N in eggs may impair offspring performance, constraining egg proteins to be nutritionally expensive. Consistent with this, dietary protein deficiency differentially affects female fertility rather than lifespan in Drosophila. Brain and CNS tissues, while actively growing and differentiating in larvae and adults , also showed comparatively shallow N content/expression relationships (Table 3); we hypothesize that, because the CNS is highly charge-sensitive, the intrinsic correlation between N content and protein charge may reduce the scope for N conservation in nervous tissues. However, the apparently shallow expression/N content relationship in these tissues remains an open question.
Interestingly, Elser et al.  also used D. melanogaster as a reference model organism for heterotrophs, and used it as a baseline in the comparison with autotrophs. They found that N content in Drosophila followed a U-shaped curve that actually increased with expression intensity in the most highly expressed proteins (their Figure 1b). Examination of their figure reveals that this trend is influenced by two outliers (possibly ribosomal proteins, a group with unusually high N [10, 13]); excluding these two outliers, the remainder of the points in their figure agree with our data because the non-outlier data in  follow a weakly negative trend. This result accords with the authors’ main conclusion, because this negative trend is indeed shallower than in the plants they analysed, lending weight to the idea that N conservation is indeed relaxed in animals.
Comparing the extent of N conservation we observed in D. melanogaster with the results obtained by Elser et al. , our results indicate that it is important to consider tissue-specific expression levels. For example, in the tissues with the strongest N conservation, the larval and adult midgut, the most highly expressed proteins were approx. 0.05 N atoms poorer per residue than in the least highly expressed. By contrast, in the testes there was no such pattern. These results suggest that selection for nutrient conservation in proteins may be mediated by tissue-specific expression, a possibility that requires further research.
Of course, it is difficult to be entirely confident that stoichioproteomic patterns are not a result of systematic selection on biochemical properties of amino acids or of underlying DNA rather than elemental content per se. As an alternative hypothesis, the most highly expressed proteins may require a lower charge density to allow unbinding from the machinery of translation at a fast enough rate to maintain high expression, which would explain their lower N and O content, although this requirement would most likely be of much lower importance than requirements of protein function. Future authors may wish to make preliminary steps towards elucidating these two hypotheses by conducting analyses of protein composition and expression while controlling for charge density.
We have provided a mainly descriptive account of broad-scale variation in the atomic content of Drosophila proteins across the 12 fully sequenced Drosophila species, to which GRASP provides ready access, alongside preliminary tests of some core stoichioproteomic hypotheses. Further detailed research using GRASP will provide deeper insights into the evolution of atomic composition within and among species. Subsequent releases of GRASP will be augmented with similar information on other organisms across the phylogeny, as well as with additional information about other characteristics, including known developmental regulators, life span, feeding habits, and other ecological information, resulting in a powerful bioinformatics knowledgebase for the framing and testing of stoichioproteomic hypotheses.
We found that atomic content in Drosophila was at least partially a function of DNA GC content and amino acid biochemistry, and was also predictable based upon relative amounts of other constituent elements. On top of this, however, proteins carried signatures of conservation of limiting nutrients: N content was reduced in the most highly expressed proteins in most somatic tissues, but not in testes where nutrient conservation is unnecessary. However, the predictable patterns in elemental composition that we detected within proteomes were not plainly evident in broad-scale comparisons across species, indicating a potential role for lineage-specific evolutionary changes; this phylogenetic variation can provide a testing ground for future researchers wishing to use GRASP to look into the evolution of atomic composition.
Protein atomic content can be seen a passive emergent property of selection acting on the phenotype via a protein’s structure, but may also be a source of selection pressure in itself, through its effect on organism nutrient demand. Here we have identified patterns in atomic content ranging from associations with basic properties of DNA to evolutionary associations with ecological species differences that may represent signatures of selection for nutrient conservation. We hope that the stoichioproteomic trends we have identified here will provide multiple working hypotheses for future research aiming to investigate these hypotheses in detail using these 12 Drosophila species and beyond. GRASP will provide a convenient springboard for such studies.
GRASP is organized around a central interface whereby users select the species they wish to query and then the category of proteins whose data they wish to extract. A range of data is included on the website to enable direct tests of hypotheses, as well as providing links to outside sources of information. We have added categorical data mapping to the Gene Ontology (http://www.geneontology.org) on protein family, biological process, molecular function, and pathway, derived from FlyBase (http://www.flybase.org), Panther (http://www.pantherdb.org), and Uniprot (http://www.pir.uniprot.org). GRASP also includes the amino acid sequence itself, along with its length, plus the underlying coding DNA sequence and information about its GC content, a property that directly affects the amino acid sequence . The current sequence data in GRASP are derived from FlyBase version FB_2007_3, October 2007. When a gene gives rise to more than one protein product, each protein product is indicated with a different suffix (i.e., PA, PB, etc.). Aggregations grouped by biological process, molecular function, protein pathway and family can be selected and output to the browser or via downloadable spreadsheets and comma-separated value lists for use in other statistical software. In addition, users can create their own aggregations of proteins derived from the selected species, automatically generating a downloadable spreadsheet of aggregated amino acid and elemental counts.
Exploratory analysis of Drosophila proteomes
After downloading the information from GRASP, all data analyses were carried out in R 2.13.0  using various packages as cited below. Unless stated otherwise, only proteins with orthologs in all 12 species were analysed.
First, we used principal component analysis to characterize multivariate relationships between C, O, N and S content, DNA GC content, protein length, and the proportions of hydrophobic, polar, positive, negative and aromatic residues, respectively, in the entire dataset. We then asked whether proteins from the 12 different species, and different functional categories, occupied distinct regions in multivariate space using MANOVA with the first 8 principal components as a multivariate response.
Species- and clade-specific divergence in elemental content
First, we looked at species divergence in the PC axes identified above by reconstructing the ancestral states for each protein across the phylogeny on each axis in the PCA using maximum likelihood reconstruction in the ace() function of the ape package in R . We used this information to calculate the estimated divergence for each protein in each species since its most recent common ancestor with a sibling species. Lineage- and clade-specific evolution of atomic content could therefore be isolated from patterns shared among species.
Evolutionary patterns in elemental content and ecology
We used the method of phylogenetically independent contrasts (PIC, ) to calculate independent contrasts in all considered variables. To look at multivariate evolutionary change we then calculated principal components in these contrasts, following . Evolutionary associations among the variables were assessed using the variable loadings of the principal axes.
Testing hypotheses across the phylogeny
To test ecological and genomic hypotheses relating to stoichioproteomics (Table 1), we asked whether any of the species-level ecological or genomic traits listed in Table 1 on its own was related systematically to evolutionary patterns in atomic composition and protein properties. We used phylogenetic generalized least squares (PGLM), using the CAIC package  to model phylogenetic changes in the principal component axes identified above for “standing variation” against changes in the trait of interest (i.e. for each ecological trait, 4934 analyses each of n = 12), asking whether fitted lines systematically departed from zero. Under a null hypothesis we would expect 1% of 4934, or 49, analyses to be significant at the 0.01 level; we used 200 or approx. 4 times this number as an arbitrary but conservative threshold for significance. Note that PGLM differs from the method of PIC which we used to calculate the EPC axes: where PIC calculates a new dataset of phylogenetically independent contrasts, PGLM instead uses raw species values as the response variable, and incorporates phylogenetic information into the error term of the model. Thus, we performed these analyses on the "standing variation" PC axes (rather than the EPC axes). For ecological variables that were frequently associated with protein composition (intron percent, ovariole number, specific development time and diet breadth) we asked whether associations were consistently positive or negative in particular protein categories using χ2 tests; for each variable, Table 1 outlines hypotheses relating to specific subsets of proteins that might be expected to show elemental conservation in their sequences.
Protein expression in D. melanogaster
Detailed information on protein expression in D. melanogaster has recently become available in the FlyAtlas database (http://www.flyatlas.org). We used FlyAtlas to analyse protein elemental content with respect to protein expression in various tissues of D. melanogaster (see Table 3 for tissues). Nutrient conservation in proteins is expected to appear as a negative relationship between protein nutrient content and protein expression level (see [2, 3, 13]). If N conservation is brought about by wholesale adjustment of expression levels on the basis of N content, we would expect to see such a negative relationship across all proteins. On the other hand, if proteins that are constrained to be highly expressed have evolved to be low in N, we should see this negative relationship only in highly-expressed proteins .
To test between these two hypotheses, we fitted piecewise regression models to the data for each tissue, breaking the bimodal distribution at a point corresponding to a log2 abundance of 5.5 (determined by comparing AIC values of piecewise regressions using different breakpoints; data not shown).
In tissues where N conservation is expected to be weak or non-existent, however, we would expect a negative relationship in neither down- nor up-regulated proteins. Thus, we predicted that the slope of any relationship between expression and N content would be shallower for the testes than for any other tissue.
This work was funded by NSF grant (DBI 0548366) to WFF, JJE and SK and NIH grant (HG002096-12) to SK. The authors would like to thank B. van Emden and R.R. Tyagi for technical assistance with GRASP and J. Bragg and F.S. Gilbert for useful discussions and comments on the manuscript.
- Baudouin-Cornu P, Surdin-Kerjan Y, Marlière P, Thomas D: Molecular evolution of protein atomic composition. Science. 2001, 293 (5528): 297-300. 10.1126/science.1061052.View ArticlePubMedGoogle Scholar
- Elser JJ, Fagan WF, Subramanian S, Kumar S: Signatures of ecological resource availability in the animal and plant proteomes. Mol Biol Evol. 2006, 23 (10): 1946-1951. 10.1093/molbev/msl068.View ArticlePubMedGoogle Scholar
- Bragg JG, Wagner A: Protein carbon content evolves in response to carbon availability and may influence the fate of duplicated genes. Proc R Soc B. 2007, 274: 1063-1070. 10.1098/rspb.2006.0290.PubMed CentralView ArticlePubMedGoogle Scholar
- Bragg JG, Wagner A: Protein material costs: single atoms can make an evolutionary difference. Trends Genet. 2009, 25 (1): 5-8. 10.1016/j.tig.2008.10.007.View ArticlePubMedGoogle Scholar
- Elser JJ, Acquisti C, Kumar S: Stoichiogenomics: the evolutionary ecology of macromolecular elemental composition. Trends Ecol Evol. 2011, 26 (1): 38-44. 10.1016/j.tree.2010.10.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Mazel D, Marliere P: Adaptive eradication of methionine and cysteine from cyanobacterial light-harvesting proteins. Nature. 1989, 341 (6239): 245-248. 10.1038/341245a0.View ArticlePubMedGoogle Scholar
- Bragg JG, Thomas D, Baudouin-Cornu P: Variation among species in proteomic sulphur content is related to environmental conditions. Proc R Soc B. 2006, 273 (1591): 1293-1300. 10.1098/rspb.2005.3441.PubMed CentralView ArticlePubMedGoogle Scholar
- Vieira-Silva S, Rocha EPC: An assessment of the impacts of molecular oxygen on the evolution of proteomes. Mol Biol Evol. 2008, 25 (9): 1931-1942. 10.1093/molbev/msn142.PubMed CentralView ArticlePubMedGoogle Scholar
- Acquisti C, Elser JJ, Kumar S: Ecological nitrogen limitation shapes the DNA composition of plant genomes. Mol Biol Evol. 2009, 26: 953-956. 10.1093/molbev/msp038.PubMed CentralView ArticlePubMedGoogle Scholar
- Acquisti C, Kumar S, Elser JJ: From elements to biological processes: signatures of nitrogen limitation in the elemental composition of the catabolic apparatus. Proc R Soc B. 2009, 276: 2605-2610. 10.1098/rspb.2008.1960.PubMed CentralView ArticlePubMedGoogle Scholar
- Acquisti C, Kleffe J, Collins S: Oxygen content of transmembrane proteins over macroevolutionary time scales. Nature. 2007, 445: 47-52. 10.1038/nature05450.View ArticlePubMedGoogle Scholar
- Zeldovich KB, Berezovsky IN, Shakhnovich EI: Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol. 2007, 3 (1): e5-10.1371/journal.pcbi.0030005. 10.1371/journal.pcbi.0030005PubMed CentralView ArticlePubMedGoogle Scholar
- Gilbert JDJ, Fagan WF: Contrasting mechanisms of proteomic nitrogen thrift in Prochlorococcus. Mol Ecol. 2011, 20: 92-104. 10.1111/j.1365-294X.2010.04914.x.View ArticlePubMedGoogle Scholar
- Bragg JG, Hyder CL: Nitrogen versus carbon use in prokaryotic genomes and proteomes. Proc R Soc B. 2004, 271 (Suppl 5): PC374-PC377.View ArticleGoogle Scholar
- Baudouin-Cornu P, Schuerer K, Marlière P, Thomas D: Intimate evolution of proteins. Proteome atomic content correlates with genome base composition. J Biol Chem. 2004, 279 (7): 5421-5428.View ArticlePubMedGoogle Scholar
- Markow TA, O’Grady PM: Drosophila biology in the genomic age. Genetics. 2007, 177 (3): 1269-1276. 10.1534/genetics.107.074112.PubMed CentralView ArticlePubMedGoogle Scholar
- Drosophila 12 Genomes Consortium: Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007, 450: 203-218. 10.1038/nature06341.View ArticleGoogle Scholar
- Stage DE, Eickbush TH: Sequence variation within the rRNA gene loci of 12 Drosophila species. Genome Res. 2007, 17 (12): 1888-1897. 10.1101/gr.6376807.PubMed CentralView ArticlePubMedGoogle Scholar
- Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill D, Zhang Y, Oliver B, Clark AG: Evolution of protein-coding genes in Drosophila. Trends Genet. 2008, 24 (3): 114-123. 10.1016/j.tig.2007.12.001.View ArticlePubMedGoogle Scholar
- Fauchon M, Lagniel G, Aude J-C, Lombardia L, Soularue P, Petat C, Marguerie G, Sentenac A, Werner M, Labarre J: Sulfur sparing in the yeast proteome in response to sulfur demand. Mol Cell. 2002, 9 (4): 713-723. 10.1016/S1097-2765(02)00500-2.View ArticlePubMedGoogle Scholar
- Elser JJ, Acharya K, Kyle M, Cotner J, Makino W, Markow T, Watts T, Hobbie S, Fagan W, Schade J, Hood J, Sterner RW: Growth rate–stoichiometry couplings in diverse biota. Ecol Lett. 2003, 6 (10): 936-943. 10.1046/j.1461-0248.2003.00518.x.View ArticleGoogle Scholar
- Lee KP, Simpson SJ, Clissold FJ, Brooks R, Ballard JWO, Taylor PW, Soran N, Raubenheimer D: Lifespan and reproduction in Drosophila: new insights from nutritional geometry. Proc Natl Acad Sci USA. 2008, 105 (7): 2498-2503. 10.1073/pnas.0710787105.PubMed CentralView ArticlePubMedGoogle Scholar
- Sterner RW, Elser JJ: Ecological stoichiometry: the biology of elements from molecules to the biosphere. 2002, USA: Princeton University PressGoogle Scholar
- Raubenheimer D, Simpson SJ: Integrative models of nutrient balancing: application to insects and vertebrates. Nutr Res Rev. 1997, 10: 151-179. 10.1079/NRR19970009.View ArticlePubMedGoogle Scholar
- Jaenike J, Markow TA: Comparative elemental stoichiometry of ecologically diverse Drosophila. Funct Ecol. 2003, 17 (1): 115-120. 10.1046/j.1365-2435.2003.00701.x.View ArticleGoogle Scholar
- Chintapalli VR, Wang J, Dow JA: Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nature Genet. 2007, 39 (6): 715-720. 10.1038/ng2049.View ArticlePubMedGoogle Scholar
- White-Cooper H: Studying how flies make sperm—investigating gene function in Drosophila testes. Mol Cell Endocrinol. 2009, 306 (1–2): 66-74.View ArticlePubMedGoogle Scholar
- Mikhaylova L, Nguyen K, Nurminsky DI: Analysis of the Drosophila melanogaster testes transcriptome reveals coordinate regulation of paralogous genes. Genetics. 2008, 179: 305-315. 10.1534/genetics.107.080267.PubMed CentralView ArticlePubMedGoogle Scholar
- Parisi M, Nuttall R, Edwards P, Minor J, Naiman D, Lü J, Doctolero M, Vainer M, Chan C, Malley J, Eastman S, Oliver B: A survey of ovary-, testis-, and soma-biased gene expression in Drosophila melanogaster adults. Genome Biol. 2004, 5: R40-10.1186/gb-2004-5-6-r40.PubMed CentralView ArticlePubMedGoogle Scholar
- Bastolla U, Demetrius L: Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds. Protein Eng Des Sel. 2005, 18 (9): 405-415. 10.1093/protein/gzi045.View ArticlePubMedGoogle Scholar
- Vicario S, Moriyama EN, Powell JR: Codon usage in twelve species of Drosophila. BMC Evol Biol. 2007, 7: 226-10.1186/1471-2148-7-226.PubMed CentralView ArticlePubMedGoogle Scholar
- Albu M, Min XJ, Golding GB, Hickey D: Nucleotide substitution bias within the genus Drosophila affects the pattern of proteome evolution. Genome Biol Evol. 2009, 1: 288-293.PubMed CentralView ArticlePubMedGoogle Scholar
- Clobert J, Garland T, Barbault R: The evolution of demographic tactics in lizards: a test of some hypotheses concerning life history evolution. J Evol Biol. 1998, 11: 329-364.View ArticleGoogle Scholar
- Markow TA, Raphael B, Dobberfuhl D, Breitmeyer CM, Elser JJ, Pfeiler E: Elemental stoichiometry of Drosophila and their hosts. Funct Ecol. 1999, 13: 78-84. 10.1046/j.1365-2435.1999.00285.x.View ArticleGoogle Scholar
- Meiklejohn CD, Presgraves DC: Little evidence for demasculinization of the Drosophila X chromosome among genes expressed in the male germline. Genome Biol Evol. 2012, 10.1093/gbe/evs077Google Scholar
- Cooper KW: Normal spermatogenesis in Drosophila. Biology of Drosophila. Edited by: Demerec M. 1950, New York: Wiley, 1-56.Google Scholar
- Truman JW, Bate M: Spatial and temporal patterns of neurogenesis in the central nervous system of Drosophila melanogaster. Dev Biol. 1988, 125: 145-157. 10.1016/0012-1606(88)90067-X.View ArticlePubMedGoogle Scholar
- R Development Core Team: A language and environment for statistical computing. 2011, Vienna, Austria: R Foundation for Statistical Computing, ISBN 3-900051-07-0 http://www.R-project.org/Google Scholar
- Felsenstein J: Phylogenies and the comparative method. Am Nat. 1985, 125 (1): 1-15. 10.1086/284325.View ArticleGoogle Scholar
- Martins EP, Hansen TF: Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat. 1997, 149 (4): 646-667. 10.1086/286013.View ArticleGoogle Scholar
- Pagel M: Inferring the historical patterns of biological evolution. Nature. 1999, 401 (6756): 877-884. 10.1038/44766.View ArticlePubMedGoogle Scholar
- Paradis E, Claude J, Strimmer K: APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004, 20: 289-290. 10.1093/bioinformatics/btg412.View ArticlePubMedGoogle Scholar
- Freckleton RP: The seven deadly sins of comparative analysis. J Evol Biol. 2009, 22: 1367-1375. 10.1111/j.1420-9101.2009.01757.x.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.