The cytochrome P450 (CYP) gene superfamily in Daphnia pulex
© Baldwin et al. 2009
Received: 13 February 2008
Accepted: 21 April 2009
Published: 21 April 2009
Skip to main content
© Baldwin et al. 2009
Received: 13 February 2008
Accepted: 21 April 2009
Published: 21 April 2009
Cytochrome P450s (CYPs) in animals fall into two categories: those that synthesize or metabolize endogenous molecules and those that interact with exogenous chemicals from the diet or the environment. The latter form a critical component of detoxification systems.
Data mining and manual curation of the Daphnia pulex genome identified 75 functional CYP genes, and three CYP pseudogenes. These CYPs belong to 4 clans, 13 families, and 19 subfamilies. The CYP 2, 3, 4, and mitochondrial clans are the same four clans found in other sequenced protostome genomes. Comparison of the CYPs from D. pulex to the CYPs from insects, vertebrates and sea anemone (Nematostella vectensis) show that the CYP2 clan, and to a lesser degree, the CYP4 clan has expanded in Daphnia pulex, whereas the CYP3 clan has expanded in insects. However, the expansion of the Daphnia CYP2 clan is not as great as the expansion observed in deuterostomes and the nematode C. elegans. Mapping of CYP tandem repeat regions demonstrated the unusual expansion of the CYP370 family of the CYP2 clan. The CYP370s are similar to the CYP15s and CYP303s that occur as solo genes in insects, but the CYP370s constitute ~20% of all the CYP genes in Daphnia pulex. Lastly, our phylogenetic comparisons provide new insights into the potential origins of otherwise mysterious CYPs such as CYP46 and CYP19 (aromatase).
Overall, the cladoceran, D. pulex has a wide range of CYPs with the same clans as insects and nematodes, but with distinct changes in the size and composition of each clan.
The cytochrome P450s (CYPs) have widespread and diverse functions in animals. CYPs in families 1–4 are critical and often inducible components of the phase I detoxification systems of vertebrates, invertebrates, and plants [1–4]. They are also important in lipid metabolism, including fatty acids, retinoids, eicosanoids, steroids, vitamin D and bile acids, and some are regulated in a sexually dimorphic fashion through endocrine mechanisms [5–8]. In addition, the CYP26 family contributes to maintenance of sharp retinoic acid boundaries between rhombomeres in the developing mammalian embryo and in the eye [9–11]. In insects CYPs are essential to the function of sensory organs such as antennae, where they may be involved in odorant clearance .
CYPs are important in the metabolism of and tolerance to anthropogenic chemicals  and plant allelochemicals [14, 15]. Inducibility, constitutive overexpression, and genomic studies investigating quantitative trait loci (QTL) mapping have demonstrated tolerance or resistance to environmental chemicals due to CYPs [13, 16, 17]. Furthermore, it has been hypothesized that pesticide sensitivity in honeybee is associated with reduced CYP numbers . In addition, hormone structure is dependent on CYPs , and therefore small differences in the hormones utilized (i.e. juvenile hormones, methyl farnesoate) may be dependent on the CYPs available or the timing of expression.
Daphnia pulex, commonly known as the waterflea, is the first crustacean to have its genome sequenced. D. pulex has a genome of approximately 200 Mb with 31,000 genes http://wfleabase.org. Daphnid species are globally distributed zooplankton, in the order Branchiopoda, suborder Cladocera, and there are several daphnid species, including Daphnia magna, Daphnia pulicaria, Daphnia pulex, and Ceriodaphnia dubia. Daphnids are commonly studied zooplankton because of their importance to aquatic ecosystems, ability to contend with environmental challenges, amenability to culture, short life-cycle, and parthenogenic reproduction. Furthermore, daphnids are commonly used in several toxicity tests for multiple applications. Studies on the Daphnia pulex genome and specifically the CYPs may provide important insight on how genome and gene expression alterations promote individual and population-level fitness following environmental change.
It has become clear in the genomic era that previous estimates for the number of CYPs in daphnids such as D. magna  was a gross underestimate, and that most metazoans have a large number of CYP genes. For example, humans have 57, mice have 102, Stronglyocentrotus purpuratus (sea urchin) have 120, Anopheles gambiae (mosquito) have 106, and Drosophila melanogaster have 83 functional CYPs (Additional file 1). In continuation of this P450 research, we are working with the Daphnia Genomics Consortium to assemble, annotate, and compare the CYPs in the Daphnia pulex genome with several other species. We have also assigned names to all of the CYPs and CYP pseudogenes based on previously published rules using phylogenetic trees and amino acid sequence identity to determine clan, family and subfamily membership . Clans represent the deepest diverging gene clades in the CYP nomenclature. There are ten P450 clans among animals, but only four are present in the protostomes: CYP2, CYP3, CYP4 and mitochondrial. The general web repository for P450 nomenclature and sequence data is http://drnelson.utmem.edu/cytochromeP450.html. Overall, this work provides a continuation of earlier projects to comprehensively annotate CYPs and determine the putative role of Daphnia pulex CYPs in sensory adaptation, sensitivity to toxicants, adaptation to environmental challenges, and as biomarkers of chemical or endocrine stimulation.
Manual annotation and curation of the CYPs in Daphnia pulex v1.1 draft genome sequence assembly (September, 2006) produced 73 full length P450s and three pseudogenes. In addition, there are two ESTs in the CYP4 clan that were not found in the D. pulex genome of the chosen parthenogenic individual, but partial sequences (Cyp4C32, Cyp4AN1) were observed previously in specific D. pulex ecotypes . Overall, this number is similar to other CYP genomes (Additional file 1), but slightly lower than the typical invertebrate or insect with the exception of honeybee .
Bayesian inference and maximum parsimony yielded nearly identical topologies, so we have presented the tree in which branch lengths are proportional to the amount of change occurring in each lineage (Fig. 2). This tree is also available in a detailed, expandable, readable pdf document as supplementary material (Additional file 2). The Bayesian tree includes posterior probabilities at nodes which reflect the proportion of trees sampled during the search that included each particular node. The sea anemone genome has not been manually curated and therefore the data are subject to potential assembly errors until validated. However, the sea anemone provides an ancient anthozoan class cnidarian, and more importantly a diploblast, to the analysis as the only non-triploblastic species in our phylogenetic tree. The honeybee and fruitfly provide a protostome insect CYP genome for comparison, while the human CYPs provide a deuterostome for comparison to the Branchiopod crustacean, D. pulex.
In general, the tree shows six distinct monophyletic clades, all but one supported by posterior probabilities of 1.00. These include the mitochondrial, CYP2, CYP3, and CYP4 clans, and two deep branches that do not include any arthropod CYPs. The non-arthropod lineages are nearly deuterostome exclusive and would be if not for the two anemone CYPs that show their closest relationships to CYP26. These anemone CYPs found nested within a vertebrate lineage indicate that a CYP26-like ancestor may have existed in cnidarians but was lost in protostomes. Similar inferences concerning CYP loss in protostomes have already been made about CYP51, CYP20 and possibly CYP7. The history of CYP20 in protostomes may be complex since an ortholog has been detected in the annelid leech Haementeria depressa (CN807321). In addition, CYP19 (aromatase) clusters with the CYP2 and mitochondrial clans, and CYP46 and related sea anemone CYPs are part of a sister clade to the CYP4 clan (Additional file 2).
Overall, the tree demonstrates that the four major clans found in insects (mitochondrial, CYP2, 3, 4) encompass all of the CYPs in D. pulex. An Excel table of each of the CYP genes and pseudogenes found in the Daphnia pulex genome, their nucleotide and amino acid sequences, and links to their scaffold position is available as supplementary material (Additional file 3).
The mitochondrial clan of D. pulex contains six members in five families and five subfamilies. Three of the members are highly conserved Halloween genes involved in ecdysone synthesis [19, 23]. Specifically, disembodied (Cyp302A1; dib), shade (Cyp314A1, shd), and shadow (Cyp315A1; sad) are all mitochondrial CYPs involved in the last three steps of 20-hydroxyecdysone (20-HE) synthesis. Cyp314A1 is the CYP required for the conversion of ecdysone to its active form, 20-HE [19, 23]. The other three CYPs are divided into two new families CYP362 (Cyp362A1, Cyp362A2) and Cyp363A1, indicating that these CYPs are not as well conserved and probably have taken on new roles. A tree of the mitochondrial CYP clan is available as supplementary material (Additional file 4).
The other three CYP2-clan families in D. pulex (Cyp18, 364, 370) are divided into four subfamilies and contain 19 genes. Interestingly, a sister-group in the CYP2 clan contains no arthropod CYPs. This group primarily contains anemone CYPs, but also contains CYP17, CYP21, and the CYP1 family members inducible by chlorinated hydrocarbons in vertebrates (Fig. 3; also available as Additional file 5). The Cyp370 family is the largest of the CYP2-clan families, containing 15 members with 13 members in the 370A subfamily and 2 members in the 370B subfamily (Cyp370B1,2). There is also one pseudogene in the Cyp370A subfamily, Cyp370A3P.
The Daphnia CYP370 family is greatly expanded relative to the single gene CYP15 and CYP303 families in insects it most closely resembles. Cyp15A1 is a regio- and stereo-specific epoxidase critical in the formation of juvenile hormone III (JH III) from methyl farnesoate in the corpora allata of the cockroach . However, methyl farnesoate, a juvenile hormone precursor, is considered the major terpenoid hormone in crustaceans ; therefore, a methyl farnesoate epoxidase is unnecessary and it is unlikely that the CYP370A and CYP370B subfamily members specifically perform this function. The role of the Cyp370 family in Daphnia is currently unknown. Cyp18 and Cyp364 are both close relatives of Cyp306 (phantom), suggesting potential involvement in ecdysone synthesis or catabolism (Fig. 3). The Cyp364 family is a new family that contains three genes (Cyp364A1,2,3). Cyp18, which is also found in insects, is induced by 20-HE in Drosophila .
The CYP3 clan consists of numerous CYPs involved in detoxification of xenobiotics and endobiotics [30–33]. Some CYP3 clan members are inducible by hormones such as progesterone  and ecdysone , and are responsible for the metabolism and elimination of steroid hormones in vertebrates [20, 35, 36]. Although the posterior probability at the base of the CYP3 clan is low (0.54), this only reflects the uncertainly of the position of the first two Drosophila lineages at the base of the CYP3 clan. The CYP3 clade is strongly supported when Drosophila is not included in the analysis (1.00). In Daphnia pulex, the CYP3 clan contains 12 genes and one pseudogene, arranged into two new families (Cyp360, Cyp361) and three subfamilies. Eleven of these thirteen genes are in the Cyp360A subfamily leaving just Cyp361A1 and Cyp361B1 outside this subfamily in the Cyp3 clan of D. pulex. The closest relatives of the Cyp360 subfamily in the tree are the Cyp6 and Cyp9 subfamily members of insects involved in endobiotic and xenobiotic metabolism and detoxification. Similarly, the closest relatives of the two Cyp361 subfamily members are the anemone CYP3-like group, and the human CYP3A and CYP5A subfamily members involved in detoxification and thromboxane A2 biosynthesis. In general, the CYP3 clan in insect species has more CYPs than the Daphnia pulex CYP3 clan. Most CYP families in insects have only a few members, but the CYP6AS subfamily has 18 members, 37.5% of the honeybee P450s, and there are 35 members of the CYP3 clan distributed between seven families making up 42% of the D. melanogaster P450s. The CYP3 clan in insects, and in particular the CYP6AS subfamily in honeybee was recruited for major gene expansion as were the CYP360 and CYP370 families in D. pulex, and to a lesser degree the CYP4C subfamily. A tree of the CYP3 clan is available as an additional file (Additional file 6).
The CYP4 clan, which is the sister-group to a clade containing the CYP3 clan plus one of the two "non-arthropod" clans, consists of 38 members all in the same family (Cyp4) and arranged into five subfamilies (Cyp4C, Cyp4AN, Cyp4AP, Cyp4BX, Cyp4BY) with 4–10 members in each subfamily (Additional file 7). There is also a pseudogene in the Cyp4C subfamily. Two members of the Cyp4 family were not observed in the Daphnia pulex genome v1.1 draft genome sequence assembly (September, 2006), but were cloned by degenerative PCR in a previous study in which nine CYP4 members were partially cloned . The two absent CYPs, Cyp4C32 (95% identical to 4C34v1; 89% to 4C34v2 from the Daphnia pulex genome) and Cyp4AN1 (92% to 4AN2v1; 96% identical to CYP4AN2v2 from the Daphnia pulex genome), are available on GenBank (BQ703381 and BQ703379, respectively). The D. pulex genome sequence coverage was 8.7×, therefore our inability to find these two CYPs may be due to the known gaps in the genome assembly. It is also possible that these two genes were deleted from the Daphnia Genomics Consortium's chosen parthenogenic D. pulex or "chosen one" due to strain differences.
The Cyp4 members are considered the least studied of the CYP clans in insects  and are involved in fatty acid metabolism, including inflammatory arachidonic acid metabolites, and xenobiotic metabolism in mammals . Some CYP4 members may be involved in sensory perception in insects as they are found in the antenna . A Cyp4c member is also involved in the biosynthesis of juvenile hormone, and another is inducible by hypertrehalosemic hormone, a key hormone in arthropod carbohydrate metabolism . Furthermore, some Cyp4 members are down-regulated by ecdysteroids , indicating that Cyp4 members may play a key role in sensory and hormonal functions in D. pulex.
Primarily CYP3, but some CYP4 and mitochondrial clan members have been associated with resistance to pesticides [17, 30, 41, 42, 43, 44]. Several Cyp3 clan members, such as Cyp6g1 and Cyp6a5 are associated with DDT or pyrethroid resistance, respectively [41, 44]. Cyp4D10 and other CYP4 members in Drosophila are inducible by plant alkaloids and may be important in plant host interactions . In D. pulex, differential expression of two Cyp4 genes is associated with resistance to tannic acid and leaf litter . Cyp4C32 expression is much higher in ecotype 1 and Cyp4AP1 shows much higher levels of transcription in ecotype 2, which is exposed to high amounts of leaf litter and polyphenols, and in turn resistant to toxic leaf litters . Interestingly, the CYPs that show differential expression based on ecotype and leaf litter exposure, are the two CYPs that are not found within the D. pulex v1.1 genome sequence assembly. The complete sequence of these two CYPs is not available .
The CYP4 clan is also slightly expanded in D. pulex relative to the insects. There are 38 CYP4 members that encompass 49% of the CYPs in the genome. The CYP4 clan varies from 8.6–42% of the CYP genome in sequenced insects . Excluding the honeybee, which only has 4 CYP4 members, the rest of the insect's CYP4 members vary from 30.7–42% of the total CYPs. Relative increases in CYP2 and CYP4 members in D. pulex leaves a relative reduction in the CYP3 clan compared to insects. Only 16.7% (13/78) of the members of the D. pulex CYP genome are CYP3 clan members; whereas 38–61% (28–76) of the insect CYPs are CYP3 clan members.
There are several CYP2-clan members similar in structure to other ecdysone metabolizers (CYP18, CYP364 members). However, the formation of juvenile hormone III from methyl farnesoate by CYP15A1 is unnecessary in crustaceans and D. pulex in turn lacks the CYP15A1 gene. In addition, D. pulex lacks the CYP303 subfamily members with unknown but putative external sensory development function . Nevertheless, the CYP370 family, which is phylogenetically related to the CYP15 and CYP303 families, has expanded dramatically in D. pulex, and this family lacks a specific enzyme with a known function. Based on our current knowledge of CYPs, the expansion of the CYP370 family is probably necessary for responses to environmental stressors such as toxicants, and/or other growth or behavioral stimulators such as plant alkaloid toxins.
The CYP2 clan is highly expanded relative to the insects, but only 9 of the 21 CYP2 (43%) clan genes are members of a tandem repeat region. Initially, we thought this indicated that tandem repeat regions had little to do with the expansion of the CYP2 clan in D. pulex. However, all nine of the tandemly repeated CYP2 clan members are in the CYP370 family. It is interesting that the rest of the CYP2 clan members are not found in tandem repeat regions, and it is tempting to speculate that most D. pulex CYP2 clan members, just as many of the mitochondrial CYPs, have highly specific functions such as ecdysone biosynthesis. Needs for P450s are often met by expansion via tandem duplication leading to gene clusters. This suggests that the CYP370 family expanded while under selective pressure. Which genes expand may be independent of clan or family membership and depend on substrate specificity required to cope with a new xenobiotic stress. The C. elegans genome has almost half of its P450s in the CYP2 clan, yet it only has one mitochondrial clan member, CYP44. Insects have expanded the CYP3 clan into the large CYP6 and CYP9 families and several spinoff families. Deuterostomes have expanded CYP2 extensively, while Trichoplax adhaeren has only one CYP2 clan member (e_gw1.8.275.1|Triad1 at JGI).
Scaffold 4 is especially rich in tandem repeat regions. It contains 26 of the 44 (59%) CYP genes located in tandem repeats, and 19 of these genes are CYP4 family members. Scaffold 4 contains tandem repeats for CYP4AN, 4AP, 4BX, and 4BY subfamilies. Interestingly, a Cyp4BY subfamily member (Cyp4BY5) is located in the middle of the Cyp4AN subfamily tandem repeat region. This is unusual as most of the genes adjacent to each other in a tandem repeat belong to the same subfamily. As the genes diversify a single gene cluster may contain multiple subfamilies as in the CYP2ABFGST and the CYP4ABXZ gene clusters in mammals . An error could have been made in the annotation of this CYP; however, re-examination of Cyp4BY5 gene provided no evidence of this and this CYP firmly fits in the Cyp4BY subfamily based on identity and phylogenetic status (Fig. 2).
The cladoceran, D. pulex has a wide range of CYPs with the same clans as insects, but with distinct changes in the population of each clan. Of note is the expansion of the newly discovered CYP370 family. Elucidation of the function of the different CYP families and subfamilies will no doubt provide important insight into the ability of D. pulex to respond to environmental changes, predator-prey relationships, hormonal changes, plant allelochemicals, and sensory stimuli. Ultimately, the understanding of the evolution of animals will require deciphering the history of the P450s and the Daphnia genome may contribute to unraveling molecular aspects of the ecology and physiology of this commonly used crustacean species.
The Joint Genome Institute (JGI) http://www.jgi.doe.gov/Daphnia/ and wFleaBase http://wfleabase.org generated gene models for protein-coding genes in Daphnia pulex using multiple algorithms. The models were constructed and filtered in order to reduce redundancy based on homology and EST support, which in turn produced the Dappu v1.1 gene builds (July, 2007). The gene models were directly compared to other genomes such as human, mouse, Drosophila, zebrafish (Danio rerio), Xenopus, and bovine (Bos taurus) on the Daphnia genome portal. We searched for cytochrome P450 (CYP) gene models using KEGG and KOG pathways in addition to the Advanced Search Options on the JGI Daphnia pulex portal, and then manually curated each of the CYPs with the help of the gene models and genome comparisons detailed above. Our manual annotations included corrections to several models based on knowledge of intron-exon boundaries in related genes and BLAST searches as described previously , and later assigning gene names based on the homology of Daphnia CYPs to other species using defined nomenclature and naming rules for complete genes and pseudogenes [21, 46].
The different CYP clans and families from Daphnia pulex were compared to CYP genomes from human (Homo sapiens), honeybee (Apis mellifera), fruitfly (Drosophila melanogaster), silkmoth (Bombyx mori), purple sea urchin (Stronglyocentrotus purpuratus), and pufferfish (Fugu rubripes) [22, 46, 47, 48, 49]. In addition, phylogenetic comparisons of the different full length Daphnia pulex CYP genes were performed with full length human (Homo sapiens), fruitfly (Drosophila melanogaster), and honeybee (Apis mellifera), and select starlet sea anemone (Nematostella vectensis) CYPs. Honeybee, fruitfly, and human CYPs have been manually curated, but the anemone CYPs are only available through GenBank, http://www.stellabase.org, or the Nelson cytochrome P450 webpage , and have not been curated or officially named yet. This genome was chosen as an ancient, diploblast for comparison to protostome (honeybee, fruitfly, Daphnia) and deuterostome (human) CYP genomes.
To construct phylogenetic trees, all of the Daphnia, human, fruitfly, honeybee, and anemone CYP amino acid sequences were first aligned using default parameters in ClustalX . Trees were constructed using two methods. First, we used maximum parsimony as implemented in PAUP 4.0b10 , a method that minimizes the number of evolutionary events but does not use an explicit substitution model. The parsimony tree was based on a heuristic search with 10 random addition sequence replications and tree-bisection-reconnection branch swapping, with gaps treated as missing data. A 50% majority-rule tree was computed for all equally parsimonious trees.
Next, we constructed several trees using Bayesian inference, a probabilistic model-based method of phylogeny reconstruction that is similar to maximum likelihood but which has substantially reduced computation time. Bayesian trees were constructed with MrBayes version 3.1.2  on a computing cluster provided by the Computational Biology Service Unit of Cornell University http://cbsuapps.tc.cornell.edu/mrbayes.aspx. We built trees using the "mixed-model" approach in which the Markov chain Monte Carlo sampler explores nine different fixed-rate amino acid substitution models implemented in MrBayes. We used 4 chains with runs of 2.5 million generations with chains sampled every 100 generations and with a burnin of 10,000 trees; the WAG  model was selected as the best fitting substitution model by MrBayes. Due to the difficulty in choosing an outgroup for such a diverse and ancient gene family, we elected to present phylogenies with a midpoint rooting in which the root is placed halfway between the two most divergent sequences. Midpoint rooting will accurately determine the root of a phylogeny provided that rates of substitution do not vary across the tree. To examine this assumption of a molecular "clock", we repeated the Bayesian analyses with the added constraint of constant rates of amino acid substitution and rooted the resulting trees with a midpoint rooting. The "clock" constrained trees yielded the same overall topology and root position as the unconstrained analyses, indicating that our application of a midpoint rooting is justified.
Automated genome annotation software that relies on alignment with gapping, from BLAST and BLAT thru GeneWise, Exonerate and similar tools, may have problems identifying tandem genes and pseudogenes in areas with highly identical exons . Software may skip over one exon and link to its nearly identical downstream gene model in tandem repeat regions. The Daphnia pulex genome appears rich in tandem genes, and CYPs are often found in tandem repeat regions. Tandemgenes, or 'Tandy', software http://eugenes.org/gmod/tandy/ was used to address this problem, and potential CYPs found in tandem repeat areas were provided by Don Gilbert, Biology, Indiana University http://wfleabase.org/genome-summaries/gene-duplicates/tdpages/Cytochrome_P450_15.html. Areas potentially containing tandem repeats were carefully mined and manually curated using the newly generated models, and maps were made of each of the tandem repeat regions.
The sequencing and portions of the analyses were performed at the DOE Joint Genome Institute under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Los Alamos National Laboratory under Contract No. W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consortium (DGC) http://daphnia.cgb.indiana.edu. Additional analyses were performed by wFleaBase, developed at the Genome Informatics Lab of Indiana University with support to Don Gilbert from the National Science Foundation and the National Institutes of Health. Coordination infrastructure for the DGC is provided by The Center for Genomics and Bioinformatics at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. Our work benefits from, and contributes to the Daphnia Genomics Consortium. This is Technical Contribution No. 5608 of the Clemson University Experiment Station, supported by the CSREES/USDA (project number SC-1700342).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.