Transcriptomic and phylogenetic analysis of Culex pipiens quinquefasciatus for three detoxification gene families

Background The genomes of three major mosquito vectors of human diseases, Anopheles gambiae, Aedes aegypti, and Culex pipiens quinquefasciatus, have been previously sequenced. C. p. quinquefasciatus has the largest number of predicted protein-coding genes, which partially results from the expansion of three detoxification gene families: cytochrome P450 monooxygenases (P450), glutathione S-transferases (GST), and carboxyl/cholinesterases (CCE). However, unlike An. gambiae and Ae. aegypti, which have large amounts of gene expression data, C. p. quinquefasciatus has limited transcriptomic resources. Knowledge of complete gene expression information is very important for the exploration of the functions of genes involved in specific biological processes. In the present study, the three detoxification gene families of C. p. quinquefasciatus were analyzed for phylogenetic classification and compared with those of three other dipteran insects. Gene expression during various developmental stages and the differential expression responsible for parathion resistance were profiled using the digital gene expression (DGE) technique. Results A total of 302 detoxification genes were found in C. p. quinquefasciatus, including 71 CCE, 196 P450, and 35 cytosolic GST genes. Compared with three other dipteran species, gene expansion in Culex mainly occurred in the CCE and P450 families, where the genes of α-esterases, juvenile hormone esterases, and CYP325 of the CYP4 subfamily showed the most pronounced expansion on the genome. For the five DGE libraries, 3.5-3.8 million raw tags were generated and mapped to 13314 reference genes. Among 302 detoxification genes, 225 (75%) were detected for expression in at least one DGE library. One fourth of the CCE and P450 genes were detected uniquely in one stage, indicating potential developmentally regulated expression. A total of 1511 genes showed different expression levels between a parathion-resistant and a susceptible strain. Fifteen detoxification genes, including 2 CCEs, 6 GSTs, and 7 P450s, were expressed at higher levels in the resistant strain. Conclusions The results of the present study provide new insights into the functions and evolution of three detoxification gene families in mosquitoes and comprehensive transcriptomic resources for C. p. quinquefasciatus, which will facilitate the elucidation of molecular mechanisms underlying the different biological characteristics of the three major mosquito vectors.


Background
Mosquitoes are the most important vectors of human diseases. The Culex pipiens complex has a broad geographic distribution and is the vector of the West Nile virus and the Wuchereria bancrofti nematode, which causes filariasis. Over the last several decades, chemical insecticides have been intensively applied to control disease transmission. However, such control is undermined seriously by the increased insecticide resistance of vector mosquitoes. Three gene families are implicated in insecticide metabolism in mosquitoes: cytochrome P450 monooxygenases (P450s) are responsible for pyrethroid resistance [1], glutathione S-transferases (GSTs) are responsible for DDT resistance [2], and carboxyl/cholinesterases (CCEs) are responsible for organophosphate and carbamate resistance [3]. Many insect species show rapid expansion and diversification of detoxification genes, as disclosed by their sequenced genomes. The expansion or restriction of detoxification genes likely helps insects adapt to their particular ecological niches and enable them to survive natural and man-made insecticide selection.
The genomes of three major taxonomic mosquitoes, including Anopheles gambiae, Aedes aegypti, and Culex pipiens quinquefasciatus, have been analyzed and released to the public [4][5][6]. Of the three, Ae. aegypti has the largest genome size (1376 Mb), while C. p. quinquefasciatus has the largest number of predicted proteincoding genes (18883), which is 22% larger than that of Ae. aegypti and 52% larger than that of An. gambiae. The extra number of protein-coding genes partially results from the expansion of its three detoxification gene families. However, unlike An. gambiae and Ae. aegypti, which have large amounts of gene expression data, such as various expressed sequence tag libraries and transcriptomes, C. p. quinquefasciatus has limited gene expression resources, with only several salivary gland transcriptomes currently reported [7,8]. Knowledge of complete gene expression information is very important for the exploration of the functions of genes involved in specific biological processes and for the discovery of new candidate genes.
In the present study, the three detoxification gene families of C. p. quinquefasciatus were subjected to phylogenetic analysis and compared with those of three other dipteran insects. The CCE and P450 families were found to undergo large gene expansion. Digital gene expression tag profiling (DGE) technology was used to perform a deep transcriptome analysis of C. p. quinquefasciatus during development and in response to organophosphate insecticide selection. The gene expression profiles obtained provide an invaluable resource for the identification of genes involved in the development and insecticide resistance of C. p. quinquefasciatus.
Results and discussion C. p. quinquefasciatus detoxification gene families When 1e -10 was used in the HMMER searches, 79, 203, and 17 candidate genes of CCEs, P450s, and GSTs were identified in the C. p. quinquefasciatus genome, respectively. After verified by community annotations, only 71 CCEs and 196 P450s were confirmed. The eight false positives for CCEs were lipases or conserved hypothetical proteins and the seven false positives for P450s were groucho protein, 25-hydroxyvitamin D-1 alpha hydroxylase, or conserved hypothetical proteins. When the search stringency was lessened to 2e -2 , 35 cytosolic GSTs were identified and supported by community annotations. Thus, a total of 302 detoxification genes were found in C. p. quinquefasciatus, including 71 CCE, 196 P450, and 35 GST genes, representing the widest gene expansion among the dipteran insect genomes sequenced thus far (Table 1). Gene expansion was mainly observed in the CCE and P450 families compared with three other dipteran species.
Seventy-one CCE sequences were detected in the C. p. quinquefasciatus genome, which was approximately 39% and 29% gene-expanded compared with An. gambiae and Ae. aegypti, respectively, and 2-fold the number of CCEs found in D. melanogaster (Table 1, Additional file 1). The new functional assignment proposed by Oakeshott et al. [9] was used to designate the clades in the CCE phylogeny ( Figure 1). A total of 11 clades, representing dietary/detoxification, hormone/semiochemical processing, and neuro/developmental functions, were obtained. The numbers of C. p. quinquefasciatus CCEs in the three functional classes were 30, 26, and 15, respectively ( Table 1).
The number of CCEs in the neuro/developmental class was relatively conserved among the four dipteran insects. Similar conservation occurs in hymenopteran (Nasonia vitripennis, A. mellifera) and coleopteran (Tribolium castaneum) genomes [9], which reflects the relatively ancient origins of this class, where all members are catalytically inactive except for the acetylcholinesterases.
CCEs in dietary/detoxification and hormone/semiochemical processing classes expanded on the C. p. quinquefasciatus genome compared with the three other dipteran insects. Expansion mainly occurred in αesterases (30 genes in clade B) and juvenile hormone esterases (22 genes in clades F and G). For α-esterases Culex showed rapid radiation in two clusters, which contained six and seven members, respectively, and Aedes displayed an obvious expansion in one cluster with five α-esterases ( Figure 2). The α-esterases are thought to be involved in the development of metabolic resistance to insecticides; some examples include αE7 of Lucilia cuprina [10] and esterase A (CpipJ_CPIJ013918) and esterase B (CpipJ_CPIJ013917) of C. p. quinquefasciatus [3]. Unlike the rapid radiation of other αesterases, the esterase A and esterase B were well conserved: secure 1:1:1 orthologs were found across Culex, Anopheles and Aedes ( Figure 2). Twenty-eight other α-esterases of the species are probably involved in the metabolism of endotoxins or naturally occurring dietary constituents. The αand β-based nomenclatures are applied extensively to esterase isozymes in Drosophila according to their preferential hydrolysis of isomeric artificial substracts, αand β-naphthyl acetate, respectively [11]. The nomenclatures in themselves represent no broad biological distinctions. In the culicines, two esterase genes involved in organophosphate resistance are commonly designated esterase A and esterase B based on their ability to use preferentially αor β-naphthyl acetate in the presence of equal quantities of both substrates [3,12].
The large expansion of juvenile hormone esterases (JHEs) on the C. p. quinquefasciatus genome is interesting: 22 compared with 3 to 13 in three other dipteran genomes and 2 in the hymenopteran and coleopteran genomes [9]. When compared among the three    mosquito species, the expansion in the Culex happened in three clusters, which included five, three, and four JHEs, respectively, and two JHEs were specific for the Culex (Figure 3). Among 22 JHEs, 10 have typical GQSAG nucleophilic elbow motifs around the catalytic site and 6 have varied motifs, such as GH (W/N/Y) SAG. Empirical functional data are required to determine the identity of juvenile hormone esterases.
The secreted β-esterases (clade E) are comparatively conserved among the three mosquito species: from 2 to 4 β-esterase genes were found in mosquito genomes, largely different from the expansion (11 β-esterase genes) found in N. vitripennis genome [9]. Some members of the β-esterases have well-described functions in other insects, such as E4 and FE4 esterases, which confer OP insecticide resistance in Myzus persicae [13], and the antennal Apo1PDE esterase of the silkworm Antheraea polyphemus, which degrades sex pheromones [14]. The functions of the β-esterases in Culex need to be further investigated.
Compared with the P450 genes of D. melanogaster and An. gambiae, those of C. p. quinquefasciatus expanded by 130% and 85%, respectively, and were slightly more than the number of those in Ae. aegypti (Table 1). Expansion was most pronounced in the CYP4 clade, where, among 83 CYP4 genes, 46 belonged to CYP325 ( Figure 4). Some CYP325s were conserved among the three mosquito species, such as CYP325E, CYP325K, and CYP325G; some only expanded in the Culex and Aedes genomes, such as CYP325X, CYP325Y; even Aedes and Anopheles had their specific CYP325s ( Figure 5). However, Culex is not evolved speciesspecific large CYP325 gene expansion. The physiological function of the CYP325 clade in insects remains unclear, except that a CYP325A3 gene was found to be overexpressed in a permethrin-resistant strain of An. gambiae [15]. But this gene does not have clear orthologs in Culex or Aedes ( Figure 5), indicating that the resistance mechanism of CYP325A3 overexpression may be limited to Anopheles. Other large clades of CYP4 genes included CYP4H, CYP4D, CYP4J, and CYP4C ( Figure 4), members of which are involved in DDT and pyrethroid insecticide resistance in mosquitoes [15,16]. The P450 genes were also expanded in Ae. aegypti compared to An. gambiae, but this expansion was most prominent in the CYP9 of CYP3 family: 37 CYP9 genes in Ae. aegypti contrasting to just 9 in An. gambiae [15].
While the total number of CYP3 members (including CYP6 and CYP9) in C. p. quinquefasciatus was similar to that in Ae. aegypti, C. p. quinquefasciatus had more CYP6 members and few CYP9 members than Ae. aegypti [17]. Sixty-three CYP6 genes of C. p. quinquefasciatus were mainly distributed in the CYP6BY, CYP6N, CYP6M, CYP6AG, and CYP6Z groups ( Figure 6A), and 25 CYP9 genes were mainly distributed in the CYP9J and CYP9M groups ( Figure 6B). Members of CYP6 have been implicated in resistance to a broad range of insecticides (e.g., OPs, pyrethroids, DDT, and neonicotinoids) in many insects [15,[18][19][20], while less evidence implicates CYP9s in the detoxification of insecticides. In 11 mitochondrial P450s, 7 genes belonged to CYP12F ( Figure 6C). Members of the CYP12 clade are involved in DDT resistance in An. gambiae [15] and D. melanogaster [21]. Several large clusters of P450 genes were found in the C. p. quinquefasciatus genome, such as a cluster of 13 CYP9 genes on supercontig 278 and a cluster of 12 CYP6 genes on supercontig 869 (Additional file 2).
C. p. quinquefasciatus had 35 cytosolic GST genes belonging to the Delta, Epsilon, Omega, Sigma, and Theta clades ( Table 1). The majority of the GSTs were represented by the Delta and Epsilon clades, which are insect-specific clades and contain the majority of the GSTs associated with detoxification in insecticides [22]. Many members of the Delta and Epsilon clades expanded locally in the C. p. quinquefasciatus genome, such as 12 Delta GSTs in supercontig 36 and 10 Epsilon GSTs in supercontig 1224 (Additional file 3). Delta 1, 6, and 7, and Epsilon 2, 4, and 8 showed 1:1:1 orthologies across the three mosquito genomes while Delta 11 and Epsilon 3 clades expanded on the Culex genome ( Figure 7). Furthermore, Culex had specific Delta and Epsilon clades, which did not have clear orthologs in other two mosquito species (Figure 7). Unlike hymenopteran insects, where the Sigma class of GSTs expands and is thought to play an important role in protection against oxidative stress [9], only one Sigma GST gene was located in the genomes of dipteran insects. The Omega, Theta, and Zeta classes of GSTs are ubiquitously distributed in nature, but no Zeta GST has been identified in C. p. quinquefasciatus.
Why does C. p. quinquefasciatus have such an abundance of detoxification genes compared to other insect species? Several biological characteristics of mosquitoes may provide clues. The aquatic breeding sites of larvae and pupae contain numerous microorganisms, phenolic products of plant degradation, and pesticides. Adults feed on plant nectars and mammalian blood, which contain some harmful substances, such as heme and plant toxins. As viral pathogen vectors, mosquitoes have to deal with the generation of toxic endogenous compounds and reactive oxygen species during the immune response. But these cannot account for the gene expansion in C. p. quinquefasciatus compared to Anopheles and Aedes species. Perhaps its more polluted larval habitat and more diverse geographic range have exerted a greater selective pressure on C. p. quinquefasciatus so as to produce a larger repertoire of detoxification enzymes.  The percentage of bootstrap confidence values greater than 70% (1000 replicates) is shown at the nodes.

DGE library sequencing and mapping to genome
Using the DGE technique, which measures absolute, rather than relative, gene expression levels, the transcriptome was analyzed during the development of C. p. quinquefasciatus. For the five DGE libraries, 3.5 million to 3.8 million raw tags were generated ( Table 2). The ratio of low quality reads was lower than 0.1% in all libraries except in that of the adult SG (Additional file 4). The number of distinct clean tags ranged from 113095 to 156922 ( Table 2). The distribution of the total tags and distinct tags over different tag abundance showed similar patterns among the five libraries, indicating the normality of the DGE data ( Figure 8).
Highly expressed tags with copy numbers larger than 100 dominated in the distribution of the total clean tags, while tags with low expression and copy numbers smaller than 5 occupied the majority of the distinct clean tags (Figure 8). Pearson correlations between development stages ranged from 0.76 to 0.95, indicating uneven transcriptome divergence during mosquito development or the existence of lowly expressed genes not detected, while the Pearson correlation between the two third instar larva libraries of the SG and S-lab strains was relatively high (0.98), reflecting the reproducibility of DGE sequencing (Additional file 5). Among 20306 reference genes in VectorBase, a total of 13314 (65.6%) reference genes were mapped by unambiguous tags combining the five DGE libraries. Sequencing saturation analysis showed that the increase in the identified gene number nearly stopped when the number of reads reached 3 million (Additional file 6).
GO and KEGG pathway classification of the genes expressed in C. p. quinquefasciatus GO and KEGG pathway assignments were performed on the expressed genes to classify their functions and dissect the molecular events behind the expressed genes from the five DGE libraries. Of the 13314 genes, 2391 genes could be categorized into 48 GO function groups, among which binding, catalytic activity, metabolic process, cellular process, and cell part or cell were predominant categories. In contrast, few genes were classified into groups for antioxidant activity, rhythmic process, pigmentation, cell wall organization, and carbon utilization (Figure 9). A total of 1629 genes were mapped to 125 KEGG pathways. The most-represented pathways were metabolism (578 genes), microbial metabolism in diverse environments (101 genes), RNA transport (100 genes), spliceosome (95 genes), and protein processing in endoplasmic reticulum (95 genes).

Life-stage specific detected genes
Comparing the four DGE libraries of the SG strain, 2666 genes were detected in only one library, and their functions and involvement largely diverged (Additional file 7).        Among 302 detoxification genes, 225 (75%) were detected for expression in at least one DGE library (Table 1). About 25% of the genes that were not detected in any library were either not constitutively expressed in the life stages or were possibly untranscribed pseudogenes. Among expressed detoxification genes, 30% of the CCE genes, 25% of the P450 genes, and 72% of the GST genes were expressed in all life stages, indicating that they play a general housekeeping or detoxification function; some detoxification genes were developmentalstage specific (Table 3). Around 28% of the CCE genes were detected in only one stage, and two α-esterases, CpipJ CPIJ018232 and CpipJ CPIJ004752, were expressed at extremely high levels in larvae and pupae, respectively. Two insecticide-resistance-responsible esterases, CpipJ_ CPIJ013917 for encoding esterase B and CpipJ CPIJ013918 for encoding esterase A, were expressed in all life stages, and their expression levels in larvae were higher than in adults. Only two GST genes showed stage-specific expression. Of the P450 genes, around 25% were detected in only one stage and most of them belonged to CYP4. Most strikingly, of 46 CYP325 genes, representing the widest expansion within the P450 family, 27 genes (59%) were detectable for expression and nearly half of the expressed genes (12 CYP325) were developmentally regulated. The larva and adult are feeding stages while the egg and pupae are non-feeding. Hence enzymes expressed specifically or highly in larva or adult are important in digestion and detoxification of dietary component whereas those in egg or pupae are vital for detoxification of metamorphosis byproducts or synthesis of specific hormones. For gene expansion clusters of detoxification genes, the expression profiles were different among the members. For example, among the six members of one expanded α-esterase cluster (Figure 2), CpipJ_CPIJ007825 was detected for expression in pupae and adults while the other five members were not detected in any stage. For the seven members of another expanded α-esterase cluster ( Figure 2), CpipJ_CPIJ016025, CpipJ_CPIJ005694, and CpipJ_CPIJ008749 were not detected for expression; CpipJ_CPIJ000049 and CpipJ_CPIJ000051 were expressed in pupae and adults, CpipJ_CPIJ000050 in larvae and pupae, and CpipJ_CPIJ016026 only in eggs. Similar phenomenon was observed in the three expansion clusters of juvenile hormone esterases ( Figure 3). For the five members of cluster, only CpipJ_CPIJ013027 was detected for expression and in eggs. No expression was found in the three members of cluster. For the four members of cluster, three were detected for expression: CpipJ_ CPIJ016681 and CpipJ_CPIJ016682 in adults, and CpipJ_CPIJ017763 in larvae and adults. The different expression patterns of these duplicated detoxification genes are probably indicative of their subfunctionalization or retrogression as pseudogenes.

Differentially expressed genes between parathion resistant and susceptible larvae
When the third instar DGE library of the parathionresistant strain SG was compared with the same stage in the susceptible strain S-lab, a total of 1511 genes showed different expression levels, among which 619 genes had up-regulated expression levels in the SG strain (Additional file 8). The most prominent GO functions of these up-regulated genes were endopeptidase or serinetype peptidase activity, such as genes encoding trypsin, chymotrypsin, mast cell protease 2, urokinase-type plasminogen activator, and elastase. However, not all of the differentially expressed genes are responsible for parathion resistance because comparison strains were not selected from the same panmictic population such that genetic background differences could be ruled out. A total of 15 detoxification genes were expressed at higher levels in the SG strain, including 2 CCEs, 6 GSTs, and 7 P450s (Table 4). The expression of the known esterase B gene, CpipJ CPIJ013917, increased 16-fold in the resistant strain, while the esterase A gene, CpipJ CPIJ013918, did not show differential expression between the two strains. Three Epsilon GSTs had  prominent involvement in resistance, as previously reported in DDT-and OP-resistant mosquitoes and houseflies [22,23]. Seven P450 genes with elevated expression in the resistant strain belonged to either the CYP9 or CYP6 groups, which are usually implicated in pyrethroid resistance in mosquitoes and other species [18]. P450 monooxygenases are relatively less commonly involved than carboxylesterases in resistance to OP insecticides, although evidence shows that CYP6A1 from Musca domestica and CYP6A2 from Drosophila are capable of metabolizing diazinon [24,25]. Unless direct evidence demonstrates the detoxification or sequestration of insecticide compounds by P450s, the conclusion that the 7 up-regulated P450 genes are involved in OP resistance is too early to draw. Most of the 15 detoxification genes were expressed in all life stages, while 6 were expressed only in certain stages, especially Theta GST (CpipJ CPIJ014051), CYP9J34 (CpipJ CPIJ010546), and CYP6F1 (CpipJ CPIJ010858), which were not found in adults. The different responses of detoxification genes may account for the different resistance levels between the larval and adult stage in some circumstances.

Conclusion
C. p. quinquefasciatus is an important vector that transmits human diseases different from those by An. gambiae and Ae. aegypti. The lack of transcriptomic data available for this species has hampered characterization of the molecular mechanisms underlying the different biological characters of the three major mosquito vectors. The five DGE libraries described in the present study represent a dramatic expansion of the existing transcriptomic sequence available for C. p. quinquefasciatus. This expansion will facilitate the investigation of the fundamental biology of C. p. quinquefasciatus and its pathogenic interactions. In addition, the results of the present study provide new insights into the functions and evolution of the three detoxification gene families of mosquitoes. A larger number of detoxification genes were identified on the genome of C. p. quinquefasciatus compared with three other dipteran insect genomes, representing the widest gene expansion sequenced thus far. Comparative genomic analysis suggested that gene expansion mainly occurs in α-esterases, juvenile hormone esterases, and P450 CYP325. Some detoxification genes were expressed in all developmental stages, while some were developmentally regulated. The expression profiles were different among the members of gene expansion clusters, probably indicative of their subfunctionalization or retrogression as pseudogenes. Fifteen detoxification genes showed the potential to take part in the parathion resistance of Culex, including unexpected P450 genes.

Mosquito strains
Mosquito strains of C. p. quinquefasciatus used included S-lab, which was OP-susceptible and reared at the laboratory without any contact with insecticides for many years [26] and Shengui (SG), a field population collected in Foshan, Guangdong Province, in 2007 and constantly treated with parathion at the laboratory. The parathionresistance of SG was 115-fold that of S-lab before use in the DGE analysis. The mosquitoes were maintained at 26°C ± 1°C and a long-day photoperiod (14 h light/10 h darkness cycle). Fifty egg rafts, forty third instar larvae, forty pupae, and forty adults (twenty females and twenty males) of SG and forty third instar larvae of S-lab were collected and frozen at −80°C for further analysis.

Identification and phylogenetic classification of detoxification genes
Sequences encoding GSTs, P450s, and CCEs were identified from the protein set of the C. p. quinquefasciatus whole genome sequencing database at the Broad Institute (http://www.broadinstitute.org/annotation/genome/ culex_pipiens) using the HMMER program (http://hmmer. janelia.org/) with the protein domains for CCEs (PF00135), GSTs (PF00043 and PF02798), and P450s (PF00067) as described in the Pfam database. A significance value of 1e -10 was used in the searches for CCEs and P450 and 2e -2 for GSTs. Community annotations and VectorBase were referred to verify the searches [27]. Those candidate genes not supported by the community annotations as CCEs, P450 or GST were not accounted. P450s were named by the P450 nomenclature committee (http:// drnelson.uthsc.edu/CytochromeP450.html). Known detoxification genes from An. gambiae, D. melanogaster [28], and Ae. aegypti [17] were used as references for the phylogenetic classification of the detoxification genes from C. p. quinquefasciatus. Protein sequences were aligned with ClustalW2 at EMBL-EBI (http://www.ebi. ac.uk/Tools/msa/clustalw2). Unrooted distance neighborjoining trees showing the phylogeny of detoxification gene families were constructed using the pairwise deletion and p-distance functions of Mega 4.0 software. Bootstrap analysis (1000 replicates) was applied to evaluate the internal support of the tree topology.

Pipeline of DGE
Six micrograms of total RNA from each of the above five mosquito samples were isolated using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. Tag library preparation was performed with an Illumina Gene Expression Sample Prep Kit. The raw data (tag sequences and counts) were deposited in the NCBI Sequence Read Archive (SRA) database under submission number SRA049959.