Genomic analysis of carboxyl/cholinesterase genes in the silkworm Bombyx mori

Background Carboxyl/cholinesterases (CCEs) have pivotal roles in dietary detoxification, pheromone or hormone degradation and neurodevelopment. The recent completion of genome projects in various insect species has led to the identification of multiple CCEs with unknown functions. Here, we analyzed the phylogeny, expression and genomic distribution of 69 putative CCEs in the silkworm, Bombyx mori (Lepidoptera: Bombycidae). Results A phylogenetic tree of CCEs in B. mori and other lepidopteran species was constructed. The expression pattern of each B. mori CCE was also investigated by a search of an expressed sequence tag (EST) database, and the relationship between phylogeny and expression was analyzed. A large number of B. mori CCEs were identified from a midgut EST library. CCEs expressed in the midgut formed a cluster in the phylogenetic tree that included not only B. mori genes but also those of other lepidopteran species. The silkworm, and possibly also other lepidopteran species, has a large number of CCEs, and this might be a consequence of the large cluster of midgut CCEs. Investigation of intron-exon organization in B. mori CCEs revealed that their positions and splicing site phases were strongly conserved. Several B. mori CCEs, including juvenile hormone esterase, not only showed clustering in the phylogenetic tree but were also closely located on silkworm chromosomes. We investigated the phylogeny and microsynteny of neuroligins in detail, among many CCEs. Interestingly, we found the evolution of this gene appeared not to be conserved between B. mori and other insect orders. Conclusions We analyzed 69 putative CCEs from B. mori. Comparison of these CCEs with other lepidopteran CCEs indicated that they had conserved expression and function in this insect order. The analyses showed that CCEs were unevenly distributed across the genome of B. mori and suggested that neuroligins may have a distinct evolutionary history from other insect order. It is possible that such an uneven genomic distribution and a unique neuroligin evolution are shared with other lepidopteran insects. Our genomic analysis has provided novel information on the CCEs of the silkworm, which will be of value to understanding the biology, physiology and evolution of insect CCEs.


Background
The carboxyl/cholinesterase (CCE) superfamily is comprised of functionally diverse proteins that hydrolyze carboxylic esters to their component alcohols and acids. CCEs fall into three functional groups: dietary detoxification, hormone and pheromone degradation, and neurodevelopment [1,2].
The dietary detoxification group of CCEs includes esterases that are responsible for the metabolism of a broad range of substrates including xenobiotics in the diet and insecticides. There is evidence that the acquisi-tion of insecticide resistance can arise either by mutations in CCE amino acid sequences that change the activity of the esterase or by amplification of CCE genes in this group [1]. Such phenomena have been observed in many insect species including flies, mosquitoes and aphids [1], and there might be common mechanisms for the acquisition of insecticide resistance in these species based on their CCEs. The hormone and pheromone degrading group includes juvenile hormone esterases (JHEs), pheromone degrading esterases (PDEs) and others. JHEs act to degrade juvenile hormone (JH), a sesquiterpenoid insect hormone that plays important roles in the regulation of a number of physiological processes [3][4][5]. The active functioning of JHE at the final instar larva is essential for normal larval-pupal metamorphosis [6].
PDEs are expressed in the adult male antenna and have a role in the degradation of sex pheromones produced by the female [7,8]. The degradation of the sex pheromone is believed to be essential to enable the male to accurately follow a pheromone trail. The third neurodevelopmental group includes acetylcholinesterases (AChEs), neuroligins, neurotactins, gliotactins and others. AChEs are the only CCEs of this group that are catalytically active and they function in neurotransmission [9]. With the exceptions of Drosophila melanogaster and other higher Diptera, insects have two AChE genes that show a clear 1:1 orthologous relationship between species [1]. Neuroligins are known to be involved in the cell-cell interactions of synapses [10]. The functions of neuroligins are well characterized in the human, mouse and rat [11,12], while recent studies in the honeybee, Apis mellifera, examined the splicing and expression of insect neuroligins [13] or revealed the genetic and functional conservation of neuroligins between vertebrate and invertebrate [14]. Not only neuroligins but also other CCEs in this group are catalytically inactive, as are some CCEs outside of the neurodevelopmental group, such as glutactins and βesterases [1,15].
Recently, genome analyses have proceeded very rapidly in a wide range of species including insects. Insects were found to have multiple CCE genes, many of which have unknown function [1,2,[16][17][18][19]. Determination of the functions of these genes based on sequence and homology information is infeasible. As members of the CCE superfamily have been found in prokaryotes to vertebrates, it is clear that elucidation of the roles of the genes in this family will have a wider biological relevance beyond entomology. With regard to genomic analyses, sequencing of the genome of the silkworm Bombyx mori has now been completed and released to public databases [20]. The silkworm is a useful model for lepidopteran insects, and comparative analyses between lepidopteran species can be made using the silkworm genomic information as a base. Moreover, the large body size of the silkworm has been exploited to establish multiple tissuespecific expressed sequence tag (EST) libraries [21,22]. Integration of genomic analysis and EST expression analysis should enable a more comprehensive understanding of the functions and evolution of many genes.
In this study, we used silkworm genomic information to analyze the phylogeny of lepidopteran CCEs. Based on a recent analysis of CCEs in the silkworm and Helicoverpa armigera, another species belonging to the Lepidoptera [23], we constructed a phylogenetic tree that included several novel lepidopteran CCEs. To gain further insight into the phylogeny of CCEs, we compared the expression patterns of each CCE by a search of an EST database. A large number of B. mori CCEs were identified in a midgut EST library and, interestingly, these were clustered in the phylogenetic tree. CCEs of other lepidopteran species that were positioned close to the cluster of B. mori midgut CCEs were also expressed in the midgut, suggesting that their functions are conserved between species. Additionally, we performed a comparative analysis of the intron-exon structure of B. mori CCE genes and determined their chromosomal locations. These analyses highlighted the unique phylogenetic character of B. mori neuroligins. Overall, our study has produced novel information on the CCEs of the silkworm and other lepidopteran insects, which will be of value to understanding the biology, physiology and evolution of insect CCEs.

B. mori CCEs
A recent study identified 70 putative CCEs in B. mori [23]. Our present study is largely in accordance with that work, including the following minor exceptions. In our analysis, BmCCE001d and 001e was dealt with as a single gene because they have slight differences in amino acid sequence, and a search of KAIKObase identified only one genomic locus corresponding to them [20]; this is also the case for BmCCE024a and 024b. On the other hand, BGIBMGA002185 was included among putative CCEs as our phylogenetic analysis placed this gene in the same cluster as BmCCE030d with a bootstrap value of more than 50% (Figure 1). Using the nomenclature system proposed by Teese et al [23], this CCE was designated BmCCE030e (Figure 1). In total, we focused on 69 B. mori CCEs in this study.

Construction of the phylogenetic tree of lepidopteran CCEs
A phylogenetic tree of lepidopteran CCEs is shown in Figure 1. This tree contains CCEs of B. mori, H. armigera and several other lepidopteran species (see Figure 1); the CCEs of Spodoptra littoralis, Heliothis virescens and Manduca sexta have only recently been identified [24,25]. Comparison of the relationship between B. mori and other lepidopteran CCEs revealed that among 69 B. mori CCEs 21 appeared to have a 1:1 orthologous relationship with CCEs of other lepidopteran species, while others not ( Figure 1).
Although Teese et al [23] proposed 33 major clades for insect CCEs, the phylogenetic tree produced here after inclusion of additional CCEs suggested that several of these clades could be merged. The integration of clades 001 and 002 as clade 001*, clades 003 and 023 as clade 003*, clades 012 and 013 as clade 012*, clades 028 and 029 as clade 028*, and clades 007-011 and 033 as clade 007* was supported with a bootstrap value of greater than 50% ( Figure 1).
shown that a large cluster containing clades 001*, 004-007* and others are Lepidoptera-specific (Additional file 1). These clades contain more than 30 B. mori CCEs (Figure 1, Additional file 1)). CCE clusters specific to nonlepidopteran orders have also been identified [2,18,19]; however, none of these clusters contains as many CCEs as those in the Lepidoptera for B. mori. This suggests that the abundance of CCEs in B. mori (and possibly in other lepidopteran species) is related to the existence of this large Lepidoptera-specific cluster.

EST clone analysis of B. mori CCEs
To further investigate the functions of B. mori CCEs and the relationships between CCE phylogeny and expression profile, we searched a silkworm EST database to identify the tissues in which each CCE was expressed. In total, the search found 354 EST clones with homology to CCE in libraries from larval or pupal tissues; these clones corresponded to 47 CCE genes (Tables 1 and 2). A summary of the expression patterns of the CCEs is given in Figure 1. Recently, we described the developmental expression profiles of several CCE genes [26]. The EST database search here showed good agreement with the results of this earlier analysis of developmental expression patterns. Thus, for example, multiple clones of JHE were found in the fat body library (Figure 1), consistent with the high level of expression of jhe in the fat body of the final instar stage larva [26,27]. Similarly, EST clones of CCE011a/b (CCE-4A/B) were present in a range of tissues ( Figure 1) and were previously shown to be expressed in these tis-sues [26]. Such consistency was also obtained for CCE014a (CCE-5AL/AS) and CCE014b (CCE-5BL/BS) ( Figure 1, [26]), further supporting the validity of our EST expression analysis.
The largest group of EST clones was identified in the larval midgut library: 104 of the total 354 clones came from this library, and they corresponded to 23 CCEs (Figure 1, Table 2). The majority of the midgut CCEs belonged to lepidoptera-specific phylogenetic clades (Figure 1, Additional file 1)), suggesting that the large number of silkworm CCEs (and possibly of other lepidopteran species) might be a consequence of this large number of midgut CCEs. Overall, however, B. mori had slightly fewer midgut CCEs than H. armigera [23]. This might reflect differences in feeding behavior of the two species: B. mori is monophagous, while H. armigera is polyphagous. In addition to the midgut, the analysis of the EST cDNA libraries showed expression of CCEs in the corpora allata, silk gland, ovary, brain, pheromone gland, wing, fat body, hemocyte, and testis ( Table 2). In D. melanogaster species subgroup, it is known that a CCE expressed in the male ejaculatory duct is transferred to the female via the semen during mating and that this CCE stimulates egg laying behavior and inhibits the receptivity to remating in the female [28]. It is possible that B. mori CCEs expressed in the male testis have similar functions although the precise expression pattern might be different. However, in most cases, the functions of CCEs in each tissue are unknown. Verson's gland 2 9 prothoracic gland 2 2

Relationship between CCE expression profile and phylogeny
We sought to determine if there was any relationship between CCE phylogeny and patterns of expression in tissues. Many of the CCEs in clade 001* were confirmed to be expressed in the midgut (Figure 1). Although the CCEs of S. littoralis in this clade were derived from an antennal EST library [24], it might be possible that they are also expressed in the larval midgut. CCEs of subclade 001 are considered to be catalytically active, and one of their possible roles is the detoxification of noxious substances in the diet. By contrast, CCEs of subclade 002 lack the catalytic serine residue and are presumed to be inactive, although they might bind to substrates in the midgut. Expression of catalytically inactive CCEs of clade 021 was also found in the midgut (Figure 1). Many of the B. mori CCEs in clade 006 were expressed in the midgut ( Figure 1). Likewise, CCEs of clade 006 from several other insect species are also expressed in the midgut (Figure 1, [23,25,29]). On the basis of these results, we named clade 006 "larval midgut esterases of unknown function", a designation different from that used by Teese et al [23]. It should be noted that BmCCE006c and 006d are mainly expressed in the silk gland, suggesting that novel CCEs closely related to these silk gland proteins might be identified in other lepidopteran species in the future. As no clone of BmCCE006n was found in the midgut library, and the other CCEs of subclade 006n originated from the antenna, we tentatively excluded this subclade from "larval midgut esterases of unknown function" (Figure 1).
In contrast to the CCEs described above, those in clade 007* were derived from various tissues (Figure 1). Subclades 008 and 010 included CCEs from antenna [24,30,31]. Currently, it is not known whether BmCCE008a and BmCCE010a are expressed in the antenna; nevertheless, it is still possible that subclades 008 and 010 form an antennal CCE cluster. By contrast, BmCCE011a/b are expressed in various organs (see above). Thus, CCEs in this cluster might have a universal function rather than a tissue-specific role. BmCCE011a and 011b have been shown to be alternative splicing products of the same gene and to share a 62 amino acid sequence at their N-termini [26]. Interestingly, SlCXE8 and SlCXE18 also have a common 62 amino acid sequence at their N-termini, indicating that such alternative splicing might be conserved among lepidopteran species.
Among the CCEs of clade 014, BmCCE014a and 014b are also splicing variants of the same gene [26]. BmCCE014a is expressed strongly in the midgut and Malpighian tubules, and this gene showed strong activity for degrading 1-naphthyl acetate (1-NA), a general esterase substrate [26]. Interestingly, the H. armigera homologue, HaCCE014a, is also expressed in the midgut and also has the ability to degrade 1-NA [23], suggesting that not only expression but also function of CCEs in this clade is conserved between species.
Four B. mori CCEs are located in clade 016 ( Figure 1); none were confirmed to be expressed in the midgut. This outcome is consistent with a previous analysis of the expression profile of BmCCE016c (CCE-1) and BmCCE016 d (CCE-2) [26]. Other insect species, however, have homologous CCEs that are expressed in the midgut (Figure 1). Thus, the expression patterns of CCEs in this cluster might not be conserved among species.
CCEs of clades 018, 024 and 026 appear to be expressed ubiquitously (Figure 1), suggesting they might have universal roles, in a similar manner to CCEs of subclade 011. One exception is Antheraea polyphemus PDE of clade 026, which is specifically expressed in the adult male antenna [7]. In contrast, the B. mori homologue, BmCCE026a, is expressed in various tissues ( Figure 1). This may reflect functional differences between these CCEs, possibly related to species differences with respect to usage of sex pheromones. The sex pheromones of A. polyphemus are ester compounds while those of B. mori are a mixture of an alcohol and an aldehyde. However, S. littoralis is also known to use ester compounds as sex pheromones, but SlCXE13, the putative counterpart to A. polyphemus PDE, surprisingly shows ubiquitous expression [24]. One possible explanation is that the A. polyphemus PDE has a specified function for the degradation of the sex pheromone, while SlCXE13 has functions in addition to pheromone degradation.

Intron-exon organization
Next, we investigated the intron-exon organization of B. mori CCEs. In total, 240 introns were identified in the B. mori CCEs. Four CCEs were intronless (Figure 2), the remainder had one to thirteen introns each ( Figure 2). The average intron size was 1372 nucleotides. The longest intron was present in BmCCE027b and comprised 13962 nucleotides located between exons 2 and 3. BmCCE020c, BmCCE020d and BmCCE025a contained the shortest introns of 68 nucleotides. Such intron size variations are similarly observed in B. mori glutathione-S-transferases (GST) [32]. The intron size distribution in B. mori CCEs is shown in Figure 3. The lengths of the introns showed an approximately even distribution.
We mapped the positions of introns in B. mori CCEs by the multiple sequence alignment (Figure 2). There was a clear and strong conservation of intron positions among the CCEs, as was also observed for B. mori GSTs ( Figure  2, [32]). We also classified the splice sites into three phases according to their positions in the codons: phase 0 for a splice site lying between two codons, phase 1 for a splice site lying one base inside a codon in the 3' direc-tion, and phase 2 for a splice site lying two bases inside the codon in the 3' direction. We then examined the distribution of these three splice site phases and found that not only the position of the intron but also the splice site phase was strongly conserved (Figure 2). The most conserved intron was a phase 2 intron at position 1368; this was present in 45 CCEs (Figure 2, arrowhead). A phase 0 intron at position 229 or 230 was also present in 20 CCEs, respectively (Figure 2, arrow). Fifty-seven B. mori CCEs contained one or both of these introns (Figure 2), indicating that these arose at an early stage of CCE evolution. In addition to these two introns, others were also conserved  (Figure 2). Such a clade-specific strong conservation of intron phase and position was also observed for B. mori GSTs [32]. Interestingly, CCEs of clades 024-026 and 030 had a phase 1 intron at positions 792 and 861 ( Figure 2, green brackets), despite their distant locations in the phylogenetic tree ( Figures. 1 and 2). As described below, these two introns were also conserved in the neuroligins of D. melanogaster and A. mellifera. Totally, we found 21 intron positions that are conserved in more than 2 B. mori CCEs.

Chromosomal locations of CCEs in the silkworm
Examination of the chromosomal locations of silkworm CCEs showed these were distributed unevenly across the genome (Figure 4). A more detailed representation of the genomic structure of the clusters on chromosomes 25 and 23 is shown in Figure 5. Six CCEs on chromosome 25 are in the same orientation, while four CCEs on chromosome 23 vary in orientation ( Figures. 4 and 5). This clustered distribution pattern has also been observed for silkworm GSTs [32]. The chromosomal clusters of CCEs of clades 016 and 020 indicate that they arose through a recent duplication. In addition to these clades, other CCEs showed clustering on the silkworm genome and, in many cases, also showed clustering in the phylogenetic tree ( Figures. 1, 2 and 4). A recent study found evidence of conserved microsynteny in Lepidoptera [33,34]. It is possible that a similar phenomenon occurs with regard to CCE chromosomal clusters in other lepidopteran insects.
Genomic clustering of CCEs has also been observed for non-lepidopteran insects such as D. melanogaster and Nasonia vitripennis [1,18]. In D. melanogaster a large CCE cluster has been identified on chromosome 3R [1]; however, neither B. mori nor N. vitripennis have such a large CCE cluster (Figure 4, [18]). On the other hand, there are several differences between the chromosomal locations of CCEs in B. mori and N. vitripennis. CCE clusters in N. vitripennis tend to be localized around centromeric regions [18], whereas in B. mori, the clusters were frequently observed close to the telomeric regions ( Figure  4). Another difference is that while the three functional classes of CCEs are respectively clustered in the chromosomes of N. vitripennis [18], no such functional clustering was observed in B. mori CCEs (Figure 4).
We also analyzed the relationship between the chromosomal location of B. mori CCEs and the tissues in which they were expressed. In some cases, adjacently located CCEs were expressed in the same tissue; for example, CCE006g, 006h and 006j were expressed in the midgut, and CCE006c and 006d were expressed in the silk gland ( Figures. 1 and 4). This might indicate that these CCEs were born via a recent duplication event. However, CCEs located adjacently are not always expressed in the same tissue; such CCEs are probably regulated by independent enhancers/promoters that have distinct activities, despite their close chromosomal locations. Disagreement between chromosomal location and expression pattern has also been reported for silkworm cuticular protein genes [35]. Such CCEs might have distinct functions in the silkworm.

Analysis of phylogeny and microsynteny in neuroligin genes
Finally, we investigated the phylogeny and chromosomal locations of neuroligins, genes in clade 30 in the phylogenetic tree, in more detail (Figure 1). Every insect genome examined to date contains multiple neuroligin-like sequences, and phylogenetic analyses have indicated that these sequences are highly conserved [2,13]. Moreover, it was also reported that vertebrate and invertebrate neuroligins are conserved genetically and functionally [14].
Our analysis of the B. mori genome identified five putative neuroligins, CCE030a-e ( Figures. 1 and 2). We constructed another phylogenetic tree using the sequence data for the five B. mori neuroligins, four D. melanogaster neuroligins, five A. gambiae neuroligins and five A. mellifera neuroligins ( Figure 6A). Each species had at most one CCE in each neuroligin subcluster, while B. mori had two CCEs in the Nlg-4 subcluster ( Figure 6A), the first evidence of a neuroligin duplication event in the Insecta. Although a CCE corresponding to Nlg-1 could not be identified in B. mori, BGIBMGA002170, which showed very weak homology to other insect Nlg-1s, is a candidate homologue. BGIBMGA002170 was not located in the same clade as other neuroligins in the phylogenetic tree   Figures. 1 and 2) are colored purple. CCE019a, 024c and 029a were located in a scaffold whose chromosomal location is unknown and is not shown on this figure. (data not shown); however, our microsynteny analysis supported the interpretation of Nlg-1 homology (see below).
The chromosomal locations of B. mori neuroligins showed unique features compared to other insects (Figure 6B). In A. mellifera, A. gambiae and D. melanogaster, all neuroligins except for Nlg-2 are located on the same chromosome [2]. By contrast, B. mori Nlg-3 and Nlg-4 are located on a chromosome different from the one Nlg-5 is located; Nlg-3 and two Nlg-4 genes are on chromosome 15, while Nlg-5 is on chromosome 4 ( Figures. 4 and  6B). Moreover, Nlg-2, which is singly located on a chromosome in D. melanogaster, A. gambiae and A. mellifera, is on the same chromosome as Nlg-5 in B. mori ( Figure  6B). BGIBMGA002170 is located between Nlg-3 and two Nlg-4 genes (data not shown). In D. melanogaster, A. gambiae and A. mellifera Nlg-1 is located between Nlg-3 and Nlg-4 [2], supporting for the interpretation that BGIBMGA002170 is an Nlg-1 homologue. In light of these results, we propose that the following events have occurred in the evolution of the Lepidoptera: (1) duplication of the Nlg-4 gene, (2) separation of the chromosomal segments containing Nlg-3~5, and (3) fusion of the chromosomal segments containing Nlg-2 and Nlg-5. More genomic information will be necessary to verify this hypothesis. We also compared the intron positions between neuroligins of B. mori, D. melanogaster and A. mellifera, and found that most intron positions, including the common intron in CCEs of clades 024-026 ( Figure 2, green brackets), were conserved.

Conclusions
We analyzed the genomic distribution, phylogeny and EST expression of 69 B. mori CCEs. Many B. mori CCEs were expressed in the midgut, and such midgut expression was conserved with CCEs of other lepidopteran insects located in the same phylogenetic tree. The abundance of CCEs of the silkworm (or Lepidoptera), compared to species of other insect orders, is a possible consequence of the high number of these midgut CCEs. Intron positions and splice site phases were strongly conserved among B. mori CCEs, and they were located unevenly in the genome. Among the CCEs of B. mori, neuroligins show evidence of having evolved uniquely compared to other insects. Our genomic analysis has provided novel information on the CCEs of the silkworm, which will be of value to understanding the biology, physiology and evolution of insect CCEs.

Database analysis
CCE sequences were retrieved from NCBI [36]. EST clones of B. mori CCEs were searched using tBLASTN in NCBI [36], KAIKObase [37], SilkBase [38] and a private library. Introns were identified by comparison of amino acid sequences with DNA sequences, and the canonical GT/AG rule was used to specify the exon-intron junction position [39]. The chromosomal locations of the genes were determined from KAIKObase [37].

Phylogenetic analysis of CCE
ClustalW software was used to perform a multiple sequence alignment prior to the phylogenetic analysis. MEGA 4.0 [40] was used to construct the phylogenetic tree using the Minimum Evolution method with the JTT matrix. To evaluate branch strength in the phylogenetic tree, a bootstrap analysis of 500 replicates was performed.

Authors' contributions
TT carried out all of the analysis described in this paper and wrote the manuscript. TS completed the manuscript. All authors read and approved the final manuscript.