Expressed sequence tags from larval gut of the European corn borer (Ostrinia nubilalis): Exploring candidate genes potentially involved in Bacillus thuringiensis toxicity and resistance

Background Lepidoptera represents more than 160,000 insect species which include some of the most devastating pests of crops, forests, and stored products. However, the genomic information on lepidopteran insects is very limited. Only a few studies have focused on developing expressed sequence tag (EST) libraries from the guts of lepidopteran larvae. Knowledge of the genes that are expressed in the insect gut are crucial for understanding basic physiology of food digestion, their interactions with Bacillus thuringiensis (Bt) toxins, and for discovering new targets for novel toxins for use in pest management. This study analyzed the ESTs generated from the larval gut of the European corn borer (ECB, Ostrinia nubilalis), one of the most destructive pests of corn in North America and the western world. Our goals were to establish an ECB larval gut-specific EST database as a genomic resource for future research and to explore candidate genes potentially involved in insect-Bt interactions and Bt resistance in ECB. Results We constructed two cDNA libraries from the guts of the fifth-instar larvae of ECB and sequenced a total of 15,000 ESTs from these libraries. A total of 12,519 ESTs (83.4%) appeared to be high quality with an average length of 656 bp. These ESTs represented 2,895 unique sequences, including 1,738 singletons and 1,157 contigs. Among the unique sequences, 62.7% encoded putative proteins that shared significant sequence similarities (E-value ≤ 10-3)with the sequences available in GenBank. Our EST analysis revealed 52 candidate genes that potentially have roles in Bt toxicity and resistance. These genes encode 18 trypsin-like proteases, 18 chymotrypsin-like proteases, 13 aminopeptidases, 2 alkaline phosphatases and 1 cadherin-like protein. Comparisons of expression profiles of 41 selected candidate genes between Cry1Ab-susceptible and resistant strains of ECB by RT-PCR showed apparently decreased expressions in 2 trypsin-like and 2 chymotrypsin-like protease genes, and 1 aminopeptidase genes in the resistant strain as compared with the susceptible strain. In contrast, the expression of 3 trypsin- like and 3 chymotrypsin-like protease genes, 2 aminopeptidase genes, and 2 alkaline phosphatase genes were increased in the resistant strain. Such differential expressions of the candidate genes may suggest their involvement in Cry1Ab resistance. Indeed, certain trypsin-like and chymotrypsin-like proteases have previously been found to activate or degrade Bt protoxins and toxins, whereas several aminopeptidases, cadherin-like proteins and alkaline phosphatases have been demonstrated to serve as Bt receptor proteins in other insect species. Conclusion We developed a relatively large EST database consisting of 12,519 high-quality sequences from a total of 15,000 cDNAs from the larval gut of ECB. To our knowledge, this database represents the largest gut-specific EST database from a lepidopteran pest. Our work provides a foundation for future research to develop an ECB gut-specific DNA microarray which can be used to analyze the global changes of gene expression in response to Bt protoxins/toxins and the genetic difference(s) between Bt- resistant and susceptible strains. Furthermore, we identified 52 candidate genes that may potentially be involved in Bt toxicity and resistance. Differential expressions of 15 out of the 41 selected candidate genes examined by RT-PCR, including 5 genes with apparently decreased expression and 10 with increased expression in Cry1Ab-resistant strain, may help us conclusively identify the candidate genes involved in Bt resistance and provide us with new insights into the mechanism of Cry1Ab resistance in ECB.

It has been long recognized that the insect gut is an important target for developing new strategies for insect pest management. Until now, however, only a few studies have focused on the development of gut-specific EST libraries of lepidopterans as a tool to identify candidate genes involved in the toxicity of insecticides and the development of insecticide resistance. Gut-specific EST libraries were reported for light brown apple moth (Epiphyas postvittana) (6,416 ESTs) [24], bertha armyworm (Mamestra configurata) (30 serine protease-related sequences) [25], and European corn borer (ECB, Ostrinia nubilalis) (1,745 ESTs) [26].
ECB is one of the most destructive pests of corn and can cause as much as $1 billion of economic loss annually in the United States alone [27,28]. ECB also represents a complex of stalk borers, such as the southwestern corn borer (Diatraea grandiosella) and the sugarcane borer (Diatraea saccharalis). These stalk borers share similar ecosystem and create similar damage to corn plants. Although ECB has been successfully managed using transgenic Bt corn hybrids (plants that express insecticidal toxins of Bacillus thuringiensis or Bt), there are increasing concerns about the potential development of Bt resistance in ECB because of the widespread use of Bt corn [28,29]. Indeed, several ECB colonies have developed resistance to Bt toxins under laboratory selection conditions [30,31].
The main target for Bt toxins is the insect midgut, where Bt protoxins are activated by gut proteases to produce acti-vated Bt toxins. The activated toxins then bind to specific receptor(s) to confer toxicity [32]. This means that insect resistance to Bt toxins could be conferred by proteasemediated and receptor-mediated mechanisms [33][34][35][36][37]. Because Bt toxins and insect gut interactions are determined by many gene products in the insect gut, including many proteins/enzymes involved in Bt protoxin activation, toxin binding to receptors and toxin degradation, any change in these systems has the potential to affect a particular Bt's specificity and efficacy, and could lead to Bt resistance in insects.
Our goals are to develop a gut-specific EST database from ECB larvae and explore candidate genes that are potentially involved in insect-Bt interactions and Bt resistance. In this paper, we report the analysis and annotations of 15,000 ESTs derived from the gut of ECB larvae. We discuss the putative identities of the ESTs, their potential biological and molecular functions, and present comparative analyses of our ESTs with sequences from other insects. This work provides the opportunity for developing an ECB gut-specific microarray that can be used to study insect-Bt interactions and genetic basis of Bt resistance in ECB. Furthermore, we revealed 52 candidate genes that could be involved in Bt toxicity and resistance. Among the 41 selected candidate genes examined by RT-PCR, we found 5 genes with apparently decreased expressions and 10 with increased expressions in Cry1Ab-resistant strain of ECB as compared with the susceptible strain of ECB. Differential expressions of these genes in a Cry1Ab-resistant strain may suggest possible involvement of these genes in Cry1Ab resistance, and therefore provides us with new insights into the mechanism of Cry1Ab resistance in ECB. This study may serve as a model for studying Bt resistance mechanisms and for developing bio-pesticides for all closely related corn stalk borers.

Development and analysis of the ECB gut ESTs
We first used pPCR-XL-TOPO plasmid vector to prepare a cDNA library using total RNA purified from the whole guts of fifth-instar larvae of ECB. After we sequenced a total of 1,152 cDNA clones, we found that the cDNA inserts in the vector were not sufficient long (average length: 441 bp). Therefore, we used lambda Uni-ZAP RX vector to prepare a second cDNA library using mRNA purified from the guts of fifth-instar larvae of ECB. This library provided us with much longer cDNA inserts (average length: 674 bp). Because of this significantly improved quality of the ESTs generated from the lambda library, we used the lambda library for our further sequencing of ESTs. Among the 15,000 random cDNA clones sequenced, only <8% were from the plasmid library whereas >92% were from the lambda library (Table 1).
Our analysis of the 15,000 sequences resulted in 13,066 readable sequences (i.e., 87.1% success rate). These sequences were first trimmed for removal of vector sequences and then were subjected to filtration to exclude the sequences of <100 bp. Further analysis, using Repeat-Masker and Organelle Masker programs [38], removed an additional 547 sequences. Thus, the total number of high quality sequences obtained was 12,519 (83.4%) with an average length of 656 bp (Table 1). These high quality sequences have been deposited in the EST database (dbEST) with GenBank accession numbers from GH987145 to GH999663 at the National Center for Biotechnology Information (NCBI). Redundancy and assembly analyses of the high quality sequences using Sequencher software (Gene Codes Corp., Ann Arbor, MI, USA) resulted in 2,895 unique ESTs, including 1,157 contiguous sequences (contigs) that consist of 2 or more sequences, and 1,738 singletons that represent single sequences. The majority of the contigs were assembled from 10 or fewer ESTs ( Figure 1A). On average, however, each contig was assembled from 10.1 sequences due to a few highly redundant ESTs. Putative identities of the unique sequences were determined by searching the nonredundant database in GenBank using BLASTx. Among the 2,895 unique sequences, 1,816 (62.7%) showed significant matches at E-values of ≤ 10 -3 , whereas the remain- a The poor quality sequences were discarded and were not included in the analysis. b The numbers of contigs and singletons were based on the analysis of all the ESTs sequenced from the two libraries.

Transcript abundance
The abundance of transcripts for a particular gene of an organism can be estimated from the corresponding EST abundance in a cDNA library [39]. The most abundant ESTs in our cDNA libraries were those encoding trypsinlike proteases and chymotrypsin-like proteases (   To identify the secretory proteins, putative protein sequences were examined to identify potential secretion signal peptide using SignalP software [43]. A total of 439 (15.2%) putative proteins were predicted to contain signal peptides ( Figure 2B). Among the putative secretory proteins, 298 sequences (67.9%) had matches with known proteins in the NR protein database, whereas 141 putative secretory proteins (32.1%) were unique, sharing no significant sequence similarity with any known protein. This information is valuable since secretory proteins Open reading frame (ORF), secretory protein, and BLASTx results

Comparative analyses of ECB gut ESTs
The development of EST databases has been recognized as a rapid method of sampling an organism's transcriptome and is complementary to a whole genome-sequencing project [46]. Indeed, a large number of ESTs have been generated from other model organisms. The 2,895 contigs and singletons obtained from the larval gut of ECB were compared with the sequences from other organisms. The first hits (highest score) of the sequences in the NR database were taken into account to determine the most similar organism. The largest number of first hit sequences (390; 13.5%) came up with B. mori (Figure 3). This can be explained by the fact that the genome of B. mori has been sequenced and partially annotated, and that both ECB and B. mori are lepidopterans. The second largest number of first hit sequences (290; 10.0%) was with T. castaneum, followed by Ae. aegypti (109; 3.8%), Culex pipiens (91; 3.1%), and A. gambiae (81; 3.8%). Only 2.5% of the sequences (72) were found to be most similar to predicted protein sequences from O. nubilalis. This is simply due to the very small number of sequences currently available in NCBI database from ECB.
In order to compare our ECB gut ESTs with the 1,745 ECB ESTs that are already available in NCBI database, we per-formed BLASTN searches. Among our 2,895 contigs and singletons, 1,279 (44.2%) had significant matches at a cutoff E-value of ≤ 10 -3 whereas 1,616 (55.8%) did not show any significant matches in NCBI database using BLASTN search. We compared our ECB ESTs with the ECB ESTs available in NCBI dbEST database. We found 475 sequences (16.4%) that had significant matches with Evalues less than E-150 ( Figure 4A) Figure 4B).

Gene ontology
Blast2GO software was used to obtain the gene ontology (GO) terms for the unique sequences by comparing them through the Gene Ontology Consortium [47]. Among the 2,895 contigs and singletons, 1,815 showed blast hits at Evalue ≤ 10 -3 and 1,119 ESTs of the 1,815 were mapped. A total of 120 mapped ESTs showed both the GO terms and Enzyme Commission (EC) numbers. Figure 5 shows the EST functional categories, where the ECB unique ESTs were assigned to putative biological processes, molecular functions, and cellular components. Within the biological process category, 24.0% belong to cellular processes, followed by 17.0% metabolic processes, 11.0% developmental processes, 11.0% multi-cellular processes, and 8.0% each for biological regulation and localization. In the molecular function category, the maximum GO terms (40.0%) are included in catalytic activity, followed by binding (31.0%), transporter activity (10.0%), and 5.0% each for enzyme regulation activity and structural molecular activity (9.0%). In cellular components category, cell part, cell, and organelle had 27.0%, 24.0%, and 18.0% of the GO terms, respectively. They were followed by organelle part (13.0%), macromolecular complex (11.0%), envelope (4.0%), and membrane-enclosed lumen (3.0%). Figure 3 Similarity of ECB gut-specific ESTs with other insects. The first hit sequence (highest score) was used to determine the most similar organism.

Identification of ESTs potentially relevant to the Bt toxicity and resistance
The mode of Bt action in insects includes the ingestion of Bt protoxins, solubilization of Bt protoxins in insect gut, proteolytic activation of protoxins, binding of toxins to Bt receptors, membrane integration, pore formation, cell lysis, and insect death [48]. According to this mode of action, a target insect could potentially develop resistance to Bt protoxins or toxins via one or more changes in the Bt-receptor interaction pathway. Indeed, the two most commonly identified Bt resistance mechanisms are protease-mediated and receptor-mediated resistance [49]. Our analysis of ESTs derived from the larval gut of ECB revealed a number of genes that are potentially involved in Bt toxicity and resistance (Table 3). Specifically, we identified 18 ESTs putatively encoding trypsin-like proteases and 18 ESTs putatively encoding chymotrypsin-like proteases with E-value ranges from 2e-26 to 3e-137 and Evalue 3e-27 to 3e-149, respectively. Changes in the proteolytic activity of digestive enzymes can alter the toxicity of Bt protoxins or toxins through effects on crystal solubilization and/or activation of protoxins, as well as degradation of activated toxin [33,[50][51][52][53][54][55][56]. A previous study from our lab has shown that Bt resistance in a Dipel-resistant strain of ECB was primarily associated with reduced trypsin-like protease activity [35,40]. These trypsin-like proteases were also revealed in our EST analysis. Thus, our analysis of the ESTs generated from the guts of ECB larvae revealed many more candidate genes that deserve further analysis for their roles in Bt toxicity and resistance in ECB.
Our EST analysis also revealed 13 ESTs putatively encoding aminopeptidases (E-value 1e-64 to 1e-116), 1 encoding a cadherin-like protein (E-value 1e-35), and 2 encoding alkaline phosphatases (E-value 1e-115 to 1e-131). Aminopeptidase N, cadherin-like proteins, and alkaline phosphatases have been found to serve as Bt toxin binding receptors in other insect species [57][58][59]. To verify the function of aminopeptidase N as a receptor for Bt Cry1Ac toxin in Spodoptera litura, RNAi technology was used to reduce the expression of aminopeptidase N. This resulted in a significant reduction in the susceptibility of the insect to Cry1Ac toxin [60]. Gahan et al. [61] showed that in a resistant strain (YHD2) of Heliothis virescens, there was a disruption of a cadherin-superfamily gene by a retrotransposon-mediated insertion that resulted in high levels of resistance to the Bt toxin Cry1Ac. Fernandez et al. [62] also reported that a GPI (glycosylphosphatidyl-

Ostrinia nubilalis Bombyx mori
inositol)-anchored ALP (alkaline phosphatase) was an important receptor molecule involved in Cry11Aa interactions with midgut cells and toxicity to Ae. aegypti larvae. These studies demonstrate that aminopeptidases, cadherin-like proteins, and alkaline phosphatases can serve as Bt toxin receptors involved in Bt toxicity and resistance. Thus, identification of these candidate Bt receptor genes in this study will allow us to further examine whether receptor-mediated resistance is involved in Bt resistance in ECB.

Comparison of expression profiles between Cry1Absusceptible and resistant strains of ECB
We performed RT-PCR to compare the expression patterns of the candidate genes relevant to Bt toxicity and resistance between Cry1Ab-susceptible and resistant strains of ECB. Among 41 selected genes from the 52 candidate genes, which included 15 that putatively code for trypsinlike serine proteases, 13 for chymotrypsin-like serine proteases, 10 for aminopeptidases, 2 for alkaline phosphatases, and 1 for cadherin-like protein, we found apparently decreased expressions in 2 trypsin-like and 2 chymotrypsin-like protease genes, and 1 aminopeptidase genes in the resistant strain as compared with the susceptible strain ( Figure 6). Among these genes, 2 trypsin-like protease genes (contig [0907] and ECB-30-C08) were virtually absent in the resistant strain. In contrast, we found Although RT-PCR is not quantitative, reproducible results of such differential expression patterns for these candidate genes in the Cry1Ab-susceptible and resistant strains of ECB may imply their potential roles in conferring or contributing to Cry1Ab resistance as well as genetic differences between the susceptible and resistant strains of ECB. Indeed, certain trypsin-like and chymotrypsin-like proteases have previously been found to activate or degrade Bt protoxins and toxins, whereas several aminopeptidases, cadherin-like proteins and alkaline phosphatases have been demonstrated to serve as Bt receptor proteins in other insect species. Thus, our results may help conclusively identify the candidate genes involved in Cry1Ab resistance and provide us with new insights into the mechanism of Cry1Ab resistance in ECB. Nevertheless, further research will be needed to confirm their involvements and to elucidate their roles in Cry1Ab resistance in ECB.
Distribution of the ECB gut-specific unique ESTs annotated at GO level 2

Conclusion
Our study resulted in a gut-specific EST database containing 12,519 high-quality ESTs from a total of 15,000 ESTs sequenced in an agriculturally important lepidopteran pest. To our knowledge, this database represents the largest gut-specific EST database from a lepidopteran pest. Our analysis using ORF predictor software showed that approximately 11.2% of the protein coding genes in our database may be specific to ECB as these sequences have an ORF of at least 450 bp but did not have significant matches with known sequences in NCBI database. We have also identified 52 candidate genes that are relevant to Bt toxicity and resistance. These genes encode trypsin-like proteases, chymotrypsin-like proteases, aminopeptidases, cadherin-like protein, and alkaline phosphatases. Furthermore, we showed differential expressions of 15 out of the 41 representative candidate genes that were examined by RT-PCR, including 5 genes with apparently decreased expressions and 10 with increased expressions in Cry1Abresistant strain. These results may help us further narrow down the candidate genes possibly involved in Cry1Ab resistance, and provide us with new insights into the mechanism of Bt resistance in general in ECB.
We are in the process of developing a microarray using our unique ESTs together with the ECB gut-specific sequences which are already available in the GenBank. The microarray technology will help us analyze the global change of gene expression in response to Bt protoxins/toxins. It will also allow us to analyze any genetic differences between Bt resistant and -susceptible strains of ECB. Our genomic information on ECB could also serve as a valuable resource for identifying critical/vulnerable genes from the gut of ECB that would make useful physiological targets for new toxins that could be developed for use in pest management.

Insects rearing and dissection
The KS-SC Bt-susceptible ECB colony was used for generating EST libraries. This colony originated from the egg masses collected from the cornfields near St. John, Kansas, in 1995. The colony has been reared since then on artificial diets in the laboratory at Kansas State University according to Huang et al. [63]. The resistant ECB strain originated from a field collection of 126 diapausing larvae obtained from non-Bt hybrids in Kandiyohi Co., MN in 2001. The resistant strain was initiated from 14 larvae that survived exposure to a diagnostic Cry1Ab concentration used to identify potential changes in susceptibility to Cry1Ab [64,65]. To minimize inbreeding or founder effects, the resistant insects were backcrossed twice with the susceptible strain which originated from the same collection. Because the resistance was incompletely recessive and involved multiple factors [65], the F 1 progeny were randomly mated to obtain recombination of resistance factors in the F 2 progeny to allow selection of resistant genotypes. The insects were then subjected to selection at a Cry1Ab concentration corresponding to two-to threefold the LC 50 for the F 1 progeny (150 ng/cm 2 ) [66]. This selection event was designed to eliminate all the susceptible homozygotes and most of the heterozygotes. The resistant survivors from this selection event were then subjected to a second cycle of backcrossing, random mating, and selection. After six generations, the Cry1Ab concentration used in selections was gradually increased to achieve 750 ng/cm 2 at generation F 10 , a concentration that kills virtually all F 1 progeny. At generation F 17 , the resistance to Cry1Ab in the re-selected strain was in excess of 800-fold. The guts were dissected from fifth-instar larvae in DEPC (diethylpyrocarbonate)-treated distilled water and were stored in TRI reagent™ (Molecular Research, Inc., Cincinnati, OH) at -80°C until used.

cDNA library construction and sequencing
Total RNA was isolated from the whole guts of ECB larvae using TRI reagent™. The plasmid library was constructed using Creator SMART™ cDNA library construction kit from Clontech (Palo Alto, CA) following the manufacturer's protocols with one modification; instead of using the original phage vector, PCR fragments were cloned directly into a pPCR-XL-TOPO plasmid using a TOPO TA

EST analyses and annotations
The DNA sequences were preprocessed by using the online software EGassembler [38]. Specifically, sequence cleaning process was employed to trim the vector and adaptor sequences from the ESTs. RepeatMasker process was used to mask the interspersed repeats and low complexity regions of the sequences by using Drosophila Repbase repeat library. The sequences were further masked by using vector masking against NCBI's vector library and organelle masking against mitochondrial library. The pre-processed ESTs were then assembled by using Sequencher software (Gene Codes Corp., Ann Arbor, MI). The ORF regions of the assembled ESTs were identified by using the ORF predictor software [67] and secretory proteins were identified by looking for signal peptide sequence using SignalP software [43]. Gene ontology (GO) annotation was derived using Blast2GO software http:// www.blast2go.de/ [68].

Comparative analysis of ESTs
The ECB unique ESTs were comparatively analyzed for their sequence similarities against other organisms. The organism associated with the EST showing the highest BLAST score in GenBank databases was selected. The ECB gut ESTs were also compared with sequences from the silkworm and ECB that are currently available in the database by using BLASTN with a cutoff E-value of 10 -3 .

Expression profiling by RT-PCR
Forty-one out of the 52 candidate genes were selected for comparing their apparent gene expression profiles between the Cry1Ab-susceptible and resistant strains of ECB by using RT-PCR. These genes were selected solely based on their representations among different gene groups from our EST analysis. After total RNA was isolated from four midguts dissected from one-day-old fifth-instar larvae of each strain (Cry1Ab-susceptible and resistant  Three micrograms of total RNA was used for synthesis of first strand cDNA using SuperScript ® III First-Strand Synthesis System (Invitrogen, Carlsbad, CA). cDNA prepared from total RNA was used as a template for RT-PCR. A minimum of two biological replications was used for all the PCR primer pairs. For all trypsin-like (except for ECB-30_C08) and chymotrypsin-like serine protease, alkaline phosphatase, and RPS3 genes, 25 PCR cycles were used whereas for aminopeptidase and cadherin-like protein, 27 PCR cycles were used. For one trypsin-like serine protease gene (ECB-30_C08), however, 33 PCR cycles were used as the expression of this gene using fewer cycles was not visible on agarose gels. Each PCR was performed for above mentioned number of cycles, each consisting of 94°C for 30s, 55°C for 60s, and 72°C for 60s. The sequences of forward and reverse PCR primers, and expected size of PCR product for each of 41 candidate genes are provided in Additional file 1.

Authors' contributions
CK conducted the major part of this study including experimental design, construction of the cDNA libraries, EST analysis, RT-PCR analysis, and manuscript preparation. YCZ participated in experimental design, EST sequencing and preliminary analysis of EST data. MSC assisted in the development of the project, the establishment of the collaboration in EST sequencing, and manuscript preparation. LLB participated in experimental design, maintenance of the insect culture, and manuscript preparation. RAH participated in the development of the project and experimental design. JY assisted in EST sequencing and analysis. BDS and ALBC contributed materials and participated in data analysis and manuscript preparation. SM participated in experimental design and manuscript preparation. KYZ coordinated the project and participated in experimental design, EST analysis, and manuscript preparation. All authors read and approved the final manuscript.