Identification of differential expression genes associated with host selection and adaptation between two sibling insect species by transcriptional profile analysis

Background Cotton bollworm (Helicoverpa armigera) and oriental tobacco budworm (Helicoverpa assulta) are noctuid sibling species. Under artificial manipulation, they can mate and produce fertile offspring. As serious agricultural insect pests, cotton bollworms are euryphagous insects, but oriental tobacco budworms are oligophagous insects. To identify the differentially expressed genes that affect host recognition and host adaptation between the two species, we constructed digital gene expression tag profiles for four developmental stages of the two species. High-throughput sequencing results indicated that we have got more than 23 million 17nt clean tags from both species, respectively. The number of unique clean tags was nearly same in both species (approximately 357,000). Results According to the gene annotation results, we identified 83 and 68 olfaction related transcripts from H. armigera and H. assulta, respectively. At the same time, 1137 and 1138 transcripts of digestion enzymes were identified from the two species. Among the olfaction related transcripts, more odorant binding protein and G protein-coupled receptor were identified in H. armigera than in H. assulta. Among the digestion enzymes, there are more detoxification enzyme, e.g. P450, carboxypeptidase and ATPase in H. assulta than in H. armigera. These differences partially explain that because of the narrow host plant range of H. assulta, more detoxification enzymes would help them increase the food detoxification and utilization efficiency. Conclusions This study supplied some differentially expressed genes affecting host selection and adaptation between the two sibling species. These genes will be useful information for studying on the evolution of host plant selection. It also provides some important target genes for insect species-specific control by RNAi technology.


Background
Cotton bollworm (Helicoverpa armigera, Hübner) and oriental tobacco budworm (Helicoverpa assulta, Guenée) are two sibling noctuid species of Lepidoptera. They are distributed in almost same region from 50°S to 50°N and from 45°S to 45°N, respectively. H. armigera are slightly broader than oriental tobacco budworms [1,2]. In the field, similar external morphology makes them easily confused.
Interestingly, these two species can mate under artificial manipulation and produce offspring. However, when female H. armigera mated with male H. assulta, the first filial generations are all males [3]. This result further confirms that they are two distinct species [4]. Under natural conditions, because of differences in sex pheromone composition, the two species seldom mate. Their sex pheromones comprise cis-11-hexadecenal (Z11-16: Ald) and cis-9-hexadecenal (Z9-16: Ald), but the compositions are reversed in the two species. The ratio of these two components is 97:3 in H. armigera, but it is 7:93 in H. assulta [5][6][7].
In addition, the host ranges of these two species are significantly different. H. armigera is a euryphagous insect whose host range includes 40 families of over 200 different plants. However, H. assulta is an oligophagous insect, they are mainly feeding on the plants of the Solanaceae, for example, tobacco and hot pepper [1,8,9]. Although each species have their own preferred host plants, both of them love feed on tobacco and hot pepper [4]. These similarity and difference may be depend on the host plant selection by adult, or depend on the food digestion or detoxification enzymes from larvae. Host plant selection is a complicated and continuous process. The color, odor and shape of the plants will affect the insects choice on host plants, among which odors are a critically important element for the lifestyle and reproduction of an insect species [10]. Accordingly, insects with different feeding habits possess their own specific odor identification and odor-binding proteins [11]. At the same time, enzymes for food digestion and detoxification are also important factors for insect growth and development. These enzymes probably effect on the survival of insects, consequently affecting on the host range of an insect species. Therefore, identifying enzymes related to insect development and feeding habits will benefit to research on insect host range and on insect pest control.
The two sibling species are non-model insects. Their genome sequences are not available till now. To identify differentially expressed genes from H. armigera and H. assulta, digital gene expression tag (DGE-tag) profile libraries were constructed and sequenced using high throughput second-generation sequencing technology [12,13]. A DGE-tag profile is according to the theory and method of SAGE (serial analysis of gene expression) combined with high-throughput sequencing technology [14]. In this study, eight DGE-tag libraries were constructed and sequenced for four developmental stages (embryo, larva, pupa and adult) of the two species. Differentially expressed transcripts or genes between H. armigera and H. assulta were analyzed by bioinformatics. Most growth and development related genes have similar expression modes. The differentially expressed genes are mainly focus on olfactory-related genes and enzymes for food detoxification or digestion. Therefore, the two sibling species represent a good model for host plant selection and adaptability. These results also provide valuable data for insect pest control.

Results and discussion
The main identifying characteristics of H. armigera and H. assulta The two insect species in genus Helicoverpa, Cotton bollworm (H. armigera) and oriental tobacco budworm (H. assulta), are important insect pests of crops in China. They have similar external morphology. Figure 1 shows some taxonomic characteristics to distinguish these two species. Their eggs, larvae and pupae look like nearly same ( Figure 1A-C). Only under the microscope, according to some taxonomic characteristics they can be distinguished ( Figure 1E-G). The adult is the easiest stage to be distinguished by entomologist, because there are some special speckles and stripes on the wings ( Figure 1D, H).
To show the relationship of H. armigera and H. assulta, multiple sequence alignments of spanning the 18S rRNA across 26 species from 23 orders (Additional file 1, Additional file 2: Figure S1A) and the expansion segment of the COI gene across 20 species of lepidopteran moths (Additional file 3, Additional file 2: Figure S1B) were constructed and supplied as Additional files. The results provide clues about the evolutionary origin of the phytophagous Noctuidae. The sister group of H. assulta and H. armigera is clustered on a clade. Some hypotheses on the sister group relatedness based on morphology are concordant with our molecular results.

Sequencing of DGE-tag libraries and unique tag annotation
DGE-tag profile libraries were constructed from total RNA of H. armigera and H. assulta for four development stages (embryo, larva, pupa and adult). The summary sequence results are shown in Table 1. Low frequency tags were discounted under the assumption that many could have arisen through sequencing errors such as base substitution, deletion or addition at a single position [14]. Therefore, after eliminating low quality tags (containing Ns), copy numbers less than two and adaptor sequences, the remaining reads were called clean tags, of which more than 50% were singletons (tags with count equal to 1), which is typically observed in SAGE experiments [15]. We obtained approximately 23 million 17nt clean tags from both insect species. Their total unique clean tag (Uni-tag) numbers were also similar at approximately 357,000 (Table 1). Unique tag-to-gene assignments were conducted for the four development stages of H. armigera and H. assulta using SOAPdenovo program just permitting 1 bp mismatch [16]. On average, more than 75% of the uni-tags of H. armigera were mapped on transcripts; however, only 64.5% uni-tags of H. assulta mapped on transcripts. The total numbers of transcripts or genes were 268,145 and 230,591 for H. armigera and H. assulta, respectively, among which the annotated transcripts or genes were 88,857 and 75,157, respectively ( Table 1). The Illumina short-reads sequence of H. armigera and H. assulta were submitted to NCBI Sequence Read Archive under the accession number of SRR628282 and SRR620569, respectively.
Although we obtained similar amounts of total unique clean tags from H. armigera and H. assulta, the numbers of uni-tags obtained from the different developmental stages in each insect were quite different (Table 1, Figure 2A).
In theory, each uni-tag should be derived from one transcript [14]. According to this theory, the embryo stage has the highest number of transcripts compared with any other stages in the two species (97,646 and 93672 in H. armigera and H. assulta, respectively). Unexpectedly, the larval stage of H. armigera has the lowest number of transcripts, 80,673 (Table 1, Figure 2A). The gene annotation result indicated that 75.14% and 64.52% uni-tags of H. armigera and H. assulta corresponded to EST sequences in the H. armigera transcriptome library. The number of identified genes in H. armigera is higher than in H. assulta; however, the number of identified genes has no significantly difference at each developmental stage of the same species (Table 1, Figure 2B).

Global analysis of differentially expressed genes between the two species
The unique clean tags provide transcripts information for one species. Using a Venn diagram, developmental stage-specific transcripts and coexpressed transcripts between two to four developmental stages were shown between of H. armigera and H. assulta ( Figure 3A, B). The analysis results revealed that the minimum numbers of coexpressed transcripts existed between the larval and adult stage in both H. armigera and H. assulta (3155 and 3630, respectively). This indicated that the biggest differences exist between these two stages among the four developmental stages. However, in H. armigera, the embryo and adult stage are probably "the closest neighbors" and have the most amount of 6703 coexpressed transcripts or genes. In H. assulta, the largest number of coexpressed transcripts was 10052 between the embryo and larvae stage. The uni-tag annotation results were also analyzed for differential expression and coexpressed transcripts or genes using a Venn diagram ( Figure 3C, D).
The copy number of each unique tag provides quantitative information for the abundance of the transcripts or genes detected by the tags. Using the tag copy number, we can roughly estimate the expression level of each transcripts or gene. The dynamics of gene expression can be reflected by up-or down-regulation among the four development stages by pairwise comparisons ( Figure 3E, F). Overall, the changes in gene expression levels between two developmental stages in H. assulta are more extreme than in H. armigera.

Coexpressed transcripts or genes between two species
Comparing the unique clean tags between H. armigera and H. assulta, approximately 30% transcripts or genes are coexpressed in each developmental stage ( Figure 4A, red region of overlap). This reflects the real situation of coexpression transcripts in these two insects. Because there are no whole genome annotation information for the two insect species, most of the differentially or coexpressed genes are unknown proteins, hypothetical proteins, enzymes or cytoskeletal proteins, which are annotated according to the H. armigera transcriptome results. In terms of annotated genes, about 67% to 85% genes in H. armigera and H. assulta are coexpression during the four developmental stages ( Figure 4B, red region of overlap). GO analysis also confirmed that only in the larval stage there were more functional genes in H. assulta than in H. armigera (Additional file 4).

Development and host range related transcripts or genes between two insect species
To explain why the two species have these similarities and differences, we focused on comparing the transcripts or genes that are related to growth and development, food digestion or detoxification enzymes, and host plant recognition. We identified 246 and 240 growth and development related transcripts, 1137 and 1138 transcripts for food digestion or detoxification related enzymes, and 83 and 68 olfaction related transcripts from H. armigera and H. assulta, respectively. The relative expression levels (by tag copy numbers) for these transcripts in each developmental stage of the two species and the annotation results are listed in Additional files 5, 6 and 7. In summary, the amounts of each type of transcript are similar between the two species ( Figure 5A-C). The biggest difference in the number is odorant binding proteins (OBPs). There are 42 OBP transcripts in H. armigera, but only 31 in H. assulta ( Figure 5B).

Expression patterns of possible host-range-related transcripts or genes
The expression patterns of transcripts can be divided into two categories by tag copy number among the four developmental stages between two species ( Table 2). The first type is the similar expression pattern. For example, in Table 2, the Trypsin-1 gene just express at the embryo stage in the two species (Table 2, line 1). The Cytochrome C oxidase polypeptide III gene has nearly same high expression levels at all four development stages ( Table 2, line 2). More information is shown in Additional files 5, 6, and 7. Genes and transcripts with similar expression patterns were not further analyzed in this study. The other expression pattern is showing a significant difference between two species. For example, the OBP3 gene is a carrier of odor molecules, which can protect odor molecules from enzymatic degradation [17,18]. Thus, OBP3 is likely to be an important gene in host plant selection. In this study, the expression pattern of OBP3 was significantly different among the four development stages between the two insect species (Table 2, line 5 D3). These kinds of transcripts or genes are probably the main reasons underlying the differences between the two species. The expressions of these types of transcripts or genes were confirmed by reverse transcription polymerase chain reaction (RT-PCR) ( Figure 6). Most of expression patterns are nearly consistent with the digital expression tag copy number. These transcripts or genes should be further studied in host selection and adaptation.

Conclusions
In this paper, we systematically analyzed the differences and similarities between the two sibling insect species, H. armigera and H. assulta. These characteristics make the two sibling species fitting for research on host range, evolution and pest control. By comparing the tag copy number of each developmental stage, we identified some differentially expressed transcripts or genes that are probably associated with host plant recognition and food digestion or detoxification. These genes provide important clues for further study. SAGE is a method of large-scale gene expression analysis [19]. It is an 'open' system that permits the relative expression levels of almost all transcripts in an organism. DGE-tag profiles are a development of this technology using second-generation high throughput sequencing technology [20,21]. The DGE-tag profile results indicated that even between two sibling species, only 30% of transcripts are coexpressed in each developmental stage. The annotation results indicated that 67-85% genes are coexpressed in each developmental stage between the two species ( Figure 4). These results further confirm that they are distinct two species at the genome level.
The unique clean tags number can provide quantitative information for the number of transcripts detected by tags. Using traditional SAGE technology, 50,000 to 100,000 tags could be collected, which represent 20,000 to 40,000 unique tags [15,19]. In this study, the highthroughput approach was adopted to implement the tag sequencing protocol on the Illumina platform [13]. Using this technology, we obtained more than 23 million clean tags from H. armigera and H. assulta, respectively. These data far more exceed the saturation requirements of sequencing [15,22]. The total unique clean tags were 356,842 and 357,414 for each species, which probably represent the total transcripts in the whole life cycle of the two insects. A total of 80,673 to 97,646 transcripts were identified for each developmental stage (Table 1, unique clean tags). These results indicated that the data sets were suitable for analyzing the differential expression of transcripts or genes.
In this study, we found that embryo stage expressed the most transcripts or genes in both species. Unexpectedly, the larval stage expressed the lowest numbers of transcripts in H. armigera ( Figure 3A). Because the total clean tag number far more exceeded the saturation requirements, the difference is unlikely to be caused by sequencing bias. The most significant difference between the two insects is in their host ranges, which should be reflected at the larval stage.
is the biggest difference among all the transcripts types. This is probably the main reason why these two sibling species have different host ranges. We also identified two and four OR-related transcripts or genes from H. armigera and H. assulta, respectively. These transcripts should be further studied to increase our understanding of the host range difference between the two species. In addition to host selection, the other important aspect is host adaptation. During feeding, insect will inevitably swallow some poisonous secondary metabolites from plants. Therefore, insects have to develop an adaptation mechanism involving a series of detoxification enzymes [27,28]. These detoxification enzymes include cytochrome P450-dependent monooxygenases (P450s), glutathione-S-transferases and carboxylesterases (COEs) [27,[29][30][31][32]. In this study, we found that there are more transcripts or genes for P450s, COEs and ATPases in H. assulta than in H. armigera ( Figure 5C). GO analysis also confirmed that only in the larval stage there are more functional genes in H. assulta than in H. armigera (Additional file 4). Therefore, we suspected that because oriental tobacco budworm has a narrower host range, more detoxification enzymes would help them increase food detoxification and utilization efficiency. This should be further investigated in a future study.
The two species are also important agricultural insect pests. Considering all the similarities and differences between the two sibling species, we think they are a good model insect-pair for developing species-selective RNAi technology. Many studies have shown that RNAi is feasible technology in insect pest control [33][34][35][36][37]. The appeal of RNAi technology in pest control is that it is possible to design the pesticide to target only a single species or a group of related species, with minimal threat to other organisms [38]. To this end, it is necessary to identify species-specific target genes. The present study represents an effective strategy for identifying differentially expressed genes from related species that do not have genome sequences. Using DGE-tag profile technology, we identified many differentially expression transcripts or genes from the two sibling insect species. These genes not only provide clues for host range difference studies, but may also represent important targets for speciesspecific control by RNAi technology. adults (male and female separately)). The samples were immediately frozen in liquid nitrogen and stored at −80°C before RNA extraction.

RNA isolation
Total RNA was isolated using a Qiagen RNA Extraction kit according to the manufacturer's instructions. The RNA was treated with RNase-free DNase I for 30 min at 37°C (New England BioLabs) to remove residual DNA. Equivalent amounts of the 34 samples were merged into four pools of embryo, larva, pupa and adult. mRNA was isolated from DNA-free total RNA using a Dynabeads mRNA Purification Kit (Invitrogen).

cDNA synthesis
Before cDNA synthesis, 5 μg total RNA was treated with RQ1 RNase-free DNase (Promega), according to the manufacturer's instructions, to ensure no DNA contamination. cDNA synthesis was then carried out with the purified RNA using the SuperScript III First-Strand Synthesis System (Invitrogen), following the manufacturer's instructions. The RT reaction was performed using Mastercycler Gradient (Eppendorf ). Briefly, 1 μg RNA, 50 μM oligo dT (20) and 10 mM dNTP mix were added together and incubated at 65°C for 5 min. The samples were then placed on ice for at least 1 min. After that, 2 μl 10 × RT buffer, 1 μl 25 mM MgCl2, 2 μl 0.1 M DTT, 40 U RNaseOUT and 200 U SuperScript III were added and incubation carried at 50°C for 50 min. The RT reaction was terminated by incubating at 85°C for 5 min and the residual RNA was removed by incubating at 37°C for 20 min with the addition of 1 μl RNaseH. The cDNA was stored at −20°C.

Sequence tag preparation, sequencing and DGE-tag annotation
Sequence tags were prepared with Illumina's Digital Gene Expression Tag Profiling Kit, according to the manufacturer's protocol. A schematic overview of the procedure can be found in reference [39]. We extracted 6 μg total RNA, use Oligo(dT) magnetic beads adsorption to purify mRNA, and then use Oligo(dT) as primer to synthesize the first and second-strand cDNA. The 5' ends of tags can be generated by two types of Endonuclease: NlaIII or DpnII. Usually, the bead-bound cDNA is subsequently digested with restriction enzyme NlaIII, which recognizes and cuts off the CATG sites. The fragments apart from the 3' cDNA fragments connected to Oligo(dT) beads are washed away and the Illumina adaptor 1 is ligated to the sticky 5' end of the digested bead-bound cDNA fragments. The junction of Illumina adaptor 1 and CATG site is the recognition site of MmeI, which is a type of Endonuclease with separated recognition sites and digestion sites. It cuts at 17 bp downstream of the CATG site, producing tags with adaptor 1. After removing 3' fragments with magnetic beads precipitation, Illumina adaptor 2 is ligated to the 3' ends of tags, acquiring tags with different adaptors of both ends to form a tag library. After 15 cycles of linear PCR amplification, 95 bp fragments are purified by 6% TBE PAGE Gel electrophoresis.
After denaturation, the single-chain molecules are fixed onto the Illumina Sequencing Chip (flowcell). Each molecule grows into a single-molecule cluster sequencing template through Situ amplification. Then add in four types of nucleotides which are labeled by four colors, and perform sequencing with the method of sequencing by synthesis (SBS). Each tunnel will generate millions of raw reads with sequencing length of 35 bp. Image analysis and basecalling were performed using the Illumina Pipeline, where sequence tags were obtained after purity filtering. This was followed by sorting and counting the unique tags.
We filtered out low quality tags (containing Ns), copy number below 2 and adaptor sequences. Ultimately, ≈6 million clean sequence DGE-tags for each developmental stages of embryo, larva, pupa and adult were obtained. The DGE tags, which consist of the CATG restriction enzyme digested site and an additional 17 bp from each transcript, were de novo assembled using SOAPdenovo program just permitting 1 bp mismatch [16]. All of the tags were compared with the reference database of H. armigera cDNA library [40][41][42] and other insect nucleotide sequences (Bombyx mori, Heliothis virescens, Spodoptera exigua, Prodenia litura and Manduca sexta) from NCBI. The number of tags mapped on a transcript was used as a measure of the abundance of this transcript. The DGE-tag expression level was calculated by the RPKM (Reads Per kb per Million reads) method [43]. Functional annotation by Gene Orthology (GO, http://www.geneontology. org/) was run on Blast2go (http://www.blast2go.org/) [44]. The COG and KEGG pathway annotation were performed using Blastall online program against the Cluster of Orthologous Groups of proteins (COG, www.ncbi.nlm. gov/COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG, www.genome.jp/kegg) databases, respectively. Briefly, sequences were searched against GenBank non-redundant database (Nr) with BLASTx algorithm [44]. The blast results were mapped to gene ontology terms and annotation was carried out using default annotation parameters in the Blast2Go software suit [44][45][46]. For further functional annotation, the KEGG mapping was carried out in Blast2Go.