Immunization, sequencing assembly, and gene identification
Most immunity studies on lepidopteran insects use dead microbes, aiming at deciphering biochemical mechanisms. In contrast, dipteran insects such as Drosophila and mosquitoes are often injected with live pathogens, which eventually kill the hosts. In order to decipher host-pathogen interactions and develop better pest control methods, we performed septic injury on day 2 fifth instar larvae using two entomopathogens, B. bassiana, and E. cloacae, which elicit an inefficient immune response compared with the challenge by dead pathogens. The immunity-related genes based on their expression profiles in fat body or hemocytes were obtained through comparison of the immune challenged transcriptomes. These two entomopathogenic agents had distinct morphological characteristics on plates and under light microscope (in Additional file 1: Figure S1). E. cloacae injected larvae turned light red in color at 12 h after infection and were almost completely liquefied before death by 24 h (Figure 1A), as compared with the light brown color in larvae that were injected with the phosphate buffered saline (PBS). Fungal infection on the other hand was slower, with almost 85% survival rate at 24 h post injection as compared to the 15% of that of bacteria (Figure 1B). B. bassiana injected larvae turned dark brown and were finally covered with white conidia and hyphae at 72 h post infection, leading to the death of the insect. These results suggested a distinct pathogenicity of B. bassiana and E. cloacae toward the cotton bollworm.
In D. melanogaster and mosquitoes, immunity-related genes regulated by IMD pathway showed an acute expression pattern involved in the anti-bacterial responses, whereas Toll pathway displayed sustained activation of immunity-related genes against fungal infection [5,18]. To gain detailed information about the H. armigera transcriptome, the hemocytes and fat body samples were collected from E. cloacae infected larvae at 6 h and B. bassiana infected larvae at 48 h, respectively. Six cDNA libraries were constructed from the RNA samples extracted from control fat body (FB_Mock), fungus-induced fat body (FB_Bb), bacterium-induced fat body (FB_Ec), control hemocytes (HC_Mock), fungus-induced hemocytes (HC_Bb), and bacterium-induced hemocytes (HC_Ec) (in Additional file 1: Figure S2). These libraries were sequenced on an Illumina Hiseq 2000 system. After removal of adaptor sequences and low-quality reads (Q < 20), these libraries yielded 72.6, 79.2, 65.6, 80.1, 46.7, and 80.4 million high-quality reads with 7.54, 7.70, 5.90, 7.53, 8.25, and 7.45 Gb data, respectively (in Additional file 2: Table S1). All the reads were de novo assembled into 57,588, 61,452, 55,394, 60,873, 42,073, and 58,262 contigs with average lengths of 1,102, 1,139, 1,034, 1,128, 995, and 1,188 nt, respectively [19,20]. Pooled clean reads from the six libraries and three others (E. cloacae, B. bassiana, and PBS injected 2nd instar larvae), totally yielded 150,606 contigs (average size: 1,124 nt) (in Additional file 2: Table S1).
To predict functions, all the obtained unigenes were annotated according to BLASTX searches against the non-redundant sequence database resulting 60,458 (43.2%) of them displaying homology to the known proteins (E-value < 1e-5) (in Additional file 1: Figure S3) [21]. For species distribution, nearly 17,000 (11.2%) annotated unigenes were homologous to B. mori, followed by Danaus plexippus (6.6%). Only fewer transcripts were matched to the ones from H. armigera (0.6%), probably due to the unavailability of its genome sequences (in Additional file 1: Figure S3). Additionally, 713 transcripts with high similarity to non-coding RNAs by homology search were obtained (in Additional file 3: Table S2) [22]. After removing redundant sequences and those shorter than 150 nt, we retained the unigenes with significant BLAST hits and generated a non-redundant dataset containing 37,694 sequences (average length: 1,711 nt) (in Additional file 2: Table S1). Length distribution of the H. armigera transcripts indicates that this final dataset has a high proportion of long transcripts (>2000 nt) and increases the chances of full length gene predictions. Based on the unigene information (e.g. gene ID, length, putative function) (in Additional file 4: Table S3), the immunity-related genes were identified and filtered out for further analysis. The detailed workflow from immune challenge to immunity-related gene identification is provided in a flowchart (in Additional file 1: Figure S2).
Identification and functional classification of differentially expressed genes in response to B. bassiana and E. cloacae infections
To gain insights into the tissue-specific transcriptional changes in H. armigera larvae infected by B. bassiana and E. cloacae, we performed pairwise comparisons between libraries to identify the differentially expressed transcripts (DETs) (in Additional file 1: Figure S4A) [23]. Relative to the control, transcripts with greater than 2-fold change and p value less than 0.001 were considered as differentially expressed. The complete list of DETs with FPKM (fragments per kilobase of transcript per million) values (in Additional file 5: Table S4) revealed that more unigenes exhibited remarkable changes in mRNA levels in fat body (530 up- and 415 down-regulated) than in hemocytes (235 up- and 320 down-regulated) 6 h after E. cloacae injection (in Additional file 1: Figure S4B) indicating a tissue-specific affect. More pronounced changes occurred to the larvae 48 h after B. bassiana injection. Compared to the control group, there were 2,014 (1,469 up- and 545 down-regulated) transcripts that were significantly changed in the fat body elicited by B. bassiana, while more transcripts (496 up- and 2081 down-regulated) suppressed in hemocytes were observed.
In order to compare gene expression levels in hemocytes and fat body in response to the pathogen invasion, we performed hierarchical clustering of DETs and identified five discrete clusters showing expression trends relevant to the infection. The 487 and 2,015 genes in clusters 1 and 2 had the highest and the lowest RNA levels in hemocytes from B. bassiana infected larvae, respectively, among all the experimental conditions (Figure 1C and in Additional file 5: Table S4). Clusters 3 (1,280 genes) and 4 (370 genes) showed the most significant mRNA level increase in fat body after the fungal and bacterial infection, respectively. Furthermore, cluster 5 represented gene cohorts significantly suppressed in hemocytes and fat body after the infections. Consistent with the information revealed by hierarchical clustering, a Venn diagram analysis indicated that 1,251 transcripts were exclusively regulated in the fungus challenged fat body (1,109 up- and 142 down-regulated), while 1,857 other transcripts were specifically regulated in the fungus challenged hemocytes (290 up- and 1,567 down-regulated) (Figure 1D). In comparison, fewer gene transcripts (222 + 67 + 101 + 15) were exclusively regulated in the two tissues after the bacterial infection. Considering that the gene repertoire of fat body had small overlap with that of hemocytes (Figure 1B), we speculate that tissue type instead of the pathogen is more important in regulating the gene transcription under the same experimental condition. Besides, the data showed consistent up-regulation of 12 genes in both fat body and hemocytes after the fungal and bacterial infection and most of them were immunity-related.
To analyze the potential functions of all identified DETs and the corresponding pathways involved, the Gene Ontology (GO) enrichment and KEGG analysis were further performed for DETs [24,25]. Functional classification of all unigenes was determined by GO assignment. In the gene repertoire of fat body, more DETs from the E. cloacae elicited transcriptome were assigned binding and catalytic activities in the category of molecular function and cellular and metabolic processes in biological process (Figure 2A). In contrast, more DETs from B. bassiana elicited hemocyte transcriptome were enriched in the following GO terms: cell, cell part, organelle, and organelle part in cellular component, binding in molecular function, and biological regulation and cellular process in biological process. We also used KEGG classification to analyze putative functions of the B. bassiana and E. cloacae induced DETs, which were categorized into thirteen functional groups based on the biological processes and molecular functions [26]. There was a remarkable tissue related disparity among the major functional gene groups between the fungi and bacteria induced transcriptomes (Figure 2B). Except in E. cloacae elicited hemocytes, a large proportion of up-regulated gene repertoire encoded enzymes involved in carbohydrate, energy, transport and catabolism, and immunity-related processes. Other notable functional gene groups were xenobiotics, glycan biosynthesis and metabolism, nucleotide metabolism, folding, sorting and degradation. In particular, such gene transcripts formed the largest group that were down-regulated in hemocytes after B. bassiana infection, followed by genes involved in replication and repair. We performed the principal component analysis (PCA) to compare changes in the transcriptomes and found clear variation between fat body and hemocyte samples (Figure 2C). Another variation exists between pathogen treated and control groups, whereas Ec or Bb treated samples are mostly similar. It appears that the difference between the tissues was much more significant than that between different pathogens.
Taken together, our results clearly indicated that global gene expression profiles varied among different tissues and treatments (Figures 1 and 2). While tissue is the major determinant, other factors also have minor but significant impacts on the transcriptome. The distinct GO and KEGG distribution of up- and down-regulated gene cohorts showed similar trends. Detailed expression profiling of the immunity-related genes was still required for further investigation of tissue-specific immune responses against B. bassiana and E. cloacae infection.
For in-depth analysis of the interactions between H. armigera and pathogens, we annotated the insect defense genes and examined their phylogenetic relationships. By sequencing the tissues that were enriched in the immune-related transcripts at sufficient depth, followed by the BLAST search [27], we were able to detect 233 immunity-related genes (Figure 2D, in Additional file 6: Table S5). This number was significantly higher when compared with other insects from holometabolous orders, such as A. mellifera, but similar to that in B. mori. This result indicated that H. armigera, well adapted to various plant hosts and environment, has a sizable repertoire of genes related to the cellular and humoral immune responses against the wounding and infection.
Comparative analysis of genes for immune signal recognition
When pathogens infect insects, multiple PRRs may recognize conserved determinants (e.g. lipopolysaccharide, peptidoglycan, lipoteichoic acid, β-1,3-glucan) on the pathogen surface to trigger host defense reactions [28]. While some PRRs are constitutively expressed as surveillance molecules at certain levels in hemolymph, others are synthesized in response to the entry of microorganisms. Upon binding to their target molecules, PRRs undergo a conformational change required for recruiting plasma factors and hemocytes that eliminate the invading pathogens [8], finally leading to PPO activation and synthesis of immune effectors (e.g. AMPs) via immune signaling pathways [8]. The H. armigera immunotranscriptome has at least 41 PRR transcripts, including 9 PGRPs, 5 βGRPs, 24 CTLs, and 3 galectins (in Additional file 6: Table S5, and in Additional file 7: Table S6).
PGRPs play a pivotal role in activating insect innate immunity by binding to Lys- and diaminopimelate-type peptidoglycans in bacterial cell walls [7]. The members of this protein family are characterized by the presence of a conserved C-terminal PGRP domain (approximately 165 aa), which is homologous to bacteriophage and bacterial type 2 amidases [29]. PGRPs can be further sub-grouped into short (S) and long (L) forms based on their length and the presence of transmembrane domain. There are 13, 7, 12, and 3 PGRPs in the genomes of D. melanogaster, A. gambiae, B mori and A. mellifera, respectively. We have identified nine putative PGRP transcripts in H. armigera transcriptome and have named them as HaPGRP-SA1, −SA2, −SB1, −SB2, −SD, −LA, −LB, −LC, −LD, in accordance with PGRP nomenclature from other insects (Figure 3, in Additional file 6: Table S5) [17]. The mRNA abundances of H. armigera PGRP-SA1, −SB1, and -SD increased in response to the bacterial infection, and their roles in bacteria agglutination and growth inhibition have been characterized as well [30]. Multiple sequence alignment suggested that PGRP-SB1, −SB2, −SD contains all the five active site residues (H, Y, H, T, C) essential for amidase activity, indicating that they can not only bind but also degrade peptidoglycan (in Additional file 1: Figure S5). Based on their homology to BmPGRP-S1 and S5, a putative role of HaPGRP-SA1, −SB1 and SB2 in regulation of PPO activation can be predicted [31,32]. All the S form HaPGRPs have an N-terminal signal peptide and are expected to be secreted outside the cell (Figure 3). On the other hand, HaPGRP-LA, −LB, −LC, −LD contains a transmembrane region, potentially serving as peptidoglycan receptors triggering immune signaling pathways. There existed no PGRP-LE in H. armigera with similarity to that of A. gambiae, B. mori, and A. mellifera, while PGRP-LD has not been identified in B. mori (Figure 3B).
βGRPs belong to another family of PRRs, which have a β-1, 3-glucanase-like domain close to the carboxyl terminus [28]. They could bind to β-1, 3-glucan (a fungal cell wall component) but lack the hydrolase activity, due to amino acid substitutions in the catalytic center. Other members of the family are referred to as GNBPs (Gram-negative bacteria-binding proteins). It has been reported that the mRNA level of B. mori GNBP was significantly elevated in response to bacterial infection [33]. In our transcriptome, five βGRP transcripts are identified and designated as H. armigera βGRP1 through 4, all of which may be secreted into hemolymph except βGRP3 (Figure 4, in Additional file 6: Table S5). The phylogenetic analysis of insect βGRP gene family revealed segregation into two major evolutionary groups – one retaining the key residues for glucanase activity (Group B) and the other lacking such residues (Group A) [34]. H. armigera and B. mori βGRP1, 2a, and 4 share a 1:1 orthology, while H. armigera βGRP-3 is orthologous to M. sexta MBP [34]. βGRP1-3 belong to a lepidopteran specific clade probably arising as a result of gene duplication (Figure 4). Based on the studies in M. sexta and orthologous relationships, we suggest that H. armigera βGRP1 and 2a bind to β-1,3-glucan to trigger a serine protease cascade for PPO activation. The close relationships among H. armigera βGRP1, 2a, 2b, 3 and D. melanogaster GNBP3 also indicate that the two β-1,3-glucanase-related proteins could be involved in antifungal immunity of H. armigera.
C-type lectins (CTLs) are a sub-group of carbohydrate binding lectin proteins that require the presence of calcium for its association with various types of sugar moieties like mannose and galactose. The protein family shows a considerable amount of diversity in terms of structure and function and can occur in membrane bound or soluble forms. Several insect CTLs have been implicated to play a role in host defense functioning as PRRs and eliciting multiple immune responses including PPO activation, hemocyte-mediated nodule formation, encapsulation and opsonization finally resulting in microbial clearance [35,36]. Previous studies have cloned and characterized eight H. armigera CTLs, of which CTL1 has been characterized to play a role in bacterial agglutination [37,38]. Another member CTL7, apart from having a function similar to CTL1, can also enhance cellular encapsulation and melanization [15,16]. Our in-depth immunotranscriptomic studies have identified a total of twenty-four CTLs (except CTL1 and 8), 18 of which are novel transcripts, much more than the PGRP and βGRP family size, thus significantly expanding our knowledge about CTL gene family in this devastating insect pest. In terms of domain structure (Figure 5A, in Additional file 6: Table S5), thirteen (CTL2-7, 11–14, 16, 25, and 26) had two tandem carbohydrate recognition domains (CRDs), while eleven (CTL15, 17–24) possessed only one. In addition to one CRD domain, CTL9 and 10 had three sushi and eleven sushi domains, respectively. Thirteen CTLs formed an independent clade in the phylogenetic tree, suggesting a possible gene expansion of this family in H. armigera (Figure 5). The phylogenetic analysis indicates that CTLs containing two CRDs are in the same group, suggesting a lepidopteran specific family expansion. H. armigera CTL2, 6, and 25 are orthologous to M. sexta immulectin-2, which agglutinated E. coli and might anchor the PPO activation reaction in the vicinity of microbial cells [39,40]. Along with this group, HaCTL7, 11, 26 were clustered together with MsIML-3 and MsIML4 in a bigger clade. HaCTL21, 22 and 23 formed three distinct clades along with the respective CTLs from A. mellifera, T. castaneum, D. melanogaster, A. aegypti, B. mori, and Acyrthosiphon pisum. These three clades are in turn part of a single branch in the phylogenetic tree which is distant from other major branches, and are characterized as galactose binding CTLs (CTLGAs) with QPD signature sequence. However, we could not find H. armigera homologue of A. gambiae CTL4 and CTLMA2, which prevented Plasmodium from melanization [41]. Our findings imply that H. armigera CTLs are much more diversified than other insect groups and may arise from gene duplications catering to specific pathogen interactions. Apart from CTLs, another class of lectins-Galectins that were found in diverse groups of insects and known to bind to β-galactoside were also essential for pathogen recognition [42]. H. armigera immunotranscriptome revealed three galectins, displaying 1 to 1 orthologous relationship with B. mori galectins (Figure 5B).
Some thioester-containing proteins (TEPs), belonging to C3/α-2 macroglobulin superfamily, harbor an intrachain β-cysteinyl-γ-glutamyl thioester bond, which is used for self- or non-self-identification [43]. α2-macroglobulin is a broad-spectrum protease inhibitor using a trapping mechanism through its thioester linkage. During the immune responses, complement factors C3, C4, and C5 bind to self/non-self surfaces via the same bond. In mosquitos, TEP family presents extensive gene duplication, with 15 genes in A. gambiae, 8 in A. aegypti, and 10 in C. pipiens [44]. In A. gambiae, TEP1 played the key role in blocking Plasmodium oocyst proliferation in the midgut [45]. Helicoverpa, however, has a relatively small TEP family with only four members which are homologous to B. mori TEPs (Additional file 1: Figure S6, in Additional file 6: Table S5). Since H. armigera TEP4 contains the signature motif GCGEQ, it is probably involved in immune surveillance against pathogens or parasites.
Based on sequence homology search, we have also identified other H. armigera PRRs encoded by 10 scavenger receptors class B (ScR-Bs), 1 hemocytin, 1 hemolin, 1 DSCAM, 1 Draper, 1 RhoL, 1 Rap1, 1 enabled, 1 Fascin, 1 Scar, 1 Dizzy, 1 TM9SF4, 1 Integrin, 1 Noduler, and 4 Eater, which are involved in phagocytosis, encapsulation, and various other cellular immune responses (in Additional file 6: Table S5) [46,47]. The phylogenetic analysis of the ScR-Bs (in Additional file 1: Figure S7) demonstrated that H. armigera and B. mori ScR-Bs were most likely to form the orthologous pairs. The H. armigera ScR-B9 might function in phagocytosis of the apoptotic bodies, since it is orthologous to D. melanogaster Croquemort (CG4280). The presence of a large number of PRR genes in the immunotranscriptome of H. armigera, indicated their involvement in the immune response to the infection of B. bassiana and E. cloacae.
Genes involved in melanization and extracellular signal modulation
Upon PRR binding, the invading microbes may lead to the PPO activation. As key enzymes in the melanization reaction, the copper-containing phenoloxidases are activated from its inactive zymogens by specific proteolytic cleavage [28]. Active phenoloxidases catalyze the conversion of monophenols to o-diphenols and further oxidation of o-diphenols to quinones, which then polymerize to form melanin that entraps and kills microbial pathogens and parasites [8,48]. Except for mosquitoes, most insects had one to three PPOs [49]. Similar to other lepidopteran species, two PPO transcripts were identified in the H. armigera transcriptome. The H. armigera PPO1 and PPO2 share 80.3% and 79% of identity in amino acid sequence to the B. mori PPO1 and PPO2 (in Additional file 6: Table S5), respectively. The phylogenetic analysis establishes the orthologous groups of HaPPO1-BmPPO1-MsPPO1 and HaPPO2-BmPPO2-MsPPO2 (in Additional file 1: Figure S8).
Serine proteases (SPs) and their non-catalytic homologs (SPHs) constitute one of the largest protein family in insects that are functionally involved in various physiological processes such as digestion, development, hemolymph clotting, and immune responses [50]. Hemolymph SP cascades, comprised of multiple clip-domain SPs (cSPs), are essential for proteolytic activation of PPO and the processing of Toll pathway ligand, spätzle, which will be discussed in detail in the next section. In total, we have identified 46 SPs and 7 SPHs in H. armigera transcriptome (in Additional file 8: Table S7). Among them, 41 have complete sequences containing entire open reading frames (ORFs). Apart from these cSPs, five other SPs (SP41-44, 47) were shown to contain the structural modules important for protein-protein interactions, such as LDLA, Sushi, and SRCR. Detailed analysis of SPs revealed the orthology among H. armigera SP42, M. sexta HP14 and D. melanogaster modular SP. The respective SPs from M. sexta and D. melanogaster apparently are the first key enzymes triggering PPO activation and Spätzle processing [51,52]. Thus we postulated a similarly important role for H. armigera SP42, which needs further experimental verification.
While each cSP harbors one or two regulatory modules at its amino terminus, clip-domain SPHs (cSPHs), although lacking catalytic serine residue, act as cofactors for cSP mediated reactions. In Helicoverpa transcriptome, 10 cSPs (cSP1-10) and 2 cSPHs (cSPH11-12) were identified (in Additional file 6: Table S5). This number is more or less similar to B. mori (18) but much lower than that of D. melanogaster (57) [31,50], partly due to the lack of the corresponding genome information. Multiple sequence alignment of the ten cSP catalytic domains suggests that they all have a trypsin-like specificity, based on residues predicted to form the primary substrate binding site (DGG) (in Additional file 1: Figure S9, in Additional file 8: Table S7). H. armigera cSPs and cSPHs are divided into four subfamilies based on their evolutionary relatedness (Figure 6A, in Additional file 6: Table S5). With similar SP subfamilies in other insects as well, these four groups of SP-related genes might represent lineages derived from ancient evolutionary events [53,54]. CLIPA is composed of cSPHs only. H. armigera cSPH11 of CLIPA is orthologous to T. molitor PPAF and Holotrichia diomphalia PPAF2 and based on this orthology a putative function of cSPH11 as a cofactor of PPO can be predicted [55,56]. Similarly, in CLIPB subfamily, cSP6 and cSP8 are orthologous to M. sexta PAP3 and PAP1 with 64.3% and 69.7% identity in amino acid sequences, respectively and may function as PPO activating protease [57,58]. Structurally, cSP6 harbors dual clip domains at its N-terminus, while cSP8 only has one. cSP8 and MsPAP1 were clustered together into a group which is close to but distinct from a clade comprising of DmMP2, AaIMP1, AaIMP2 and AgCLIPB9 in phylogenetic analysis. On the other hand, cSP6 was clustered together with MsPAP3 and BmPPAE into a clade distant and distinct from the above two clades. Likewise, H. armigera cSP7, belonging to yet another independent clade within CLIPB subfamily, is orthologous to MsHP8, BmBAEE and TmSPE, suggesting its involvement in spätzle processing and Toll pathway activation [12,59]. CLIPC includes four cSPs, of which cSP4 forms an independent branch as that of M. sexta HP6 and D. melanogaster Persephone. Since M. sexta HP6 and HP8 constituted a branch leading to the cleavage of Spätzle, H. armigera cSP4 and cSP7 are postulated to achieve the same function.
The PPO activation is delicately regulated by serpins. Serpins are a superfamily of proteins, which have around 350–400 amino acid residues and the conserved tertiary structure comprising nine α-helices and three β-sheets [60]. Most of them inhibit SP-mediated processes by forming covalent complexes with their cognate SPs. The P1 residue at an exposed reactive site loop (RSL) near the carboxyl terminus determines its inhibitory selectivity. Sequence similarity search of H. armigera unigenes yields 22 serpin transcripts, most of which have putative orthologs in B. mori (Figure 6B, in Additional file 6: Table S5). Of these, sixteen serpins have a predicted signal peptide for secretion, while the rest (Serpin2, 4, 10, 12, 16, 21) are assumed to be cytosolic (in Additional file 6: Table S5). H. armigera Serpin1-6 are highly similar to their M. sexta and B. mori orthologs, suggestive of their essential roles in regulating innate immunity [28]. One interesting feature of M. sexta and B. mori Serpin1 and 2 is the mutual exclusive use of alternative exons to generate RSL variants on the same framework. While initial analysis suggests the same mechanism in H. armigera (data not shown), a detailed study of Serpin1 and 2 gene sequences will reveal the scope of this phenomenon. 1:1 orthologous relationship between H. armigera, B. mori, and M. sexta Serpins3, 4 and 5 allowed us to make predictions about the putative functions of these HaSerpins as summarized below [61,62]. D. melanogaster Spn27A, M. sexta serpin-3 and B. mori Serpin3 are negative regulators of the melanization cascade. In the phylogenetic tree, H. armigera Serpin3 was located in the same clade with them and also contained the conserved P2-P2′ sequence (NK-FG) (Figure 6, in Additional file 1: Figure S10). Serpin3 may negatively regulate melanization and the activation of Spätzle, while Serpin4 and 5 might inhibit hemolymph cSPs regulating immune response in general (Figure 6, in Additional file 1: Figure S10). We predict their proteolytic cleavage site based on the multiple sequence alignment of the hypervariable RSLs of the H. armigera serpins (in Additional file 1: Figure S10). Serpin10 and 14 lack a conserved carboxyl-terminus and may not fold properly with a long insert separating the partial serpin domain. Serpin1, 3 ~ 6, 8, 9, 13, 21, and 22 (with Arg or Lys located at the predicted P1 position) may inhibit trypsin-like enzymes, while serpin-2, 7, 11, 12, and 17 ~ 20 (with Phe, Tyr, or Leu located at the putative P1 site) are anticipated to inhibit chymotrypsin-like SPs. Only three serpins, Serpin12, 15, and 20, contain Val and Met at the P1 position to control elastase-like enzymes. These predicted inhibitory activities and selectivity features surely need further experimental data to validate.
The immune signaling pathways and effector gene families
After recognizing the surface molecules of invading pathogens, recruited hemolymph immune factors would trigger signal transduction pathways and induce the production of effector molecules. Similar to melanization, the Toll pathway is also involved in the antifungal and antibacterial immunity. Cleavage activated Spätzle by Drosophila SPE (orthologous to M. sexta HP8 and H. armigera cSP7) acts as the ligand for this pathway, binding to Toll receptors and triggering its dimerization between leucine-rich repeats [6,7,63]. Six spätzle (Spz) transcripts are present in the H. armigera transcriptome (Figure 7A, in Additional file 6: Table S5). The phylogenetic tree indicates H. armigera Spz2 ~ 4 are 1:1 orthologs of D. melanogaster Spz2 ~ 5. H. armigera Spz1 contained 9 Cys residues at the same position to, perhaps form one interchain and four intrachain disulfide bonds. H. armigera cSP7, a trypsin-like enzyme, may cleave after R119 to activate the Spätzle-1. Orthologous to B. mori and M. sexta Spz1s with 41.9% and 53.5% identity, respectively and are grouped together in a lepidoptera specific branch in the phylogenetic analysis [64]. Toll like receptors have also undergone significant diversification, with a total of 11 members in H. armigera transcriptome (Figure 7B, in Additional file 6: Table S5). They contain the entire ORFs encoding extracellular Leu-rich regions, transmembrane region, and cytoplasmic TIR domains (Figure 7C). The phylogenetic analysis supported the following orthologous relationship: Dm9-Ag9-Ha9-Bm9. Like D. melanogaster Toll9 [65], H. armigera Toll-9, may participate in the induced synthesis of antifungal proteins. H. armigera Toll-6 and Toll-7 may also play roles in immune signaling, as they both are in the same clade along with D. melanogaster Toll1 and Toll7. Although the Toll pathway ligands and receptors have experienced major family expansions, the corresponding intracellular components, such as MyD88, Tollip, Tube, Pelle, Pellino and Cactus, are mostly conserved in all the studied insects including H. armigera (in Additional file 6: Table S5). Our results suggested that these components formed a multimeric protein complex in H. armigera, which further phosphorylated HaCactus resulting in the production of AMPs. The immune molecules involved in the anti-fungal and -bacteria pathways were conserved in the H. armigera (Figure 8).
The insect IMD pathway, analogous to the mammalian TNF signaling pathway, is critical for combating Gram-negative bacteria [7]. IMD protein itself is similar to mammalian TNF receptor as both contain a homologous death domain. Activation of IMD leads to the proteolytic cleavage of a Rel family transcription activator, Relish. The released active domain of Relish is then translocated into the nucleus to exert its function [6]. The IMD pathway components, including IMD, Dredd, FADD, TAK1, Tab2, IKK-β, IKK-γ and Relish were all identified in the H. armigera transcriptome strongly suggesting the conservation of IMD mediated immunity in cotton bollworm (in Additional file 6: Table S5). Additionally, two members of IAP-2 were also identified. In D. melanogaster, IAP-2 acts as a modulator of IMD pathway aiding the nuclear localization of Relish [66].Besides Toll and IMD pathways, JNK and JAK/STAT pathways are also implicated to play a role in insect immunity. Activated by TAK1 kinase, JNK pathways function through transcription factor Relish resulting in production of AMPs in both A. gambiae and D. melanogaster [67]. One copy of JNK, Hem, Jun and Fos genes of the JNK pathway were all identified in H. armigera transcriptome (in Additional file 6: Table S5). In Drosophila, JAK/STAT pathway controls the transcription of some immune molecules including TEPs, promotes phagocytosis and participates in antiviral response [68]. Based on homology to D. melanogaster JAK/STAT pathway members, cytokine receptor Domeless, Hopscotch, and STAT were present as single copy identified in H. armigera, indicating the conservation of this pathway (in Additional file 6: Table S5). The negative regulators of JAK/STAT pathway, SOCS and PIAS, were found in the transcriptome as well. Taken together, all these four pathways-Toll, IMD, JNK and JAK/STAT, coordinates in relaying the ‘danger’ signal ultimately stimulating the production of immune effector molecules (Figure 8).
Most AMPs are low molecular weight (<30 kDa) proteins forming amphiphilic α-helix, hairpin-like β-sheet, or structures stabilized by disulfide bonds [69]. Induction of AMPs in fat body and hemocytes is thoroughly studied in lepidopteran insects including M. sexta and B. mori [28,31]. Analysis of the H. armigera transcriptome revealed unigenes homologous to AMPs, including 4 lysozymes, 5 cecropins, 4 moricin, 3 gloverins, 1 defensin, 1 gallerimycin, and 1 lebocin (in Additional file 6: Table S5). The number of annotated AMPs is much lower than that of B. mori [31] and M. sexta [32], probably due to the lack of genome information. While cecropins and defensins are found in various insects, moricins and gloverins are only reported to be present in Lepidoptera. Furthermore, four putative lysozyme transcripts are present in H. armigera transcriptome (in Additional file 1: Figure S11). Lysozymes are 14–16 kDa enzymes that hydrolyze peptidoglycans in bacterial cell wall [70]. All four H. armigera lysozymes belong to the c-type lysozymes with four disulfide bridges, and lysozymes 1–3 contain the conserved active site including the Asp and Glu residues. Reactive oxygen and nitrogen species (ROS and RNS) played an important role in killing microbes and parasites [71]. However, the overproduction of free radicals can be toxic to the host cell, therefore a tight regulation of ROS/RNS concentration is required. Superoxide dismutase (SOD), glutathione S-transferase (GST), or peroxiredoxin converts ROS/RNS to less toxic product. Here, we have annotated some of genes, including 5 GSTs, 3 SODs, and 6 peroxiredoxin transcripts (in Additional file 6: Table S5), indicating that free radicals produced by ROS and RNS are likely to be components of the defense system in H. armigera. Further experiments are needed to verify suggested role of the immune signaling pathways and effectors in H. armigera as described above.
Expression profile of immunity-related genes in the fat body after fungal and bacterial infections
To verify the FPKM value changes between control and treatment groups, we performed quantitative real-time PCR analysis of the expression profiles of 18 genes in fat body and hemocytes from B. bassiana-, E cloacae- and buffer-injected H. armigera larvae. The results, consistent with the deep RNA-seq data, provide an overview of tissue-specific gene expression patterns of H. armigera larvae in response to B. bassiana infection (Figure 9, and in Additional file 9: Table S8). This suggests the expression data of other genes from the RNA-seq data analysis are also reliable.
Humoral defense proteins are mainly produced and secreted by fat body, an insect tissue analogous to liver and adipose tissue of mammals in terms of immunity and metabolism. For PRR genes, RNA-seq based transcriptome analysis indicated that short PGRPs (PGRP-SA1, -SA2, -SB1, -SB2, and -SD) were induced in fat body after the fungal challenge. Similarly, transcription of ScR-B1, B4, B8, TEP2, TEP3, CTL9 ~ 11, 18, 20 ~ 24 and galectins is up-regulated. This change is not limited to PRRs, some signal transducers and regulators are also induced. Cluster analysis of expression patterns revealed that cSP5, 8 ~ 10, cSPH11, 12, Serpin6, 14, 20, 21 mRNAs became much more abundant in fat body after the fungal challenge whereas higher levels of cSP2 ~ 4 and cSP6 transcripts were detected in hemocytes. The pronounced up-regulation of Toll pathway genes (e.g. Spz2, Pelle, Toll-8, Toll-9, Toll-14) occurred in fat body elicited by B. bassiana, suggesting their involvement in antifungal immunity. The gloverin1, lysozyme2, and lysozyme4 mRNA levels increased most dramatically to defend the fungal infection. Notably, most of these genes did not change sharply at transcript level in the other experiment conditions, indicating that fat body plays a major role in the antifungal response.
Bacterial challenge also resulted in the activation of a broad spectrum of immunity-related genes. These include PRRs such as CTLs (CTL2, 12, 14, 16, 17, and 26), signaling modulators (cSP3, Serpin1, 4, and 9), and effector molecules (gloverin1, gloverin2, Lebocin, and Moricin4). The mRNA abundances of FADD, Fos, JNK significantly increased in fat body elicited by E. cloacae, as evidenced by 12 genes in cluster 4, which were significantly up-regulated (fold change > 2, P < 0.001) in response to such bacterial infection. This number was apparently lower than that elicited by fungi, which is 24 (cluster 3, Figure 1C). Although more immunity-related genes were elicited by fungi (67) than bacteria (42), there is an overlap between them, including up-regulated cSP8, lysozyme3, gloverin1 and hemolin, and down-regulated βGRP3 (in Additional file 1: Figure S12).
Expression profile of immunity-related genes in hemocytes in response to fungal and bacterial challenges
Hemocyte mediated defense reactions such as phagocytosis and encapsulation also depend on the recognition of foreign pathogens to trigger downstream immune signaling pathways. The expression profiles of immunity-related genes in hemocytes elicited by the pathogens were highly different from those in fat body. To better understand the complexity of responses of hemocytes against pathogens, the expression profile of immunity-related genes elicited by fungi and bacteria was compared. Gene cohorts of cluster 1 include 21 such genes induced in B. bassiana induced hemocytes, including βGRP1, βGRP2, hemolin, cSP2, cSP6, Serpin2, Serpin5, lysozyme1, lysozyme3 and so on. Immune activation by E. cloacae is also noticeable as demonstrated by induction of PRRs (PGRP-SA2, TEP1) and immune modulators (MyD88, Toll-5, Cactus, Toll-12, Serpin3, 11, 21).
Thirty-three immunity-related genes in cluster 2 were significantly suppressed in B. bassiana elicited hemocytes. RNA-seq based transcriptome analysis indicated that long PGRPs (PGRP-LA, -LB, -LC, -LD) were strongly suppressed by B. bassiana but induced by E. cloacae. Quantitative real-time PCR also revealed that the βGRP1 mRNA level dramatically increased in hemocytes elicited by B. bassiana. In contrast, CTL mRNA levels did not change significantly in hemocytes. The transcript levels of PPOs were high in hemocytes, but significantly decreased in response to fungal challenge. Likewise, components of the IMD pathway (FADD, IAP2a, 2b, IKKβ, IKKγ, Dredd, Relish, UBc13, TAK1, IMD, Tab2), JNK and JAK-STAT pathways (Domeless, PIAS, Hopscotch, STAT, Fos, JNK, Hem, SOCS) were all dramatically suppressed in hemocytes in response to the fungal infection but moderately induced by bacterial infection, as validated by the quantitative real-time PCR assay of Domeless, JAK, and STAT transcripts. This indicated that signal transduction in hemocytes responds differently to bacterial and fungal infection. More immunity-related genes were regulated by fungi (60) than bacteria (18) in hemocytes, and most of them tended to be suppressed (in Additional file 1: Figure S12).
Although fat body and hemocytes both are important organs involved in the immune response, our transcriptome analysis revealed that more immunity-related genes are induced in fat body (clusters 3 and 4) than hemocytes (cluster 1). The expression of immune recognition and modulation molecules was mainly controlled in fat body, while genes involved in signal transduction, such as JAK-STAT and IMD, were regulated more drastically in hemocytes (Figure 9). In Toll pathway, hemocytes regulated Toll-1, Toll-5, Toll-12, Tollip, Cactus, and MyD88, but more other genes were regulatedin fat body. Additionally, quantitative real-time PCR indicated that the transcription of putative PGRP-SA1, PGRP-LB, Toll-14 and Spz2 genes were elevated in fat body upon B. bassiana infection, while the mRNA levels of defensin, moricin and gloverin1 were up-regulated in hemocytes (Figure 9), which is consistent with the RNA-seq results.