A transcriptomic insight into the infective juvenile stage of the insect parasitic nematode, Heterorhabditis indica

Background Nematodes are the most numerous animals in the soil. Insect parasitic nematodes of the genus Heterorhabditis are capable of selectively seeking, infecting and killing their insect-hosts in the soil. The infective juvenile (IJ) stage of the Heterorhabditis nematodes is analogous to Caenorhabditis elegans dauer juvenile stage, which remains in ‘arrested development’ till it finds and infects a new insect-host in the soil. H. indica is the most prevalent species of Heterorhabditis in India. To understand the genes and molecular processes that govern the biology of the IJ stage, and to create a resource to facilitate functional genomics and genetic exploration, we sequenced the transcriptome of H. indica IJs. Results The de-novo sequence assembly using Velvet-Oases pipeline resulted in 13,593 unique transcripts at N50 of 1,371 bp, of which 53 % were annotated by blastx. H. indica transcripts showed higher orthology with parasitic nematodes as compared to free living nematodes. In-silico expression analysis showed 30 % of transcripts expressing with ≥100 FPKM value. All the four canonical dauer formation pathways like cGMP-PKG, insulin, dafachronic acid and TGF-β were active in the IJ stage. Several other signaling pathways were highly represented in the transcriptome. Twenty-four orthologs of C. elegans RNAi pathway effector genes were discovered in H. indica, including nrde-3 that is reported for the first time in any of the parasitic nematodes. An ortholog of C. elegans tol-1 was also identified. Further, 272 kinases belonging to 137 groups, and several previously unidentified members of important gene classes were identified. Conclusions We generated high-quality transcriptome sequence data from H. indica IJs for the first time. The transcripts showed high similarity with the parasitic nematodes, M. hapla, and A. suum as opposed to C. elegans, a species to which H. indica is more closely related. The high representation of transcripts from several signaling pathways in the IJs indicates that despite being a developmentally arrested stage; IJs are a hotbed of signaling and are actively interacting with their environment. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2510-z) contains supplementary material, which is available to authorized users.


Conclusions:
We generated high-quality transcriptome sequence data from H. indica IJs for the first time. The transcripts showed high similarity with the parasitic nematodes, M. hapla, and A. suum as opposed to C. elegans, a species to which H. indica is more closely related. The high representation of transcripts from several signaling pathways in the IJs indicates that despite being a developmentally arrested stage; IJs are a hotbed of signaling and are actively interacting with their environment.

Background
Nematodes are the most abundant metazoans on earth and show remarkable diversity in their ecological and feeding habits [1]. Although notorious as parasites and pathogens of humans, animals, and plants, the majority of nematodes are beneficial to us as they recycle nutrients in soils and oceans [1,2]. Another beneficial nematode group known as entomopathogenic nematodes (EPNs) encompass two genera, Steinernema, and Heterorhabditis. These EPNs symbiotically associate with gram-negative gammaproteobacteria, Xenorhabdus, and Photorhabdus, respectively [3]. Because of their ability to kill insects rapidly and amenability to mass production, they are widely used for the biological control of the insect pests of crops [4][5][6]. The EPNs are models to study animal-microbe symbiosis [7][8][9][10], nematode parasitism [11] and ecology [12,13].
The infective juvenile (IJ) stage of the Heterorhabditis spp. is a developmentally arrested stage analogous to the dauer stage of the C. elegans [14], and infective L3 stage of many animal parasitic nematodes [15]. IJs are the only EPN stage found in nature outside the insect-host, and are capable of surviving tough environmental conditions in the soil for long periods of time. The nematodes in the IJ stage do not feed or grow until they find a new insect-host, and they possess a remarkable ability to actively search, follow and infect their insect-host in the soil environment [16,17]. IJs are known to show different kinds of parasitic behaviors [18]. They can be desiccated to quiescence or frozen in liquid nitrogen [19,20], and then be revived back to life. Thus, there is a possibility to extend the lifespan/delay life cycle. Because of this remarkable environmental toughness of the IJs, all the EPN formulations, presently available in the market, are based on this stage. An extensive body of research exists on the genes, pathways, and processes involved in aging in the free-living nematode, C. elegans [21][22][23]. A similar understanding of genes that increase the lifespan in EPNs would be directly beneficial in extending the shelf-life of EPN IJs, and IJ based formulations to improve their use as a pest control product [24][25][26].
Genomic tools and technologies have allowed the researchers to uncover the amazing biology of nematodes [27][28][29]. The genome of the EPN, Heterorhabditis bacteriophora TTO1-M31e strain has been sequenced [30] and is available in the public domain. Additionally, the expressed sequence tags (ESTs) of H. bacteriophora GPS-11 strain [31,32] and transcriptome of the adult stage of H. bacteriophora TTO1-M31e were published earlier [33]. Large amount of information is available on molecular biology of the dauer/developmentally arrested L2 and L3 stages of various nematodes, such as free-living C. elegans and C. briggsae [34][35][36][37][38], insect-associated Pristionchus [39,40], animal parasitic Strongyloides stercoralis [41] and Ostertagia ostertagi [42] and many plant parasitic nematodes [43][44][45][46][47][48]. However, such information is completely lacking for IJ stage of EPNs. Scanty information available on the Heterorhabditis IJ 'recovery' is not adequate to decipher the various molecular and physiological pathways specific to these IJs [33,49]. Additionally, it is suggested that genes expressed in survival or dispersal stages in nematodes, such as dauer, and EPN IJs, are more likely to be novel, compared with the genes expressed in adult or larval stages [29].
H. indica was the first species of this genus recorded from India [50]. Since then, various surveys showed that H. indica is the most predominant species of Heterorhabditid nematode in India and is found in almost all the geographical parts of the country. Therefore, H. indica is naturally suitable for incorporation in insect biological control programs in India. In the present study, the transcriptomic analysis of the IJ stage of H. indica was carried out to understand the molecular processes and pathways active at this stage, and to create a resource for further functional genomics and genetic investigations.

Transcriptome sequencing and assembly
The mRNA sequencing of IJ stage of H. indica using the Illumina GAIIx platform yielded about 51.2 million reads of 100 base read-lengths generating 64x coverage. After quality filtering, 42.3 million high-quality reads totalling 4.2 gigabases of data were obtained. The de-novo sequence assembly was carried out by Velvet at different k-mer lengths (51-93 with step size of 4) with minimum contig length of 200. The optimal assembly was attained at k-mer 83 which resulted in 18,710 contigs with 909 bp N50 ( Table 1). Merging of transcripts from 71 to 83 k-mer range by Oases resulted in 23,827 transcripts with 1,292 bp N50 size. Removing duplicates by cd-hitest, and filtering out < 300 bp transcripts resulted in 13,593 unique transcripts with N50 of 1,371 bp ( Table 1). Total of 13,592 proteins were predicted by ORFPredictor [51] which were then used for downstream analysis.

Characterization of H. indica transcripts
The blastx analysis of H. indica transcripts resulted in annotation of 7,246 transcripts (Additional file 1: Table S1a), of which 6,320 hits matched to animal  Table S1b).
Comparison of the transcripts with complete genomes of other closely related rhabditid nematodes through reciprocal blast approach showed 3,364 orthologs of C. elegans, 3,103 of C. briggsae, 3,171 of C. remanei, 2,164 of P. pacificus and 346 of H. bacteriophora (Fig. 2a). However, higher numbers of orthologs were identified when the transcripts were compared to the animal parasitic nematodes-9,685 orthologs in A. suum, 6,819 in Strongyloides ratti while other parasites like Meloidogyne hapla, M. incognita, B. malayi and Trichinella spiralis ranked in between these two nematodes (Fig. 2b).
The transcripts were analysed to identify the key metabolic pathways and processes of which 4,738 proteins were mapped to various pathways ( Table 3). The 60 most represented pathways included signaling pathways like PI3K-Akt, MAPK, Rap1, Ras, insulin, FoxO, AMPK, Fig. 1 a Distribution of the top 10 nematode species with most homologs to Heterorhabditis indica. The distribution was calculated using best blastx hits. b. Venn diagram of H. indica transcripts matching H. bacteriophora proteins in a standalone blast cAMP, Wnt, Hippo, chemokine, neurotrophin, sphingolipid, oxytocin, thyroid hormone, cGMP-PKG, and signaling pathways regulating pluripotency of stem cells ( Table 3). Transcripts that were mapped to all the pathways in H. indica IJs are represented in Fig. 4.
The transcripts were also analyzed using the EuKaryotic Orthologous Groups (KOG) and Protein K(c)lusters (PRK) databases. The results of the analysis are presented in Additional file 1: Table S1. The KOG analysis is a eukaryote-specific version of the Clusters of Orthologous Groups (COG) tool for identifying ortholog and paralog proteins. Broadly, 1,519 transcripts were classified to signal transduction (KOG function ID-T), 985 to transcription (KOG function ID-K), 747 to translation, ribosomal structure and biogenesis (KOG function ID-J), 566 to RNA processing and modification (KOG function ID-A), 85 to defence mechanisms (KOG function ID-V) amongst other KOG classes (Additional file 1: Table S1). A total of 3,594 transcripts were annotated using PRK database (Additional file 1: Table S1).

Transcriptome quantitation and enrichment of significant biological categories and KEGG pathways
To get an estimate of transcript abundance, in silico quantitation of transcripts was done by mapping the reads from individual libraries to the non-redundant set of 13,593 transcripts using TopHat, and transcript abundance were calculated using Cufflinks. The FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values for all the transcripts are given in Additional file 3: Table S2. The highly abundant transcripts were searched against KOG and PRK databases to identify their functions. We identified 202 transcripts showing ≥1000 FPKM, and 4,124 transcripts with ≥100 FPKM (Additional file 3: Table S2). The KOG analysis predicted functions for 76 proteins with ≥ 1,000 FPKM values, of which three most abundant protein classes were translation, ribosomal structure and biogenesis (KOG function ID-J), post translational modification, protein turnover, chaperones (KOG function ID-O) and intracellular trafficking, secretion, and vesicular transport (KOG function ID-U) ( Table 4, Additional file 3: Table S2). In the 2,345 proteins with ≥ 100 FPKM values (Additional file 3: Table S2), other predominant protein functional classes that showed up in 2,345 proteins with ≥ 100 FPKM values were signal transduction (KOG function ID-T), energy production and conversion (KOG function ID-C), RNA processing and modification (KOG function ID-A), and transcription (KOG function ID-K). The PRK database analysis showed a similar result (Additional file 3: Table S2).
Metabolic pathway analysis was done using KEGG Automatic Annotation Server against C. elegans, C. briggsae, B. malayi, Loa loa and Trichinella spiralis pathways. The analysis of KEGG pathways represented by the abundant transcripts revealed that, among others, at FPKM ≥ 1,000, the various signaling pathways like PI3K-Akt, Hippo, HIF-signaling pathway, Rap, MAPK, calcium, sphingolipid, cGMP-PKG, insulin signaling pathway were represented by at least one or more protein (Additional file 4: Table S3). However, at FPKM ≥ 100, in addition to the above pathways, several other signaling pathways like FoxO, cAMP, Ras, sphingolipid, epithelial cell, AMPK, TGF-ß were detected (Additional file 4: Table S3).

The kinome of H. indica IJs
The kinome analysis was done to identify the protein kinases important in signal transduction in all the above mentioned signaling pathways that regulate metabolism, cell cycle, growth and development, and responses to environmental stimuli. As against 438 kinases reported from C. elegans [52], we detected 272 in H. indica IJ transcriptome at stringent blastp parameters of at least 40 % sequence identity and 50 % query coverage  (Table 5). These 438 (C. elegans) kinases were classified into 187 groups, and we found that 137 kinase groups were common between C. elegans and H. indica, whereas, 50 kinase groups were not found in H. indica. The details of kinase groups common between C. elegans, and H. indica are given in Table 5, and kinases that could not be discovered in H. indica but present in C. elegans are listed in Additional file 5: Table S4.

The secretome of H. indica IJs
A total of 2,374 secreted proteins were predicted (Additional file 6: Table S5a). The important proteins found in the analysis were related to neuropeptide signaling, for example, 2 each of GPCR-Family 2 like and GPCR rhodopsin-like including GPCR rhodopsin-like 7TM, and GPCR Family 3 C-terminal domains. Several hydrolases were identified, including 33 hydrolases belonging to small GTPases, glycoside hydrolases, transthyretin/ hydroxyisourate hydrolase, alpha/beta hydrolase and epoxide hydrolase. The secretome showed the presence of a large contingent of peptidases that have a known role in degrading insect tissues. We could identify 38 peptidases belonging to different classes, such as metallopeptidases, trypsin-like cysteine/serine peptidases, cysteine peptidases, peptidase S1 (serine endopeptidases), S1A, S8, S10, S24, S26, S28, S53, S54, M10, M13, M14, M28, M12, M41. Some of these peptidases like carboxypeptidase possess regulatory domains. A search of the MEROPS database [53] for identification of putative peptidases (proteases, proteinases, and proteolytic enzymes) identified 64 known peptidases of the different parasitic and free-living nematodes (Additional file 7: Table S5b). Five transcription factors including STAT, p53, TFIID were also identified. Several genes involved in signaling, such as 13 members of protein kinases were present in the secreted contingent, including serine threonine, tyrosine, and thiamine phosphate kinase. Similarly, 12 members of phosphatases were found. Lastly, the transcripts showed the presence of several known stress response genes such as glutathione peroxidases, heat shock protein 70 and heat shock protein 90.

Repeat elements in H. indica transcriptome
The transcriptome data was used to analyze the repeat elements because no information is available for repeat elements in this species. Transcript sequences were examined for the presence of repeat elements using Repeat Masker v-4.0.5 program. Approximately 1.4 % of the total transcripts were found to be encoded by different repetitive elements, of which 1.21 % belonged to simple repeats, and 0.29 % were low complexity repeats (Additional file 8: Table S6a). A total of 31 retroelements were found in the transcripts, with four long interspersed repeat elements (LINEs), although no short interspersed repeat elements (SINEs) were found. Among retroelements, 27 long terminal repeats (LTR) were found which was higher than non-LTR elements. Also, 15 DNA transposons of different classes, 103 small RNA, and three satellites were found (Additional file 8: Table S6a). Using MISA to identify short sequence repeats (SSRs) revealed 2,968 sequences showing the presence of 3,635 SSRs. Out of the 2,968 sequences, 465 sequences contained more than one SSRs and 209 SSRs were present in compound formation (Additional file 9: Table S6b). Mononucleotide repeats (46.6 %), and trinucleotide repeats (46.05 %) represented the largest fraction of SSRs, followed by di-nucleotide repeats (6.3 %). The number of tetra-(32), penta-(5) and hexa-(1) nucleotide repeats were below 0.1 % (Additional file 9: Table S6b).

RNAi pathway genes and other gene classes in H. indica IJs
C. elegans genome encodes 77 RNAi pathway effector genes, which is the most number of RNAi pathway effector genes discovered in any nematode [54]. We could identify 24 RNAi pathway effector genes in the present transcriptome (Table 7). Different RNAi effector genes identified were six genes encoding for small RNA biosynthetic proteins, four genes for dsRNA uptake, spreading and siRNA amplification, three for Argonautes, two each for RNA-induced silencing complex genes (RISC) and RNAi inhibitors, and seven for nuclear RNAi effectors (Table 6, Additional file 10: Table S7). The presence of nrde-3 in H. indica (percent identity, 30.27; query Additionally, the H. indica transcriptome was analysed for presence of members of functionally important gene classes like neuropeptides (FMRFamide-related peptides (flp), non-insulin, non-FMRFamide-related neuropeptidelike proteins (nlp), uncoordinate (unc), dauer formation (daf), fatty acid and retinol binding protein (far), nuclear hormone receptor (nhr), C-type lectin domain containing proteins (lec), lysozymes (lys) and lethal (let) gene classes at two stringency levels of 25 and 30 % sequence similarity and query coverage. The results are presented in Table 7. Interestingly, we also found an ortholog of C. elegans tol-1 in the transcriptome of H. indica IJs (32.9 % identity, 88 query coverage at 2e-180).

Discussion
The transcriptome sequencing and assembly of H. indica IJs resulted in 13,593 unique, high-quality transcripts at N50 value of 1,371 bp. Further, 6,320 out of 13,593 (53 %) transcripts could be annotated by blastx against nr database. Most of the blastx hits showed similarity with A. suum and not H. bacteriophora which is a closely related species. This anomaly may be attributed to the absence of H. bacteriophora sequences from nr database. Standalone blast identified 2,745 hits with H. bacteriophora.
The free living-developmentally arrested infective stage is characteristic of many parasitic nematodes [55][56][57][58].   The "dauer hypothesis" proposes that similar molecular mechanisms regulate the developmental arrest and activation of both C. elegans dauer larvae and analogous developmentally arrested 3 rd stage larvae (L3i) of parasitic nematodes [56,57,59] despite their evolutionary divergence [60,61]. In the free-living model nematode, C. elegans, a developmentally arrested dauer stage is formed during conditions of low food abundance, high temperature [62], high dauer pheromone levels [63,64] and high population density [65,66]. The daf (abnormal dauer formation) genes identified in C. elegans that are involved in formation and regulation of dauer stages are placed into four dauer pathways-a cyclic guanosine monophosphate (cGMP) signaling pathway, an insulin/ IGF-1-like signaling (IIS) pathway regulated by insulinlike peptide (ILP) ligands, a dauer transforming growth factor-β (TGF-β) pathway regulated by the Ce-DAF-7 ligand, and a nuclear hormone receptor (NHR) regulated by a class of steroid ligands known as dafachronic acids (DAs) [35]. Epistatic analysis revealed that the cGMP signaling pathway operates upstream of the parallel IIS and dauer TGF-β pathways, which converge on the DA biosynthetic pathway, ultimately regulating the NHR Ce-DAF-12 [38,41]. Analysis of dauer pathways in the L3i stage of S. stercoralis revealed that out of four pathways involved in dauer formation, two were conserved while two were not, suggesting their conserved and novel modes of developmental regulation [41,67]. Our results show that at least two of the canonical dauer pathways-insulin signaling pathway and cGMP-PKG signaling pathway were represented in the top 60 active pathways by at least 46 and 32 proteins, respectively (Table 3). Further, TGF-β pathway was represented by 27 proteins, and the dafachronic acid pathway was represented by a single but important gene, daf-1 (Additional file 11: Table S8). DAF-1 encodes a TGF-beta type I receptor homolog, which, in association with the DAF-4, regulates dauer formation in response to environmental signals through the ASI chemosensory neuron [68][69][70]. Our results show that similar to C. elegans, all the four dauer formation pathways are conserved and active in the IJ stage of H. indica. EPN IJs are not known to feed, but they utilize the lipids and glycogen energy reserves stored in the body   for their survival. We found genes involved in various pathways like fatty acid degradation, glycolysis, and glyoxalate in the IJ transcriptome. All these three pathways catabolize energy reserves such as fatty acids and glucose and generate ATPs that are utilized for the IJ survival. Glyoxalate pathway has been known to be important for dauer stages of C. elegans [71] and has also been reported in an EPN, Romanomermis [72]. We found several signaling pathways in the transcriptome of H. indica IJs essential for nematode survival under stressed conditions and various other activities ( Table 3). Some of these signaling pathways, such as PI3K-Akt and mTOR signaling pathways are involved in regulation of cell cycle and in mediating oxidative stress responses and extending the lifespan in the nematodes [73,74]. Presence of other signaling pathways such as the MAPK known to be involved in nematode response to various cellular and environmental stimuli including stresses and cell proliferation, regulation of fertilization in nematodes, especially sperm activation [75,76] suggest that these signaling pathways might control the IJ nematodes from being reproductive in the arrested stage. cGMP-PKG signaling is involved in olfactory sensing and behavior regulation in the nematodes [77,78] and flies [78,79], and pharyngeal pumping rate, mouth form dimorphism, the duration of forward locomotion, and the amount of fat stored in the intestine in necromenic insect associated nematode, Pristionchus [80]. This indicated that the H. indica IJs also actively sense their environment and adapt their metabolism and behavior accordingly.
The analysis of the H. indica secretome identified several hydrolases, a large contingent of peptidases, kinases, phosphatases, and enzymes involved in stress responses. Some of these enzymes are important for the degradation of insect cuticle, tissue, and hemocoel, whereas peptidases are also known to be involved in regulatory functions. The presence of a large number of kinases and phosphatases indicates vibrant signaling in the IJ stage. All these findings suggest that although IJ is a developmentally arrested stage; it is still a hotbed of signaling and is actively sensing its environment. H. indica is a rhabditid as C. elegans, which shows the presence of 77 RNAi pathway genes [54]. Primary sequence similarity based search was carried out to identify putative orthologs of C. elegans RNAi pathway genes in H. indica. We found 24 orthologs of C. elegans RNAi pathway effector genes in H. indica IJs. The completed genome sequence of another species of the same genus, H. bacteriophora revealed the presence of only 12 RNAi pathway genes [30] indicating either incompleteness of the genome or false negatives because of poor annotation of H. bacteriophora genome. Interestingly, the RNAi pathways can differ significantly even amongst very closely related nematode species, as is evident by the fact that the number of RNAi effector genes varied from 60 to 77 amongst different species of Caenorhabditis spp. [54]. Out of the four RNAi effector genes present in most known parasitic nematodes, drsh-1, rsd-3, ego-1, and smg-2 were present in H. indica IJs. However, ego-1 was absent in the two parasitic nematodes Trichinella spiralis, and A. caninum [54], suggesting that it is not universally present in parasitic nematodes as thought earlier. We found nrde-3 in H. indica IJs at a low stringency cutoff, which is responsible for nuclear translocation of RNAi triggers in C. elegans, and is involved in processes that lead to the heritability of gene silencing events. The absence of nrde-3 in parasitic nematodes has led to speculations that silencing events cannot be passed between generations of parasitic nematodes [54]. However, sequences with loose homology to the C. elegans nrde-3 could be discovered in H. bacteriophora genome as well, suggesting that the absence of nrde-3 in H. bacteriophora might be a false negative caused by a failure to predict the H. bacteriophora nrde-3 gene. Its presence in Heterorhabditis nematodes indicated that the silencing events could probably be passed between generations, and opens up a whole new  array for use of Heterorhabditid nematodes as a model for epigenetic regulation of RNAi pathways. The sequence divergence between C. elegans and H. indica prevented discovery of C. elegans orthologs of important gene class members at a high stringency. By lowering the stringency of the blastn to 30 % identity and query coverage, we could identify several additional members of the various gene classes in H. indica, but these orthologs would need further validation. The H. indica transcriptome showed the presence of at least 22 flp, 25 nlp and 18 ins neuropeptide genes, 69 unc, 21 daf and 0 (4 at 25 %) far genes, 98 nhr, nine lec, 15 let but no lys gene class members ( Table 7, Additional file 11: Table S8). In the daf gene class, daf-1, daf-2 and daf-4 were identified, all of which are important in dauer formation in C. elegans. daf-1 encodes a TGF-beta type I receptor homolog, which together with the TGF-β-like type II receptor DAF-4, is required for the regulation of dauer formation by environmental signals [81][82][83][84]. Similarly, daf-7 encodes a member of the TGF-β superfamily; which is involved in signaling pathway that interprets environmental conditions to regulate energy balance pathways that affect dauer larval formation, fat metabolism, egg laying, feeding behavior and sperm motility [85][86][87][88]. Identification of several insulin-like peptide (ins) genes proved the role of insulin signaling in IJ formation and maintenance in H. indica. Neuropeptides like flp and nlp are involved in environmental sensing by the nematode. In the flp gene class, flp-1, flp-3, flp-5, flp-12, flp-17 and flp-18 were the prominent members. In the recent years, flp genes are emerging as important targets for nematode management, and it has been shown that disruption of flp gene expression impaired nematode parasite's ability to locate its host [89][90][91][92][93][94][95]. Other neuropeptides found in H. indica, like nlp-4, has no known homologs in other nematode species [90,96,97], whereas nlp-18 in C. elegans encodes four predicted neuropeptide-like proteins; and is expressed in a variety of neurons, spermatheca, the rectal gland, and the intestine [98]. Another important protein class, nematode lectins, are protein molecules that bind to carbohydrate moieties. They are involved in cell-cell recognition and are important in nematode recognition of bacteria and innate immune responses against pathogens. Nine members of the lec gene class were identified in H. indica including lec-6. lec-6 encodes a 'proto' type galectin (beta-galactosylbinding lectin) containing a single carbohydrate recognition domain and is suggested to be important for cell adhesion and aggregation, proliferation, or programmed cell death in C. elegans [99][100][101]. Likewise, in H. indica, members of the lectin protein family might possibly be involved in recognition of the symbiont bacteria. Similarly, tol-1 found expressing in H. indica IJs has been reported to be involved in behavioral responses to the pathogenic microbes by promoting the development of sensory neurons that monitor microbial metabolism and are required for a pathogenavoidance behavior in C. elegans [102]. Hence, it is possible that tol-1 could be involved in the maintenance of a specific symbiotic relationship between Heterorhabditis nematodes with Photorhabdus bacterium, but this hypothesis would need further testing.

Conclusions
Here we presented a transcriptomic insight into the infective juvenile stage of the EPN, H. indica. After using cd-hit-est and filtering out <300 bp transcripts, we have identified 13,592 unique transcripts in H. indica infective juveniles. 18.6 % of the proteins were similar to an animal parasite A. suum. We found that similar to C. elegans, all the four dauer formation pathways-cGMP-PKG signaling pathway, insulin signaling pathway, dafachronic acid pathway, and TGF-β were conserved in H. indica and were active in the IJ stage of the nematode. Several important signaling pathways were found active in the IJs indicating that despite being a developmentally arrested stage, IJs are a hotbed of signaling and are actively interacting with their environment. Similarly, glycolysis and fatty acid degradation pathways were highly active in IJs indicating a breakdown of food reserves required for survival. Twenty-four orthologs of C. elegans RNAi pathway effector genes were found in H. indica IJ transcriptome, including nrde-3 that has been identified in any of the parasitic worms for the first time. Using a low stringency approach, we have identified several additional members of important gene classes in H. indica. Our results and analysis lay down the groundwork for further functional genomic investigations on these gene classes in Heterorhabditis nematodes.

Nematode collection and multiplication
The Heterorhabditis indica nematodes were isolated from the soil collected from Ghaziabad district, UP, India by using greater wax moth Galleria melonella as a bait. The nematodes were maintained in the laboratory on Galleria using standard procedures.

RNA extraction, cDNA synthesis, library preparation and sequencing
Total RNA was extracted from the frozen IJs using Nucleospin RNA isolation kit (Macherey-Nagel GmbH & Co. KG, Düren, Germany) according to the manufacturer's instructions. Extracted RNA was assessed for quality and quantity using an Agilent 2100 Bioanalyzer (Agilent Technologies). RNA with an RNA integrity number (RIN) of 8.0 was used for mRNA purification. mRNA was purified from 1 mg of intact total RNA using oligodT beads (Illumina® TruSeq® RNA Sample Preparation Kit v2). The purified mRNA was fragmented at elevated temperature (90°C) in the presence of divalent cations and reverse transcribed with Superscript II Reverse Transcriptase (Invitrogen Life Technologies) by priming with random hexamers. Second strand cDNA was synthesized in the presence of DNA polymerase I and RNaseH. The cDNA was cleaned using AgencourtAmpure XP SPRI beads (Beckman-Coulter). Illumina adapters were ligated to the cDNA molecules after end repair and the addition of an ' A' base followed by SPRI clean-up. The resultant cDNA library was amplified using PCR for the enrichment of adapter-ligated fragments, quantified using a Nanodrop spectrophotometer (Thermo Scientific) and validated for quality with a Bioanalyzer (Agilent Technologies). It was then sequenced on the Illumina Hiseq 2000 platform at SciGenom Next-Gen sequencing facility, Cochin, India. Both the raw and assembled sequence data generated has been deposited in the European Nucleotide Archive (ENA) database (http://www.ebi.ac.uk/ena) for public access (raw data accession no.: PRJEB10852, assembled contigs accession numbers: HADG01000001-HADG01013593). The assembled nucleotide and protein sequences are also available for blast and download at http://insilico.iari.res.in/ hindica/. The assembled data is included with the manuscript as Additional file 12.

De novo transcriptome assembly and analysis
Paired orphan sequence reads obtained from IJs were used for assembly of the transcriptome [103]. The low quality reads (Phred score <30) were removed and sequencing statistics was generated with the help of NGSQC Toolkit version v2.3.3 [104]. High quality filtered paired-end raw reads (Phred Score ≥ 30) obtained from IJs were assembled using Velvet (v.1.2.08) and Oases (v.0.2.08) pipeline [105]. Velvet was run at different k-mer lengths (51-93 with a step size of 4)-with minimum contig length of 200. The optimal assembly was attained at k-mer 83. The oases module was used for merging transcript assemblies from k-mer 71 to 83 (71,75,79,83) with minimum transcript length of 100 using the script "oases_pipeline.py" (k-mer range 71-83, insert length 250 bp, coverage depth cut off 5). Cd-hit-est was used to remove redundant transcripts at 90 % similarity. Transcripts <300 nucleotide length were removed resulting in a unique set of nonredundant transcripts.

Annotation and quantification of the transcriptome
ORFPredictor web server (http://bioinformatics.ysu.edu/ tools/OrfPredictor.html) [51] was used to predict proteins from the 13,593 transcripts (>300 bp length) using the default cut-off value of 1e-5, and 13,592 proteins were predicted which were used for annotation. Annotation for all the unique transcripts (>300 bp) was done using blastp [106], homology search against Uniprot [107], the National Center for Biotechnology Information (NCBI)-NR Protein database [106] and NEMABASE4 (http:// www.nematodes.org/nembase4/). In addition, blastx was performed to identify homologues at ≥30 % query coverage and ≥50 % sequence identity and e-value 1e-5 in other databases including RefSeq (PRK), SWISSPROT [108], European Molecular Biology Laboratory(EMBL) [109], DNA Databank of Japan (DDBJ) [110], Protein Information Resource (PIR) [111] and Protein Data Bank (RCSB) [112]. Nematode orthologs were identified from NCBI COG [113] database and other completely sequenced genomes by the reciprocal blast method [106]. To study gene orthologs across free-living and parasitic nematode species, we used the predicted protein sets from 11 genomes available in the public domain (Wormbase, NCBI, and Sanger) viz., C. elegans, C. remanei, C. briggsae, M. hapla, M. incognita, H. bacteriophora, Pristionchus pacificus, Brugia malayi, S. ratti, Trichinella spiralis and A. suum. Blastp hits with e-value scores 1e-5 and query coverage above 50 % were considered as annotated homologous proteins and python script was employed for filtering reciprocal best hits. KEGG orthologs were identified using the KEGG Automated Annotation Server (KAAS) using nematode database. iPATH server was used for mapping it to KEGG reference pathway [114]. The gene ontology and domains were identified using InterProScan 5 with default parameters [115]. The resulting hits were processed to retrieve associated GO terms describing biological processes, molecular functions, and cellular components. Homologs of the C. elegans RNAi pathway genes were also identified in the H. indica transcriptome by performing tblastx with e-value ≤ 1e-5.
The high-quality reads were mapped to the nonredundant assembled transcripts using TopHat v-2.0.9. [116][117][118][119]. Assembly of transcript models from RNA-Seq alignments and estimation of transcripts and their abundance was performed using Cufflinks v-2.1.1 [119]. Both these software packages were used with default parameters for our analysis [119].