Transcriptomic analysis of the entomopathogenic nematode Heterorhabditis bacteriophora TTO1

Background The entomopathogenic nematode Heterorhabditis bacteriophora and its symbiotic bacterium, Photorhabdus luminescens, are important biological control agents of insect pests. This nematode-bacterium-insect association represents an emerging tripartite model for research on mutualistic and parasitic symbioses. Elucidation of mechanisms underlying these biological processes may serve as a foundation for improving the biological control potential of the nematode-bacterium complex. This large-scale expressed sequence tag (EST) analysis effort enables gene discovery and development of microsatellite markers. These ESTs will also aid in the annotation of the upcoming complete genome sequence of H. bacteriophora. Results A total of 31,485 high quality ESTs were generated from cDNA libraries of the adult H. bacteriophora TTO1 strain. Cluster analysis revealed the presence of 3,051 contigs and 7,835 singletons, representing 10,886 distinct EST sequences. About 72% of the distinct EST sequences had significant matches (E value < 1e-5) to proteins in GenBank's non-redundant (nr) and Wormpep190 databases. We have identified 12 ESTs corresponding to 8 genes potentially involved in RNA interference, 22 ESTs corresponding to 14 genes potentially involved in dauer-related processes, and 51 ESTs corresponding to 27 genes potentially involved in defense and stress responses. Comparison to ESTs and proteins of free-living nematodes led to the identification of 554 parasitic nematode-specific ESTs in H. bacteriophora, among which are those encoding F-box-like/WD-repeat protein theromacin, Bax inhibitor-1-like protein, and PAZ domain containing protein. Gene Ontology terms were assigned to 6,685 of the 10,886 ESTs. A total of 168 microsatellite loci were identified with primers designable for 141 loci. Conclusion A total of 10,886 distinct EST sequences were identified from adult H. bacteriophora cDNA libraries. BLAST searches revealed ESTs potentially involved in parasitism, RNA interference, defense responses, stress responses, and dauer-related processes. The putative microsatellite markers identified in H. bacteriophora ESTs will enable genetic mapping and population genetic studies. These genomic resources provide the material base necessary for genome annotation, microarray development, and in-depth gene functional analysis.


Background
The entomopathogenic nematode, Heterorhabditis bacteriophora, and its mutualistic bacterium, Photorhabdus luminescens, are important biological control agents of insect pests [1] and represent an emerging model for research on mutualistic and parasitic symbiosis [2,3]. The use of H. bacteriophora as a biological control agent is hampered by its susceptibility to environmental extremes including temperature, desiccation, and UV radiation, differences in virulence towards different insect pests, and short shelf life. Elucidation of molecular mechanisms underlying these biological processes may serve as a foundation for improving the biological control potential of the nematode-bacterium complex.
The entomopathogenic nematode H. bacteriophora has a distinct life style. The infective juveniles (IJs) or dauer juveniles (DJs) persist in soil in search of a suitable insect host [4]. Following entry into the insect host through natural body openings and cuticle, the IJs regurgitate the symbiotic bacteria into the insect hemocoel [5]. The bacteria kill the insect host via septicemia, usually within 24-48 h [5]. The nematodes feed on the multiplying bacteria and disintegrated host tissues and produce 1 to 3 generations within the cadaver. When the food source depletes and nematode density reaches a threshold, next-generation IJs are formed which exit the cadaver in search of a new host [3].
In contrast to the closely related genetic model Caenorhabditis elegans, few genomic resources are available for H. bacteriophora. However, some progress has been made over the past few years with the generation and analysis of 1,000 ESTs from H. bacteriophora GPS11 strain [3,6], the start of an H. bacteriophora complete genome sequence project (supported by the National Human Genome Research Institute), and the development of a reverse genetics tool using RNA interference [7]. Release of the genomes of C. elegans [8], C. briggsae [9], Brugia malayi [10], bacterium Photorhabdus luminescens subsp. laumondii TTO1 (obligate endosymbiont) [11] and over 1 million ESTs from various nematode species deposited in Gen-Bank offers unprecedented opportunities for the genetics of entomopathogenic nematodes. Here, we report on the construction of cDNA libraries from H. bacteriophora TTO1 adult hermaphrodites and the generation and analysis of 31,485 ESTs. The TTO1 strain is different from GPS11 strain in insect toxicity and their symbiotic bacteria (Grewal et al., unpublished). ESTs are valuable for gene discovery and can also be used in the identification of microsatellite markers [12]. Therefore, we also identified microsatellite loci in H. bacteriophora ESTs for use in population genetic studies. In addition, ESTs will also aid the prediction of protein-coding genes in the annotation of complete genomes. However, domain identification, secretome prediction, phylogenetic and evolution analysis on the ESTs that are of short-length and low coverage are not very informative, and therefore were not performed.  (Figure 1). In total, we identified 10,886 distinct EST sequences.

Putative functional identifications of the ESTs
In order to assess the putative identities, all distinct ESTs were subjected to BLASTx sequence similarity searches against GenBank's nr database and Wormpep190 database consisting of extensively curated C. elegans proteins from Wormbase [13]. Of the 10,886 distinct ESTs, 7,828 (71.9%) had significant matches (E value cutoff 1e-5) to proteins in GenBank's nr database. As expected, most of the best matches (95.9%) were to nematode proteins (Figure 2). A small proportion (0.3%) of the best matches was  Table 3). The currently identified H. bacteriophora RNAi genes are a small portion of those identified in C. elegans and B. malayi ( Figure 4). We also identified 22 ESTs corresponding to 14 genes potentially involved in dauer-related processes (Table 4) and 51 ESTs corresponding to 27 genes involved in defense and stress responses (see Additional File 2: H. bacteriophora distinct ESTs similar to C. elegans genes involved in defense and stress responses).

Identification of parasitic nematode-specific ESTs
In order to identify parasitic nematode-specific ESTs, a comparison of H. bacteriophora ESTs to all nematode EST sequences from GenBank (see additional file 3: Nematode species that ESTs used in the analysis came from) was performed. The nematode taxa having ESTs were divided into animal-and human-parasitic nematodes (AHPNs), plantparasitic nematodes (PPNs), and free-living nematodes (FLNs) to enable a more informative comparison using tBLASTx algorithm with an E value cutoff of 1e-5 ( Figure  5A). Of the 10,886 H. bacteriophora ESTs, 2,523 had no matches to nematode ESTs ( Figure 5B) of which 2,371 had no matches to proteins but 152 had matches to proteins of other organisms. Length distribution of H. bacteriophora ESTs before and after assembly Figure 1 Length distribution of H. bacteriophora ESTs before and after assembly.
Among these parasitic nematode-specific ESTs, 142 had matches to proteins of other organisms, enabling putative function assignment. Among these ESTs are those encoding F-box-like/WD-repeat protein, theromacin, Bax inhibitor-1-like protein, and PAZ domain containing protein, which represent interesting targets for in-depth functional analysis. The remaining 412 had no matches to any protein in the current databases, and are thus considered novel sequences. Biological Process GO category, followed by "nematode larval development" (28.3%), "positive regulation of growth rate" (27.6%), "reproduction" (24.7%), and "growth" (23,6%). Biological Process term associations of H. bacteriophora distinct ESTs that may present potential interests included: (i) 164 with "determination of adult life span"; (ii) 22 with "defense response"; (iii) 22 with dauer-related biological processes, including "dauer larval development", "dauer entry", and "dauer exit"; and (iv) 37 with stress-related biological processes. Protein binding (53.8%) was the most dominant term among the 3,190 H. bacteriophora ESTs annotated to the Molecular Function category, followed by identical protein binding (2.6%) and cytochrome-c oxidase activity (2.5%). Among a. "Identities" was presented as "length of aligned H. bacteriophora EST"/"full length of H. bacteriophora EST", percentage of identities over the aligned sequences.

Gene ontology annotation
Categories of organisms with significant protein matches of H. bacteriophora distinct ESTs Figure 2 Categories of organisms with significant protein matches of H. bacteriophora distinct ESTs. The percentage was calculated considering the total number of H. bacteriophora distinct ESTs having significant matches (E value < 1e-5) as 100%. a. "Similarity" was presented as "length of aligned H. bacteriophora EST"/"full length of H. bacteriophora EST", percentage of positives over the aligned sequences.  [11]. Given the fact that poly(A) RNA was used in EST sequencing and the prokaryotic sequences were less than 100% identical to known prokaryotes, the possibility that these sequences are contaminants from other bacteria is low, although the possibility cannot be ruled out completely. The identification of sequences of putative prokaryotic origin in H. bacteriophora ESTs are consistent with our previous observations [6] and those observed in plant parasitic nematodes [14]. The putative prokaryotic origin of these sequences could be tested more rigorously once the complete genome sequence becomes available.     ESTs (Additional file 3). Eighty-six percent of these ESTs matched ESTs from parasitic nematodes in clade V. Taken into consideration the fact that most FLNs included in this study were also from clade V, we are confident that these ESTs reflect the differences between parasitic and free-living nematodes and are not the result of phylogenetic constraint. These 554 H. bacteriophora ESTs had sequence similarities with ESTs from other parasitic nematodes, suggesting that these genes may participate in parasitismrelated activities. Although the 2,523 ESTs without matches to any nematode ESTs could be H. bacteriophora specific, we hesitate to consider them as parasitic nematode specific at this time because they lack sequence similarity to ESTs of parasitic nematodes. Among these, the 2,371 ESTs without matches to any proteins in the current GenBank nr database represent potentially novel H. bacteriophora genes, and 81 of these were identified in the EST dataset of H. bacteriophora GPS11 strain [3,6]. These findings suggest the enormous potential of discovering new genes and gene functions, genetic networks, and metabolic pathways specific to H. bacteriophora and other entomopathogenic nematodes. The identification of H. bacteriophora ESTs shared with other parasitic nematodes through our EST comparison opens the door for conducting in-depth research on gene functions that will ultimately elucidate the parasitic nematode-specific biological processes.

Figure 5 Summary of comparison between H. bacteriophora distinct ESTs and other ESTs from other nematode groups. (A) Summary of EST databases build from publicly-available animal-and human-parasitic nematodes (AHPNs), plantparasitic nematodes (PPNs), and free-living nematodes (FLNs). (B) Results of EST comparison with numbers of matches to the
Among the 554 parasitic nematode-specific ESTs are those encoding F-box-like/WD-repeat protein, theromacin, Bax inhibitor-1-like protein, and PAZ domain containing protein. EST FF678238 encodes a homolog of the F-box-like/ ED-repeat protein in Brugia malayi [10]. The WD-repeat is commonly associated with F-box domain that mediates protein-protein interactions in a variety of contexts such as polyubiquitination [15]. EST FF678397 encodes a protein similar to the PAZ domain containing protein in Brugia malayi [10]. The PAZ (Piwi, Argonaut and Zwille) domain has nucleic acid-binding capability and is potentially involved in post-transcriptional gene silencing [16]. Further investigation is needed to elucidate the functions of these two ESTs with common domains and whether they are related to parasitic nematode-specific processes.
Two other parasitic nematode-specific ESTs are Contig2528 and Contig1066. Contig2528 encodes a homolog of theromacin in the segmented worm Theromyzon tessulatum [17]. Theromacin is a novel antimicrobial peptide acting against Gram-positive bacteria but without any similarities to other known antimicrobial peptides [17]. H. bacteriophora TTO1 is obligately symbiotic with Photorhabdus luminescens subsp. laumondii TTO1 in natural environments. The production of an antimicrobial peptide could help establish the symbiotic relationship by selectively eliminating competing microbes. It is also possible that the antimicrobial peptide is a defense mechanism against potentially harmful microbes in the environment. Contig1066 encodes a protein similar to the uncharacterized protein family UPF0005 containing protein in Brugia malayi (GenBank accession number XP_001896958 http://www.ncbi.nlm.nih.gov/protein/ 170584338) and BAX inhibitor-1-like protein in wasp Nasonia vitripennis (GenBank accession number XP_001605379 http://www.ncbi.nlm.nih.gov/protein/ 156542785). BAX inhibitor-1 is a member of Bcl-2 family that suppresses programmed cell death [18]. Transmembrane BAX inhibitor motif protein (TMBI) homologs have been identified in C. elegans, C. briggsae, C. japonica, C. remanei, and Pristioncus pacifiucs (Wormbase). However, these genes have very low sequence similarities to H. bacteriophora Contig1066. BAX inhibitor-1 is involved in preventing endoplasmic reticulum stress-related programmed cell death in Arabidopsis [19] and humans [20].
GO assignments based on sequence similarity searches aid identification of H. bacteriophora distinct ESTs involved in different biological processes. Here we discuss the genes involved in some biological processes of interest in detail. A number of ESTs related to defense responses and stress responses were identified in these H. bacteriophora distinct ESTs (Additional File 2) based on GO assignments. Among H. bacteriophora distinct ESTs involved in defense response are 3 ESTs encoding a homolog of C. elegans SMEK (Dictyostelium suppressor of MEK null) homolog that is essential for DAF-16-mediated defense response to pathogenic bacteria and increased resistance to oxidative and UV-induced damage [21]. EST ES410098 encodes a heat shock protein HSP16-1 that is induced solely in response to heat shock or other environmental stresses [22]. Another 13 ESTs encoding 5 different proteins whose C. elegans homologs exhibit a "pathogen susceptibility increased" phenotype when silenced by RNAi [23]. However, the molecular functions of these proteins have yet to be elucidated. The defense response transcripts may be involved in the protection of entomopathogenic nematode IJs from bacterial or fungal pathogens and the insect innate immune system.
Five H. bacteriophora distinct ESTs involved in stress response encode a homolog to C. elegans catalase CTL-2 that likely is involved in protecting cells from reactive oxygen species as an antioxidant enzyme [24]. Another EST (ES742296) encodes a protein whose C. elegans homolog showed an "organism stress response abnormal" phenotype when silenced with RNAi [25]. Functions of other ESTs involved in stress response are yet to be clearly characterized. These transcripts related to stress responses provide workable targets for the improvement of ultraviolet, desiccation, and heat stress tolerance, traits desperately sought for improving the biological pest control potential of H. bacteriophora. Once the functions of these genes are determined, they can be potentially used for genetic manipulations of entomopathogenic nematodes. ESTs involved in dauer larval development, dauer entry, and dauer exit were also identified ( Table 4) according to GO assignments. The infective juvenile stage of entomopathogenic nematodes is developmentally similar to the dauer stage in many bacteria feeding nematodes, including C. elegans and C. briggsae. The dauer is a developmentally arrested stage triggered by food deprivation, high population density, and other harsh environmental conditions [26]. Elucidation of this process is of specific interest in the case of entomopathogenic nematodes because the dauer juvenile is the only life stage capable of infecting insects [4].
RNA interference represents a powerful technique for analysis of gene function. An RNAi system relying on soaking in double-stranded RNA solution has been established in H. bacteriophora [7]. Interestingly, we were able to identify only a small number of known RNAi related genes in H. bacteriophora (Table 3) compared to C. elegans and B. malayi [10].
We have identified genes encoding RNAi induced silencing complex (RISC) components. One EST (EX014403) encodes a homolog of C. elegans TSN-1 (71% similarity at the amino acid level) and another EST (Contig211) encodes a homolog of C. elegans VIG-1 (45% similarity at the amino acid level). TSN-1 (Tudor staphylococcal nuclease) containing 5 staphylococcal/micrococcal nuclease domains and a tudor domain is a RISC component in C. elegans, Drosophila and mammals [27]. The purified TSN-1 from C. elegans was shown to have nuclease activity and therefore thought to contribute to RNA degradation in RNAi [27]. The product of the vig-1 gene was also shown to be a component of RISC [28]. We did not identify a member of Argonaute family in this EST set based on sequence similarity. However, we identified an EST (FF678397) encoding a protein similar to a PAZ domain containing protein from Brugia malayi (57% similarity at the amino acid level). Another EST (FF679415) encodes a putative homolog to Drosophila and human Drosha [29] rather than Dicer in C. elegans. These findings suggest that H. bacteriophora may have structurally different RNAi pathway components than its relative, C. elegans.
Other RNAi related genes we were able to identify are those encoding homologs of SMG-2, SMG-5, RDE-4, GFL-1, and ZFP-1. SMG-2 and SMG-5 are involved in nonsense-medicated mRNA decay (NMD) where eukaryotic mRNAs with premature stop codons are selectively and rapidly degraded [30,31]. The other three genes, rde-4 [32], gfl-1 and zfp-1 [33] were shown to be involved in RNAi via RNAi evidence. We currently are not able to identify a gene encoding a SID-1 homolog in H. bacteriophora TTO1 that was shown to be necessary for systemic RNAi [34]. However, a sid-1 gene has been found in H. bacteriophora GPS11 [3]. It is possible that more known genes may be identified when the complete genome of H. bacteriophora TTO1 is sequenced.
This EST project also enabled the development of genetic markers. We have identified 168 microsatellite loci from H. bacteriophora distinct ESTs, of which we were able to design primers for 141 based on the flanking sequences. These microsatellite markers may be useful for genetic mapping, linkage analysis, and population genetic studies. In a separate effort, microsatellite loci with 2-or 3-bp repeat units were selected for microsatellite marker development, along with the microsatellite loci enriched from genomic DNA of H. bacteriophora [35]. Eight polymorphic microsatellite loci were demonstrated within a Northeast Ohio population.

Conclusion
We have generated 31,485 high quality H. bacteriophora ESTs representing 10,886 distinct sequences. Among these, 7,828 (71.9%) ESTs matched to proteins in Gen-Bank's nr database. The vast majority (95.9%) of the best matches was to nematode proteins, a small portion (0.3%) to prokaryotic proteins and the remaining 3.8% to other eukaryotic proteins. GO terms were assigned to 6,685 H. bacteriophora distinct ESTs. "Embryonic development ending in birth or egg hatching" and "protein binding" were the most dominant terms in the categories of Biological Process and Molecular Function, respectively. This EST collection offers unprecedented opportunities for research on this unique nematode-bacterium symbiotic complex. The comparison of ESTs of H. bacteriophora TTO1 with those of AHPNs, FLNs, and PPNs resulted in the identification of 554 parasitic nematode-specific ESTs. These ESTs should be valuable for future research related to insect parasitism by these nematodes. We were able to identify a small number of ESTs involved in RNAi, among which is an EST encoding a Drosha homolog, suggesting structurally different RNAi pathway components from those in C. elegans. In addition, we have identified 157 microsatellite loci which may prove valuable once their polymorphisms are tested and validated. Overall, novel, parasitic nematode-specific, and C. elegans homologous genes have been identified in this EST study, greatly facilitating genome annotation, gene functional analysis, population genetic studies, and microarray development.

RNA isolation, cDNA library construction, and sequencing
Total RNA and poly(A) RNA were isolated from adult hermaphrodites of the isogenic line of Heterorhabditis bacteri-ophora TTO1-M31e strain propagated on a lawn of Photorhabdus luminescens bacterium. Poly(A) RNA was used for cDNA library construction with two different strategies. The first group of libraries were constructed using the CloneMiner™ cDNA Library Construction Kit (Invitrogen) following the manufacturer's instructions. Briefly, 2 μg single-stranded mRNA was converted into double stranded cDNA containing attB sequences on each end. Through site-specific recombination, attB-flanked cDNA was cloned into the attP-containing pDONR222 vector. The second group of libraries were constructed using SMART technology with modifications. Briefly, the double stranded cDNA was synthesized with SMART oligos from poly(A) RNA with SuperScript ® III First-Strand Synthesis System (Invitrogen) and Advantage ® High Fidelity 2 PCR kit (Clontech). The double-stranded cDNA was normalized with duplex-specific nuclease (Evrogen) and then was nebulized, end repaired with End Repair Kit (Lucigen), size separated, and ligated into pSMART hinc II Vector System (Lucigen). The cloning and sequencing of both pDONR222 and pSMART libraries were not directional, leading to the production of ESTs from both 5' and 3' ends. The sequences were generated by ABI 3730 machines from the cDNA libraries using and deposited in GenBank dbEST.

Contig assembly and analysis
EST sequences in FASTA format were downloaded from GenBank dbEST. The sequences were processed by removing vector sequences with Vector NTI Advance™ 10 program. The processed EST sequences were assembled into contigs (contiguous sequences) using the ContigExpress module embedded in Vector NTI Advance™ 10 (Invitrogen). These stringent parameters of assembly (overlap length cutoff of 40 and overlap percent identity cutoff of 95%) were used to assure proper assembly. The distinct EST sequences, including the contig consensus sequences and the singleton sequences, were searched against Gen-Bank's nr database and Wormpep190 databases in a local Linux workstation using the BLASTx algorithm [36]. The E value cutoff of the BLASTx searches was 1e-5.

EST comparison
All nematode EST sequences were downloaded from Gen-Bank dbEST to a local Linux workstation and formatted as a database for tBLASTx searches. Gene index (GI) numbers of all nematode EST sequences were extracted and grouped according to the categories of AHPNs, PPNs, and FLNs. The tBLASTx searches were performed in a local Linux workstation against the complete nematode EST database with the -l option enabled to restrict the database search to the list of GI's of the targeted group [7]. For example, when comparing H. bacteriophora ESTs to ESTs of FLNs, "-l fln.gi" was included in the command with fln.gi containing all GI numbers of EST entries from free-living nematodes. The BLAST outputs were parsed with in-house developed perl scripts to extract match information. ESTs with no significant matches to ESTs of FLNs were extracted and further searched against GenBank nr and Wormpep190 databases using BLASTx algorithm. EST entries with no significant matches to proteins of FLNs were designated parasitic nematode-specific ESTs, which were further characterized. The cutoff value of BLAST searches was 1e-5.

Gene Ontology annotation
For assignment of Gene Ontology terms, the distinct H. bacteriophora ESTs were searched using the BLASTx algorithm against the annotated sequences of FASTA format in the April 2008 release of GO database. The BLAST output was parsed and terms assigned with the assistance of inhouse developed perl scripts accessing the MySQL database of "mygo" in a local Linux workstation. The distribution of GO terms in each of the main ontology categories, Biological Process, Cellular Component, and Molecular Function [37], was examined. The number of H. bacteriophora distinct ESTs assigned in a single GO category was considered as 100% [12,38].

Bioinformatics mining of microsatellite loci
The set of 10,886 H. bacteriophora distinct ESTs were searched for microsatellite loci using msatfinder v. 2.0.9 [39] in a local Linux workstation. The cut-off values of number of repeats were set to 6 for di-nucleotide loci and 5 for tri-, tetra-, penta-, and hexa-nucleotide loci. Primers were designed using Primer3 release 1.0 [40] in a local Linux workstation.