An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications

Background Teladorsagia circumcincta (order Strongylida) is an economically important parasitic nematode of small ruminants (including sheep and goats) in temperate climatic regions of the world. Improved insights into the molecular biology of this parasite could underpin alternative methods required to control this and related parasites, in order to circumvent major problems associated with anthelmintic resistance. The aims of the present study were to define the transcriptome of the adult stage of T. circumcincta and to infer the main pathways linked to molecules known to be expressed in this nematode. Since sheep develop acquired immunity against T. circumcincta, there is some potential for the development of a vaccine against this parasite. Hence, we infer excretory/secretory molecules for T. circumcincta as possible immunogens and vaccine candidates. Results A total of 407,357 ESTs were assembled yielding 39,852 putative gene sequences. Conceptual translation predicted 24,013 proteins, which were then subjected to detailed annotation which included pathway mapping of predicted proteins (including 112 excreted/secreted [ES] and 226 transmembrane peptides), domain analysis and GO annotation was carried out using InterProScan along with BLAST2GO. Further analysis was carried out for secretory signal peptides using SignalP and non-classical sec pathway using SecretomeP tools. For ES proteins, key pathways, including Fc epsilon RI, T cell receptor, and chemokine signalling as well as leukocyte transendothelial migration were inferred to be linked to immune responses, along with other pathways related to neurodegenerative diseases and infectious diseases, which warrant detailed future studies. KAAS could identify new and updated pathways like phagosome and protein processing in endoplasmic reticulum. Domain analysis for the assembled dataset revealed families of serine, cysteine and proteinase inhibitors which might represent targets for parasite intervention. InterProScan could identify GO terms pertaining to the extracellular region. Some of the important domain families identified included the SCP-like extracellular proteins which belong to the pathogenesis-related proteins (PRPs) superfamily along with C-type lectin, saposin-like proteins. The 'extracellular region' that corresponds to allergen V5/Tpx-1 related, considered important in parasite-host interactions, was also identified. Six cysteine motif (SXC1) proteins, transthyretin proteins, C-type lectins, activation-associated secreted proteins (ASPs), which could represent potential candidates for developing novel anthelmintics or vaccines were few other important findings. Of these, SXC1, protein kinase domain-containing protein, trypsin family protein, trypsin-like protease family member (TRY-1), putative major allergen and putative lipid binding protein were identified which have not been reported in the published T. circumcincta proteomics analysis. Detailed analysis of 6,058 raw EST sequences from dbEST revealed 315 putatively secreted proteins. Amongst them, C-type single domain activation associated secreted protein ASP3 precursor, activation-associated secreted proteins (ASP-like protein), cathepsin B-like cysteine protease, cathepsin L cysteine protease, cysteine protease, TransThyretin-Related and Venom-Allergen-like proteins were the key findings. Conclusions We have annotated a large dataset ESTs of T. circumcincta and undertaken detailed comparative bioinformatics analyses. The results provide a comprehensive insight into the molecular biology of this parasite and disease manifestation which provides potential focal point for future research. We identified a number of pathways responsible for immune response. This type of large-scale computational scanning could be coupled with proteomic and metabolomic studies of this parasite leading to novel therapeutic intervention and disease control strategies. We have also successfully affirmed the use of bioinformatics tools, for the study of ESTs, which could now serve as a benchmark for the development of new computational EST analysis pipelines.


Introduction
Parasitic nematodes have a free-living state with their growth and survival controlled by the surrounding environment, especially by factors such as temperature and moisture.
Teladorsagia circumcincta is a key parasite that affect small ruminants in many countries around the world. Its lifecycle is direct and is similar to a number of gastrointestinal strongylid nematodes [1]. In brief, eggs released in faeces develop, and first-stage larvae (L1s) hatch usually within a day. L1s develop through to infective third-stage larvae (L3s) within about a week. L3s on pasture are ingested by the ruminant host, within which they exsheath in the rumenoreticulum and then pass to the abomasum to enter gastric glands and moult to fourth-stage larvae (L4). After this histotrophic phase, these larvae develop to adult female and male worms which reproduce.
T. circumcincta can be a major cause of economic loss due to poor productivity of ruminants, such as sheep and goats, failure to thrive and deaths, mainly in lambs [2,3]. Together with other trichostrongylid nematodes, this parasite is usually controlled using a combination of anthelmintic treatment and management strategies. The emergence of resistance in trichostrongylids to the three main classes of anthelmintic drugs, including benzimidazoles (white drenches), imidazothiazoles/tetrahydropyrimidines (yellow/pink drenches) and macrocyclic lactones (clear drenches) compromises effective control. Improved insights into the molecular biology of these parasites have the potential to support the development of alternative methods of parasite control, in order to circumvent these resistance problems. Vaccination is considered by some researchers [4] to be a possible alternative approach to anthelmintic treatment, but attempts to develop a practical, commercial vaccine have been unsuccessful to date, likely because of a lack of detailed understanding of the immuno-molecular biology of the parasites, host-parasite interactions and disease. In spite of the economic significance of T. circumcincta, particularly in lambs, our understanding of the spectrum of antigens and immunogens involved in immune responses is still limited [5][6][7]. Nonetheless, there is evidence that excretory/secretory (ES) molecules are intimately involved in inducing and/ or modulating the host's immune response [8], and it has been proposed that some of them are immunogens which could serve as potential vaccine targets [9,10].
Antigenic or immunogenic molecules can be studied using a range of immunochemical or proteomic approaches [11], and transcriptomic studies can strengthen such investigations by providing annotated datasets to allow the identification and classification of such key molecules. For instance, transcriptomic study of T. circumcincta has identified a number of components, including N-type and C-type single domain, activation-associated secreted proteins (ASPs) [5]. Preliminary evidence showed that the proteins inferred to represent the secretome in T. circumcincta larvae were associated with specific antibody responses in sheep against this parasite. These proteins might be incorporated into a vaccine for immunizing sheep to combat the Teladorsagiosis disease [12]. Importantly, N-type and C-type single domain activation-associated secreted proteins (ASPs) and T. circumcincta apyrase-1 (Tci-APY-1) in excretory/secretory products of L4s of T. circumcincta, identified also in transcriptomic studies [5,13], have been demonstrated to be targets for early, specific IgA responses in infected sheep [5]. In addition, it has been reported that Tci-MIF-1, a macrophage migration inhibitory factor (MIF)-like molecule with tautomerase activity, might influence both host immune responses and nematode physiology [14]. Therefore, a detailed exploration of the transcriptome of T. circumcincta will provide a vital insight into the molecular biology of this parasite and should also provide a basis for studying parasite-host interactions and disease as well as parasite development and reproduction, with a view towards establishing new methods of prevention, treatment or control. Extending previous studies of strongylid nematodes [15][16][17][18], we report the first comprehensive analysis of the transcriptome from the adult stage of T. circumcincta, with an emphasis on characterization of molecules inferred to be ES proteins.

Materials and methods
The ESTs (NCBI EST database accession numbers SRR328404 and SRR328405) was generated by LS454 RNAseq sequencing of T. circumcincta 2284716780 fragment cDNA library using 454 GS FLX Titanium instrument. The dataset was initially assembled and annotated using different tools. Initially, all ESTs were pre-processed (using SeqClean [19] and RepeatMasker (Smit AFA & Green P)), for the removal of low-quality regions and consensus sequence generation using the Contig Assembly Program CAP3 which was followed by assembly [20]. This step was followed by ESTScan [21] translation of the contiguous sequences (contigs) into peptides, which were then characterized via InterProScan [22] domain/motifs. Gene ontologies were inferred using BLAST2GO (V 2.3.5) [23], from Gene Ontology (MySQL-DB-data release go_200903) and InterProScan. Peptides predicted were also compared, using BLASTP, with data in the nonredundant protein sequence database from National Centre for Biotechnology Information (NCBI). The peptides were mapped to respective pathways in C. elegans using KOBAS [24] (KEGG [25] Orthology-Based Annotation System, KOBAS-1.1.0). The results were compared with pathway mapping using KAAS [26]. Similarity searches were done for protein databases for 'parasitic nematodes' and 'non-nematodes' generated in-house. Homologues/ orthologues were identified via comparisons against WormBase using BLASTX. In addition, data for C. elegans, including RNA interference (RNAi), gene ontology, pathway and domain analyses were used for functional annotation.
The program SimiTri [27] was used for the comparison of inferred amino acid sequence data for T. circumcincta with those available for C. elegans, parasitic nematode and other organisms in public databases. SimiTri provides a two-dimensional display of relative similarity relationships among three different datasets. ES proteins were predicted using SignalP [28] to infer the presence of secretory signal peptides and signal anchors in predicted proteins. Secreto-meP [29] was also used to predict proteins involved in a non-classical secretory pathway. Transmembrane proteins were predicted using TMHMM [30], a hidden Markov model-based program. Predicted proteins lacking transmembrane domains were subjected to further annotation using data available in Wormpep [31].

cDNA analysis
From a total of 407,357 raw ESTs representing T. circumcincta, we obtained 366,897 high quality ESTs (Table 1), which ranged from 100-415 bp in length (mean: 206 bp; standard deviation: 43 bp). After clustering and assembly, the mean length of contigs increased to 360 bp (standard deviation: 173 bp). The G+C content of the coding sequence was 42%, consistent with other strongylid nematodes [15,32] 6156 of them were mapped to 234 KEGG pathways of the homologues identified in C. elegans. Oxidative phosphorylation (n = 357) and Peptidases (n = 277 peptidases) were the highest represented according to the number of peptides mapped. Other groups of molecules were mapped to metabolic pathways such as glycine, serine and threonine metabolism (n = 93), insulin signaling pathway (n = 68), signal transduction mechanisms (n = 54), N-glycan biosynthesis (n = 33), galactose metabolism (n = 31), GnRH signaling pathway (n = 13), aminosugars metabolism (n = 11), linoleic acid metabolism (n = 5), immune and complement and coagulation cascades (n = 4). A list of the KEGG pathways and the corresponding rESTs is provided as supplementary information (Additional File 3). The contigs and singletons generated by preprocessing, overall representative ESTs (rESTs), peptides from conceptual translation and putative excretorysecretory (E/S) proteins identified are shown.

Peptides/Proteins
Of the 39,852 rESTs, 24,013 were inferred to have open reading frame (ORFs). 6,470 sequences mapped to 309 KEGG pathways, with the top 30 'highly represented' pathways categorized by the number of peptides mapped, presented in Table 2. The main KEGG pathways represented were the peptidases (n = 254) and ribosomal protein assembly pathway (n = 220). Other highly represented pathways by the peptides include oxidative phosphorylation (n = 187) and chaperones and folding catalysts (n = 144). Peptides were mapped to several pathways, including purine metabolism and glycolysis/gluconeogenesis. We have also compared our results by mapping the sequences using KAAS where 2,897 sequences were characterized as belonging to 257 pathways, with 30 'highly represented' pathways, categorized according by the number of peptides mapped, are presented in Table 3. The main KAAS pathways represented were Huntington's disease (n = 91) and oxidative phosphorylation (n = 84). Other highly represented pathways include the ribosomal protein assembly pathway (n = 80), ubiquitin mediated proteolysis (n = 33) and glycolysis/gluconeogenesis (n = 29). Peptides were also mapped to several other pathways, including purine metabolism and pyrimidine metabolism, pathways in cancer, cysteine and methionine metabolism, glycolipid metabolism and glutathione metabolism. Among the highly represented pathways, both KEGG and KAAS identified oxidative phosphorylation, purine metabolism, glycolysis/gluconeogenesis and ribosomal protein assembly pathways. We could identify GO terms using InterProScan for 24,013 proteins with 3,801 being assigned as involved in biological process (BP), 5,220 as associated with molecular function (MF) and 1,862 as part of the cellular component (CC) (Additional File 4). The analysis revealed that oxidation reduction (GO:0055114) and metabolic process (GO:0008152) were the most common GO categories representing biological processes. The highest represented  GO terms in molecular function were binding (GO: 0005488) and oxidoreductase activity (GO:0016491). Whereas in cellular component, the highly represented GO terms were ribosome (GO:0005840) and membrane (GO:0016020). With 138 protein entries, the protein kinase-like domain family of proteins was the most represented, followed by SCP-like extracellular domain family, with 126 protein entries. Other highly represented group of domains are the NAD(P)-binding domain, allergen V5/ Tpx-1 related domain and transthyretin-like domain ( Table 4).

Secretome
We inferred 112 excreted/secreted proteins from the present data set of 39,852 rESTs (Additional File 5). Six Transthyretin proteins followed by three saposin-like pro-tein1 from A. caninum, three SXC1 (Six Cysteine Motif) proteins of O. ostertagi, two C-type single domain activation associated secreted protein ASP3 precursor from O. ostertagi were identified. Two C-type lectin-1 proteins represented in Heligmosomoides polygyrus and FMRFamide-like prepropeptide from Oesophagostomum dentatum one each of globin-like protein and putative L3 ES proteins of O. ostertagi, the bovine parasite which is closely related to T. circumcincta [33] were also identified. Neuropeptides or neuropeptide precursor molecules were represented among the annotated ES dataset. Upon detailed annotations of the 112 adult secreted proteins, few novel proteins such as SXC1, protein kinase domain containing protein, trypsin family protein, TRYpsin-like protease family member (try-1), putative lipid binding protein were also identified. These novel proteins were not reported in the T. circumcincta proteomics analysis [12,34] (Additional File 6). Subsequent detailed annotation of 226 transmembrane proteins helped in the identification of SXC1 (Six Cysteine Motif) proteins of O. ostertagi, putative L3 ES protein (O. ostertagi), putative major allergen (Brugia malayi). The details of these proteins are listed in Additional File 7.
We were able to functionally assign GO terms to 112 putative ES proteins with 50 being assigned as involved in biological process (BP), 81 as associated with molecular function (MF). The GO annotation summary with biological process, cellular component and molecular function details is provided in Figure 1. Oxidation reduction (GO:0055114) and transmembrane transport  with the top 30 'highly represented' pathways, categorized according to the number of putative ES proteins mapped, are presented in Table 5. Protein kinases (n = 3) and oxidative phosphorylation (n = 3) were the main KEGG pathways that mapped to the ES protein sequences. Few other highly represented pathways by the ES proteins include the glycerophospholipid metabolism (n = 3), long-term depression (n = 3), glycolysis/gluconeogenesis (n = 2). Several pathways including purine metabolism, protein folding and associated processing, MAPK signaling pathway, linoleic acid metabolism, GnRH signaling pathway and glutathione metabolism were mapped by ES protein sequences. The list of KEGG pathways for ES proteins is available from Additional File 9. 55 KEGG pathways contained 85 sequences using KAAS with the top 30 'highly represented' pathways, categorized by the number of peptides mapped, are presented in Table  6. Glycerophospholipid metabolism (n = 3) and oxidative phosphorylation (n = 3) were the main KEGG pathways that mapped to the sequences. Few other highly represented pathways by ES proteins included long-term depression (n = 3) and Wnt signaling pathway (n = 2). ES proteins were mapped to several pathways such as MAPK signaling pathway, linoleic acid metabolism, GnRH signaling pathway, glutathione metabolism and TGF-b signaling pathway. The KEGG pathways with the corresponding ES proteins are provided in Additional File 10. Table 7 gives the top 20 representative protein families with metridin-like ShK toxin as the highly represented family of proteins, comprising of 14 ES protein entries. Followed by transthyretin-like family of proteins, comprising 11 ES protein entries. C-type lectin, saposin-like domain and SCP-like extracellular domain superfamily of the pathogenesis-related proteins (PRPs) [35,36] were the few other well-represented domain families in the present  datasets. SecretomeP identified 615 sequences as nonclassical secreted proteins at a cut-off value of 0.9. The detailed annotation of 615 secreted proteins revealed 62 KEGG pathways mapped by 105 sequences (Additional File 11) with the top highly represented pathways presented in Table 8.

Discussion
In the absence of a genomic sequence for T. circumcincta, 407,357 raw EST sequences were analysed to obtain quality ESTs with a sequencing success of 90.06% which is consistent with previous studies [15,34,37]. To infer the proteome for T. circumcincta, all rESTs were then subjected to analyses against three databases containing protein sequences. Data were compared with protein sequences available for (i) C. elegans (from WORMPEP v.182 Wombase([http://wormbase.org/])), (ii) parasitic nematodes (available protein sequences and peptides from conceptually translated ESTs) and (iii) organisms other than nematodes (from NCBI nonredundant protein database) [38]. Three-way comparison of T. circumcincta rESTs with homologues from C. elegans, WORMPEP and parasitic nematodes have been figuratively presented ( Figure 2) using SimiTri.
Some of the proteins predicted to be parasite-or nematode-specific were identified by similarity searches of rESTs and these proteins in parasitic nematodes were either absent from or very different from the corresponding molecules in their host(s).
Comparative analysis was carried out to identify homologues in C. elegans, the best characterized nematode in relation to its genome, genetics, biology, physiology, biochemistry as well as the localization and functions of molecules Wormbase [39]. This study showed that 7,537 of them were mapped to key biological pathways including oxidative phosphorylation, peptidases and the ribosomal protein assembly pathway. Oxidative phosphorylation relates to genes that encode NADH dehydrogenases, succinate dehydrogenases, cytochrome c oxidases, cytochrome c reductases, ATPases and ATP synthases (complexes I-V) [40]. Several peptidases are known to play a vital role in the moulting process [41], these include metallo-peptidases that might be candidates for chemotherapeutic interventions [42][43][44][45]. The ribosomal protein assembly pathway is composed of genes that encode various proteins of the ribosomal subunits. These proteins are closely related functionally and need to interact with each other physically to form a large protein complex known as the ribosome [40]. Other pathways represented include the carbon fixation pathways. Several enzymes in nematodes map to KEGG carbon fixation pathways [http://www.genome.jp/kegg-bin/show_pathway?categor-y=Nematodes&mapno = 00720], which refer to normal energy pathways such as glycolysis, gluconeogenesis (which is actually carbon fixing) and tricarboxylic acid cycle. The pathways identified using KOBAS such as TGF-b signaling pathway and insulin signaling pathway trigger an ''alternative'' developmental pathway and regulate the transition of environmental stress on C. elegans in the first larval stage of its life cycle [46,47]. The disruption of both insulin-like and DAF-7 transforming growth factor (TGF)-β signalling pathways causes developmental arrest [48,49]. Abundant levels of transcription of GTP-CH transcripts in some parasitic species could be associated with production of serotonin to regulate these processes, in a way that is similar to that of C. elegans, if a TGF-β pathway does indeed regulate developmental events in parasitic nematodes [34]. These areas are of great interest and deserve detailed investigation, particularly given that molecules representing the TGF-β pathway have been described for a number of parasitic nematodes such as B. pahangi, B. malayi and P. trichosuri [50][51][52].
Proteins expected to play critical roles in host-parasite interactions including immune responses are predicted to be involved in antigen processing and presentation or complement and coagulation cascades.
Nematode enzymes mapped to known human disease pathways such as Huntington's disease, Alzheimers disease, Parkinson's disease and Vibrio cholerae infection. The neurological disorder pathways are known to describe the morbidity and depression associated with helminthic infections. The Vibrio cholera infection pathway supports this parasite being similar to gastrointestinal strongylid nematodes.
Clearly, much more work is required to establish the functional roles of such proteins in the parasite and/or the host and also to identify essential proteins required in each pathway, even though they are not well represented. Some of the proteins are inferred to be excreted/secreted from the nematode. These include serine proteinase inhibitors and cathepsin B-like cysteine proteases which are proposed to interfere with the immune system at the antigen processing and presentation stages, thereby, to interrupt the cytokine network and to down-regulate inflammation [53]. Families of proteins considered as important targets for parasite invention and control were also identified represented by serine, cysteine as well as proteinase inhibitors which are also supported by domain analysis [54][55][56]. The proteinase inhibitors might protect the parasite against digestion by endogenous or host-derived proteinases [53].
Of the 39,852 rESTs, 24,013 were inferred to have open reading frame (ORFs). The most represented domain family of proteins were the protein kinase-like and the SCP-like extracellular domains, followed by NAD(P)binding domain, allergen V5/Tpx-1 related domain and transthyretin-like domain. Analysis of several protein and protein domains present in C. elegans [57] revealed that protein kinases comprise the second largest family of protein domains in worms. Protein kinases are required for the existence of multicellular organisms and are likely to be involved in the complex signal transduction pathways including cell-substratum and cell-cell adhesion, transmembrane signaling in response to humoral factors and cell survival or programmed cell death. Other protein kinases provide signals that regulate metazoan-specific transcription factors, particularly those containing Znfinger domains [58]. SCP/TAPS family members belong to the cysteine-rich secretory protein (CRISP) and have been identified in various eukaryotes. They also seem to have some biological roles linked with the member proteins within this superfamily [59].
The sperm-coating protein (SCP)-like extracellular proteins, also called SCP/Tpx-1/Ag5/PR-1/Sc7, play major biological roles in the host-pathogen interplay [60] along with other groups of proteins [61] . NADP + plays a vital role in developmental process and also acts as a reducing agent in anabolism along with NAD + , a coenzyme involved in key pathways like glucose metabolism and fatty acid synthesis [62]. In Strongyloidae, the allergen V5/Tpx-1 related domain is considered as one of the most abundant InterPro domain that may be important in parasitism [32]. It symbolizes various members such as the ancylostoma-secreted or activation-associated proteins (ASPs) that belong to the pathogenesis-related protein (PRP) superfamily [35]. The transthyretin-like domain, an abundant nematode-specific motif [63] was recently identified as being abundantly transcribed in the transcriptome of B. malayi [64]. Lectins are carbohydrate binding proteins and the CLec fold constitutes a general ligand (including protein)-binding motif [65].
The vertebrate immune cell signalling and trafficking, activation of innate immunity in both vertebrates and invertebrates and venom-induced haemostasis, have the involvement of C-type lectins [66]. Metridin-like ShK toxin domains are highly represented in the Strongylida [32]. Though the specific function of these proteins are not known, they are assumed to be involved in defense or digestion [67]. WD40 repeats (also known as WD or betatransducin repeats) are involved in signal transduction and transcription regulation along with cell-cycle control and apoptosis [68,69].
Heat shock proteins, such as HSP-20 are reported to be present in the parasitic nematode, H. contortus (barber's pole worm) which afflicts small ruminant species and in the adult stage of A. caninum and other nematodes including the bovine lungworm Dictyocaulus viviparus and the common roundworm of canids Toxocara canis. The expression of this molecule was shown not to be controlled by heat shock treatment [70].
'EF-hand' domains are involved in protein-protein interactions regulated by various specialized systems (e.g., Golgi system, voltage dependent calcium channels and calcium transporters) [71]. The maturation of the nervous system and the formation of ciliated sensory neurons require both EF-hand and WD40 proteins in C. elegans [72,73]. Major sperm proteins (MSPs), a large protein family, are known to be largely involved in nematode sperm motility [74,75]. MSPs (expressed in recombinant form) have been proposed as vaccine candidates [76]. The entire list of domains and their details are given in Additional File 12. The protein sequences were assigned functionality based on BLASTP against the NR database (Additional File 13). Different classes of proteases are assigned based on the catalytic mechanisms and are named based on their active catalytic centre residues (aspartic, serine and cysteine proteases) or after their dependence on co-factors for activity (metalloproteases). Of the four classes of proteases aspartic proteases are considered to be the most conserved group.
Cysteine proteases are most likely involved in tissue penetration and feeding [77]. Cysteine, aspartic and metallo-proteases represented in N. americanus, are known to function in a multi-enzyme cascade to digest haemoglobin and other serum proteins [78,79]. SCP (sperm coating protein)-1 superfamily members include insect venom allergens, plant pathogenesis family-1 (PR-1) proteins and VAL proteins beside mammalian cysteine-rich sperm proteins (CRISPs). No rational function for this protein family has been demonstrated despite the sequence similarity [8]. Astacinlike metalloproteases are vital for establishment of the parasite in the host. MTP-1 and the astacin-like MTP secreted by infective larvae of hookworms, are primarily reported in A. caninum [80][81][82]. The enzyme guanosine-50-triphosphate (GTP)-cyclohydrolase may be involved in larval development [35]. In parasitic nematodes, astacin-like molecules are considered to be involved with moulting, tissue penetration and immunomodulation besides feeding [34,80]. They are also anticipated to be vaccine candidates against parasitic nematodes [82,83].
Pathway analysis using KOBAS [24] mapped a total of 6,470 sequences to 309 KEGG pathways. The results were compared by mapping the sequences using KAAS [26], where a total of 2,897 sequences were mapped to 257 KEGG pathways. The perceptive of such mapping in biological pathways will help in identifying vital proteins required in each pathway.
Functionally varied classes of molecules such as digestive enzymes, extracellular proteinases, chemokines, morphogens, cytokines, toxins, hormones, antibodies, antimicrobial peptides included in secretome constitute the entire set of secreted proteins, representing up to 30% of the proteome of an organism [84]. SXC1 (Six Cysteine Motif) proteins of O. ostertagi, transthyretin proteins, saposin-like protein 1, C-type lectin-1, globin-like protein, Na-ASP-2, a PR-1 protein from N. americanus, ASP-3 from O. ostertagi, neuropeptides and cytochrome P450s were also identified from the 112 excreted/secreted proteins inferred from the data set of 39,852 rESTs.
The SXC domain, also termed nematode-six cysteine, NC6 [85], was identified in surface coat proteins of the parasitic ascarid T. canis [86,87] along with zinc metalloproteases and tyrosinases of C. elegans. SXC domains have also been identified in other helminths such as Ascaris, Brugia, Trichuris muris and Necator [88]. The function of the motif is not known but it is suggested that it is involved in protein-protein interactions, particularly those associated with nematode surfaces [89] or that it acts as a signalling ligand [90]. In general, SXC motif containing proteins have a putative secretory signal peptide and are therefore extracellular. The transthyretin-like (TTL) gene family, also known as ''family 2'' [91], has been classified as nematode-specific based on the genome-wide study of C. elegans. These are the largest conserved nematode-specific gene families, coding for a group of proteins with significant sequence similarity to transthyretins (TTR) and transthyretin-related proteins (TRP) [92]. Transthyretin-like protein families are potential vaccine candidates against human filariasis [93].
As part of transcriptomic analysis of some members of the phylum Nematoda more than 4,000 nematode-specific protein families encoded by nematode-restricted genes were defined with TTL family representing one of the largest [32]. TTL protein domain was represented 185 times in all nematodes studied. This included 18 ttl genes in O. ostertagi as a result of protein domain search using the NEMBASE database [92]. The TTL family shows characteristics comparable with those of neuropeptides, i.e., a large protein family with secretion signals and different expression patterns between the members of the family and are likely to play a role in the nervous system of the nematodes [94]. SAPLIPs (saposin-like proteins) are a diverse family of lipid interacting proteins [95] that have six conserved cysteine residues forming three disulfide bridges [95][96][97][98]. The majority of Ac-slp-1 is expressed in the L3 and adult worm, although it is detected in RNA from all developmental stages of A. caninum.
While the Ac-slp-1 and slp-2 mRNAs are expressed in the intestines of multiple developmental stages of A. caninum, suggesting multiple functions in parasite biology, both Ac-SLP-1 and SLP-2 are localized to the intestines and could play a role in parasite feeding. The SLP-1 protein could also interact with host cells [99]. Worm carbohydrates may be masked from host immune cells by parasite C-TLs. Nematode C-TLs may also have roles unconnected with immune evasion [8]. Antigen uptake and presentation, cell adhesion, apoptosis and T cell polarization are the few immune processes in which C-type lectins and galectins are involved [66]. CTLs are perhaps the most prominent in the mammalian immune system. Heligmosomoides polygyrus, the natural parasites of mice, are the most widely-studied amongst the parasitic nematodes. Immunological interactions with the host are presumed to be mediated by the new C-type lectins from these rodent parasites which are preferentially expressed by the mature adult stages [100].
Craig et al. [101] were able to identify a homologue of a globin-like ES protein from O. ostertagi in L4 and adult T. circumcincta protein. Adult ES proteins in O. ostertagi identified a homologue of an ASP and a vitellogenin [92], which were not identified in T. circumcincta ES proteins [101]. However, we have successfully identified a globinlike protein and Na-ASP-2 -a PR-1 protein from Necator americanus) [102] and ASP-3 from O. ostertagi [103]. ASPs are the members of a group of nematode-specific molecules [5]. Proteins in this family have been identified in a wide range of organisms [35], including human hookworm [104], filarial nematodes [105,106], trichostrongylids such as H. contortus [107,108], schistosomes [59,109,110] as well as free-living C. elegans [111]. It has been suggested that ASPs are key to the transition of nematodes from free-living to the parasitic state [112]. It has also been suggested that they exhibit homology to a diverse, yet evolutionarily-related, group of secreted proteins classified as the SCP/Tpx-1/Ag5/PR-1/Sc7 family [5].
Na-ASP-2 has recently been shown to induce neutrophil chemotaxis in vitro and in vivo [113], but it remains uncertain if this is a widespread property of VAL homologues [8]. The role of nematode ASPs as valid vaccine candidates has also been investigated [114]. ASPs have been suggested to have the role of allergens [34]. They also have a role in modulation of the host immune response [115], in maintenance of the parasites at their host niche [116,117]and in maintenance and/or exit from arrested development [118]. ASPs are highly represented in EST datasets derived from parasitic stages of T. circumcincta and are abundant in the L4 ES proteins of this nematode [34]. Neuropeptide-like proteins have shown to be present in O. ostertagi [119]. These intercellular signaling molecules and particularly the FMRFamide-related peptides (FaRPs), have been most widely studied in Ascaris suum where they are present throughout the nervous system [34]. Cysteine-rich proteins were highly represented in T. circumcincta L4-specific dataset and were suggested to have a role in establishment and immune evasion [113].
Members of the astacin family have a wide range of functions [120] including immunomodulation [121], growth-factor processing, pattern formation in embryos [122], digestion, tissue penetration [80,123] and hatching [124]. Nematode AST-like metalloproteinases play role in stimulating innate and adaptive immune responses early in infection [83]. Cytochrome P450s, the candidate drugresistance genes, were also identified. These could affect the expression of the functional group 'xenobiotic degradation and metabolism' [6]. We have attempted to integrate the transcriptomics data with the proteomics analysis from previous reports to understand the role of ES proteins in host-parasite interaction (Additional File 6). Kyoto Encyclopedia of Genes and Genomes database (KEGG) was searched with KOBAS and KAAS to categorize functionality by assigning secreted protein sequences to biological pathways. Fc epsilon RI signaling pathway, T cell receptor signaling pathway, leukocyte transendothelial migration and chemokine signaling pathway represent the immune system related pathways which could play a critical role in understanding the immune responses.
We were also able to identify pathways related to neurodegenerative diseases and infectious diseases. Figure 3 shows the pathways represented using the ipath tool [125]. Identification of the role of such proteins as potential players in pathway analysis will help in our understanding of nematode biology in the context of parasite-host interplay. However, they are thought to be involved in immune responses in either the host or the parasite, which can be the focus of future studies. Of the pathways identified using KAAS, the protein family comprising serine, cysteine and metallo-proteinases and proteinase inhibitors in the EST datasets could form the basis of in vitro and in vivo studies. The parasite might be protected against digestive degradation by blocking endogenous proteinases within the host, with proteinase inhibitors. Tissue migration and other interactions with host cells may be facilitated by the function of these enzymes, by mediating or changing proteolytic functions [53]. Several studies have considered these enzymes as important therapeutic targets for parasite control [54][55][56]93]. Results from the pathway analysis carried out using KOBAS were compared with the results obtained using KAAS. The identification of domain/motif or region in a protein sequence characteristic for a particular protein family helps in the annotation by the assignment of protein function. We also searched the InterPro member databases [126] using Interproscan. Amongst the InterPro domains identified, the Metridin-like ShK and transthyretin-like domains were amongst the most represented, followed by C-type lectin, saposin-like and SCP-like extracellular domains. The Metridin-like ShK domain has already been shown to be highly represented in Strongylida and is often present in metallopeptidases [127,128]. The results showed that the most common molecules associated with the extracellular region correspond to allergen V5/Tpx-1 related protein. Additional File 14 contains the domain details of ES proteins. Overall, KOBAS and KAAS provided similar results.
Homologues RNAi phenotypes were identified by the comparison of 112 predicted ES proteins with the free- living nematode C. elegans and the associated RNAi phenotypes were studied to understand the function(s) and importance of homologous genes in other nematodes (of animals).
From these, 133 C. elegans homologues were retrieved with RNAi phenotypes (Additional File 15): Emb (embryonic lethal, including pleiotropic defects severe early emb), Lva (larval arrest), Gro (slow growth). Stp (sterile progeny), Lvl (larval lethal) and Ste (maternal sterile). In the current dataset, we have selected RNAi phenotypes essential for nematode survival or growth as well as those representing potential drug and/or vaccine targets [129,130]. Lethality can be considered as the most attractive RNAi phenotype applicable to all developmental stages that are less susceptible to available drugs as a result of interference with a vital process. Other attractive phenotypes include sterility that would lead to death. RNAi phenotypes help in understanding the concerns regarding genetic redundancy [131].