Gene discovery for the carcinogenic human liver fluke, Opisthorchis viverrini

Background Cholangiocarcinoma (CCA) – cancer of the bile ducts – is associated with chronic infection with the liver fluke, Opisthorchis viverrini. Despite being the only eukaryote that is designated as a 'class I carcinogen' by the International Agency for Research on Cancer, little is known about its genome. Results Approximately 5,000 randomly selected cDNAs from the adult stage of O. viverrini were characterized and accounted for 1,932 contigs, representing ~14% of the entire transcriptome, and, presently, the largest sequence dataset for any species of liver fluke. Twenty percent of contigs were assigned GO classifications. Abundantly represented protein families included those involved in physiological functions that are essential to parasitism, such as anaerobic respiration, reproduction, detoxification, surface maintenance and feeding. GO assignments were well conserved in relation to other parasitic flukes, however, some categories were over-represented in O. viverrini, such as structural and motor proteins. An assessment of evolutionary relationships showed that O. viverrini was more similar to other parasitic (Clonorchis sinensis and Schistosoma japonicum) than to free-living (Schmidtea mediterranea) flatworms, and 105 sequences had close homologues in both parasitic species but not in S. mediterranea. A total of 164 O. viverrini contigs contained ORFs with signal sequences, many of which were platyhelminth-specific. Examples of convergent evolution between host and parasite secreted/membrane proteins were identified as were homologues of vaccine antigens from other helminths. Finally, ORFs representing secreted proteins with known roles in tumorigenesis were identified, and these might play roles in the pathogenesis of O. viverrini-induced CCA. Conclusion This gene discovery effort for O. viverrini should expedite molecular studies of cholangiocarcinogenesis and accelerate research focused on developing new interventions, drugs and vaccines, to control O. viverrini and related flukes.


Background
Throughout East Asia, there is a strikingly high prevalence of cholangiocarcinoma (CCA -cancer of the bile ducts) in regions where the human liver fluke is endemic. No stronger link occurs between a human malignancy and infection with a eukaryotic parasite than that between CCA and infection with the liver fluke, Opisthorchis viverrini (Digenea) [1]. Indeed, the International Agency for Research on Cancer (IARC) recognizes O. viverrini as a 'category I carcinogen' [2,3]. CCA is highly prevalent in Northeast Thailand, areas where uncooked cyprinoid fish are a dietary staple. Due to poor sanitation practices and inadequate sewerage infrastructure, O. viverrini-infected people pass the trematode's eggs in their feces into natural bodies of fresh water. Aquatic snails, which represent the first intermediate hosts of O. viverrini, ingest the eggs from which the miracidia undergo asexual reproduction before a population of the free swimming larval stage, called a cercaria, is shed from the infected snails. The cercaria then locates a cyprinoid fish, encysts in the fins, skin and musculature of the fish, and becomes a metacercaria. The metacercarial stage is infective to humans and other fisheating mammals. Infection is acquired when people ingest raw or undercooked fish. The young adult worm escapes from the metacercarial cyst in the upper small intestine and then migrates through the ampulla of Vater into the biliary tree, where it develops to sexual maturity over four to six weeks, thus completing the life cycle. The adult worms, which are hermaphrodites, can live for many years in the liver, even decades, shedding as many as 200 eggs per day which pass out via bile into the chyme and feces [4].
In Thailand, ~6 million people are infected with O. viverrini. Despite widespread chemotherapy with the compound, praziquantel, the prevalence of O. viverrini in some endemic areasapproaches 70% (reviewed in [1]). Moreover, in Thailand, liver cancer is the most prevalent of the malignant/fatal neoplasias, and the prevalence of CCA in regions in which the parasite is endemic is unprecedented [5].
While sexual reproduction takes place in the mature adults of O. viverrini within the bile ducts, asexual reproduction in the snail leads to a massive increase in the number of infectious cercarial stages exiting and swimming off to locate then infect the fish host. The adult fluke is a diploid organism which reproduces by meiosis; self fertilization of the male and female organs occurs, but it is believed that cross-fertilization between adjacent adult worms is the normal pattern. Although the genome size of O. viverrini has not yet been reported, it is known to have six pairs of chromosomes, i.e. 2n = 12 [6], distinct from the closely related liver fluke, Clonorchis sinensis, which possesses 2n = 56 chromosomes [7].
Despite its public health importance, only a small number of O. viverrini sequences (mostly ribosomal genes) have been available in public databases prior to the present study. Characterization of the genes expressed in this organism should provide a foundation for elucidating the immunopathogenesis of CCA, particularly the molecular mechanisms by which infection with this parasite induces cancer. Indeed, secreted proteins of O. viverrini induce hyper-proliferation of cells (or hyperplasia) in vitro [8], implying that carcinogenesis may not be just a consequence of chronic inflammation, but that the parasite actively secretes gene products which initiate neoplasia.
Here, we undertake gene discovery for O. viverrini after the construction of a cDNA library and characterization of 5,000 expressed sequence tags (ESTs) from this carcinogenic parasite. A similar dataset exists for C. sinensis [9], which, despite its widespread prevalence [10], is not recognized as a 'class I carcinogen' [3]. Therefore, we compared the available transcriptomic dataset from O. viverrini with those from C. sinensis, and from several other flatworms, both free-living and parasitic in humans.

Results and Discussion
Features of the dataset Of 5,159 randomly selected ESTs, a total of 4,241 yielded acceptably high quality sequences. These in turn were clustered into contigs, establishing a catalogue of 1,932 non-redundant OvAEs. This apparently represents the largest dataset thus far for any of the liver flukes. Table 1 summarizes the key features of the dataset. Of note is that the identities for 1,070 (55%) of these OvAEs could not yet be established, as they did not share sequence homology (BLASTx/tBLASTx) with any other predicted or known molecules in public databases, including dbEST which contains 2,678 ESTs from the related liver fluke, C. sinensis [9]. The average insert size of these novel OvAEs was 550 nt; 47 of these 1,070 OvAEs had insert sizes of less than 150 nt. These ESTs may in fact be O. viverrini-specific or even digenetic fluke-specific genes. A similar situation currently pertains to the human blood fluke where a large percentage of known transcripts, and indeed proteins, are assumed to be Schistosoma-or indeed phylum Platyhelminthes-specific [11,12]. If the O. viverrini genome has 14,000 protein-coding genes (like the blood fluke S. mansoni) [13], and if each of the 1,932 O. viverrini contigs represented a protein coding gene, these newly discovered genes from the adult stage of O. viverrini are predicted to represent ~14% of the entire transcriptome of this liver fluke. EST sequences described herein have been deposited in dbEST under accession numbers EL618683-EL620614.

Abundantly expressed transcripts
After manual filtering of 136 ribosomal sequences, the 10 most abundantly represented mRNAs encoded proteins with known or unknown functions, including one contig that did not have homologues in any public databases ( Table 2). Abundant contigs encoded proteins involved in a range of physiological functions which are considered essential to parasitism, such as anaerobic respiration (myoglobin) [14], reproduction (vitelline precursors and egg shell proteins) and detoxification of xenobiotic compounds (glutathione-S-transferase). Other abundantly expressed OvAEs encoded proteins of likely key relevance to the host-parasite relationship, and included proteases (papain-like and legumain-like enzymes), saposin-like proteins and dynein light chains. Homologues of some of the most abundantly represented OvAEs were also highly represented in C. sinensis ESTs (cysteine proteases, myoglobin, vitelline B precursor), whereas others were uniquely over-expressed in each species. In particular, structural molecules, including tubulin and actin-binding proteins, were among the 10 most abundant clones from C. sinensis [9], but were not highly represented in the dataset from O. viverrini. An in-depth comparison of the O. viverrini and C. sinensis datasets is presented below.

Gene ontology assignments of ESTs from O. viverrini and related flukes
Three hundred and eighty three (383) of the total 1,932 OvAEs (19.8%) could be assigned GO classifications (Figure 1). The most abundant groups represented under the molecular function category were linked to binding (34.8%), catalytic activity (27.9%) and structural molecule activity (13.9%). Other sequences of interest identi-fied in this category were inferred to relate to caspase activity (0.2%) and transporter activity (5.3%). The most abundant groups represented under the biological process category corresponded to physiological processes (41.2%), cellular processes (39.7%) and unknown biological processes (15.6%).
We then undertook a comparative assessment of GO assignments of sequences from O. viverrini and two other trematode parasites of humans in Asia -the liver fluke, C. sinensis (2,679 contigs) and the blood fluke, S. japonicum (107,427 contigs). In general, the percentages of ESTs allocated to each GO category among these three flukes was similar ( Figure 2). However, some categories were over-or under-represented in one species. For example, contigs encoding structural proteins were ~four times more abundantly represented in the two liver flukes than in S. japonicum, whereas contigs encoding motor proteins were ~three times more abundant in O. viverrini than they were in C. sinensis or S. japonicum. Sample sizes were too small to determine whether these differences were statistically significant. Both structural and motor proteins are important components of fluke teguments [15], playing roles in surface maintenance and turnover in schistosomes [16,17] and liver flukes [18]. Therefore, these molecular differences might reflect the specialised niches and physiological requirements of each parasite. From just 1932 OvAEs, 15 different contigs had ORFs encoding components of the dynein complex of motor proteins, a category of motion-related, and surface and gut-localized EF-hand motif-containing antigens which, at least in schistosomes, represent potent allergens and targets of protective immunological responses [17,19].

Evolutionary relationships between O. viverrini and other platyhelminths
To assess the evolutionary relationships between O. viverrini and other platyhelminths (both parasitic and free-living), we used SimiTri [20] to plot the relative similarities of predicted polypeptide sequences ( Figure 3). OvAEs shared most sequence identity with sequences from C. sinensis (2,679 publicly available ESTs), and S. japonicum (107, 427 ESTs). OvAEs were less similar to sequences from the free-living turbellarian platyhelminth, Schmidtea mediterranea (171,472 ESTs) [21], which was not altogether surprising given the phylogenetic distance between parasitic and free-living members of the phylum Platyhelminthes [22,23]. The bulk of this phylum (including those species analysed here) represents a monophyletic group based on 18S rDNA sequences [22] and morphological characters [24], and is often referred to as the Rhabditophora [22]. Therefore, the members of the Rhabditophora are considered to be more closely related to each other than to other turbellarian clades, such as the Polycladida [22]. A total of 105 OvAEs had homologues in the ESTs from the two parasitic flukes but not in the free-living Schmidtea ([see additional file 1]; selected examples are shown in the table in Figure 3), suggesting that at least some of these are parasitism-specific genes. Thirty-three (33) of the conserved parasitic fluke genes were novel and did not have homologues of known function. Predicted proteins of known function included homologues of legumain, fatty acid binding proteins, myoglobin and potential anti-inflammatory proteins such as Ly6/UPAR domain-containing proteins. Of the parasitic fluke-specific genes, 38 encoded ORFs with Nterminal signal sequences; 14 of these OvAEs had homologues in just S. japonicum and 24 had homologues in both S. japonicum and C. sinensis.

Secreted and membrane proteins
We conducted an analysis of ORFs containing an N-terminal signal peptide or signal anchor. A total of 164 OvAEs contained ORFs with signal sequences. The dataset was divided into three categories -sequences that were (i) novel; (ii) platyhelminth-specific; (iii) conserved across multiple phyla. Novel sequences constituted 55.4% of the total, but only 5.2% of them encoded proteins with a signal sequence. Conserved sequences constituted 36.3% of the total, and 10.7% of these encoded proteins with signal sequences. Finally, the sequences inferred to be platyhelminth-specific accounted for 8.3% of the total dataset, but 20.6% of these encoded proteins with a signal sequence ( Figure 4). It should be noted, however, that not all of the OvAEs contained full-length nt sequences, and therefore the true percentage of sequences with signal peptides cannot be definitively inferred in the absence of full genome coverage.
Sequences encoding novel secreted and/or membrane proteins (without orthologues/paralogues in other organisms or phyla) may be of particular interest for the development of vaccines and drugs, because the absence of host homologues enhances the prospect for therapeutic margins of safety. OvAEs encoding secreted/membrane proteins involved in many aspects of parasitism were Summary of predicted gene product function and location using gene ontology terms Figure 1 Summary of predicted gene product function and location using gene ontology terms. Gene ontology (GO) terms for annotated Opisthorchis viverrini assembled ESTs were extracted, if present, from the GO database and sorted into the immediate subcategories for molecular function, cellular component and biological process. The GO subcategory and percentage relative to the total number of extracted terms is indicated in the legend. Although cellular and physiological processes, structural proteins and catalytic activity were strongly represented other categories of interest include the caspases and transporter activity that may represent proteins important for a parasitic lifestyle. The large number of unknowns in each of the three categories highlights the lack of knowledge regarding many of the proteins found in these parasites.
identified (Table 3), and some of these are discussed in the following section. Two of the OvAEs presented in Table 3, which are inferred to encode transforming growth factor β receptor (see section "Host-parasite cross-talk) and calumenin, were more similar in sequence to vertebrate proteins from both the non-redundant and dbEST databases than they were to platyhelminth sequences, suggesting that they have evolved independently to bind host ligands. These results are reminiscent of reports of the schistosome transcriptome where, for example, receptors for mammalian hormones, including insulin, fibroblast growth factor and cytokines, have been hypothesized to bind host molecules (reviewed in [25]).

Proteases
As with other parasitic helminth transcriptomes [11][12][13]25,26], proteins with catalytic activity were abundantly represented in the O. viverrini dataset (27.9% of contigs that were assigned GO molecular functions). Many of these enzymes encoded endo-and exo-proteases belonging to established families (MEROPS classification), but which have not yet been described from liver flukes ( Table  3). Of particular interest were members of the S1A serine protease family with sequence similarity to kallikrein and chymotrypsin, and, therefore, potentially involved in feeding or tissue migration [27]. Other proteases included homologues of enzymes that digest hemoglobin in bloodfeeding helminths, including cathepsin D-like aspartic and cathepsin B-like cysteine proteases [28][29][30] as well as an asparaginyl endopeptidase, which is known to activate the gastrodermal cathepsin B enzyme, and probably other gut proteases in S. mansoni [31]. We also identified O. viverrini homologues of the cell death enzyme, caspase-2, and the neutral cysteine protease from the tegument of schistosomes, calpain.

Multiple membrane-spanning proteins
Predicted proteins with multiple membrane spanning domains were identified. Tetraspanins, an abundantly represented family of four-transmembrane proteins in the tegument of schistosomes [32,33], were identified from O. viverrini ( Figure 5). These proteins are thought to stabi- lize the cell membrane by forming a network of interactions, called the tetraspanin web, with other membranebound and -associated proteins, particularly on the surface of cells of the immune system [34]. A homologue of the six-transmembrane domain family of water channel proteins, aquaporin [35], was identified. Seven transmembrane proteins are common drug targets [36], and at least three distinct members of this family were identified, including receptors for dystroglycan and lamin b, and a protein with homology to a the DC-STAMP receptor from the surface of dendritic cells (Table 3).

Host-parasite "cross talk"
Parasitic helminths receive host-derived signals for growth and reproduction via surface receptors for host ligands, [37][38][39]. Convergent evolution of extracellular parasite proteins to promote their interactions with host tissues is well documented [40,41], and we identified O. viverrini ORFs encoding membrane and secreted proteins, some of which were clearly more similar to vertebrate than to invertebrate homologues (Table 3). Transforming growth factor-beta (TGF-β) regulates cell growth and differentiation and is acquired on the cell surface by specific Evolutionary relationships between Opisthorchis viverrini and related platyhelminths based on similarities of protein coding genes using SimiTri  Table S1.
TGF-β receptors [42]. An ORF encoding a member of the TGF-β receptor type Ib family was identified in O. viverrini. The ORF included a 28 amino acid insertion absent from other type I TGF-β receptors, except for TR1 from the hydatid tapeworms of the genus Echinococcus (also members of the phylum Platyhelminthes) [43]. However, these two insertions did not share sequence identity ( Figure  6A). Unlike many of the ESTs identified for which the closest homologues were from parasitic trematodes, the O. viverrini TGF-β receptor type I was divergent from SmRK-I of S. mansoni [44] and instead grouped more closely with proteins from Echinococcus multilocularis and from parasitic and free-living nematodes ( Figure 6B). In pairwise sequence comparisons, however, the O. viverrini partial ORF was more similar to pig and macaque sequences (44% identity over 181 amino acids) than it was to Echinococcus TR1 (40% over 180 residues) or SmRK1 (40% over 182 residues). SmRK-I is known to bind to human TGF-β [40], suggesting that the O. viverrini receptor might also bind host growth factors for maturation and reproduction. Another OvAE encoding a protein which is potentially involved in the acquisition of host signals (and subsequent signaling) for growth and development was a fibroblast growth factor (FGF) receptor substrate 2. Parasitic flatworms induce fibrosis (and FGF) [45], and the parasites might acquire and utilize the host FGF that they induce for development and reproduction. Indeed, schistosomes are dependent upon FGF and transferrin for growth and maturation in vitro [46]. Of the sequences presented in Table 3, another OvAE which shared greatest identity with vertebrate homologues, encoded for calumenin, an EF-hand calcium binding protein localized to the secretory pathway. Calumenin is an inhibitor of the gamma-carboxylation system [47] and is expressed in thrombin-activated thrombocytes. It has a modulating effect on the organization of the actin cytoskeleton and may be involved in the pathophysiology of thrombosis or in wound healing [48]. The predicted calumenin of O. viverrini was most similar to rat and frog orthologues/paralogues, suggesting that it might interact with actin on the surface of host cells which are damaged during parasite feeding and migration.

Molecules associated with cancer?
O. viverrini is the major cause of CCA in South-East Asia [1]. The molecular mechanisms underlying induction of O. viverrini-induced CCA are thought to be multi-factorial (reviewed in [49]), but recent evidence suggests that O. viverrini secretes mitogenic proteins into host tissues [1,8].
OvAEs encoding secreted proteins with prospective mitogenic activity were identified in the EST dataset. Of note, first, progranulin (pgrn) is a pluripotent secreted growth factor that mediates cell cycle progression, cell motility [50] and wound repair [51]. We identified an OvAE (OvAE1732) that shared sequence identity with pgrn (data not shown). Of particular importance is that pgrn has been implicated in regulating the proliferation of tumour cells, and its expression is up-regulated in more aggressive cancers (reviewed in [50]). The kallikrein-like serine proteases are another family of enzymes whose over-expression has been linked to cancer. The expression of some kallikreins in prostate cells leads to changes indicative of an epithelial to mesenchymal transition, an important process in cancer progression [52]. An OvAE (OvAE1918) with sequence identity to kallikrein-like secreted proteases is present in the new O. viverrini gene catalogue. Phospholipase A2 (PLA-2) regulates the provision of arachidonic acid to both cyclooxygenase-and lipoxygenase-derived eicosanoids (reviewed by [53]), and the upregulation of cyclooxygenase-2 is thought to be an important feature of cholangiocarcinogenesis in both humans and experimental rodent models [49,54,55]. We identified an OvAE (OvAE1644) that encodes a secreted PLA-2 which shared greatest sequence identity with PLA-2 from venom of Heloderma (Gila monster) and an EST from C. sinensis (Table 3). Parasites utilize secreted serine proteases [56] and PLA-2s [57] to invade host tissues, and Distribution of Opisthorchis viverrini assembled ESTs (OvAEs) that contain predicted signal peptides or signal anchors Figure 4 Distribution of Opisthorchis viverrini assembled ESTs (OvAEs) that contain predicted signal peptides or signal anchors. OvAEs that had BLAST hits greater than 1 × 10 -5 were sorted into conserved (those matching entries for species other than platyhelminths), phylum Platyhelminthes-specific (only matching platyhelminth entries) and novel (no significant homology to any database entry). The sequences in each category were then analysed for the presence of a signal sequence using SignalP. The relative percentages of each category are indicated along with the subcategory of signal sequence positive contigs.
homologues of these proteins (and granulin) are potentially secreted by O. viverrini into host tissues where they might promote cell proliferation, mutagenesis and ultimately carcinogensis. Ongoing studies in our laboratories are now focused on the physiological roles of these putative carcinogens in the host-parasite relationship and in cholangiocarcinogenesis induced by O. viverrini infection.

Potential vaccines
Digenean flukes develop through a series of morphologically and developmentally discrete stages within their mammalian hosts, and each stage can be expected to display a characteristic transcritpome, confounding efforts to develop new control measures. Adult parasitic flukes are bound by an outer epithelial tegument, a structure that is widely regarded as the most vulnerable target for vaccines and drugs [32]. Homologues and orthologues of vaccine antigens identified in the tegument (and other structures from larval stages) of other flatworms were identified in the O. viverrini dataset (Table 4). Of particular note were the membrane spanning proteins, including an orthologue of the protective tetraspanin from S. mansoni, Sm-TSP-2 [32,33] (Figure 5) and the 22.6 kDa family of antigens from the schistosome tegument [58]. Homologues of gut proteases used by blood-feeding helminths to digest their blood-meal were identified from O. viverrini, including cathepsin D-like aspartic proteases [59,60], 11 distinct papain-like cysteine proteases [61][62][63] and the neutral protease, calpain, which associates with the inner tegument of schistosomes [64]. Other potential immunogens include lipid-binding proteins which are efficacious vaccines in the rabbit model against the western liver fluke, Fasciola hepatica, including saposin-like proteins [65] and fatty acid-binding proteins [66,67].

Conclusion
This report provides the first description of gene discovery for the liver fluke O. viverrini. Infection with O. viverrini is an important tropical health issue, but even more important and enigmatic is that chronic O. viverrini infection leads to the development of CCA. Indeed, there is no stronger link between a human parasite and cancer than that between O. viverrini and CCA [68]. The new gene catalogue for O. viverrini represents the largest EST dataset in the public domain for any species of liver fluke, and provides a platform for explorations into the molecular basis of host-helminth parasite interactions. We [1] and others [8] are interested in the molecules secreted into host tissues by O. viverrini that induce hyper-proliferation of biliary cells which can subsequently undergo malignant transformation. Given the number of O. viverrini ESTs sequenced herein, it is possible that mRNAs corresponding to these parasite mitogens are already present in the current dataset. Proteomic analysis of proteins secreted by adult O. viverrini maintained in vitro also is underway in our laboratories, and linking peptide sequences to corresponding mRNAs can be expected to be facilitated by this gene discovery program [12]. Finally, this gene discovery information for O. viverrini should expedite molecular studies of cholangiocarcinogenesis and accelerate research focused on developing new interventions, drugs and vaccines, to control O. viverrini and related flukes.

Parasite material
Adult O. viverrini were collected from experimentally infected hamsters (Mesocricetus auratas) maintained at the animal facility of the Khon Kaen University Faculty of Medicine. Protocols approved by the Khon Kaen University Animal Ethics Committee were used for all animal research conducted in this study. Briefly, metacercariae of O. viverrini were collected from naturally infected cyprinoid fish by pepsin digestion. Metacercariae (100 per hamster) were administered intragastrically to hamsters. Hamsters were euthanazed 6 weeks after inoculation, and adult worms were flushed with saline from the bile ducts [69]. Worms were washed extensively with sterile phosphate-buffered saline (pH 7.2), after which they were snap frozen and stored in liquid nitrogen or employed immediately as the source of fluke RNA.

Construction and mass excision of cDNA library
Total RNA from adult O. viverrini was extracted using Trizol (Invitrogen), following the manufacturer's instructions. Ten μg of O. viverrini total RNA was used as a template for the synthesis of double-stranded cDNA using the SMART cDNA kit (BD Bioscience), after which the cDNA modified with adapters was cloned into the Sfi I site of the pTriplEx2 plasmid (BD Bioscience) and packaged into λ arms. The titer and percentage of recombinant phages in the library were determined using the protocols

Clone selection, sequencing and annotation
Five thousand clones were randomly selected from the phagemid library and grown overnight in Luria Bertani (LB) broth supplemented with ampicillin to a final concentration of 25 μg/ml. Overnight cultures were shipped at 4°C in LB broth/ampicillin to the University of Melbourne (Department of Veterinary Science). The sequencing was performed by AgGenomics Inc., Australia, using a 3730xl DNA analyzer (Applied Biosystems). The Tem-pliPhi™ DNA Sequencing Template Amplification system (GE Healthcare) was used to sequence each clone using the 5'λ TriplEx2 sequencing primer.

Bioinformatic analyses
Edited sequences were condensed into contigs or singletons using TGICL [70] with the default parameters of 40 bp overlap, a minimum of 95% identity and a 30 bp maximum mismatched overhang. Sequences of less than 100 nt were discarded. Sequences were named using the same convention as that used for the human blood fluke, Schistosoma mansoni [13]; OvAE for O. viverrini Assembled EST. Sequences were compared with those available in the NCBI non-redundant protein and nucleotide databases using BLASTx and BLASTn. searches, respectively in October 2006. The dbEST database was queried using BLASTn and tBLASTx searches. BLAST alignments with an E-value of ≤ 1.0 × 10 -5 were reported. OvAEs were functionally categorized by querying a local copy of the Gene Ontology (GO) database [71] (downloaded November, 2006) with an E-value cutoff of 1.0 × 10 -5 . All ESTs from C. sinensis [9] and Schistosoma japonicum [11,12] were downloaded from NCBI [72], and the same methodology was used to derive ontology classifications for the C. sinensis ESTs. ORF predictions were performed using GENSCAN [73] using the HumanIso parameter set. Signal sequence prediction was accomplished using SignalP 3.0 [74], incorporating both hidden Markov models and neural networks. Positive signal sequence predictions from either method and positive signal anchor predictions using Markov models were reported. Predictions of transmembrane domains were conducted using TMPred [75]. All multiple sequence alignments were carried out using ClustalW. Clan and family assignments of proteolytic enzymes were analyzed via the MEROPS protease database [76]. Putative phosphorylation sites were predicted using the NetPhos 2.0 server [77].

Phylogenetic trees
Multiple sequence alignments were assembled using Clus-talW. Only regions which completely overlapped with partial ORFs of O. viverrini ESTs were used for tree construction. Alignments were imported into PAUP version 4.0 beta [78] to construct trees using the neighbour joining and maximum parsimony methods. Robustness was assessed by bootstrap analysis using 100 replicates. Clades with more than 50% support were denoted with bootstrap values on the branches.

Cross-taxon similarity analysis
OvAEs were compared with all entries for other organisms in the NCBI dbEST database using tBLASTx. The highest BLAST scores (above a cut-off value of 50) were used to generate SimiTri plots [20] using software developed inhouse (J. Mulvenna, unpublished).

Authors' contributions
TL, PP and JM generated and analyzed the data and contributed to drafting of the ms. BS provided parasite material, helped conceive the study and edited the drafted ms. MS and MJS provided technical assistance and edited the drafted ms. RBG facilitated interactions with the sequencing unit and edited the drafted ms. PB helped conceive the study, supervised the research and helped draft the ms. AL  [58] helped conceive the study, supervised the research, and took the lead on drafting the ms. All authors read and approved the final ms.