Analysis of expressed sequence tags and identification of genes encoding cell-wall-degrading enzymes from the fungivorous nematode Aphelenchus avenae

Background The fungivorus nematode, Aphelenchus avenae is widespread in soil and is found in association with decaying plant material. This nematode is also found in association with plants but its ability to cause plant disease remains largely undetermined. The taxonomic position and intermediate lifestyle of A. avenae make it an important model for studying the evolution of plant parasitism within the Nematoda. In addition, the exceptional capacity of this nematode to survive desiccation makes it an important system for study of anhydrobiosis. Expressed sequence tag (EST) analysis may therefore be useful in providing an initial insight into the poorly understood genetic background of A. avenae. Results We present the generation, analysis and annotation of over 5,000 ESTs from a mixed-stage A. avenae cDNA library. Clustering of 5,076 high-quality ESTs resulted in a set of 2,700 non-redundant sequences comprising 695 contigs and 2,005 singletons. Comparative analyses indicated that 1,567 (58.0%) of the cluster sequences had homologues in Caenorhabditis elegans, 1,750 (64.8%) in other nematodes, 1,321(48.9%) in organisms other than nematodes, and 862 (31.9%) had no significant match to any sequence in current protein or nucleotide databases. In addition, 1,100 (40.7%) of the sequences were functionally classified using Gene Ontology (GO) hierarchy. Similarity searches of the cluster sequences identified a set of genes with significant homology to genes encoding enzymes that degrade plant or fungal cell walls. The full length sequences of two genes encoding glycosyl hydrolase family 5 (GHF5) cellulases and two pectate lyase genes encoding polysaccharide lyase family 3 (PL3) proteins were identified and characterized. Conclusion We have described at least 2,214 putative genes from A. avenae and identified a set of genes encoding a range of cell-wall-degrading enzymes. This EST dataset represents a starting point for studies in a number of different fundamental and applied areas. The presence of genes encoding a battery of cell-wall-degrading enzymes in A. avenae and their similarities with genes from other plant parasitic nematodes suggest that this nematode can act not only as a fungal feeder but also a plant parasite. Further studies on genes encoding cell-wall-degrading enzymes in A. avenae will accelerate our understanding of the complex evolutionary histories of plant parasitism and the use of genes obtained by horizontal gene transfer from prokaryotes.


Background
The complete genome sequence of the free-living nematode Caenorhabditis elegans and the wealth of information on gene expression and function for this nematode [1,2] provide an excellent starting point for genome analysis of other nematodes. For less well studied organisms, where whole genome sequencing is currently unlikely, Expressed Sequence Tag (EST) analysis is a cost-effective method for gene discovery. EST analysis has been widely used within the Phylum Nematoda. However, most effort has been focused on plant or animal parasitic nematodes. Free living nematodes, with the notable exceptions of C. elegans, C. briggsae and Pristionchus pacificus, remain under represented in terms of ESTs.
Aphelenchus avenae is a well-known fungal feeding nematode that is currently placed in the superfamily Aphelenchoidea (family Aphelenchidae) [3]. This nematode is ubiquitous in soil and is associated with saprophytic, pathogenic, and mycorrhizal fungi. As a fungal feeder, A. avenae has potential as a bio-control agent against soilborne fungal plant pathogens [4][5][6][7][8] and, as it has a remarkable ability to survive desiccation; it is also used as a model system for studying anhydrobiosis in animals [9].
Although A. avenae is commonly found in soil samples taken from the rhizospheres of diseased and healthy plants, it is widely considered to be incapable of attacking healthy tissues of higher plants [10,11]. It has been suggested that when the nematode is found in association with plant material this occurs as a result of the nematode feeding on fungi associated with the plant. Alternatively, the finding of A. avenae within plant tissues [12,13] and its demonstrated ability to reproduce on plant callus material [13,14] may show that it can survive in healthy plant tissues and act as a facultative plant parasite. The role, if any, of A. avenae in relation to plant disease therefore remains uncertain.
In a previous study [15], we described the generation, analysis and annotation of over 10,000 ESTs from the pinewood nematode, Bursaphelenchus xylophilus, a pathogenic nematode species which was thought to belong to the same superfamily (Aphelenchoidea) as A. avenae [3] (but see below) and which can feed on live trees as well as on fungi. Genes encoding a range of cell-wall-degrading enzymes including cellulase (β-1,4-endoglucanase) [16], β-1,3-endoglucanase [17], pectate lyase [18] and expansin [19] were subsequently identified and characterized from this nematode. Similar enzymes [20][21][22][23][24][25][26][27][28][29][30][31][32][33] have also been identified and characterized from other plant parasitic nematodes including cyst and root-knot nematodes. These enzymes are produced within the esophageal gland cells of the nematode, secreted through the nematode stylet into host tissues and are thought to play an impor-tant role in the host-parasite interaction, allowing invasion and migration of the nematode through plant tissues. The presence of these enzymes is unusual; they are not usually present in animals and it is thought that the genes encoding them may have been acquired by horizontal gene transfer [34,35].
In classical taxonomic classification, A. avenae (family Aphelenchidae) has been placed in the same superfamily (Aphelenchoidea) as B. xylophilus (family Aphelenchoididae) whereas cyst and root-knot nematodes are placed in a different superfamily, the Tylenchoida (family Tylenchida), although the three nematode groups are all placed within the infraorder Tylenchromorpha [3]. However, recent phylogenetic studies using ribosomal DNA suggest that A. avenae is more closely related to cyst and root-knot nematodes than it is to B. xylophilus [36][37][38]. The current view of the taxonomy of three nematode groups is summarized in Fig. 1.
Although some of the parasitism genes are common to both superfamiles, Aphelenchoidea (Bursaphelenchus) and Tylenchoidea (cyst and root-knot nematodes), there are also differences between the parasitism genes present in the two nematode groups [3]. For example, Bursaphelenchus and cyst/root-knot nematodes contain endogenous expansins and pectate lyases which appear to have been acquired by a common ancestor via horizontal gene transfer [18,19]. However, the cellulases in the two groups are different. Those present in cyst and root-knot nematodes are from glycosyl hydrolase family 5(GHF5) and are likely to have been acquired from bacteria whereas those in Bursaphelenchus are from GHF45 and appear to have been acquired from fungi [35]. Nothing is currently known about such pathogenicity genes in A. avenae and the presence or absence of such genes in this nematode may shed light onto whether this nematode can act as a plant para-Simplified tree showing relationships of Aphelenchus avenae, Bursaphelenchus and cyst/root-knot nematodes Figure 1 Simplified tree showing relationships of Aphelenchus avenae, Bursaphelenchus and cyst/root-knot nematodes. Recently published phylogenetic tree based on SSU of ribosomal DNA [38] has been adapted for drawing this simplified tree. Taxonomic positions are indicated based on superfamily [3]. site. In addition, the presence of such horizontally acquired genes in A. avenae may also help reveal the evolutionary history of these genes within nematodes.
To address these issues, we have generated over 5,000 high quality ESTs from a mixed-stage A. avenae cDNA library. We report the identification of genes that could encode enzymes that degrade the cell walls of plants or fungi. We have also analysed the clustered A. avenae sequences using the Gene Ontology (GO) classification system and undertaken comparative analysis with C. elegans and other nematode protein databases.

Generation of ESTs from an A. avenae cDNA library
A mixed-stage A. avenae cDNA library (Aamk) was constructed to generate ESTs (Table 1). Sixteen clones were randomly selected and the sizes of the inserts in these clones were assessed after digestion with appropriate restriction enzymes. These insert sizes ranged from 400 to 1,600 base pairs (bp) with an average of 1.1 kilobase pairs (kb). A total of 5,472 cDNA clones were subsequently randomly isolated and sequenced from the 5' end in order to generate ESTs. The sequences were trimmed of vector sequence, adaptor sequence, poly(A) tail and low-quality sequence and filtered for minimum length (150 bp), resulting in a total of 5,076 high quality ESTs. The average length of submitted ESTs was 468 nucleotides (nt).

Cluster formation and analysis
To identify overlapping EST sequences, improve base accuracy and transcript length, and to produce nonredundant EST data for further functional annotation and comparative analysis, the 5,076 ESTs from the A. avenae library were grouped by sequence identity into clusters. Based upon regions of nucleotide identity, EST sequences were merged into contiguous consensus sequences (contigs). 'Contig' member ESTs derive from identical transcripts while 'cluster' members may derive from the same gene but represent different transcript splice isoforms (i.e ESTs form contigs, contigs form clusters). Two thousand seven hundred non-redundant EST clusters were generated from the ESTs (Table 1). In 2,005 cases, clusters consist of a single EST, whereas the largest single cluster contains 81 ESTs (1 case) (Fig. 2). The majority of contigs (650 out of 695) were composed of 2-10 ESTs. 89 clusters were found to contain multiple contig members, revealing potential splice isoforms. By eliminating redundancy during this contig building, the total number of nucleotides used for further analysis was reduced from 2.82 million to 1.61 million. In addition, this process significantly increased the length of assembled transcript sequences from 468 ± 114 nt for submitted ESTs alone to 595 ± 321 nt for contigs. The longest sequence generated also increased from 724 to 2,154 nt.
Based on the identified clusters 2,700 A. avenae genes were identified, corresponding to a new gene discovery rate of 53% (2,700/5,076). However, 2700 clusters is likely to be an overestimate of the true gene discovery rate, as one gene could be represented by multiple nonoverlapping clusters. Such "fragmentation" has been estimated at 18% using C. elegans as a reference genome [39]. After allowing for such potential fragmentation, we estimated that the A. avenae sequences derived from minimum of 2,214 genes giving a discovery rate of 44% (2,214/5,076). Assuming between 14,000 and 21,000 total genes, the range encompassed by Meloidogyne hapla [40], M. incognita [41] and C. elegans (Wormpep v. 203), the cluster dataset could represent approximately 11-16% of A. avenae genes.

Transcript abundance and highly represented genes
A high level of representation in a cDNA library usually correlates with high transcript abundance in the original biological sample [42], although artifacts of library construction can result in a selection for or against some transcripts. The A. avenae clusters were ranked according to the number of contributing ESTs, and the top 25 clusters are summarized in Table 2. Each of these clusters contained fifteen or more EST copies and represented 16% of the total number of ESTs obtained. Eighteen of the clusters had significant matches to genes with annotated functions based on BLASTX (E < 1e-5) against the non-redundant database, and all of these had homologues in nematodes. Transcripts abundantly represented in the A. avenae library included genes encoding structural proteins (such as actin, collagen, tropomyosin and troponin C) and proteins which carry out core metabolic processes (e.g. cytochrome c oxidase, ATP synthase). Other abundant ESTs included a small heat shock protein and phosphoenolpyruvate carboxykinase. The latter enzyme has previously been cloned from the parasitic nematodes, Haemonchus contortus and Ascaris suum [43,44]. Cluster AAC00541, containing 23 ESTs, was similar to an SXP/ RAL-2 family protein from the parasitic nematode, Anisakis simplex. Similar genes have previously been characterized from plant parasitic nematodes [45], and individual genes have been shown to be expressed in a range of secretory tissues including the gland cells sur- Histogram showing the distribution of ESTs from A. avenae by cluster size Figure 2 Histogram showing the distribution of ESTs from A. avenae by cluster size. For example, there were five clusters of size 23 containing a sum total of 115 ESTs. Distribution of contig sizes is not shown. 14 Novel -----a C. elegans homolog has higher probability match than the best GenBank descriptor rounding the main sense organs (amphids) and the hypodermis.
Seven of the 25 most abundantly represented transcripts from A. avenae had no significant similarity to any sequence in the non-redundant protein database ( Table  2). Since most nematode data are available only as ESTs and therefore not included in the BLASTX analysis, we compared these 7 contigs against dbEST using BLASTN and TBLASTX. However, these searches returned no significant matches (E < 1e-5). We also conducted BLASTN and TBLASTX searches against the non-redundant nucleotide database for these sequences. Six of the clusters did not return any matches from this database but cluster AAC00148 produced a match using TBLASTX analysis (E < 1e-5) ( Table 2).

Comparisons to proteins from other species
We compared the 2,700 cluster sequences from A. avenae against three databases containing protein sequences from different organisms. The cluster sequences were compared with protein sequences from (i) C. elegans (WORMPEP v.203 [46]), (ii) other nematodes (available protein sequences and peptides from conceptually translated ESTs), and (iii) organisms other than nematodes (from the NCBI non-redundant protein database) [47]. 66% of the A. avenae clusters (1,782 of 2,700) had matches in one or more of the three databases and these matches were represented using SimiTri [48] (Fig. 3). In the majority of cases where homologies were found (1,242/1,782), matches were found in all three databases surveyed. Gene products in this category are generally widely conserved across metazoans and many are involved in core biological processes. Examination of the individual database searches showed that 1,567 (58.0%) had homologues in C. elegans, 1,750 (64.8%) in other nematodes and 1,321 (48.9%) in organisms other than nematodes. The 918 clusters (34.0%) which had no significant similarity to any sequences in these three protein databases were searched against non-redundant nucleotide and dbEST databases using BLASTN and TBLASTX (employing a cut-off of 1e-05). 56 clusters generated matches in these searches but no matches were obtained for the remaining 862 sequences (31.9%). Table 3 shows the 15 gene products with the highest level of conservation (E-value ranging from 0 to e-151) between A. avenae and C. elegans; these include gene products involved in cell structure (for example, actin, UNC-87), protein biosynthesis or regulation (for example, Comparison of A. avenae cluster sequences with C. elegans, other nematodes and non-nematode protein sequence databases using SimiTri Figure 3 Comparison of A. avenae cluster sequences with C. elegans, other nematodes and non-nematode protein sequence databases using SimiTri. The numbers at each vertex indicate the number of cluster sequences matching only that specific database. The numbers on the edges indicate the number of cluster sequences matching the two databases linked by that edge. The number within the triangle indicates the number of A. avenae genes with matches to sequences in all three databases.  (137), or both (320) (Fig. 3).
The most conserved (1e-111) of these nematode-specific proteins was a homolog of a serine proteinase inhibitor, previously characterized from C. elegans (K10D3.4) and parasitic nematodes [49,50]. Among the other most conserved nematode-specific clusters were homologs of previously characterized C. elegans structural proteins (for example cluster, AAC01973 matched to a collagen family protein, COL-176) as well as uncharacterized C. elegans hypothetical proteins (for example, cluster, AAC01948 matched C. elegans gene, C34E7.4 which has no known function).
The 137 cluster sequences where homologs were present only in other nematodes were further categorized based on their BLAST (BLASTX and TBLASTX) results (Additional file 1). Matches were found in plant parasitic, animal parasitic and free living nematodes. 24.8% of sequences (34 of 137) had homology only to sequences from plant parasitic nematodes. Some of these sequences were similar to previously characterized cell-wall-degrading enzymes, which are known to be involved in the parasitism process of these nematodes. For example, cluster, AAC01592 matched an expansin-like protein from B. xylophilus [19] and cluster, AAC02968 matched a β-1,4endoglucanase precursor from Globodera rostochiensis [20]. Further analysis of some of the cell-wall-degrading enzymes present in A. avenae is presented below.

Identification of transcripts similar to stress-response genes related to desiccation
BLASTX (E < 1e-5) searches of A. avenae cluster sequences against nr protein databases allowed identification of genes that can encode proteins or enzymes important in providing protection against desiccation or other environmental-stress (Additional file 2). One notable observation was the presence of sequences similar to late embryonic abundant (LEA) proteins, which are known to be associated with tolerance to water stress resulting from desiccation [9]. Protein aggregation during desiccation is likely to be a major potential hazard for anhydrobiotes; LEA proteins may act as molecular chaperones or molecular shields and play an important role in the prevention of this aggregation [51]. Thirteen ESTs, distributed in three clusters, (AAC00729, AAC00888, and AAC01781) were identified as having significant similarity to LEA proteins. Cluster, AAC01781 which was identified as a singleton matched a previously characterized LEA protein from desiccated A. avenae [9]. In addition, we also identified multiple copies of cytochrome P450, superoxide dismutase, glutathione peroxidase, and glutathione S-transferase, enzymes involved in protection against oxidative damage. Desiccation stress of nematodes caused significant up-regulation of transcripts encoding these genes [52,53].

Functional classification based on gene ontology
Gene Ontology (GO) has been used widely to predict gene function and classification [54]. BLAST2GO [55,56], a universal, web-based annotation tool was used to assign GO terms for the A. avenae cluster sequences, extracting them from each BLAST hit against Swiss-Prot obtained by mapping to extant annotation associations. 1222 sequences out of 2,700 did not retrieve any BLAST results within the set E-value threshold (< 1e-5). Mapping of GO terms and annotation were not possible for 173 and 205 sequences, respectively. The remaining 1,100 (40.7%) sequences were successfully annotated and mapped to one or more of the three organizing principles of GO: biological process, molecular function and cellular component. The matches obtained from this analysis are summarized in Figs 4A-4C.
1,003 of the A. avenae cluster sequences generated matches in the "molecular function" class, 933 in the "biological process" class and 924 in the "cellular component" class. Within the "biological process" class the "regulation of biological process (GO:0050789)", "biosynthetic process (GO:0009058)" and "transport (GO:0006810)" categories were the most represented followed by "response to stimulus (GO:0050896) ", based on the annotation assigned by BLAST2GO (Fig. 4A). Within the "molecular function" class the "protein binding (GO:0005515)" term is the most represented followed by "nucleotide binding (GO:0000166)" and "transferase activity (GO:0016740)" (Fig. 4B). Many cluster sequences encoding ribosomal proteins as well as highly expressed genes coding for structural molecules (such as actin) and regulatory molecules (such as transcription factors) are assigned to the "protein binding" term. Since those clusters are abundantly present in the dataset, this may cause overrepresentation of the "protein binding" term. Within the "cellular component" class the "mitochondrion (GO:0005739)" is the most highly represented (Fig. 4C). A complete listing of GO mappings Summary of the Gene Ontology annotation as assigned by BLAST2GO Figure 4 Summary of the Gene Ontology annotation as assigned by BLAST2GO. (A) Most represented GO terms (based on number of represented sequences) of the main category "biological process"; (B) Most represented GO terms of the main category "molecular function (C) Most represented GO terms of the main category "cellular component". Multi-level pie charts were generated using the sequence cut-offs 140, 50 and 40 for "biological process", "molecular function" and "cellular component", respectively.
assigned for the A. avenae cluster sequences is provided in Additional file 3.

Identification of transcripts encoding cell-wall-degrading enzymes
BLASTX analysis allowed us to identify various genes with significant similarity to genes encoding enzymes which degrade plant and fungal cell walls ( Table 4). The plant cell-wall-degrading enzymes that were identified included cellulase, pectate lyase, polygalacturonase and expansin, while transcripts encoding fungal cell-wall-degrading enzymes included β-1,3-endoglucanase and chitinase.
Eight cellulase genes were present in 8 different A. avenae clusters and in all cases homologues were found in other plant parasitic nematodes. One cellulase gene (AAC01152) was identified as a contig of six individual ESTs. Two clusters (AAC00199 and AAC00801) contained 2 ESTs each and remaining cellulase clusters were present as a singleton. Two types of pectin degrading enzymes: pectate lyase and polygalacturonase were identified (Table 4). While all transcripts encoding pectate lyase genes were identified as singletons, polygalacturonase clusters contained either single or two individual ESTs. The features of the sequences of cellulase and pectate lyase are discussed in more detail below.
In addition to the plant cell-wall-degrading enzymes, we identified genes encoding expansin-like proteins in the A. avenae dataset. Expansins and expansin-like proteins have recently been described in several plant parasitic nematodes [19,[31][32][33] and it is thought that these proteins disrupt non-covalent bonds in the plant cell wall, enhancing the activity of other enzymes such as cellulases. All four expansin-like transcripts were identified as a best match with the expansin genes from the B. xylophilus [19].
Two different types of genes, β-1,3-endoglucanase and chitinase, that could encode enzymes, important in degradation of the fungal cell wall were identified. A gene encoding a β-1,3-endoglucanase has been cloned and characterized from B. xylophilus [17] and is thought to aid fungal feeding in this nematode. Chitinase, an enzyme responsible for breakdown of β-1,4-glycosidic bonds within chitin, has been found in wide ranges of nematodes. Since chitin is known to be present in the eggshell  [57] and the microfilarial sheath [58] of nematodes, it has been suggested that chitinases have a role in remodeling processes during the molting of filariae and in the hatching of larvae from the eggshell [59,60]. However, the existence of large families of chitinases in the free-living nematode C. elegans suggests that these enzymes may also fulfill other functions [61]. In the plant parasitic nematode, Heterodera glycines, chitinase was found to be expressed in the subventral oesophageal gland cells of the parasitic stages of this nematode, suggesting a role in parasitism but not in hatching [62]. The fungal feeding plant parasitic B. xylophilus also contains chitinase [15]. Since β-1,3-glucan and chitin are the two major structural polysaccharides of the fungal cell-wall, it is possible that fungal feeding nematodes like B. xylophilus and A. avenae secrete these enzymes in order to metabolize or soften the fungal cell wall as part of the feeding process.

Characterisation of genes encoding cellulases and pectate lyases; analysis of sequences, phylogentics and expression analysis
Cellulase and pectate lyase have been identified and characterized in a wide range of plant parasitic nematodes [16,18,[20][21][22][23][24][25][26][27][28][29]. The presence of genes encoding cell-walldegrading enzymes in A. avenae opens up a new avenue for further molecular studies aimed at understanding their functional role in this nematode and investigating the origin and evolution of these genes within the Nematoda. We therefore cloned the full length cDNA and genomic sequences of two putative cellulases (named Aa-eng-1 and Aa-eng-2) and two putative pectate lyases (named Aa-pel-1 and Aa-pel-2) and compared these sequences to those from other plant parasitic nematodes.
The full-length sequences of the cellulases were identified from two plasmid clones whose EST sequences are part of cluster, AAC001152 (Table 4). Although, six individual ESTs, form a contig to represent this cluster, one EST was selected as it showed a slightly different nucleotide sequence from the other five ESTs. Two different plasmid clones containing the full length cDNA sequences of the cellulases were subsequently selected and sequenced using the specific primers listed in Table 5.
The Aa-eng-1 cDNA was 1,104 bp in length (excluding the polyA tail) and included a 981-bp open reading frame (ORF) that could encode a protein of 327 amino acids with an ATG start codon at position 35 and a TGA stop codon at position 1,016 (Fig. 5). The complete cDNA of Aa-eng-2 was 1,107 bp in length and also contained a potential ORF of 327 amino acids with an ATG start codon at position 35 and TGA stop codon at 1,016. A signal peptide of 19 amino acids is predicted by SignalP [63] at the N terminus of the deduced AA-ENG-1 and AA-ENG-2 polypeptides. The predicted molecular masses of the putative mature proteins were 34.130 kDa and 34.059 kDa respectively and the theoretical pI value was 6.2 for both proteins. The AA-ENG-1 and AA-ENG-2 proteins contained a catalytic domain homologous to GHF5 β-1,4endoglucanases as predicted by PRODOM [64]. The deduced amino acid sequences showed highest similarity with the GHF5 endoglucanase from the migratory plant parasitic nematode Radopholus similis (GenBank Accession No.
Genomic clones of Aa-eng-1 and Aa-eng-2 were obtained by PCR amplification using gene-specific primers ( Table   Table 5 5) and genomic DNA as template. The Aa-eng-1 and Aaeng-2 genomic DNA products were1,518 bp and 1,522 bp long respectively from the ATG to the stop codon. The position of exon/intron boundaries of the genomic sequences were determined by aligning the genomic sequences with the corresponding cDNA sequences. All introns were bordered by canonical cis-splicing sequences [65]. Five introns were identified in Aa-eng-1 (Fig. 5) of which four introns were small (40 bp to 85 bp) a feature commonly found in nematodes [66]. Only the first intron was larger (319 bp). Five introns were also identified in Aa-eng-2. The first intron was 337 bp long and the length of remaining four introns ranged from 40 to 72 bp. The intron positions of Aa-eng-1 and Aa-eng-2 genes were identical to each other.
Sequence alignment of the two endoglucanases from A. avenae with GHF5 endoglucanases from nematodes and other organisms revealed that both AA-ENG-1 and AA-ENG-2 possess a consensus pattern of GHF5 endoglucanases in their primary amino acid sequences in which two glutamic acids residues are the predicted proton donor and mucleophile/base of the catalytic site (Fig. 5) which is also true for all previously described nematode GHF5 endoglucanases. In addition to the catalytic domain some of these proteins contain a cellulose binding domain (CBD) joined to the catalytic domain through a linker peptide. The GHF5 endoglucanase genes isolated from plant parasitic nematodes have also different structures: all have a signal peptide and catalytic domain, some have an additional linker and CBD and others only have a linker but no CBD [25,67]. However, neither peptide linkers nor CBD domains were present in the two GHF5 endoglucanases isolated from A. avenae. Expansins from cyst nematodes have also been shown to contain a CBD, Aa-eng-1cDNA sequence and predicted amino acid sequences Figure 5 Aa-eng-1cDNA sequence and predicted amino acid sequences. The predicted signal peptide for secretion and polyadenylation signal sequence are underlined. Predicted positions of the five intron sequences identified are indicated by darkened triangles. Primers used for obtaining the full length cDNA sequence and genomic amplification are indicated by arrows. The amino acids within the boxes represent the predicted active site residues. T  A  L  20  G ATC ATT CAG CCG GGC CTT CAG CGG CTC TAC GGG ATG AAG TGT CTT CTG ATC GCG TTC GTC GGC CTT GCC GCG TGT CAG TAT GCG ACC GCC TTG  94   T I 116 GTC GAT CAG GGC GGA TAC CTG AGC AAC AAG CAG GGC GAG CTC CAG AAG CTC AAG GTC GTC GTC GAG GCT GCC ATC GAA GCC GGA ATT TAC GTG ATC 382 ATC TAC GAG GAC TGG AAC GAG CCC CTC CAG GTG GAC TGG AAC TCG GTG ATC AAG CCC TAC CAT GAG GAG GTC GTG AAG GCC ATC CGC GCC GTC GAC 574 but no such domains were predicted in other sequences within the EST dataset, including the putative expansins.

TAC GCG AAC TGG GCC ATC GCC GAC AAG CAG GAG GCC TCC TCC GTC CTC AAG CCG GGC ACA CAG CCC TCG CAG GTG GGC CAG GAC GCC AAC CTC TCG 958
A phylogenetic tree was generated from an alignment of the β-1,4-endoglucanase protein sequences from AA-ENG-1, AA-ENG-2, cyst and root-knot nematodes, the migratory plant-parasitic nematodes R. similis, Pratylenchus penetrans, Pratylenchus coffeae and Ditylenchus africanus and GHF5 cellulases from phytophagous beetles, bacteria and protists (Fig. 6). AA-ENG-1 and AA-ENG-2 clustered into a larger group of protein sequences including all nematode GHF5 cellulases, indicating that A. avenae cellulases are closely related to those of the Tylenchida. This analysis supports the idea that all nema-tode GHF5 cellulases evolved from a GHF5 sequence acquired by a common ancestor of this group.
The β-1,4-endoglucanases are the largest family of cellwall-degrading enzymes that have been identified in parasitic nematodes to date. Over the last decade, a large number of GHF5 endoglucanases have been identified and extensively studied in plant parasitic Tylenchida including cyst and root-knot nematodes [20][21][22][23][24][25]. Genes encoding β-1,4-endoglucanases have also been found in Bursaphelenchus spp but these enzymes are most similar to GHF45 cellulases from fungi [16]. The presence of GHF5 cellulases within A. avenae (as opposed to GHF45 cellulases) provides further support for the suggestion that this Unrooted phylogenetic tree of GHF5 catalytic domains based on the protein sequences using the maximum likelihood method   Protists nematode is more closely related to the Tylenchida than to Bursaphelenchus and its relatives.
Although the presence of GHF5 cellulases within A. avenae can be readily explained given the phylogenetic arguments above, the presence of a β-1,3-endoglucanase is more surprising. These enzymes act to metabolise the fungal cell wall and have been previously described in Bursaphelenchus spp. Such genes are not usually present in animals and it was suggested that the Bursaphelenchus genes were acquired by horizontal gene transfer from bacteria [17].
No such genes are present in root-knot nematodes (for which two genome sequences are available) or other Tylenchida. It is possible that a fungal feeding common ancestor of Aphelenchoidea and Tylenchida possessed this gene but that more "advanced" plant parasites have subsequently lost it. Further sequencing within both nematode groups is required to resolve this issue.
All the transcripts potentially encoding pectate lyases were identified as singletons (Table 4). Two full-length cDNA sequences of pectate lyases, designated Aa-pel-1 and Aapel-2, were identified from the plasmid clones corresponding to the cluster IDs AAC01649 and AAC03048 respectively. The complete cDNA of Aa-pel-1 was 821 bp in length and contained an ORF of 247 amino acids with a putative ATG start codon at position 30 and TAA stop codon at position 771. A signal peptide of 18 amino acids is predicted by SignalP [63] at the N-terminus of the putative AA-PEL-1 amino acid sequence. The mature protein has a predicted molecular mass of 24.250 kDa and theoretical pI of 8.93.
The full-length Aa-pel-2 cDNA was 838 bp long and contained an ORF of 249 bp with an ATG start codon at position 31 and a TAA stop codon at position 778. An Nterminal signal sequence of 19 amino acids predicted by SignalP [63]. The molecular mass and theoretical pI value of the putative AA-PEL-2 protein were 24.329 kDa and 9.12 respectively. AA-PEL-1 has 61% identity to AA-PEL-2.
To obtain genomic sequences, the entire coding regions of Aa-pel-1 and Aa-pel-2 gene were amplified from A. avenae gDNA using gene specific primers ( Table 5). Analysis of these sequences showed that the Aa-pel-1 and Aa-pel-2 genes were 1,860 and 1,221 bp long respectively from the ATG to the stop codon. Two introns (468 bp and 651 bp) were identified in Aa-pel-1 whereas Aa-pel-2 contained only one intron (690 bp) (Fig. 7). All introns were bordered by canonical cis-splicing sequences [65]. The position of the second intron of Aa-pel-1 was identical to the intron in Aa-pel-2.
The intron position in the A. avenae genes (Aa-pel-1 and Aa-pel-2) were compared with other nematode pectate lyase genes. The pectate lyase genes from Bursaphelenchus species (Bx-pel-1/2 and Bm-pel-1/2) each have one intron in their coding region at the same position [18]. This position is also identical to the common intron position of the A. avenae genes (Fig. 7). Moreover, one of the three introns of Mi-pel-1 from M. incognita is at the same position as that in the A. avenae and Bursaphelenchus genes. Grpel-1 from G. rostochiensis has six introns and Mi-pel-2 from M. incognita has two introns. Gr-pel-1 and Mi-pel-2 share two intron positions but none of the introns of these genes have the same position as that in the A. avenae genes (not shown).
A protein homology search using the deduced amino acid sequences of AA-PEL-1 and AA-PEL-2 using BLASTP indicated high similarity to the pectate lyases belonging to the the polysaccharide lyase family 3 (PL3) from plant parasitic nematodes, bacteria and fungi. Multiple sequence alignment of AA-PEL-1 and AA-PEL-2 with the best matches confirmed that both A. avenae sequences contained the four highly conserved regions characteristic of PL3 pectate lyases in bacteria and fungi as well as 8-10 cysteine residues and four charged residues (Fig. 7) that are potentially involved in catalysis [26,68]. AA-PEL-1 and AA-PEL-2 were most similar to the pectate lyases (BX-PEL1/2 and BM-PEL1/2) from the pinewood nematodes B. xylophilus and B. mucronatus (52 to 59% identity for the AA-PEL-1 and 53 to 63% identity for AA-PEL-2) [18]. A. avenae sequences shared 23 to 33% identity with pectate lyases (MI-PEL-1, MI-PEL-2, MJ-PEL-1 and GR-PEL-1) from cyst and root-knot plant parasitic nematode spp [28,29], and 39 to 45% identity with the sequences from two microbes (Fig. 7).
A phylogenetic tree was generated from an alignment of A. avenae pectate lyase sequences with selected proteins belonging to PL3 from bacteria, fungi, and nematodes using the maximum likelihood method (Fig. 8). Both Aapel-1 and Aa-pel-2 were clustered with the Bursaphelenchus genes. Other nematode sequences were not monophyletic but were clustered into distinct clades.
The pectate lyase genes from A. avenae are more similar to the Bursaphelenchus genes compared to those from cyst and root-knot nematodes (Figs. 7 and 8). The identical position of the common intron of the A. avenae genes (Aapel-1 and Aa-pel-2) and introns within pectate lyase genes from Bursaphelenchus and M. incognita (Fig. 7) suggests that pectate lyase genes from a wide range of plant parasitic nematodes have the same origin. To determine which A. avenae cells express GHF5 endoglucanases and pectate lyases, in situ mRNA hybridisation was performed (Fig. 9). Digoxigenin-labeled antisense probes generated from Aa-eng-1 and Aa-pel-1 specifically hybridized with the transcripts in the esophageal gland cells of A. avenae (Figs. 9A and 9C). Staining was observed in juvenile and adult nematodes rather than being restricted to a specific life stages. No hybridisation was observed with the control (sense) cDNA probes of Aa-eng-1 or Aa-pel-1 (Figs. 9B and 9D). As a part of the complex process of parasitism, a wide range of plant parasitic nematodes use endogenous β-1,4endoglucanases and pectate lyases to degrade two abundant constituents of the plant cell wall and thus facilitate their migration through host tissues. The presence of signal peptides in the deduced amino acid sequences of the endoglucanases and pectate lyase from A. avenae coupled with their expression in the esophageal glands suggest that both enzymes have a similar role in A. avenae. The presence of such genes in A. avenae suggests that this nematode can enter and migrate through plant tissues and may also be able to feed on plant cell contents. This is backed up by the observation that A. avenae is known to feed on plant tissue in culture [13,14]. A. avenae may therefore have a wide ranging diet that includes fungi and plant tissues. This, coupled with the position of this nematode as a basal member of a clade that includes a wide range of plant parasitic nematodes, provides further support for the idea that plant parasitism has evolved from fungal feeding and suggests that A. avenae, may be a very primitive plant feeder.

Conclusion
The 5,076 ESTs identified in this study represent the first attempt to define the A. avenae gene set and represents over 2,200 genes. This collection of ESTs represents a starting point for studies in a number of different fundamental and applied areas. A summary of the assignment of nonredundant ESTs to functional categories as well as their relative abundance are listed and discussed. A substantial number of putative Aphelenchus-specific genes were found that do not share similarity with known genes and some of these may be highly expressed, based on their abundance in the EST dataset. The presence of genes encoding a battery of cell-wall-degrading enzymes in A. avenae and their similarities with the genes from other plant parasitic nematodes suggest that this nematode can act not only as a fungal feeder but also as a plant parasite. The gene structures of GHF5 cellulase and PL3 pectate lyase from A. avenae, their phylogenetics and comparative analyses with similar genes from other parasitic nematodes provides information that helps understand the evolutionary origins of these genes within the Nematoda. Further studies on genes encoding cell-wall-degrading enzymes in A. avenae will accelerate our understanding of the complex evolutionary histories of plant parasitism and the use of genes obtained by horizontal gene transfer from prokaryotes.

Biological material
The AaF1 isolate of A. avenae was cultured on fungi, Botrytis cinerea for 2-3 weeks at 25°C and then extracted for 3 h at 25°C using the modified Baermann funnel technique [69]. Separated, mixed life stage, nematodes were cleaned by flotation on a 30% (wt/vol) sucrose solution followed by three washes with 0.5× PBST [70]. Nematodes were stored at -80°C until use.

Isolation of total RNA, cDNA synthesis and cDNA library construction
Total RNA from mixed stage A. avenae was isolated using Sepasol (Nakalai). Analysis of the total RNA on a denaturing agarose gel showed a smear from 50 to 3,000 bp with two distinct bands of ribosomal RNA. Poly(A)+ RNA was extracted from total RNA using a FastTrack ® MAG micro mRNA Isolation Kit (Invitrogen). cDNA was synthesized using the SMART PCR cDNA amplification method (Clontech) with a NotI oligo-dT primer (5' AACTGGAAGAATTCGCGGCCGCAG-GAATTTTTTTTTTTTTTTTTT). SalI/SmaI adaptors (Takara) were added to double stranded cDNA which was digested with NotI and size fractionated using a cDNA Size Fractionation Column to remove small cDNA (< 500 bp) (Invitrogen). The appropriate fractions containing cDNA were pooled and ethanol precipitated. Inserts were directionally cloned into NotI and SalI sites of the pSPORT1 vector, and transformed into Escherichia coli DH5α cells.
The cDNA library was designated as Aamk.

EST generation
Individual transformants (n = 5,472) from the plasmid library were picked into 96 well plates containing 0.5 ml of LB medium containing 100 μg/ml ampicillin. Plates were incubated overnight at 37°C. A small aliquot of each culture was stored at -80°C after being mixed with same volume of 25% glycerol in LB. Plasmid DNA was isolated and purified using FB glass fiber plates (Millipore) using the glass bead method described in [71]. cDNA inserts were sequenced from the 5' end using the M13-T7 primer (5'-TAATACGACTCACTATAGGG-3') and the BigDye terminator ver. 3.1 kit (Applied Biosystems) on an ABI 3100 DNA sequencer (Applied Biosystems). Raw sequence trace data from the 3100 sequencer were processed in an automated pipeline, the trace2dbEST package [72]. Before submitting them to the public database (DDBJ), sequences were processed to assess quality, remove vector sequence, contaminants and cloning artifacts and to identify BLAST similarities.

Clustering and sequence analysis
Clustering was performed using PartiGene, a software pipeline designed to analyze and organize EST data sets [72]. Sequences were clustered into groups (putative genes) on the basis of sequence similarity using CLOBB [73]. Clusters were assembled to yield consensus sequences using Phrap (P. Green, unpublished data). Each consensus sequence was subjected to BLAST analysis against the GenBank non-redundant protein database.  [48] was used for the comparison (at the amino acid sequence level) of A. avenae cluster sequences with data in C. elegans, other nematode and non-nematode protein sequence databases, providing a two dimensional display of relative similarity relationships among the three different datasets. "Fragmentation" Unrooted phylogenetic tree of selected polysaccharide lyase family 3 proteins generated using maximum likelihood analysis   defined as the representation of a single gene by multiple non-overlapping clusters, was estimated by comparing A. avenae cluster sequences with C. elegans [39].

Gene ontology (GO)
Cluster sequences were classified into Gene Ontology functional categories [54] based on BLAST similarities to known genes in the NCBI Swiss-Prot protein sequence (Swiss-Prot) and using the BLAST2GO annotation tool [55,56] with an E-value cut-off of 1e-05 and summarized according to their biological processes, molecular functions and cellular components. To obtain the complete GO mapping, a node sequence filter in the GO graph was used 50 for biological process, 20 for molecular function, and 20 for the cellular component. The multi-level pie charts were generated using sequence cut-offs of 140, 50 and 40 for biological process, molecular function and cellular components respectively.

Identification of genes encoding cell-wall-degrading enzymes and sequence analyses
BLASTX searches against the GenBank database were used to identify A. avenae clusters encoding potential cell-walldegrading enzymes. To obtain full length sequences of genes, the plasmid clones from which each of these sequences were obtained were identified and resequenced both directions using designed primers (Table  5) in order to obtain the full-length cDNA sequences. The genomic coding region of each cDNA clone was obtained by PCR amplification from A. avenae genomic DNA, using pairs of gene-specific primers flanking each open reading frame. PCR products were cloned into the pGEM-T Easy vector (Promega) and sequenced using standard protocols.

Phylogenetic analyses
The phylogenetic analyses of the catalytic domains of the GHF5 proteins and PL3 proteins were performed on the Phylogeny.fr platform [74] and comprised the following steps. Sequences were aligned with MAFFT (v. 6) configured for highest accuracy (MAFFT with E-INS-i option). Prior to the phylogenetic analysis, signal peptide sequences and other N and C terminal extentions peculiar to individual taxa were excluded. In total 337 and 249 characters were used for GHF5 endoglucanase and pectate lyase respectively for phylogenetic anlaysis. The phylogenetic tree was reconstructed using the maximum likelihood method implemented in the PhyML program (v3.0 aLRT). The JTT substitution model was selected assuming an estimated proportion of invariant sites (of 0.018) and 4 gamma-distributed rate categories to account for rate heterogeneity across sites. The gamma shape parameter was estimated directly from the data (gamma = 1.603). Reliability for internal branch was assessed using the aLRT test (SH-Like). Graphical representation and edition of the phylogenetic tree were performed with TreeDyn (v198.3).

In situ hybridisation
In situ hybridisation was performed as described previously [75]. PCR products were generated from the plasmid stocks of the cDNA clones of Aa-eng-1 and Aa-pel-1 using gene specific primers (Table 5). Sense or antisense stra nds were labelled with digoxigenin by asymmetric PCR and hybridised to fixed, permeabilised fragments of mixed stage A. avenae. After washing to remove unbound probe, specifically hybridising probe was detected using Alkaline-phosphatase conjugated Anti-Digoxigenin antibody and NBT/BCIP stock solution (Roche Diagnostics). Specimens were examined with differential interference microscopy (Nikon).
Localization of Aa-eng-1 and Aa-pel-1 transcripts in the esophageal gland cells of A. avenae by in situ hybridisation