Stage-specific expression of protease genes in the apicomplexan parasite, Eimeria tenella

Background Proteases regulate pathogenesis in apicomplexan parasites but investigations of proteases have been largely confined to the asexual stages of Plasmodium falciparum and Toxoplasma gondii. Thus, little is known about proteases in other Apicomplexa, particularly in the sexual stages. We screened the Eimeria tenella genome database for proteases, classified these into families and determined their stage specific expression. Results Over forty protease genes were identified in the E. tenella genome. These were distributed across aspartic (three genes), cysteine (sixteen), metallo (fourteen) and serine (twelve) proteases. Expression of at least fifteen protease genes was upregulated in merozoites including homologs of genes known to be important in host cell invasion, remodelling and egress in P. falciparum and/or T. gondii. Thirteen protease genes were specifically expressed or upregulated in gametocytes; five of these were in two families of serine proteases (S1 and S8) that are over-represented in the coccidian parasites, E. tenella and T. gondii, distinctive within the Apicomplexa because of their hard-walled oocysts. Serine protease inhibitors prevented processing of EtGAM56, a protein from E. tenella gametocytes that gives rise to tyrosine-rich peptides that are incorporated into the oocyst wall. Conclusion Eimeria tenella possesses a large number of protease genes. Expression of many of these genes is upregulated in asexual stages. However, expression of almost one-third of protease genes is upregulated in, or confined to gametocytes; some of these appear to be unique to the Coccidia and may play key roles in the formation of the oocyst wall, a defining feature of this group of parasites.


Background
Proteases are essential regulators of pathogenesis in the Apicomplexa, a phylum that includes obligate, intracellular protozoan parasites of great human health (e.g., Plasmodium species, causing malaria, Toxoplasma gondii, causing toxoplasmosis, and Cryptosporidium, causing cryptosporidiosis) and agricultural and economic significance (e.g., Neospora caninum, the cause of foetal abortion in cattle, and Eimeria species, the causative agents of coccidiosis in poultry, cattle, sheep and rabbits). Extensive study of Plasmodium species and T. gondii has established that proteases help to coordinate and regulate the lifecycles of these parasites, playing key roles in host cell invasion, general catabolism, host cell remodelling and egress from host cells [1]. These processes are all associated with the asexual stages of apicomplexan parasites. By contrast, relatively little is known about what roles proteases may play in the sexual phase of the apicomplexan lifecycle though it is known that a subtilisin 2 is detected specifically in the gametocyte proteome [2] and expression of falcipain 1 is upregulated in gametocytes [3] of P. falciparum. Moreover, it has been demonstrated that the cysteine protease inhibitor, E64d, or the targeted genetic disruption of falcipain 1 can inhibit oocyst production in P. falciparum [3,4]. Likewise, the proteosome inhibitors, epoxomicin and thiostrepin, exhibit gametocytocidal activity [5,6].
In comparison to P. falciparum and T. gondii, proteases from Eimeria species have been studied far less intensively, despite the economic importance of this genus of parasites. Thus, homologs or orthologs of several classes of proteases found in P. falciparum and/ or T. gondii have also been identified in Eimeria species including an aspartyl protease [7][8][9][10], an aminopeptidase [11], a rhomboid protease [12,13], a subtilisin 2-like protease [10,13,14], three cathepsin Cs [15], a cathepsin L [15] and an orthologue of toxopain, a cathepsin B cysteine protease [14,15]. As for P. falciparum and T. gondii, these proteases have been found in the asexual stages of Eimeria and are mostly predicted to play roles in host cell invasion, though expression of some of these enzymes is associated with the sporulation of the developing oocyst [11,13,15]. However, it is hypothesized that proteolytic processing of two proteins from the wall forming bodies of the macrogametocytes of Eimeria -GAM56 and GAM82is essential for the subsequent incorporation of tyrosine-rich peptides into the oocyst wall [16].
In this study, we screened the E. tenella genomic database for genes encoding proteases, classified these into clans and families and designed PCR probes for them. Using cDNA produced from E. tenella stage specific mRNA, we carried out semi-quantitative PCR to determine the stage specificity of expression of the protease genes, especially to identify protease mRNAs that were upregulated in gametocytes. In order to further resolve which of these may be involved in oocyst wall formation, we carried out a processing assay using gametocyte extracts of E. tenella, whereby a variety of specific protease inhibitors were tested for their ability to inhibit the processing of GAM56 into smaller, putative oocyst wall proteins.

Identification of potential protease genes in Eimeria tenella
The genome of E. tenella (Houghton strain) was sequenced by the Parasite Genomics Group at the Wellcome Trust Sanger Institute and provided prepublication for the current analysis. The Parasite Genomics Group plan to publish the annotated sequence in a peer-reviewed journal in the coming future. The E. tenella genome database (http://www.genedb.org/Homepage/Etenella) was explored to identify genes that were automatically predicted to code for aspartic, cysteine, metallo and serine proteases. Database mining revealed over 60 gene sequences whose predicted open reading frames were associated with potential peptidase activity. Manual annotation of the genes was performed by BLAST search of apicomplexan genome databases to identify phylogenetically closely related nucleotide sequences and by BLAST search of various protein databases to identify the most closely related, experimentally characterized homologs available (Table 1). Additionally, the predicted proteins were analysed for conserved motifs and domains to further validate protein function (Table 1). Each predicted protein was then assigned a five-tiered level of confidence for function using an Evidence Rating (ER) system (Table 1). The evidence rating system, described previously [17], allocates genes an overall score (ER1-5), indicating how compelling the bioinformatic and experimental evidence is for protein function. An ER1 rating signifies extremely reliable experimental data to support protein function in the particular species being investigated, in this case Eimeria, whereas ER5 indicates no experimental or bioinformatic evidence for gene function. Genes with an ER5 were eliminated from further investigation. After this validation process was performed, 45 putative protease genes remained and these could be classified into clans and families of aspartic, cysteine, metallo and serine proteases (Table 1), including: three aspartic proteases, all within family A1 in clan AA; 16 cysteine proteases, the vast majority (15) of which were in clan CA, five being cathepsins (family C1), one calpain (family C2), eight ubiquitinyl hydrolases (family C19) and one OTU protease (family C88), as well as a single clan CF pyroglutamyl peptidase (family C15); 14 metallo proteases, distributed over five clans (MA (6), ME (5), MF (1), MK (1) and MM (1)) and seven families (M1 (2), M41 (3), M48 (1), M16 (5), M17 (1), M22 (1) and M50 (1)); and 12 serine proteases in clan PA (three trypsinlike proteases in family S1), clan SB (six subtilisin-like proteases in family S8), clan SC (one prolyl endopeptidase in family S33), clan SK (one Clp protease in family S14) and clan ST (a rhomboid proteaserhomboid protease 1in family S54). Three additional rhomboid proteases were identified in the E. tenella genome database by using BLASTP to search the database using, as queries, homologs described in T. gondii: rhomboid protease 3 (ETH_00032220, Supercontig_69: 140161-141340; 4.0e52); rhomboid protease 4 (ETH_00009820, Supercontig_44: 17996-24858; 9.8e-164); and rhomboid protease 5 (ETH_00040480, contig NODE_916_ length_3953_cov_17.775614: 53-3466; 7.2e-65). However, we were unable to confirm coding sequences or stage-specific expression for any of these three genes.

Stage-specific protease gene expression
To assess the stage specific gene expression of putative proteases identified in the E. tenella database, different stages of the parasite lifecycle were isolated and total RNA purified. These stages included merozoites, 134 h gametocytes, unsporulated oocysts, sporulated oocysts as well as uninfected caeca control tissue. RT-PCR was performed and the stage-specific cDNA samples were subjected to control PCRs to determine purity ( Figure 1). Purification of merozoite and gametocyte lifecycle stages inevitably results in co-purification of host tissue, hence, the E. tenella β-actin structural gene was amplified to  optimize relative amounts of parasite starting material as described previously [18]. The E. tenella β-actin gene was amplified from each of the parasite lifecycle cDNA samples and quantification of bands visualized by agarose gel electrophoresis allowed the specific E. tenella cDNA to be standardized to each other accordingly. The E. tenella gam56 gene product, which is predominantly expressed in gametocytes but largely down-regulated in unsporulated oocysts, confirmed the quality of gametocyte cDNA and served as a gametocyte-specific positive control, establishing the lack of gametocytes in merozoite and oocyst samples. The amplification of the tfp250 gene, specifically expressed in the asexual stages [19], indicated contaminating merozoite cDNA in the gametocyte cDNA sample, as anticipated, at the 134 h time point. Furthermore, amplification of a chicken host-specific lysozyme gene indicated host cDNA was present in both merozoite and gametocyte preparations, also as anticipated. The E. tenella genome database (www.genedb.org/Homepage/Etenella) was searched for genes predicted to code for proteins with peptidase activity. All autoannotated peptidase genes identified were manually curated by performing BLAST analysis against apicomplexan genome sequence databases and various protein databases [32] such as the protein data bank (PDB), Swiss-Prot and non-redundant (NR). In addition, signature protein motifs for the protein sequence of each gene were identified through Pfam (http://pfam.sanger.ac.uk/search; [33]), InterproScan (http://www.ebi.ac.uk/Tools/pfa/iprscan/) and the MEROPS databases (http://merops.sanger.ac.uk/; [34]). Further gene sequence manipulations, such as translation into amino acid sequences and ClustalW alignments, were performed using the DNASTAR Lasergene™ 9 Core Suite. Genes were assigned a five-tiered level of confidence for gene function using an Evidence Rating (ER) system giving an overall score of ER1-5, where ER1 indicates extremely reliable experimental data to support function and ER5 indicates no evidence for gene function [17].
After optimisation of parasite lifecycle stage cDNA samples, primer pairs were designed to generate PCR products from exons of less than 1 kb in size, where possible. PCRs were performed at optimal annealing temperatures specific for the individual primer pairs and annealing times optimal for predicted cDNA sized products. PCRs were performed at least twice (and normally three times) for each gene product, by different researchers each time. In the case of failed PCRs, primer pairs were redesigned and retested. Results of PCR on the different lifecycle stages of E. tenella indicated that 40 of the 45 protease genes could be amplified from parasite cDNA ( Table 2). The five PCRs that failed to amplify a product from cDNA were for three of the eight ubiquitinyl hydrolases, the single OTU protease and one of the six subtilisins. However, it was possible to amplify PCR products from gDNA for all five of these proteases that, when sequenced, confirmed primer specificity (data not shown). The failure to amplify a product from cDNA for these genes may be due to genome annotation problems; possibly the sequence targeted by our primers is not transcribed or falls in unpredicted intronic regions. Alternatively, a low abundance of these transcripts may have contributed to the failure to detect cDNA amplification products. Further work will be required to characterize these genes. All other PCR products from cDNA from the four E. tenella lifecycle stages were directly sequenced to confirm the correct coding sequence. Expected and actual cDNA amplicon sizes and their corresponding sequence accession numbers are shown in Table 2.
The majority of the protease genes were expressed in more than one of the four parasite stages investigated (Table 2). However, stage-specific up-or downregulation of protease gene expression was evident. Thus, taking into account that merozoite cDNA contaminates the gametocyte samples, it is safe to conclude that there were a large number (at least 15, probably 17) of protease genes whose expression was upregulated in merozoites including eimepsin 3, cathepsin C1, calpain, several of the ubiquitinyl hydrolases, an ATP-dependent Zn protease, the CAAX prenyl protease, three of the five insulysins, the leucyl aminopeptidase, the O-sialoglycoprotease, one of the trypsins, a subtilisin, the Clp protease and rhomboid protease 1. Aminopeptidase N1 appeared to be downregulated specifically in merozoites. Gametocyte-specific or gametocyte-upregulated proteases were also common, with thirteen in all, also distributed across the four groups of proteases, including eimepsin 2, cathepsin C2, ubiquitinyl hydrolase 2 and 5, the pyroglutamyl peptidase, aminopeptidase N2, insulysin 4, the S2P-like metalloprotease, two trypsin-like proteases and three of the subtilisins. Additionally, two other proteases were upregulated or specific for the sexual phase of the lifecycle (i.e., gametocytes and unsporulated oocysts), namely, cathepsin C3 and subtilisin 4. Cathepsin L appeared to be downregulated specifically in gametocytes. Only two protease genes, a pepsin-like protease with high homology to eimepsin (eimepsin 1) and an insulysin, were switched on exclusively in oocyst lifecycle stages. In contrast, numerous protease genes appeared to be downregulated in sporulated oocysts ( Table 2).

Protease processing of GAM56
Gametocytes from E. tenella-infected caeca were lysed and immediately incubated with or without protease inhibitors for various lengths of time, and the native GAM56 protein analysed by SDS-PAGE and western blotting with anti-GAM56 antibodies, as described previously [20,21], to track the disappearance of the protein to determine whether any inhibitors could prevent the degradation observed in the presence of native gametocyte proteases. The precise epitopes recognised by the anti-GAM56 polyclonal antibodies are not known for E. tenella though there is some evidence, from work with E. maxima [21], that they are located in the conserved amino-terminus of the protein. The anti-GAM56 antibodies are, thus, very useful for sensitive and specific tracking of the degradation of GAM56. No disappearance of GAM56 was apparent after 2, 4, 6, 8, 10, 12 or 16 h (data not shown) but was obvious at 24h (Figure 2). The 24 h assay was therefore repeated three times with a comprehensive range of protease inhibitors (Table 3) targeting the four protease families identified in the genome. The aspartyl protease inhibitor, pepstatin A, had no effect on GAM56 disappearance (Figure 2). None of three cysteine protease inhibitors investigated, Z-Phe-Ala-diazomethylketone (data not shown), Nethylmalemide (data not shown) or E64 (Figure 2)   inhibited GAM56 disappearance. The serine/cysteine protease inhibitor, chymostatin (data not shown) and leupeptin (Figure 2), inhibited GAM56 disappearance but another inhibitor with the same specificity, antipain, did not (data not shown). The serine protease specific inhibitors, benzamidine HCL (data not shown), soybean trypsin inhibitor (data not shown) and aprotinin ( Figure 2) all inhibited the disappearance of GAM56 but AEBSF did not ( Figure 2). The metal chelating agent, EDTA, also inhibited the disappearance of GAM56 but more specific metalloprotease inhibitors, bestatin and phosphoramidon, did not ( Figure 2).

Discussion
Mining of the E. tenella genome database has revealed over 40 protease transcripts distributed over 13 clans and 18 families of aspartic, cysteine, metallo and serine proteases. Such diversity of proteases is not unusual, indeed it may be an underestimate of the true number of protease genes in this parasite since other apicomplexan parasites are known to possess substantially more protease genes (Table 4); thus, for example, there are at least 70 in Cryposporidium parvum, more than 80 in P. falciparum and over 90 in T. gondii, though other apicomplexan parasites possess similar numbers of protease genes as E. tenella. Eimeria tenella also has lower numbers of protease genes than protozoan parasites like Leishmania, Trypanosoma and Trichomonas (though the latter is known to have an unusually expanded genome in general [22] and, apparently, in C1 and C19 cysteine proteases and M8 metalloproteases, in particular; Table 4). But, again, E. tenella has a broadly similar total number of protease genes to Entamoeba dispar and Giardia intestinalis, which are also intestinal parasites. However, the fact that our dataset for E. tenella lacks protease genes for several families, across all four types of proteases, that are represented in all other Apicomplexa and most other protozoan parasites, including A28, A22, C12, C85, C86, C13, C14, C50, C48, M24, M18, M67, S9, S26 and S16, provides reason to believe that some E. tenella protease genes remain unannotated. The apparent stage-specific regulation of protease genes in E. tenella is striking and intriguing. Most investigations of parasitic protozoan proteases have focused on the asexual stages of the apicomplexan parasites, T. gondii and P. falciparum, establishing crucial roles for proteases in host cell invasion, remodelling and egress by the asexual stages of these parasites [1]. Our finding that expression of up to 17 of 40 protease genes  Figure 2 The effect of protease inhibitors on degradation of GAM56 in Eimeria tenella gametocytes. A sample of purified E. tenella gametocytes was lysed and equal volumes of lysate added to a range of protease inhibitors (see Table 3 for details on concentrations and specificity) or PBS. A zero time point sample was taken immediately. Other samples were incubated for 24 h at 37°C, after which the assay was halted by addition to Laemmli buffer and the samples subjected to SDS-PAGE and immunoblotting as described previously [20,21] to assess the disappearance of GAM56. examined in E. tenella is upregulated in merozoites further underscores the importance of proteases in the biology of the asexual stages of apicomplexan parasites. Not surprisingly, therefore, an eimepsin, several cathepsins, a calpain, a trypsin-like protease, subtilisins, Clp and a rhomboid protease are upregulated in the asexual stages of E. tenella (Table 2). Likewise, eimepsin1 and insulysin 3 are expressed specifically in oocysts and may play an important role in the first steps of the parasite lifecycle, such as host cell invasion; they are, therefore, worthy of further research. The downregulation of several proteases (including cathepsin B, ATP-depenedent ZN proteases 1 and 3, and a prolyl endopeptidase) in sporulated oocysts may be, in part, attributed to the dormancy of this lifecycle stage, yet still warrants further investigation. Perhaps the most significant finding of our stagespecific expression study was the relatively large number of protease genes whose expression is upregulated specifically in the gametocytes stagea total of at least 13 genes, including six that are only expressed in gametocyte (Table 2). This observation becomes even more intriguing when examined in the context of the distribution of different families of proteases across parasitic protozoa (Table 4). Four classes of proteases stand out amongst the protozoa because they are only found, or are "over-represented" in the two Coccidian parasites, E. tenella and T. gondiifamilies C15, M50, S1 and S8. Eimeria tenella contains a total of eleven protease genes distributed unevenly across these families, with only one in C15 and M50 and three and six in the serine protease families, S1 and S8, respectively. But, even more significantly, all but three of these unique protease genes are upregulated or confined in expression to the gametocyte stage of the parasite. Thus, expression of a pyroglutamyl peptidase, a trypsin-like protease and subtilisin 4 is upregulated in gametocytes whilst expression of an SP2-like protease, a trypsin 1-like protease and three subtilisins is entirely gametocyte specific.

Conclusion
Eimeria tenella possesses a large number of genes coding for proteolytic enzymes, which display a remarkable pattern of stage specific expression. As in other apicomplexan parasites such as P. falciparum and T. gondii, expression of many of these genes is upregulated in the asexual, invasive stages, possibly indicating important roles in host cell invasion, remodelling and egress. However, expression of almost one-third of the protease genes identified in the E. tenella genome is upregulated or confined to the sexual gametocyte stage of this parasite's lifecycle; some of these appear to be unique to Coccidia and may play key roles in the formation of the resilient oocyst wall, a defining feature of this group of important parasites.

Data-base mining
Eimeria tenella genome sequences and gene models were downloaded from GeneDB (http://www.genedb.org/Home page/Etenella). The genome of E. tenella (Houghton strain) was produced by the Parasite Genomics Group at the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/research/projects/parasitegenomics/) and has been provided        [34]). Further gene sequence manipulations, such as translation into amino acid sequences and ClustalW alignments, were performed using the DNASTAR Lasergene™ 9 Core Suite. After the bioinformatic information was collated, genes were assigned a five-tiered level of confidence for gene function using an Evidence Rating (ER) system giving an overall score of ER1-5, where ER1 indicates extremely reliable experimental data to support function and ER5 indicates no evidence for gene function [17].  [36,37]. Aliquots of parasites were either frozen at −80°C as pellets or were stored in TRIzol W reagent (Invitrogen) at −80°C for further use in RNA purification.

RNA purification, cDNA synthesis and cDNA standardisation
To isolate total RNA, purified merozoites (1 × 10 7 ) and gametocytes (1 × 10 6 ) were resuspended in 1 ml TRIzol W Reagent and homogenized by pipetting. Unsporulated oocysts (2 × 10 5 ) and sporulated oocysts (5 × 10 5 ) were resuspended in 1 ml TRIzol W Reagent and one volume of glass beads were added to the sample, which were then vortexed for 1 min intervals until disruption of oocyst was confirmed by bright field microscopy. All TRIzol W treated samples were left at room temperature for 10 min and total RNA isolated by chloroform extraction and isopropanol precipitation. RNA was quantified using a NanoDrop™ ND-1000 Spectrophotometer and cDNA was synthesized using SuperScript™ III Reverse Transcriptase (Invitrogen) according to manufacturer's instructions. Parasite cDNA samples were standardized by relative quantification of an E. tenella β-actin PCR product. β-actin forward primer E0043 (5 0 ggaattcgttggccgcccaa gaatcc 3 0 ) and reverse primer E0044 (5 0 gctctaga ttagctcggcccagactcatc 3 0 ) were used to generate the 1020 bp β-actin cDNA PCR product. Each PCR reaction contained 50 ng of parasite stage specific cDNA, 0.2 μM forward primer, 0.2 μM reverse primer, 1 × AccuPrime™ reaction mix, and AccuPrime™ Pfx DNA polymerase (Invitrogen). The PCR reaction was carried out as follows: initial denaturation 95°C for 3 min; 95°C for 30 s; 61°C for 1 min; 68°C for 1.5 min, for 25 cycles with a final extension at 68°C for 10 min. All products were electrophoresed on a 1% agarose gel and visualized using Gel Red™ (Biotium). The net intensity of each band was determined using the Kodak EDAS 290 Electrophoresis Documentation and Analysis System and serial dilutions performed until relative intensity of PCR products were equal.
In addition, three control genes were amplified to determine the purity of parasite lifecycle stages. The GAM56 gene was used as a gametocyte specific gene. GAM56 forward primer E0030 (5 0 catatggtggagaa cacggtgcac 3 0 ) and reverse primer E0031 (5 0 ctcgagttagt accagctggaggagta 3 0 ) were designed to amplify a 906 bp gametocyte cDNA product at an annealing temperature of 61°C. The EtTFP250 gene, a homolog of an E. maxima gene encoding a microneme protein, was used as an asexual stage control. The EtTFP250 forward primer Et250F (5 0 gcaaggacgttgacgagtgtg 3 0 ) and Et250RV1 (5 0 gttctctccgcaatcgtcagc 3 0 ) were designed to amplify an 805 bp cDNA product, at an annealing temperature of 60°C. The chicken lysozyme gene was used to determine relative quantities of contaminating host cDNA. The forward primer RW3F (5 0 acaaagggaaaacgttcacgattggc 3 0 ) and reverse primer RW4R (5 0 tgcgttgttcacaccctgcatatgcc 3 0 ) were designed to amplify a 280 bp host cDNA product at an annealing temperature of 60°C.

Semi-quantitative PCR
The predicted coding regions of each protease gene were examined for potential primer sites within 1 kb of each other where possible. Primers were designed as detailed in Table 5. PCRs were conducted on cDNA samples from E. tenella merozoites, gametocytes, unsporulated and sporulated oocysts. PCR were optimized to produce cDNA sized products. Negative controls of no DNA template and host cDNA were run alongside a positive genomic DNA control. When genomic DNA products were not amplified, a repeat PCR was performed at longer annealing times to produce the often much larger genomic DNA product. A typical PCR was as follows: 1μL of standardized cDNA sample, 0.2 μM forward primer, 0.2 μM reverse primer, 1 × Accu-Prime™ reaction mix, and AccuPrime™ Pfx DNA polymerase (Invitrogen). Cycling conditions typically involved an initial denaturation at 95°C for 3 min, followed by 25 cycles of denaturation 95°C for 30 s; annealing at Tm-5 for 1 min; extension at 68°C for 1.5 min. When products were to be sequenced, a final extension at 68°C for 10 min was performed at the end of the PCR reaction. PCRs were performed at least twice and, generally, three times for each gene product by a different researcher each time.
All amplified products were gel purified using a QIAquick W Gel Extraction Kit (QIAGEN) according to the manufacturer's instructions and sequenced (Australian Genome Research Facility, Queensland). When cDNA products were amplified from different parasite stages, these were pooled and used in sequencing reactions. When cDNA products were not obtained, additional primers were designed and used. If a cDNA product was still unable to be amplified with the second primer pair, genomic DNA products were sequenced to confirm primer specificity. Sequences were analysed using DNASTAR Lasergene™ 9 Core suite.