Comparative genome analysis reveals a conserved family of actin-like proteins in apicomplexan parasites

Background The phylum Apicomplexa is an early-branching eukaryotic lineage that contains a number of important human and animal pathogens. Their complex life cycles and unique cytoskeletal features distinguish them from other model eukaryotes. Apicomplexans rely on actin-based motility for cell invasion, yet the regulation of this system remains largely unknown. Consequently, we focused our efforts on identifying actin-related proteins in the recently completed genomes of Toxoplasma gondii, Plasmodium spp., Cryptosporidium spp., and Theileria spp. Results Comparative genomic and phylogenetic studies of apicomplexan genomes reveals that most contain only a single conventional actin and yet they each have 8–10 additional actin-related proteins. Among these are a highly conserved Arp1 protein (likely part of a conserved dynactin complex), and Arp4 and Arp6 homologues (subunits of the chromatin-remodeling machinery). In contrast, apicomplexans lack canonical Arp2 or Arp3 proteins, suggesting they lost the Arp2/3 actin polymerization complex on their evolutionary path towards intracellular parasitism. Seven of these actin-like proteins (ALPs) are novel to apicomplexans. They show no phylogenetic associations to the known Arp groups and likely serve functions specific to this important group of intracellular parasites. Conclusion The large diversity of actin-like proteins in apicomplexans suggests that the actin protein family has diverged to fulfill various roles in the unique biology of intracellular parasites. Conserved Arps likely participate in vesicular transport and gene expression, while apicomplexan-specific ALPs may control unique biological traits such as actin-based gliding motility.


Background
The phylum Apicomplexa contains several protozoan pathogens that cause severe disease in mammals, including humans. Members such as Plasmodium falciparum, and P. vivax, which cause severe human malaria, and Theileria parva and T. annulata, which are responsible for economic losses in cattle in Africa, result in profound medical, social, and economic effects [1,2]. Others such as Toxoplasma gondii, Cryptosporidium parvum and C. hominis are primarily health threats in HIV+/AIDS and immunosuppressed populations [3].
Apicomplexans are primarily obligate intracellular parasites that rely on actin-based motility for cell invasion [4]. Invasion occurs by active parasite motility that is coupled to timed secretion of proteins from specialized apical secretory organelles, which are a hallmark feature of this phylum [5,6]. The apical secretory organelles (called ALP5a  In addition to conventional actins, apicomplexans contain conserved Arp1, Arp4, and Arp6 proteins; however, they do not encode Arp2 or Arp3 orthologues. Many apicomplexan proteins do not group with any of the known Arp clades. These have been divided further into proteins that are highly conserved among all the apicomplexans (i.e. ALP1, ALP2, ALP3) and those that are organism-specific (i.e. ALP5, ALP7, ALP8 Parsimony analysis of actin and actin-related proteins from apicomplexans and other taxa Figure 2 Parsimony analysis of actin and actin-related proteins from apicomplexans and other taxa. Phylogenetic analysis performed using parsimony resulted in groupings that mirrored distance analysis. All major classes of Arp and ALP groups are maintained except for the Arp4 group, which differs slightly from the BioNJ consensus tree in other eukaryotic taxa as well as in the apicomplexans. Relationships were calculated in PAUP*4.01b using the heuristic algorithm and verified by bootstrapping (>100 replicates). Consensus trees were drawn according to the bootstrap 50% majority-rule. Conventional actins were defined as the out-group. Subgroups of Arps and ALPs have been highlighted to define the boundaries between groups. Taxa are defined in Figure 1. Bootstrap values ≥90% are represented by the black nodes •, values ≥ 75% are represented by the gray nodes , and values ≥50% are denoted by the white nodes ❍.
micronemes, rhoptries, and dense granules) release their contents in a highly regulated fashion upon host cell interactions [7]. Microneme proteins provide adhesion to the host cells and supply the traction needed for invasion. Rhoptry and dense granule proteins function in the establishment and maintenance of a protective, intracellular niche called the parasitophorous vacuole (reviewed in [8]). Understanding how motility and invasion are regulated is crucial to elucidating the pathobiology of these organisms, yet we know relatively little about how these functions are controlled at the cellular level.
Apicomplexans are characterized by a unique cytoskeleton that is distinct from that of other eukaryotes [9]. At their apical end is a specialized microtubule-organizing center called the polar ring complex, which coordinates a series of singlet microtubules called the subpellicular microtubules [10,11]. The remarkable stability of these microtubules provides a defined shape and polarity to the cells that is necessary for motility and invasion [12]. The subpellicular microtubules encompass the apical secretory organelles and may play a role in trafficking to the apical end of the cell. Apicomplexans also regulate their actin cytoskeleton differently, maintaining a large pool of soluble actin, both globular and in short, unstable filaments [13][14][15]. During motility, actin filaments must rapidly assemble to support gliding and then turnover rapidly to prevent unwanted movement. Actin regulation is thus crucial to the control of motility. In other eukaryotes, a large family of actin-related proteins helps control many cytoskeletal functions including vesicle transport and actin-based motility.
Actin-related proteins (Arps) are conserved across all eukaryotes and some prokaryotes. Although all members share a common actin-fold and an overall sequence similarity to actin [16][17][18], individual Arps carry out a variety of biochemical and structural roles in the cell [19]. These include roles in cell division [20], translocation of cargo along microtubules via dynein [21,22], actin polymerization [23], and transcriptional regulation via chromatin/ heterochromatin remodeling [24][25][26]. Currently, more than 11 classes of Arps have been reported from a broad range of eukaryotes including plants, animals, fungi, and protozoans (i.e. Dictyostelium, Acanthamoeba, and Tetrahymena). In each case, the Arp groups link the separate kingdoms both by protein similarity and common biochemical functions. Despite their apparent conservation among the majority of eukaryotes, no Arps have been previously described in the Apicomplexa.
Complete genome sequences have recently been provided for a variety of apicomplexan parasites. A cursory examination of these genomes reveals multiple actins and actin related proteins; however, these have been inconsistently identified and annotated. The complex biology of these parasites led us to examine actin-related proteins in this phylum relative to other eukaryotes using a combination of phylogenetic and reciprocal BLAST analyses. Our findings reveal a complexity of actin-related proteins not previously appreciated and define both conserved and unique members of this protein family within the Apicomplexa.

Phylogenetic comparisons of actin-like proteins in apicomplexans
We searched the recently completed genomes of Toxoplasma gondii, Plasmodium spp., Cryptosporidium spp., and Theileria spp. for actin-related proteins using conventional actins and conserved Arp proteins from organisms spanning several phyla including mammals, plants, flies, worms, yeast, and protozoa [see Additional File 1]. BLAST analysis identified over 60 candidate actin-related proteins in total among the apicomplexan genomes examined in this study (Table 1). Reciprocal BLASTP searches using each of these apicomplexan actin-like proteins against the NCBI CDD database revealed that the majority of them contain a conserved actin domain (pfam00022) ( Table 1). However, at present individual actin-related protein groups have not been defined by distinct domains or motifs common to members of only one group. Consequently, we sought to establish relationships between the apicomplexan actin-like proteins and conventional Arps using sequence alignment and phylogenetic analyses. Candidate actin-related proteins were aligned with a broader spectrum of Arps from a variety of eukaryotic taxa and bacterial actin-like proteins using CLUSTALX [27]. The relative divergence of actin-like proteins was determined by Neighbor-Joining distance analysis using the phylogenetic analysis program PAUP*4.01b [28]. The resulting bootstrapped phylogram is shown in Fig. 1. Parsimony analysis revealed a similar branching pattern for the major Arp groups, but was less able to resolve deep branching groups (i.e. Arp4 and various apicomplexan specific ALPs), likely due to the divergence of these sequences (Fig. 2). We have focused primarily on the relationships supported by distance analysis, since this methodology is more appropriate for highly divergent sequences.
Our analysis reveals that the apicomplexans all encode a single conventional actin (with the exception of Plasmodium which has two conventional actins), and the remaining proteins form a total of 10 distinct actin-related protein groups (Fig. 1). Three of these groups were shown to belong to well-characterized Arps including Arp1, Arp4, and Arp6 ( Fig. 1). In contrast, we discovered that several other apicomplexan actin-like proteins (ALPs) were unique to this phylum, as they did not group with any of the conventional Arps (i.e. ALP1 and ALP2) (Fig.1); therefore, we have used the designation actin-like protein (ALP) to differentiate the apicomplexan-specific protein groups. A comparison of actin-like proteins within the Apicomplexa is summarized in Table 1. The remaining ALPs were specific to a subset of apicomplexans such as Toxoplasma (i.e. ALPs 8, and 9), Cryptosporidium (i.e. ALP7) and Plasmodium (i.e. ALP5). Several of these groups also contain paralogues, for example ALP5a and ALP5b in Plasmodium (Fig.1). While some ALPs appear as deep branches (i.e. TpALP4b, PfALP2b, CpALP7a and CpALP7b, TgALP2a, TgALP3) they were grouped and hence named in part based on BLAST results (Table 1) and phylogenetic analysis of apicomplexan ALPs compared in the absence of other organisms (data not shown). Our findings suggest that some actin-like proteins play roles that are conserved across all eukaryotes while other members of this group have diverged to fulfill specific roles within the Apicomplexa.
The two key features most prominent about the apicomplexan actin-like proteins are their strong conservation of the Arp1 protein (a major component of the dynactin complex) and their remarkable lack of both Arp2 and Arp3 homologues (subunits of the Arp2/3 actin polymerization complex) (Fig. 1). The presence of a highly conserved Arp1 and the absence of Arp2 and Arp3 orthologues have important biochemical implications for these parasites regarding vesicular trafficking and actin regulation, respectively. Arp1 is an essential component of the dynactin complex involved in vesicular trafficking [29,30] while Arp2/3 forms a multi-subunit complex that is the primary means of regulating actin polymerization in eukaryotic cells [31,32]. For these reasons, we conducted a more comprehensive study of the protein components that constitute these complexes.

The dynactin complex
Dynactin is a microtubule-associated complex that is critical for tethering cellular cargo to the cytoplasmic motor protein dynein [30]. Cytoplasmic dynein consists of heavy, light, intermediate, and intermediate light chains in addition to several regulatory subunits [33]. We searched the P. falciparum and C. parvum genomes to identify components of this complex using text word searches.
Convincing orthologues for all of the subunits were found in both parasites as shown by reciprocal BLASTP and the presence of conserved pfam domains (Table 2). These hits were then used to identify orthologues in other apicomplexan genomes by BLASTP as verified by both significant BLAST E-values and the presence of conserved pfam domains (Table 2). Somewhat surprisingly, a complete complex was not readily identified in Theileria, with the exception of subunits for heavy and light chains ( Table 2).
Based on the presence of a conserved dynein complex in a majority of apicomplexans, we thought it reasonable to search for evidence of a dynactin complex. The dynactin complex consists of several protein subunits that are grouped into two domains: the Arp1 rod and a flexible arm region. The protein subunits of the Arp1 rod are more highly conserved between eukaryotes than the remaining dynactin subunits [30]; therefore, we focused our efforts on defining homologues to these proteins in the apicomplexans. The subunits comprising the Arp1 rod include Arp1, Arp11, capping protein (CapZ), p62, p25, p27, and actin (see [30] for a complete review of the dynactin complex).
We used database searches to identify the dynactin subunits within apicomplexans. Sequences from mammals, flies, worms, and protozoa were compared against the NBCI nr database and the respective genomic databases of Toxoplasma, Plasmodium, and Cryptosporidium (Table 3) [see Additional file 1]. Arp1 was readily identified in Plasmodium, Cryptosporidium, and Toxoplasma, although it is apparently absent in Theileria (Table 1). Highly conserved orthologues of the p25, p27, and p62 subunits were found in Toxoplasma, Plasmodium, and Cryptosporidium as shown by both significant BLASTP E-values and the presence of conserved pfam domains ( Table 3).
The Arp1 rod contains a short filament of Arp1 subunits [34] that is capped at both ends. The (+) or barbed end is terminated by capping protein [35] and the (-) or pointed end by the actin-related protein Arp11 [36]. Toxoplasma and Plasmodium both contain β subunits of capping protein, and the α subunit in Plasmodium showed a significant BLASTP E-value and conserved pfam motif ( Table 3). The α subunit reported for Toxoplasma is highly divergent (NCBI AAU93918) and does not have significant matches, although BLASTP searches turn up a number of α subunit orthologues (Table 3). Additionally only the β subunit was identified in Cryptosporidium (Table 3). Capping protein always exists as an α/β dimer [37] and it is possible the α subunit is divergent in Toxoplasma and Cryptosporidium and hence difficult to recognize at present. Our phylogenetic analysis of the Arps did not show strong affinities between any of the ALP proteins and the Arp11 group (Fig. 1). However, we have included TgALP3, PfALP3, and CpALP3 as possible Arp11 orthologues based on their sequence similarity to the Arp11 proteins in BLASTP searches (Table 3).
Dynamitin is a component of the flexible arm region of the dynactin complex [30]. We identified proteins with recognizable dynamitin domains in Toxoplasma and Plasmodium, but not Cryptosporidium (Table 3). The remaining subunits of the dynactin complex were not detected by BLAST or protein domain searches in these organisms. However since these other subunits are less well conserved, failure to detect them by BLAST is not surprising.
The identification of apicomplexan orthologues to all the subunits of the Arp1 rod, and the presence of dynamitin in Toxoplasma and Plasmodium, provides strongly supportive evidence that a functional complex exists in these parasites. Theileria appears to be an exception to this pattern as neither Arp1 or the other subunits were recognized. The conserved complex in parasites likely carries out duties analogous to the dynactin in other eukaryotes. One possible role for this complex would be the directed delivery of secretory protein vesicles as has been described in other systems [29]. Secretory protein trafficking occurs via an ER-Golgi mediated pathway [38] and dynactin could provide the transportation by which cargo vesicles reach their specialized secretory organelles at the apical pole. Apical secretion is an important component of cellular invasion and maintenance of this polarization is thus vital to the survival of the parasite.

The Arp2/3 actin polymerization complex
The Arp2/3 complex consists of 7 subunits that regulate actin polymerization at the leading edge in motile cells [23], as well as providing a propulsive force to move endosomes throughout the cytoplasm [39,40]. Arp2/3 is a major nucleator of actin polymerization in most eukaryotic cells; however, our phylogenetic analyses of the apicomplexan actin-related proteins did not show homologues to either Arp2 or Arp3 (Fig. 1). Notably, Arp2 and Arp3 homologues have been previously annotated in both the Plasmodium and Cryptosporidium genome databases (PfArp3: CAD51790, PfArp2: CAD49164, CpArp3: EAK88375, and CpArp2, EAK88162). These proteins correspond to our annotations PfALP5a, PfALP5b, CpALP7a, and CpALP7b, respectively. Phylogenetic comparisons do not support these previously proposed annotations, but rather indicate that these actin-like proteins are part of other ALP groups (Fig. 1).
A recent analysis of the actin family from model organisms was utilized to derive predictive models for grouping Arp groups in a variety of taxa [41]. Importantly, this analysis also found Arp1, Arp4, and Arp6 homologues among the Apicomplexa (Plasmodium and Cryptosporidium genomes) but failed to identify orthologues of Arp2, Arp3 or other Arp groups [41]. Collectively, these findings indicate apicomplexans do not encode a conserved Arp2/3 complex.
We also searched for the other 5 subunits of the Arp2/3 complex that are known as actin-related protein complex 1 (ARPC1)/p41, ARPC2/p34, ARPC3/p21, ARPC4/p20, and ARPC5/p16. A separately recognized domain is only described for ARPC4/p20 (pfam05856), perhaps reflecting the divergence of the remaining subunits across the many taxa where they are readily identified by BLAST. We conducted genome-wide BLAST searches of apicomplexans as described above using ARPC proteins from mammals, flies, yeast, plants, and protozoa (see Table 4) [see Additional file 1]. No proteins with similarity to subunits ARPC2, 3, and 5 were found in any of the four apicomplexan genomes.
Potential orthologues to the ARPC1/p41 were found in Plasmodium and Cryptosporidium ( Table 3): both of these proteins contain WD40 repeats, which are a distinguishing feature of the ARPC1/p41 proteins in other eukaryotes [42]. This analysis was supported by BLAST and also by protein domain searches using Prosite, which identified WD40 repeat domains in both proteins (Pfam PF00400, SMART domain SM00320). WD40 repeats mediate protein-protein interactions and are involved in regulating numerous biological functions in addition to their role in actin nucleation [43,44]. Since ARPC1/p41 is not necessary for the overall cohesiveness of the Arp2/3 subunits [45], we can hypothesize this protein may serve an alternative function outside of the Arp2/3 complex in Plasmodium and Cryptosporidium.
Surprisingly, Cryptosporidium encodes a conserved ARPC4/ p20 subunit as shown by BLAST analysis and by Prosite domain similarity (pfam PF05856) ( Table 4). In other eukaryotes, ARPC4 forms a stable heterodimeric complex with ARPC2/p34 that comprises the structural core of the Arp2/3 complex [45]. In the absence of ARPC4/p20, Arp2/3 complexes are not formed [45], underscoring its importance to the protein scaffold. It is therefore unusual that Cryptosporidium would retain a close orthologue to one subunit and completely lack the other (Table 4). Additionally, the ARPC2/ARPC4 heterodimer binds actin filaments and is thought to be necessary for branching of daughter filaments from existing mother filaments [45]. Actin in Toxoplasma does not appear to be branched [46], thus it is unclear why Cryptosporidium maintains an ARPC4 homologue ( Table 4).
The presence of remnant ARPC1/p41 homologues in Plasmodium and Cryptosporidium, and ARPC4/p20 in Cryptosporidium indicates that the complex may have been functional at one time in these parasites; however, they either have since lost the complex completely or the subunits have diverged to the extent that they are no longer recognizable. Support for this hypothesis comes from other alveolates, such as the closely-related but deeper branching ciliate lineages [47]. The ciliate Tetrahymena thermophilia encodes a canonical Arp2/3 complex with easily recognizable Arp2 (AAN73249), Arp3

Arp2
- Loss of a functional complex in the apicomplexans may have resulted from their highly specialized, intracellular lifestyles. Deciphering how apicomplexans control actinfilament turnover is thus an intriguing and unanswered question. We postulate that evolution of alternative proteins, such as the ALP1 proteins (Fig. 1), could enable parasites to regulate actin polymerization in a more streamlined mechanism, yet maintain the overall function of the complex.
The ALP1 group of apicomplexan-specific proteins is phylogenetically similar to both the Arp2 and Arp3. ALP1 in Toxoplasma is the second closest paralogue to conventional actin with which it shares 37% identity and 57% similarity (Table 1). Moreover, TgALP1 is 49% identical to PfALP1 and 45% identical to CpALP1, indicating the ALP1 proteins are highly conserved within this phylum (Table 1). These phylogenetic properties, in conjunction with the lack of any obvious Arp2 or Arp3 homologues, lead us to hypothesize that the ALP1 proteins may play a corresponding or complementary role to these two proteins in the apicomplexans.
Chromatin-modifying and -remodeling machinery are involved in DNA replication, DNA repair mechanisms and transcriptional regulation [52]. Arp4 is present in several complexes including the NuA4 histone acetyltransferase and several members of the ATP-dependent SWI2-SNF2 family of chromatin-remodeling complexes [51,53]. In yeast, Arp6 is also a member of SWR1, a subgroup of the SWI2/SNF2 chromatin-remodeling complexes [50]. Other roles for Arp6 include transcriptional deactivation via heterochromatin-remodeling in Drosophila and vertebrates [24].
Changes in gene expression are important means of regulating function and such changes have been shown to play a role in parasite stage-differentiation [54][55][56]. The Plasmodium and Cryptosporidium genomes appear to lack many common transcription factors leading to the hypothesis that these parasites rely heavily on chromatin-remodeling for transcriptional control [57,58]. This is supported by the fact that apicomplexans appear to contain several components of the SWI2/SNF2 ATPase chromatin-remod-eling machinery [57,58]. Recently, Saksouk et. al. showed the first direct correlation of histone acetylation and methylation to stage-specific gene expression in Toxoplasma [59], supporting the importance of chromatin modification and remodeling in these parasites. The presence of conserved Arp4 and Arp6 orthologues suggests that actin-related proteins participate in chromatin remodeling in apicomplexans similar to other eukaryotes.

Conclusion
Comprehensive analysis of the genome content of these parasites combined with phylogenetic groupings has allowed us to propose potential functions for many of these Arp/ALP groups. Our findings indicate that apicomplexans encode a variety of actin-like proteins (ALPs) that likely participate in actin-based motility, vesicle transport, and transcriptional regulation through chromatin remodeling. Delineating their respective functions will ultimately enrich our understanding of these parasites, and also the evolution of the actin family as a whole.

Assembly of actin-like proteins from apicomplexans and other alveolates
Comprehensive BLAST searches were performed against the T. gondii genome database (ToxoDB Release v3.0) using 27 protein sequences from Arp1 through Arp4 that represented major taxa including mammals, plants, flies, worms, yeast, and protozoa. Actin-like proteins were identified in Plasmodium spp. (PlasmoDB Release v4.3) and C. parvum (NCBI nr database and CryptoDB Release v3.0) by combination of tBLASTn and BLASTP searches using the above conserved Arps or Toxoplasma candidate actin-like sequences. BLAST searches were restricted to only high quality "hits" (e-value of ≤ .0001). In the case where only nucleotide data was available, the matching nucleotide sequence was translated using the GENESCAN webserver [60] using Arabidopsis thaliana to predict exon-intron structures. In these cases, the resulting amino acid sequence predictions were used in all subsequent analyses. Once identified, candidate sequences were entered into a "reverse" BLAST search of the NCBI database [61] to determine if there was a reciprocal best match to the protein used to identify it.
Protein candidates from Tetrahymena thermophila were obtained via BLASTP searches of the NCBI nr database comparing Arps from model organisms and by searching the Tetrahymena genome database [62]. Searches of the Tetrahymena genome database were done using tBLASTn and restricted to TIGR predicted proteins.