Genome-wide in silico screen for CCCH-type zinc finger proteins of Trypanosoma brucei, Trypanosoma cruzi and Leishmania major
© Kramer et al. 2010
Received: 4 February 2010
Accepted: 5 May 2010
Published: 5 May 2010
Skip to main content
© Kramer et al. 2010
Received: 4 February 2010
Accepted: 5 May 2010
Published: 5 May 2010
CCCH type zinc finger proteins are RNA binding proteins with regulatory functions at all stages of mRNA metabolism. The best-characterized member, tritetraproline (TTP), binds to AU rich elements in 3' UTRs of unstable mRNAs, mediating their degradation. In kinetoplastids, CCCH type zinc finger proteins have been identified as being involved in the regulation of the life cycle and possibly the cell cycle. To date, no systematic listing of CCCH proteins in kinetoplastids is available.
We have identified the complete set of CCCH type zinc finger proteins in the available genomes of the kinetoplastid protozoa Trypanosoma brucei, Trypanosoma cruzi and Leishmania major. One fifths (20%) of all CCCH motifs fall into non-conventional classes and many had not been previously identified. One third of all CCCH proteins have more than one CCCH motif, suggesting multivalent RNA binding. One third have additional recognizable domains. The vast majority are unique to Kinetoplastida or to a subgroup within. Two exceptions are of interest: the putative orthologue of the mRNA nuclear export factor Mex67 and a 3'-5' exoribonuclease restricted to Leishmania species. CCCH motifs are absent from these proteins in other organisms and might be unique, novel features of the Kinetoplastida homologues. Of the others, several have a predicted, and in one case experimentally confirmed, connection to the ubiquitination pathways, for instance a HECT-type E3 ubiquitin ligase. The total number of kinetoplastid CCCH proteins is similar to the number in higher eukaryotes but lower than in yeast. A comparison of the genomic loci between the Trypanosomatidae homologues provides insight into both the evolution of the CCCH proteins as well as the CCCH motifs.
This study provides the first systematic listing of the Kinetoplastida CCCH proteins. The number of CCCH proteins with more then one CCCH motif is larger than previously estimated, due to the identification of non-conventional CCCH motifs. Experimental approaches are now necessary to examine the functions of the many unique CCCH proteins as well as the function of the putative Mex67 and the Leishmania 3'-5' exoribonuclease.
Pathogenic kinetoplastid protozoa, such as the widely studied 'Tritryps' Trypanosoma cruzi (Tc), Leishmania major (Lm) and Trypanosoma brucei (Tb), have complex biphasic life cycles and consequently require changes in gene expression in response to extrinsic and intrinsic signals. For instance, at least 5% of all Tb genes are developmentally regulated at the mRNA level between any two of the experimentally tractable life cycle stages [1–4]. Kinetoplastids regulate protein coding gene expression almost exclusively at the post-transcriptional level with the aid of RNA binding proteins (reviewed in ). One group of RNA binding proteins is defined by the presence of a CCCH type zinc finger motif that directly binds to RNA. Different CCCH proteins regulate all stages of mRNA life, amongst the best-studied are the proteins of the TIS11 family, with the best characterized being the mammalian protein tritetraproline (TTP). TIS11 proteins bind to AU-rich elements in the 3' UTRs of their target mRNAs, in most cases mediating their degradation (reviewed in ). The likely mechanism is the recruitment of mRNA degradation factors to the target mRNAs, many of which have been found to interact with TIS11 proteins [7–9]. Other CCCH proteins control the translation of their target mRNAs, for instance the C. elegans protein POS-1 [10, 11]. The Drosophila CCCH protein ZC3H3 regulates mRNA adenylation and nuclear export and also binds to known nuclear export factors . Five Arabidopsis CCCH proteins have been shown to possess intrinsic endonuclease activity, including the orthologue to the polyadenylation specificity factor CPSF30 [13, 14]. CCCH proteins have between 1 and 6 CCCH motifs. These were originally defined as C-X6-14-C-X4-5-C-X3-H  but recently redefined as C-X4-15-C-X4-6-C-X3-H, following the genome wide analysis of the rice and Arabidopsis CCCH proteins .
As part of a project that aimed to understand the regulation of nuclear export in trypanosomes, a putative orthologue to the yeast nuclear export factor Mex67 was identified in T. brucei. The finding of a CCCH motif in the putative Mex67 prompted us to set out to identify and compare the entire set of CCCH proteins in the genomes of the Kinetoplastida. A previous screen for the two most common CCCH motifs (C-X7-C-X5-C-X3-H and C-X8-C-X5-C-X3-H) in the Tritryp genomes identified of 50, 68, 41 CCCH proteins in Tb, Tc and Lm, respectively [17, 18]. In addition, some proteins containing one of the common CCCH motifs also contained a C-X10-C-X5-C-X3-H motif [17, 18]. For ease of reading here, CCCH motifs in the two most common classes C-X7-C-X5-C-X3-H and C-X8-C-X5-C-X3-H will be called 'conventional'; others, such as C-X10-C-X5-C-X3-H, 'non-conventional'. This term is used to highlight a difference and does not mean that they are less likely to be CCCH motifs .
Only three of the CCCH zinc finger proteins were identified as having readily apparent orthologues in other organisms: the splicing factor U2AF35  and two components of the mRNA cleavage and polyadenylation apparatus, CPSF30 and FIP1 [20, 21]. Of the previously identified CCCH proteins unique to kinetoplastids, two families have been experimentally characterized: (i) the ZFP CCCH proteins involved in the regulation of differentiation [22–25] and (ii) the cycle sequence binding proteins (CSBPs) that bind a conserved sequence in S-phase regulated mRNAs [26–28]. The vast majority of the trypanosome CCCH proteins defined by the conventional CCCH motifs appeared to have only one CCCH finger; while nearly two thirds of the Arabidopsis and rice proteins have at least two . The binding of the CCCH protein TTP to AU rich elements is dependent on two intact CCCH motifs, one is not sufficient , and it has been speculated that in trypanosomes such multivalent RNA binding may be achieved by oligomerization, such as occurs between the CCCH proteins of the ZFP family [24, 30].
Here, an extended analysis of the CCCH type zinc finger proteins in the genomes of the Tritryps is presented. The inclusion of non-conventional CCCH motifs into the search increased the fraction of CCCH proteins with more than one CCCH motif to one third and resulted in the identification of many novel CCCH proteins. One example is the putative orthologue to the nuclear export factor Mex67 that has no CCCH motifs in mammals or fungi.
This investigation of CCCH proteins was initiated by an in silico search for a trypanosome homologue of the budding yeast mRNA export factor Mex67 (NXF1 and TAP in mammalian cells, reviewed in ). Using standard BLAST parameters, the protein encoded by Tb11.22.0004 gave the highest p-value (2.3e-07) and screening the S. cerevisiae proteome with Tb11.22.0004 gave a single hit, Mex67, p-value (1.5e-07) [Additional file 3 :Supplemental Figure S2A]. An InterPro search for domains and motifs in Tb11.22.0004 detected the presence of PTHR10662 , characteristic of NXF1-related proteins, and a CCCH zinc finger near the N-terminus. S. cerevisiae Mex67 and mammalian NXF1 do not contain a zinc finger [Additional file 3 :Supplemental Figure S2A].
Members of the NXF1 family are generally not very similar to each other, for example, the region of highest identity between the Drosophila melanogaster NXF1 and S. cerevisiae Mex67 (residues 107-598, determined by NCBI blast2seq) is 23% identical. Using the same programme, the region of highest identity between S. cerevisiae Mex67 and Tb11.22.0004 was identified as ~160 amino acids at the N-termini (Mex67 residues 96-248 and Tb11.22.0004 residues 49 to 206) and had 31% identity. The same identity was found between Drosophila NXF1 and S. cerevisiae Mex67 for the same region [Additional file 3 :Supplemental Figure S2B]. The closest homologues of Tb11.22.004 in the other Tritryps, Tc00.1047053506127.20/Tc00.1047053508271.4 and LmjF27.1690 also contain a zinc finger near the N-terminus and have closest homology to the yeast mRNA nuclear export factor Mex67 and its mammalian orthologue TAP/NXF1.
The finding that the putative kinetoplastid MEX67 contains an RNA-binding zinc finger is novel and Mex67 homologues from other non-Opisthokonta species were investigated. Neither the Mex67 homologues from Dictyostelium discoideum nor from Entamoeba histolytica (both Amoebozoa) contained a CCCH domain. No close Mex67 homologue was readily recognisable in the available genome sequences of organisms in either the Archaeplastida (plants) or the Chromalveolata (Figure 4B). Furthermore, other than in kinetoplastids, Mex67 homologues were not readily recognisable in other Excavata species: Naegleria gruberi, Giardia lamblia and Trichomonas vaginalis. The role of the CCCH motif in the putative trypanosome MEX67 remains unknown, but it might indicate differences in the regulation of mRNA nuclear export between trypanosomes and other organisms.
The Leishmania-specific 3' exoribonuclease, LmjF34.1240, is similar to 3' exoribonucleases of various eukaryotes and the region of homology is not restricted to the exoribonuclease domain but extends over the entire protein [Additional file 4]. The CCCH motif, however, is unique to the Leishmania protein. Several Arabidopsis CCCH proteins possess intrinsic nuclease activity, including CPSF30 and Smic1 [13, 14]. However, neither of these proteins has a clearly defined nuclease domain and the nuclease activity was dependent on one (CPSF30) or two (Smic1) of the CCCH motifs. To our knowledge, the Leishmania protein is the first protein that has both a 3' exoribonuclease domain and a CCCH motif and it would be very interesting to examine whether the CCCH motif is involved in regulating exoribonuclease activity and/or substrate specificity. The exoribonuclease is present in all Leishmania species, but absent from all trypanosome species. It is tempting so speculate that the enzyme might be involved in mRNA regulation via the cis -acting element SIDER (Short Interspersed DEgenerated Retrotransposon). SIDER elements are mainly found in 3' UTRs of Leishmania genes, where they promote mRNA degradation (SIDER2, ) or regulate translation [38, 39]. In contrast, SIDER elements are 70 times less abundant in Tb and usually found in the subtelomeric regions  and do not appear to function as regulatory cis -acting elements of mRNAs.
Ubiquitination requires the ubiquitin-activating enzyme (E1), the ubiquitin conjugating enzyme (E2) and the ubiquitin protein ligase (E3). Substrate specificity is usually determined by the E3 ligase. Three different types of E3 ligases can be distinguished, named after their catalytic domains: the RING type, the (RING-related) U-box type and the HECT type. Substrates can either be monoubiquitinated, multi-ubiquitinated (monoubiquitinated on multiple lysine residues) or polyubiquitinated (carrying a chain of ubiquitins). Depending on the type and place of ubiquitination, ubiquitinated proteins can be either targeted for degradation by the 26S proteosome or the ubiquitination acts as a signal, for example to change the intracellular localization of the protein.
Among the kinetoplastid CCCH proteins are several with a predicted, and in one case experimentally confirmed, connection to ubiquitination. Both ZFP2 and ZFP3 have a motif upstream of the WW domain that has closest homology to a motif upstream of the WW domain of HECT type E3 ligases of the Nedd4 family [22, 24] (Figure 2). CSBP (ZC3H27) has two different types of ubiquitin interacting domains: UBA (ubiquitin associated domain) and CUE  (Figure 2). UBA domains have highest affinity for polyubiquitin; CUE domains, in contrast, have been shown to bind to monoubiquitin and promote autoubiquitination (reviewed in ). In fact, a fraction of the Leishmania CSBP protein has been shown to be monoubiquitinated and a small fraction is either polyubiquitinated or multi-monoubiquitinated , consistent with the CUE domain mediating autoubiquitination. Two putative E3 ubiquitin ligases are among the Kinetoplastida CCCH finger proteins: one U-box type and one HECT-type E3 ubiquitin ligase (Figure 2).
Proteins with a combination of domains associated with ubiquitination and RNA binding are common in many species ; trypanosomes are no exception. Ubiquitination may play important roles in the regulation of stability or localization of RNA binding proteins. C. elegans oogenesis and embryogenesis provide two good examples: five germline specific CCCH proteins (PIE-1, POS-1, MEX-1, MEX-5, MEX06) are degraded in somatic cells via binding of the CCCH-finger binding protein ZIF-1 to the CCCH motif and recruitment of an E3 ubiquitin ligase . Oma-1, a CCCH protein involved in asymmetric distribution of determinants in the egg is degraded during the first zygotic cell cycle via a ZIF-1-independent E3 ubiquitin ligase complex .
Four of the trypanosome CCCH proteins that have a predicted or known connection to ubiquitination are involved in the regulation of the life cycle or cell cycle. In one case, CSBP, ubiquitination of the CCCH protein has been shown. It is possible that ubiquitination is a mechanism to quickly change either stability or intracellular localization of CCCH proteins in response to life and/or cell cycle triggers.
Most CCCH proteins are present in all three Tritryps (Figure 3A); the average amino acid sequence identity between the closest homologues in Tb and Lm is 32% (Figure 2). Nevertheless, there are variations in the CCCH protein content between the Tritryps (Figure 2) and to examine the origin of these differences, the synteny between the genomic loci coding for the Tb and the Lm CCCH proteins was examined.
Nine loci have a CCCH protein gene in only one of the two species (Figure 5F and 5G). For eight loci, regional synteny is still intact, usually with one or two other genes out of synteny in addition to the gene coding for the CCCH proteins (Figure 5F). In one case, synteny was lost (Figure 5G). Are the differences between these loci due to the loss or the gain of the gene in one of the species? Four of the Lm genes are also present in Tc, indicating that the loss of the Tb gene is the more likely scenario, since Leishmanias separated from Trypanosomes before the separation of Tb and Tc. Four of the Lm genes are absent from both Tc and Tb, suggesting that they either arose after the separation of Leishmania from the trypanosomes or were lost after the separation of Leishmania in the common ancestor of Tc and Tb. One gene is present in Tb and Tc, but not in Lm, indicating that it was either lost in Leishmania, or gained in the common ancestor of Tc and Tb.
Taken together, the analysis confirms that the majority of the kinetoplastid CCCH proteins genes evolved prior to the separation of the species and the presence of many loci with more than one CCCH protein gene suggests that gene duplication was important in the evolution of CCCH proteins. Differences in the content of the genes coding for CCCH proteins between Tb and Lm can be accounted for by (i) the loss or gain of a single gene or (ii) gene duplication or loss of a previously duplicated gene; in both cases this occurred with only small changes in synteny.
The number of non-redundant CCCH proteins in the Tritryps (48 in Tb, 51 in Tc, 54 in Lm) is similar to the number in higher eukaryotes: Arabidopsis, rice, mouse and human have 68, 67, 58 and 55 predicted CCCH proteins, respectively [16, 44]. The fraction of CCCH proteins with more than one CCCH motif (34%) is about half of the fraction found among the rice (64.2%) or Arabidopsis (63.2%) CCCH proteins.
Estimation of the numbers of conventional CCCH motifs in protozoa and yeast.
No of CCCH motifs (unfiltered) Y/Z (C-XY-C-XZ-C-X3-H)
Total number of CCCH motifs (background corrected)
No of genes
T. gondii GT1
This study has identified the entire set of CCCH proteins in the available genomes of the Tritryps; there are 48 in Tb, 51 in Tc and 54 in Lm, excluding redundancy. The main findings are: (i) The fraction of CCCH proteins with more than one CCCH motif is larger than previously assumed; many of these proteins have one non-conventional CCCH motif. (ii) The putative Mex67 orthologue as well as a Leishmania-specific 3'exoribonuclease both have a CCCH motif that is not found in their counterparts in other eukaryotes. Many of the CCCH proteins have a predicted, or in one case experimentally confirmed, connection to ubiquitination pathways. (iii) Kinetoplastids do have only slightly more CCCH proteins than some other protozoa, although the number of CCCH proteins is higher than in yeast. (iv) The vast majority of the CCCH proteins are unique to kinetoplastids or to a subgroup within. The majority evolved before the separation of the Tritryps; gene duplication played an important role. Differences in the CCCH protein content between the Tritryps is mainly due to either the loss or gain of a single gene or gene duplication or loss of a previously duplicated gene; in all cases with little disruption of synteny.
The identification of CCCH proteins in this study relies entirely on in silico data. Some of the identified proteins might not be true CCCH zinc finger proteins, whilst others might have been missed. For instance, some putative CCCH proteins were excluded from the final list because of the absence of the CCCH motif in one or both of their closest homologue in one or both of the other Tritryps; they might, however, be true CCCH proteins. Although the majority of CCCH motifs bind RNA, a few examples of DNA binding CCCH motifs have been reported (for example [45–47]); thus, it is possible that some of the identified trypanosome CCCH proteins are not RNA-binding proteins.
Experimental approaches are now needed to verify the in silico data and to examine the function of the many uncharacterized proteins. Of particular interest is the function of a CCCH motif in the putative nuclear export factor Mex67 and in the Leishmania-specific 3'-5' exoribonuclease. Both CCCH motifs are unique features of the Kinetoplastida proteins and might reveal differences to other eukaryotes in mRNA metabolism.
Sequence logos were produced using the software of . All analyses of the Tritryp genomes were performed using the tools at either the Tritryps genome database , GeneDB http://www.genedb.org/Homepage or EBI. Sequence alignments of multiple sequences were done using ClustalW2 with default settings as provided by the server (; http://www.ebi.ac.uk/Tools/clustalw2/index.html). Pairwise alignments were performed using the EMBOSS Needle programme http://www.ebi.ac.uk/Tools/emboss/align/index.html for global alignments and the Water programme  or BLAST2seq for local alignments. Identification of protein domains was by Pfam , SMART [53, 54], InterPro  or Prosite .
The Tritryp genome databases  were searched for CCCH motif containing proteins using motif search for C-X4-15-C-X4-6-C-X3-H. A sequence logo was created including only C-X7/8-C-X5-C-X3-H motifs that were recognized by at least either SMART (Sm00356), Pfam (PF00642) or InterPro (IPR000571) (the training set) and are thus very likely to be real CCCH motifs. This consensus motif was then used to arbitrarily define conditions to further filter all CCCH motifs that did not fall into the group of the training set. The stringency of the chosen conditions was tested on the training set and gradually decreased until it included all proteins in the training set. The dataset was further filtered manually to exclude proteins unlikely to contain CCCH motifs using the criteria described in the results section.
The dataset contained four genes that were annoted as pseudogenes in the genome databases, two in L. major (LmjF02.0100 and LmjF22.0130) and two in T. cruzi (Tc00.1047053506977.110 and Tc00.1047053511715.50). Both Leishmania pseudogenes have premature stop codons and longer counterparts in other Leishmania strains (see Figure 2). Confirmation of the sequence for the Leishmania major isoforms was obtained from Matt Rogers (Sanger institute). Tc00.1047053506977.110 also has a premature stop codon and a longer counterpart in T. congolense. Tc00.1047053511715.50 has an internal shift out of frame and then back in again, the CCCH motif is in the out of frame region and therefore an artefact.
The putative T. brucei Mex67 orthologue (Tb11.22.0004) and NUP54/57 (Tb927.4.5200) were expressed in Trypanosoma brucei Lister 427 procyclic cells as C-terminally tagged eYFP fusion proteins from their endogenous loci as described in . For microscopic imaging, cells were washed once in SDM79 without serum or haem, fixed at a density of 1*107 cells/ml with 2.4% paraformaldehyde overnight, washed once in PBS and stained with Hoechst H33258. Confocal images were prepared using a BioRad Radiance 2100 on a Nikon Eclipse E800 upright microscope using a 100/1.4 Oil DIC objective. Transgenic trypanosomes were generated using standard procedures .
cycling sequence binding protein
This work was funded by the Wellcome Trust. NCK held a Medical Research Council PhD studentship. We would like to thank Matt Rogers (Sanger institute) for help with the identification of Leishmania pseudogenes.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.