Skip to main content

Advertisement

A bioinformatic survey of RNA-binding proteins in Plasmodium

Abstract

Background

The malaria parasites in the genus Plasmodium have a very complicated life cycle involving an invertebrate vector and a vertebrate host. RNA-binding proteins (RBPs) are critical factors involved in every aspect of the development of these parasites. However, very few RBPs have been functionally characterized to date in the human parasite Plasmodium falciparum.

Methods

Using different bioinformatic methods and tools we searched P. falciparum genome to list and annotate RBPs. A representative 3D models for each of the RBD domain identified in P. falciparum was created using I-TESSAR and SWISS-MODEL. Microarray and RNAseq data analysis pertaining PfRBPs was performed using MeV software. Finally, Cytoscape was used to create protein-protein interaction network for CITH-Dozi and Caf1-CCR4-Not complexes.

Results

We report the identification of 189 putative RBP genes belonging to 13 different families in Plasmodium, which comprise 3.5 % of all annotated genes. Almost 90 % (169/189) of these genes belong to six prominent RBP classes, namely RNA recognition motifs, DEAD/H-box RNA helicases, K homology, Zinc finger, Puf and Alba gene families. Interestingly, almost all of the identified RNA-binding helicases and KH genes have cognate homologs in model species, suggesting their evolutionary conservation. Exploration of the existing P. falciparum blood-stage transcriptomes revealed that most RBPs have peak mRNA expression levels early during the intraerythrocytic development cycle, which taper off in later stages. Nearly 27 % of RBPs have elevated expression in gametocytes, while 47 and 24 % have elevated mRNA expression in ookinete and asexual stages. Comparative interactome analyses using human and Plasmodium protein-protein interaction datasets suggest extensive conservation of the PfCITH/PfDOZI and PfCaf1-CCR4-NOT complexes.

Conclusions

The Plasmodium parasites possess a large number of putative RBPs belonging to most of RBP families identified so far, suggesting the presence of extensive post-transcriptional regulation in these parasites. Taken together, in silico identification of these putative RBPs provides a foundation for future functional studies aimed at defining a unique network of post-transcriptional regulation in P. falciparum.

Background

Malaria continues to be a major public health and socio-economic problem in developing countries, and in 2013, it still caused 584,000 deaths (http://www.who.int/malaria/publications/world_malaria_report_2014/en/). Multifaceted control efforts are directed towards reducing malaria transmission, including vector control, early diagnosis, and effective treatment. Recently, the introduction of artemisinin combination therapies (ACTs) to deal with continually evolving multidrug resistance is a cornerstone of malaria chemotherapy, but this too is faltering and is spreading at a faster pace than anticipated [1]. As parasites continue to develop resistance to existing antimalarial drugs, continued research on developing new antimalarials remains a high priority [2]. One such approach has used systems biology methods in this postgenomic era of Plasmodium to identify multiple novel pathways in the parasite as potential drug targets [35]. Information gleaned from comparative genomic analysis and functional studies has contributed to improving our understanding of the parasite’s biology and our ability to design new control measures, and understanding basic regulatory mechanisms that parasite has evolved may help to guide future decisions in selecting targets.

The Plasmodium life cycle includes multiple stages with drastically different morphologies in a mosquito vector and a vertebrate host. This sophisticated developmental program requires regulation of gene expression and protein synthesis [6, 7]. Even with the discovery of the AP2-domain specific transcriptional factors [8], the parasite genome is still relatively deficient in identifiable transcriptional regulators [6], implying that post-transcriptional regulation (PTR) is an important means of regulation of gene expression. Furthermore, comparative studies examining the parasite’s transcriptomes and proteomes revealed significant lags in protein abundance relative to mRNA abundance [9]. During intraerythrocytic development, the half-life of mRNAs is substantially extended at the schizont stage when compared with that at the ring stage [10]. Translational regulation plays particularly critical roles during parasite transmission, when the parasites must remain relatively quiescent for an extended period of time before transmission occurs [11]. In the specific stages (gametocytes and sporozoites) that are transmitted, many mRNAs that are needed for subsequent development are kept in a translationally repressed state. Premature expression of these mRNAs leads to considerable defects in development [12, 13]. Altogether, these studies underscore the importance of post-transcriptional control in the development of the malaria parasite.

From transcription to degradation, every step of mRNA metabolism is subject to extensive regulation. Through mRNA maturation, export, subcellular localization, stability, and degradation, RNAs are accompanied by RNA-binding proteins (RBPs) and are thus found as messenger ribonucleoproteins (mRNPs). RBPs also play crucial roles in processing of stable RNAs such as rRNA, tRNA, snRNA, and snoRNA [14]. The significance of RBPs in translational regulation is underscored by their abundance in diverse eukaryotes. For example, the yeast Saccharomyces cerevisiae encodes ~600 RBPs [15], whereas in humans the number of RBPs is considerably larger with at least 1000 genes containing the RNA recognition motif (RRM) alone [16]. To date, more than a dozen RNA-binding domains (RBDs) have been identified and the best-characterized domains include RRMs, RNA helicases, zinc-finger domains (C3H1 and C2H2), K Homology (KH), Pumilio and Fem-3 binding factor (Puf), and Acetylation Lowers Binding Affinity (Alba) families. While most of our understanding about RBPs and their functions comes from studies of model organisms, their importance in the development of Plasmodium has recently been more appreciated [7, 11, 12, 1720]. Given the potential roles of RBPs in virtually every aspect of RNA metabolism and in every part of the life cycle of the malaria parasites, we performed a comprehensive in silico analysis of RBPs in the malaria parasite P. falciparum. Many recent studies have also found that some RNA-interacting proteins may not possess commonly known RBDs [14], however, in this study we have used commonly known RBDs for the searches to ensure only more robust predictions are made. Using a set of bioinformatic tools, we identified 189 putative RBPs in the malaria parasite genome that contain well-characterized RBDs and provide functional annotation based on homology, domain organization, and expression patterns.

Results and discussion

Using a combination of search strategies, we identified a total of 189 putative RBPs in the P. falciparum genome including 72 with the RRM, 48 putative RNA helicases, 11 with the KH domain, 2 with the Puf domain, 6 with the Alba domain, 31 with zinc fingers (ZnFs), and 19 other minor families of RBPs (Additional file 1). Most of these putative RBPs in Plasmodium lack definitive functional annotations. For functional predictions, each of these RBPs was BLAST searched against the model species by considering the total query sequence coverage against the template and the degree of domain-architecture conservation. This analysis allowed functional predictions for 140 putative RBPs (Additional file 1). While 179 of genes are conserved both in Plasmodium vivax and Plasmodium yoelii with clearly identifiable orthologs, 9 of the genes are lost in either or both P. vivax or P. yoelii (Additional file 1).

RNA-binding domains and RBPs in Plasmodium

RNA-Recognition Motif (RRM)

The RRM is by far the most versatile and abundant RBD reported from bacteria to higher eukaryotes. The motif is about 70–90 amino acids in length and contains two consensus RNA-interacting motifs: RNP1 and RNP2. In the protein family database Pfam, RRMs are classified into ten different families based on profile similarities. We utilized representative sequences from individual RRM families as seeds to perform BLAST and hidden Markov model (HMM) searches in the P. falciparum genome to derive a final list of 120 RRM domains distributed in 72 proteins (Table 1). The number of RRM proteins in an organism appears to have increased through evolution, with higher-order species having more RRM proteins (Table 2). One exception is Toxoplasma gondii, a closely related species to Plasmodium, which encodes more than twice as many RRM proteins than P. falciparum. Compared with model organisms, Plasmodium species encode a similar number of RRM proteins as the yeast S. cerevisiae, which has a comparable genome size (Table 2). Five RRM families were found in Plasmodium genomes, whereas five other families (PF08777, PF10378, PF05172, PF10567 and PF14605) are completely absent. RRM_1 family is the most abundant with 55 members, followed by RRM_6 and _5 with 10 and 8 members, respectively. RRM_2 and _4 families only have one member (Table 1 and Fig. 1). Interestingly, RRM_2 family is supposedly specific to plants and fungi and is vastly expanded in plants (Table 2). The identification of the RRM_2 family member in Plasmodium suggests that this family in apicomplexans is likely derived from its red algae symbiont ancestor.

Table 1 List of different Pfam- and other profile families used to search RBPs from P. falciparum along with corresponding number of genes found in P. falciparum
Table 2 Comparative abundance of RRMs by Pfam class (including isoforms) across evolutionarily diverse species
Fig. 1
figure1

P. falciparum RRMs are divided into five RRM-families. a A multiple sequence alignment of 3D structures derived from representative members of each of the RRM families (RRM1-2, 4–6) found in P. falciparum is provided. RRM_4 is found to be highly diversified from typical RRM classes (RRM_1, RRM_5, RRM_6) followed by RRM_2. b Phylogenetic reconstruction of evolutionary relationship between RRM families from P. falciparum. Phylogenetic reconstruction of RRM families using representative domains from multiple PfRRMs failed to resolve the RRM families as expected, which may be due to relative number of RRMs used to represent each class (for example, RRM 2 and 4 have one domain each). c Representative 3D homology models for each of the RRM family were constructed using 3ucg, 3u1l, 2evz, 1p27 and 3zef PDB models as a reference to PF3D7_0923900, PF3D7_0515000, PF3D7_0606500, PF3D7_0623400, and PF3D7_0405400, respectively. It can clearly be seen that RRM4 (PfPrp8) is divergent from other members both at the primary sequence and structural level

Comparative inferences drawn from other species show that the presence of single and multiple RRMs in a protein is relatively common across different species [21]. Among the 72 RRM proteins in P. falciparum, 40 contain a single RRM, whereas 32 contain more than one RRM (Table 3 and Additional file 1). In addition, 16 of 72 RRM proteins have one or more of the 10 different types of other protein domains such as WWP repeating motif, Really Interesting New Gene (RING), C3H1 and C2H2 ZnF, G-patch, Suppressor-of-White-Apricot (SWAP), or poly(A) interacting domain (Table 3).

Table 3 The frequencies of occurrence of RRM in single, modular and multi-domain organization in P. falciparum

The average length of the RRM in P. falciparum is 75 aa (range 65–188 aa) (Additional file 2), which is similar to what has been reported in other species. Comparison of the different RRM families in Plasmodium found that the RRM_4 member Prp8 splicing factor is evolutionarily divergent from the other four families (Fig. 1a). Divergence of RRM_2 and RRM_4 family members from the other three major families is particularly noticeable in the RNA-binding motifs RNP1 and 2 (Fig. 1a). Phylogenetic analysis using only RRM-domain sequences of representatives from RRM_1-6 families failed to resolve evolutionary relationships as expected. For example, all RRM_1, 5 or 6 did not form monophyletic clades (Fig. 1b). Nonetheless, modeling of representative members of the five RRM families showed that the predicted structures conform to the typical organization of RRM and contains four anti-parallel beta strands and two alpha helices arranged as β1α1β2β3α2β4 (canonical RRM domain and RNP motifs are illustrated in Additional files 2 and 3) while showing sufficient diversity in overall 3D structures (Fig. 1c). For example, the RRM_4 family’s (Prp8) predicted 3D structure is highly diversified from the rest of the families.

Phylogeny-based orthology prediction identified one-to-one orthologs from P. vivax and P. yoelii except in two instances (PF3D7_1119800, PF3D7_1131000) where they were lost in P. yoelii. Both genes possess an SR domain and are predicted to participate in pre-mRNA splicing and export (Additional file 1). No recent duplications and species-specific expansion of RRM family genes were identified in a particular Plasmodium species (deficiency in paralogs), suggesting evolutionary constraints on independent evolution of the RRM gene family.

Phylogenetic analysis also identified four CUG-BP Elav-like (CELF) proteins and four potential poly(A)-binding proteins (PABPs) in Plasmodium. All CELF proteins have a similar multidomain organization with RRM domains flanking a variable WW domain, and they might have resulted from two gene duplication events (Table 3). PfCELF1 has recently been found to be a nuclear protein and participate in splicing [22]. Comparative bioinformatic analysis with human, Drosophila and Arabidopsis homologs classified the four Plasmodium PABPs into one nuclear and three cytoplasmic PABPs (Additional file 4). One cytoplasmic PABP (PfPABP1c) is evolutionarily conserved while the other three might have specifically acquired by Plasmodium species.

Because most of the Plasmodium RRM genes have not been characterized, we performed a variety of predictions of their functions. Thirty P. falciparum RRM proteins are predicted to participate in pre-mRNA splicing (13 genes), alternative splicing (10), transport (1), ribosome biogenesis (1), RNA degradation (1), translation (2), and post-transcriptional regulation (2). There are 25 other genes with different cellular functions while 17 genes are Plasmodium-specific with unknown functions (17) (Additional file 1). Functional analysis is needed to verify these predictions.

RNA helicases

Helicases are ubiquitous in nature and are considered to have evolved from near the very root of the evolutionary tree. Typically, helicases function in the separation of double-stranded RNA, DNA, and RNA/DNA structures in an energy-dependent manner [23]. Based on sequence similarities and domain conservation, helicases are classified into five superfamilies; superfamily II (SFII) is the most studied and most widely distributed in eukaryotes. Major components of SFII are DExD/H (Asp-Glu-x-Asp/His) helicase family members that primarily function in RNA metabolism including chaperoning snRNAs that participate in pre-mRNA splicing [24].

BLAST and HMM searches of the P. falciparum genome using three Pfam helicase families, PF00270 (DEAD/DEAH box helicases), PF00271, and PF12513, retrieved 51, 63 and 1 putative helicases (Table 1), respectively, similar to the number of helicases found in a previous study [25]. We further combined all three sets to derive a final set of 63 putative helicases in Plasmodium. Helicase members identified using PF00270 and PF12513 were all included in the set identified by using PF00271 as the seed. PF12513 is highly conserved from bacteria to eukaryotes and has one gene on average in each species, suggesting an early origin of this family. A previous text-based search of the P. falciparum genome retrieved 60 helicases, 22 of which with DEAD helicase family signatures [25]. With the lack of definitive features to bioinformatically classify helicases as DNA- and/or RNA-binding, it is generally considered that the DExD family preferentially binds RNA [2628]. To circumvent difficulty in classifying RNA helicases, we performed a BLASTp search against five model species and trypanosomes with all putative helicases in order to predict their functions. This allowed us to retain 48 helicases as RNA helicases either due to the presence of an RNA-binding ortholog in other species or confirmation of binding to RNA in P. falciparum. Further mapping of the conserved motifs and domains classified 39 of them as DExD helicases (Additional file 5), which make up 80 % of total helicases in P. falciparum. Comparative genomic analysis showed that higher-order species have larger repertoires of helicases compared to lower strata, suggestive of lineage-specific evolution of the gene family. However, species in similar strata have comparable level of helicases; for example, Plasmodium spp. and Toxoplasma spp. have 60 and 73 helicases respectively (Table 4).

Table 4 A comparative table of helicases from different Phyla

Of the 48 RNA helicases, 28 contain a single helicase domain, whereas the remaining 20 contain additional domains such as helicase associated domain (HA2), oligonucleotide/oligosaccharide binding fold (OBNTP/OB fold), SPRY, Suv3, C2HC, S-1 and DSH C-terminal domain (DSHCT) (Table 5). Similar to the conservation of the RRM superfamily in Plasmodium spp., a search of the P. vivax and P. yoelii genomes with all putative helicases detected a 1:1 ortholog match in these species. Furthermore, each Plasmodium species has 30 and 9 DExD and DExH helicases, respectively, which is comparable to the numbers found in humans (36, 14) and S. cerevisiae (27, 7) [26]. This particular aspect, in conjunction with evolutionary inferences, highlights the conservation of these helicases across the species boundaries. This observation is further substantiated by the phylogenetic relationship among the helicases in P. falciparum. All the tree nodes have been consistently supported with high bootstrap values suggesting early origin of the helicases, which is also suggestive of evolutionarily conserved functions (Additional file 6).

Table 5 The frequencies of occurrence of RNA helicases in single, modular and multi-domain organization in P. falciparum

To further illustrate the conservation of sequence motifs in RNA helicases in Plasmodium, a representative 3D model of RNA helicases was constructed using PF3D7_0422700 (eukaryotic initiation factor) as a query and ATP-dependent RNA helicase DDX48 (PDB ID: 2hyi) as a template (Fig. 2). All helicases have an evolutionarily conserved core structure made of two RecA-like, tandemly linked domains [29]. These domains possess all conserved residues required for nucleic acid binding (NAB), ATP binding and ATPase activities. At the sequence level, helicases are divided into two domains (Walker A and Walker B) with nine conserved motifs, Q, I, Ia, Ib and from II to VI [30]. Alignment of all 48 helicases and mapping the motif-specific sequence logos onto the 3D structure further confirmed the conservation in sequences and predicted structure (Fig. 2 and Additional file 5). Unlike RRMs, helicases are also highly conserved in their primary structure.

Fig. 2
figure2

P. falciparum RNA-helicases retain the canonical conserved sequence motifs. a A representative 3D model of RNA helicase was constructed using PF3D7_0422700 (eukaryotic initiation factor) as a query and ATP-dependent RNA helicase DDX48 (PDB ID: 2hyi) as a template. b A categorization of putative functional roles of RNA helicases in P. falciparum. c A representation of the canonical, conserved catalytic RNA helicase domain is provided. Each functional unit of the helicase domain is divided into two functional units, Walker A and Walker B, which are further categorized into eight highly conserved sequence motifs named I, Ia, Ib and from II to VI. Walker A consists of an ATPase functional portion while Walker B has roles in ATP hydrolysis and nucleic acids unwinding [24]. The relative conservation of each of the conserved motifs in 42 PfRNA-helicases has been summarized in sequence logs. It can be seen that DExD/H at motif II is highly conserved suggestive of most of the RNA-helicases have this domain

With regard to the functions of RNA helicases, generally DEAH helicases are involved in pre-mRNA processing, while DEAD helicases participate in ribosome biogenesis [26]. In P. falciparum, PF3D7_1364300, PF3D7_1231600, PF3D7_0917600 and PF3D7_1030100 all have a conserved DEAH domain and are classified as Prp (pre-mRNA processing) proteins. Similarly, almost all of the proteins classified under ribosome biogenesis (Fig. 2 and Additional file 6) have a conserved DEAD domain, indicative of evolutionary conservation of the protein synthesis apparatus. However, numerous exceptions to these rules have been observed, so these classifications should be experimentally confirmed and manually curated.

We performed a gene enrichment analysis using information on assigned biological processes as well as molecular functional information available from UniProt (http://www.uniprot.org/). From this analysis, 36 and 10 genes were classified as RNA-binding and mRNA processing, respectively, leaving the rest of the members unassigned. However, we could manually assign functions to 70 % of the RNA helicases from P. falciparum to ribosome biogenesis and related (17 genes), pre-mRNA processing (9), RNA degradation (3), mRNA turnover (1), genome repair and maintenance (2), and post-transcriptional regulation (2). Further corroborating the fact that helicases mainly take part in ribosome biogenesis, 30 of the 39 DExD/H helicases have a DExD domain (ribosome biogenesis), while 9 have a DExH domain (Additional file 5). Whereas 10 genes have homologs in model species without known functions, two genes (PF3D7_0103600 and PF3D7_1313400) appeared to be specific for the Plasmodium group. Though helicases are potential targets for drug design [31], very few of them have been characterized in P. falciparum [32, 33]. One such helicase (DOZI, a homolog of human DDX6 and yeast Dhh1) is essential to the development of the zygote in infected mosquitoes, and traffics a substantial portion of the mRNA pool to storage granules [12, 34, 35]. It would be interesting to see if Plasmodium specific helicases perform unique functions.

KH domain

The KH domain was first identified in the human heterogeneous nuclear ribonucleoprotein K (hnRNP) or pre-mRNA-binding protein K almost two decades ago [36]. The functional domain is about 70 aa in size, which primarily binds RNA [3638]. KH domain proteins have a diverse regulatory portfolio, which includes transcription and translational regulation, RNA metabolism, and chromatin remodeling [37, 38].

BLAST and HMM searches of the P. falciparum genome using two different search criteria with Pfam families (PF00013, PF07650, PF13014, PF13083, and PF13184) and superfamilies (SSF54791, SSF54814) identified 19 KH domains in 11 genes (Table 1). Only two Pfam families (PF00013 and PF07650) identified 5 and 1 KH genes respectively, whereas searches using two superfamilies revealed the presence of additional five genes with KH domains. Phylogenetic analysis of KH domain genes found that the five genes identified using the two-superfamily sequences formed a monophyletic group (Fig. 3a), composed of members with predictable functions (Fig. 3b). Based on evolutionary origin and secondary structures, KH domain has been classified into two families—Type-I and Type-II [39]. Type-I mainly occurs in eukaryotes and can form modular structures, while type-II is of prokaryotic origin and mostly occurs alone [39]. Analyzing domain structure of Plasmodium KH domain proteins revealed 9 and 2 (PF3D7_1465900, PF3D7_1435800) type-1 and type-II members, respectively. The 3D homology models constructed using a type-I (PF3D7_1415300) and type-II (PF3D7_1465900) KH domain illustrate such differences in the two domain types (Fig. 3c). Conservation of these two prokaryotic genes that potentially function in ribosome biogenesis [40] suggests an early origin of the translational machinery. Two genes, PF3D7_0623600 and PF3D7_1435800 are found to occur with other domains (C2HC, MMR_HSR1 and Pduv_EutP) (Additional file 1).

Fig. 3
figure3

PfKHs are divided into two gene families based on their evolutionary origin and sequence conservation. a A phylogeny showing two monophyletic clades created from Pfam- and Superfamily-based retrievals. b Categorization of functional roles by KH domain genes in P. falciparum is provided. c A representative 3D model was constructed for type-I & type-II KH domain using PF3D7_1415300 and PF3D7_1465900 as queries using 2anr and 4d61, respectively. Typical secondary structure of type-I (β1α1α2β2 β’α’) & type-II KH domain (α’β’β1α1α2β2) are marked onto the model

Functional annotation through BLASTp search showed seven of the eleven KH domain genes have well-defined homologs in model species, allowing better prediction of their potential roles. Two KH domain genes are predicted to function in mRNA processing, three in ribosome biogenesis, one each in poly(A)- (PF3D7_1415300) and poly(rC)-binding (PF3D7_0605100), and in splicing (Fig. 3b). Interestingly, a recent study of a KH domain gene PF3D7_1011800 indicated it as a novel specific transcription factor [41]. This may be possible since some of the KH domains are found to interact with both RNA and ssDNA [38]. Similar to other RBPs, all the KH domain genes have orthologs in P. vivax and P. yoelii. We failed to detect homologs for four KH domain genes except in Plasmodium species, implying genus-specific evolution of KH proteins in malaria parasites.

Puf domains

Puf is named after the two founding members from P umilio in Drosophila protein and FBF (fem-3 binding factor) in Caenorhabditis elegans. They represent an evolutionarily conserved class of translational repressors from a wide range of eukaryotic species, and are known to have diverse functions such as sexual differentiation and development, stem cell maintenance and neurogenesis [42, 43]. The Puf domain typically consists of eight homologous repeat units, each consisting of about 36 amino acids. Puf domains form a modular structure that can interact with eight ribonucleotides, with each repeat recognizing a single base. Two Puf proteins, Puf1 and Puf2 have been identified in all sequenced Plasmodium species (Puf domain-only alignment of PfPuf1, 2 is shown in Additional file 7) [7]. Homology modeling of the two Puf domains in P. falciparum showed a modular structure consistent with the typical Puf domain structure (Additional file 7). Puf1 and Puf2 have been characterized to regulate sexual development and transition from the mosquito vector to vertebrate hosts [11, 44]. Genetic deletion of Puf2 in P. berghei and P. yoelii leads to severe defects in sporozoite morphology and transmissibility, misregulation of mRNA transcript abundances, and in some cases affects male/female gametocyte ratios [12, 19, 45]. Over expression and knockdown of PfPuf2 expression in P. falciparum showed repression and elevation of gametocytogenesis, respectively [46]. A study by Miao et al. show that PfPuf2 regulates translationally repressed transcripts by interacting with Puf-binding elements (PBEs) located in both 3′- and 5′- untranslated regions [18]. For the first time, that study underscores the importance of 5′ UTRs in post-transcriptional regulation by PUF proteins, which now prompts investigations into additional regulation by PfPufs.

Alba

The Alba domain, formerly known as Sso10b, was first identified and characterized from a hyperthermophilic archaeon [47]. Recent studies confirmed its presence in all domains of life. Previous studies have characterized four Alba proteins (Alba1-4) in Plasmodium, which showed functional similarities to the canonical forms identified in Sulfolobus spp. [20, 48]. Using PF01918 and profile searches against P. falciparum genome in HAMMER, we identified two new members (PfAlba5: PF3D7_0216200 and PfAlba6: PF3D7_1202800) (Fig. 4a). PfAlba6 is highly diverged from rest of the group with only limited sequence identities with other Plasmodium Alba proteins (Fig. 4b and c). Phylogenetic reconstruction showed PfAlba1-2 and 3–4 formed two separate monophyletic clades leaving newly identified Albas as singletons (Fig. 4a). Interestingly, out of these four, three genes have undefined homologs in Arabidopsis suggesting their evolutionary conservation. BLAST searches with lower E-value (10) failed to identify homologs outside Apicomplexa suggesting possible lineage-specific evolution of PfAlba5 and 6. It is therefore interesting to see the functions of these putatively novel genes in Plasmodium species. To further map the conserved nucleic acid binding interface of PfAlbas, domain-only specific sequences with the conserved residues at 70 % of consensus level were extracted and mapped, which illuminated that the amino acid positions putatively interacting with DNA/RNA are also conserved in PfAlba5, 6 (Fig. 4b). A 3D model of PfAlba2 (PF3D7_1346300) with the archaea-specific DNA-binding protein (PDB ID: 2h9u) as the template showed 27 % identity through 77 % of query coverage (Fig. 4a). Typically Alba domains form a homodimer of two 10 kDa subunits. The predicted PfAlba2 model showed the conserved feature of an extended β sheet hairpin loop [47]. PfAlba proteins exist as a single domain as well as in association with other functional domains such as RGG box—a RNA-binding motif in PfAlba1 and 2 [20]. Alba proteins are conserved with corresponding orthologs in other Plasmodium species (Additional file 1).

Fig. 4
figure4

A comparison of identifiable ALBA proteins in P. falciparum. a A representative 3D model of an Alba domain is constructed using PF3D7_1346300 as a query and 2h9u as a template, and phylogenetic reconstruction of PfAlbas showing Alba1, 2 and Alba3, 4 are monophyletic groups. b A multiple sequence alignment of the Alba domain sequences from PfAlba1-6. Illustrated are the predicted secondary structural elements (arrow = alpha helix, block = beta strand) and conserved residues highlighted at 70 % consensus putatively interact with nucleic acids. Key for color-coded and highlighted amino acids letters are: negative DE; aliphatic ILV; positive MKR; tiny AGS; aromatic FHWY; charged DEHKR; small ACOGNPSTV; polar CDEHKNQRST; big EFIKLMQRWY; hydrophobic ACFGHIKLMRTVWY. The same color code is applied to rest of the alignments used in this manuscript. c A matrix of the percent identities for pairwise comparisons of PfAlbas 1–6 is provided

The Alba domain has been implicated in transcriptional and translational regulation through its ability to bind both DNA and RNA, and due to its association with Sir2 [49, 50]. Functional annotation of PfAlbas is not possible based on homology searches of genomes of model organisms. Whereas homologs of Alba1-3 were found in Arabidopsis with unknown functions, we did not identify homologs of Alba4-6 in model organisms even after relaxing the search parameters, suggesting a lineage-specific evolution. Similar to the canonical Alba proteins, PfAlba1-4 were reported to bind both DNA and RNA [20, 48]. Several Alba proteins from Apicomplexa (including Plasmodium) were reported to be involved in diverse cellular functions such as binding and regulating their own transcripts, regulating transcription through condensation of chromatin, and post-transcriptional regulation of mRNAs involved in development [4951]. PfAlba1 is essential for asexual erythrocytic development and binds to ~30 % of the trophozoite transcriptome, regulating the timing of the translation [52]. Yeast two-hybrid data revealed interactions between PfAlba3 and 4. Similar observations were made for Toxoplasma TgAlba2 and TgAlba1, where the former depends on the latter for expression [51]. In P. berghei, PbAlba1-4 were associated with the DOZI and CITH translational repression complexes, confirming their roles in Plasmodium RNA biology [13].

Zinc finger domain

Zinc Finger (ZnF) domains are small protein domains present in all forms of life and are one of the most studied domains in transcription factors. The functional versatility of the ZnF-containing proteins arises from the modular structure of ZnFs, which can be found in multiple copies and in different forms. At least 46 different types of ZnFs have been identified in mammalian transcriptomes [52]. ZnFs are classified into various groups based on structural similarities, including the number of zinc ligands they bind, and the arrangement and the number of cysteine (C) and histidine (H) residues surrounding one or more zinc atoms [53]. ZnFs can bind DNA, RNA, or protein, and the distance between two ZnF domains on a protein critically influences these interactions. The most characterized forms of RNA-binding ZnF forms are C2H2 and C3H1, which fold to create RNA-binding surfaces composed of α-helices and aromatic side chains [54].

Using various Pfam and other profile families as seed sequences (Table 1), we retrieved a total of 31 putative RNA-binding ZnF proteins. Of which, 20 and 11 genes belong to the C3H1 and C2H2 forms, respectively. Both C3H1 and C2H2 ZnFs coexist with other protein domains such as the RRM, RING, YTH, and PWI domains (C3H1) and the CactinC and RANB2 domains (C2H2) (Additional file 1). Based on homology searches, functional annotation was possible for eight of the eleven C2H2 genes; five genes may be involved in splicing and two in ribosome biogenesis. For 18 of the 20 C3H1 genes, specific functions could not be ascertained due to lack of orthologs in model species (Additional file 1).

Other potential RBDs

In addition to the major RBDs described above, we identified several minor RBP families including proteins containing the pseudouridine synthase and archaeosine transglycosylase (PUA) domain, YT521-B homology, S-1 motif, SWAP (Suppressor-of-White-APricot domains), PWI, and G-patch motif. All these minor domains have predicted orthologs in P. vivax and P. yoelii genomes.

The PUA is a compact 67–94 aa motif frequently found in RNA modification enzymes and nucleoproteins [55]. The motif is also commonly found in other proteins that have functional roles in translation and ribosome biogenesis [55]. Our analysis revealed five PUA containing genes (Additional file 1). Functional annotation of these genes indicates that they may have potential roles in tRNA and rRNA post-transcriptional modifications and maturation, RNA methylation, and translation initiation. In Plasmodium, the PUA domain is found to coexist with the S-adenosyl methionine domain (important for methylation functions) and the DKCLD domain (a TruB_N/PUA domain variant associated N-terminal domain of Dyskerin-like proteins).

The YTH (YT521-B homology-a part of PUA domain superfamily) constitutes a new class of RBP in eukaryotes [56], which was first identified and characterized in the YT521-B protein [57]. The domain is typically 100–150 aa in length, and is rich in aromatic residues that are reminiscent of RRM and PUA domains [56]. The domain is found to have functions in alternative splicing and the prevention of untimely meiosis in yeast through the degradation of meiosis-specific transcripts during vegetative growth [58]. Two genes were identified in the P. falciparum genome (PF3D7_0309800 and PF3D7_1419900) that encode this domain and other putative RBDs such as the C3H1 ZnF (Additional file 1). In silico functional annotation suggests that the YTH domain may participate in modulating alternative splicing, mRNA cleavage and polyadenylation in P. falciparum.

The S1 motif was first identified in E. coli ribosomal S1 protein and exhibits an evolutionarily conserved nucleic acid binding OB (oligonucleotide/oligosaccharide binding) structural fold [59]. The S1 motif in P. falciparum was found to co-exist with other RBDs such as KH and RNA helicase domains. These proteins may be involved in pre-mRNA processing, ribosome biogenesis and translation in Plasmodium (Additional file 1).

The SWAP domain was first identified in Drosophila splicing regulators. Pfam searches of the P. falciparum genome revealed the presence of two genes with the SWAP domains, namely PF3D71474500 (splicing factor 3A) and PF3D7_1402700 (pre-mRNA splicing factor). While PF3D7_1474500 has two SWAP domains, the PF3D7_1402700 has one SWAP domain with one RRM (Additional file 1).

The PWI domain is an another RNA-binding domain first reported in splicing factors [60, 61]. Of the three PWI-containing genes in P. falciparum, one (PF3D7_0610200) also has an N-terminal RRM domain. PWI genes may play roles in splicing and alternative splicing in Plasmodium (Additional file 1).

The glycine-rich nucleic acid binding domain called G-patch was first described by Aravind and Koonin [62]. We identified three G-patch genes (PF3D7_1454000, PF3D7_1110300, and PF3D7_0531400) in P. falciparum genome. Only PF3D7_1454000 is associated with an RRM (Additional file 1).

Functional roles of Plasmodium RBPs

RBPs are at the center of RNA metabolism and involved in all aspects of RNA biology. Based mostly on homology with RBPs in model organisms with known functions, we manually annotated the predicted functions of some putative RBPs in Plasmodium and categorized them into various cellular processes.

RBPs in splicing

Splicing of precursor mRNAs is carried out by a specialized, massive ribonucleoprotein (RNP) complex termed the spliceosome, which is highly conserved in eukaryotes. The spliceosome consists of five small nuclear ribonucleoproteins (U1, U2, U4/U6, U5 snRNPs) and non-snRNPs such as serine/arginine-rich (SR) family proteins [63]. Although splicing in Plasmodium remains to be fully characterized [64], some conserved components of the splicing machinery have been identified [31, 48, 6567], including five snRNAs [66, 68] and 28 RBPs with putative functions in pre-mRNA splicing (Table 6). Among them, 13 and 6 proteins belong to the RRM and RNA helicase families, respectively. All of the major spliceosome initiation factors—U2AF65, U2AF35, SF1, SF3b, Pre-RNA processing (Prp) 5, Prp28, SF3A3, SNRPC, ZRANB2, and Snu23 are encoded by the Plasmodium genome. In addition, proteins involved in the proofreading of the splicing and joining processes such as Prp16, Prp22, and Prp43 were also identified in the Plasmodium genome [69] (Additional file 1). Pfprp16 has been shown to bind RNA and hydrolyze ATP in the presence of helicase associated domain (HA2) [70].

Table 6 List of genes and their putative functions involved in splicing mechanism in P. falciparum

Alternative splicing creates multiple transcripts from a single gene, thus contributing to the diversity of the cellular proteome without a need for genomic expansion. While 95 % of multi-exon genes have more than one transcript isoform in humans, alternative splicing also occurs in P. falciparum, albeit to a much lesser extent [64, 7173]. RNA-seq analyses of the P. falciparum transcriptomes found evidence for alternative splicing in about 300 genes [64, 71]. Through bioinformatic analysis, we identified 13 genes in P. falciparum with predicted roles in alternative splicing (Table 6). Most of these genes are from the SR (7 genes) and the CELF (4 genes) families. SR family proteins have RRM domain(s) and arginine-serine repeats. Two SR genes in P. falciparum (PfSrrm1 and PfRSrrm3) were shown to bind to RNA [68, 79], and PfSrrm1 was predicted to regulate alternative splicing [74]. PfSF2, a homolog of serine/arginine-rich splicing factor 1(AF1) or pre-mRNA-splicing factor SF2 (SF2) was predicted to function in alternative splicing in P. falciparum and affected parasite proliferation in erythrocytes [74]. The CELF/Bruno-like family RBPs regulate pre-mRNA splicing/alternative splicing in the nucleus, as well as mRNA deadenylation and translation in the cytoplasm [7577]. Of the four Plasmodium CELF family genes, PfCELF1 was characterized to function in pre-mRNA processing [22]. The polypyrimidine tract binding proteins (PTBPs), a family of multiple RRM domain containing proteins, regulate alternative splicing by binding to the polypyrimidine regulatory tracts that exist in introns [78, 79]. While at least two PTBPs are found in the human genome, we only identified one PTBP-like protein, PfPTBP1, in the P. falciparum genome (Table 6).

RNA maturation, exon-exon junction complex formation and mRNA shuttling

RNA maturation in eukaryotes includes 5′ methyl capping and 3′ poly (A)-tailing of mRNAs. These processes are predicted to be conserved in malaria parasites. Among them, PF3D7_1419900 is a homolog of the 30 kDa subunit of human cleavage and polyadenylation specificity factor (CPSF), an RNA-binding endonuclease playing a role in 3′ processing of pre-mRNA [80]. Following complete maturation, export of mRNAs to the cytoplasm is achieved by a special mRNP complex termed the exon-exon junction complex (EJC) [81, 82]. It is comprised of a mixture of mRNA export factors—Aly/REF, TAP, Upf3b, UAP56 [67], and nonsense mediated mRNA surveillance (NMD) components—Y14 and Magoh. Our analysis identified all of the known homologs of both EJC and NMD complexes; however, their predicted functions have yet to be confirmed in P. falciparum except for PfUAP56 which was shown to harbor RNA binding and helicase activities that depend upon glycine 181, isoleucine 182 and arginine 206 [67].

RBPs in ribosome biogenesis and translation initiation

Ribosome biogenesis in eukaryotes involves the processing of rRNAs, assembly of the 40S and 60S subunit precursors in the nucleus, and export of the precursors to the cytoplasm. Most of the ribosomal proteins fall into various energy-consuming enzyme families including the ATP-dependent RNA helicases. Comparative genomic analyses using the yeast proteins involved in ribosome biogenesis identified 14 P. falciparum helicases with potential roles in this process (Table 7). Interestingly, all but one (Dbp9p) helicase homolog involved in ribosome biogenesis was identified in Plasmodium. These helicases are further divided into eight and nine helicases involved in small subunit and large subunit pre-processing, respectively. Similar to other RBP classes, all of these homologs remain to be experimentally characterized in P. falciparum (Table 7).

Table 7 A list of genes and their putative functions involved in ribosome biogenesis in P. falciparum

RBPs in genome repair and maintenance

Genome repair and maintenance are crucial for the integrity of the genome. Based on a homology search, we identified two RBPs from the P. falciparum genome that have putative functions in genome maintenance. Human DDX1 is reported to be activated by phosphorylation in response to double-stranded breaks in DNA. DDX1 has RNase activity towards single-stranded RNA as well as ADP-dependent RNA-DNA- and RNA-RNA-unwinding activities [83, 84]. The putative DDX1 homolog from Plasmodium (PF3D7_0521700) is highly conserved with 29 % identity at 93 % total gene coverage. Another gene, PF3D7_0623700 has a C-terminal domain resembling the yeast Suv3p protein, which is associated with mitochondrial genome stability [85, 86].

RBPs in RNA granules, degradation and translational regulation

RNA granules (stress granules, storage granules, P-bodies, P-granules) formed during stress and non-stress conditions provide a well-conserved means for a cell to regulate its gene expression. Although they all regulate RNA homeostasis in a cell, their compositions and functions are different. Moreover, the classification and functional assignment of these granules is fluid, as they are now thought to exist in a continuum and are only loosely defined by the presence/absence of various protein and RNA components [87]. Classically, stress granules form in response to different stressors, for example depletion of glucose. Stress granules typically contain translation initiation factors (eIF2, eIF3, eIF4G, eIF4A, eIF4B, and eIF4E) and PABPs [88]. Putative components of stress granules, the exosome, and processing bodies (P-bodies) found in the P. falciparum genome are listed in Table 8. It is important to note that few of these proteins have been experimentally validated to associate with granules in Plasmodium, and that experimental confirmation of this is certainly warranted. P-bodies are seen in the presence and absence of stress, and the composition of P-bodies is likely independent of the stressor. P-bodies differ from stress granules, as they contain proteins associated with mRNA degradation to decap and deadenylate transcripts. There are 13 core, canonical P-body proteins that include XRN1, HCCR4, DCP1, DCP2, and eIF4E, to name a few [8991]. In Plasmodium, BLASTp alignments with Plasmodium proteins identified predicted orthologues of DCP, RCK1, LSM1-7, XRN1, and Rap55 (11 of the 13 core components) (Table 8). The predicted DCP1 and DCP2 proteins share homology with the DCP1 superfamily domain and the NUDIX domain, respectively, thus strengthening these assignments. In contrast, no DCPS ortholog was identified even with relaxed search parameters. RCK, which is also a decapping activator, has been identified in Plasmodium. These proteins that likely traffic to cytosolic granules are important to the development and transmission of the parasite. During development of eukaryotes, many mRNAs are stored in a translationally repressed state in storage granules like the P- granules in metazoan germ cells. Similarly, P. berghei gametocytes produce a P-granule-like storage granule, which contains the RNA helicase DOZI, the Sm-like factor CITH, PABPs, Bruno homolog, the Mushashi homolog, and four Alba proteins [13]. Moreover, the DOZI complex was found to associate with a substantial portion of the transcripts found in gametocytes [35]. The components of this RNA granule are highly conserved across Plasmodium species.

Table 8 The inferred contents of exosomes, P -bodies, and stress granules in Plasmodium species. The composition of RNA granules in Plasmodium was inferred by conducting BLASTp queries using the amino acid sequences of components of exosomes, P bodies, and stress granules from model organisms (D. melanogaster, S. cerevisiae, C. elegans) against known and predicted Plasmodium amino acid sequences. Other Plasmodium proteins that traffic to granules, but that cannot be definitively placed in a currently annotated granule type, are listed separately. Gene identifiers for these proteins for three commonly studied malaria species (P. falciparum, P. vivax, P. yoelii) were obtained from PlasmoDB.org

RNA degradation is largely initiated through the removal of the poly(A)-tail by the deadenylation complex Caf1-CCR4-Not. In eukaryotes including Drosophila, Saccharomyces, and Homo sapiens, the core Caf1-CCR4-Not complex is conserved [92]. The various subunits of the Caf1-CCR4-Not complex functionally contribute in different ways, including deadenylation of transcripts, RNA processing, nuclear export, translational repression and feeding into the DNA damage response [91, 93, 94]. Through a BLASTp search, we identified 9 potential members of the Plasmodium Caf1-CCR4-Not complex (Table 8). These predicted members include the scaffold protein Not1, the deadenylases Caf1 and a HCCR4-like protein, as well as CNOT4 and CNOT3, which are responsible for ubiquitination and chromatin modifications respectively. Only Caf1 has been genetically characterized in P. falciparum, and genetic disruption of PfCaf1 by the piggyBac transposon resulted in mistimed expression of transcripts, abnormal expression of merozoite invasion proteins and a slight growth defect in blood stage cultures [95]. The Caf1-CCR4-Not complex is important for tasks ranging from deadenylation to ubiquitination, and may be differentially employed by Plasmodium to progress through its complex life cycle.

The eukaryotic exosome consists of multiple subunits and plays an essential role in RNA quality control, turnover and processing. The exosome complex has been shown to be important for 3′-to-5′ mRNA degradation. In Plasmodium we have found eight predicted subunits that align though BLASTP to common eukaryotic exosome components (Table 8). Rrp6 and Rrp44, which are the two active exoribonuclease components of the complex in archaeal and eukaryotic cells, are also present. An RBP (PF3D7_0903400) with putative function in exosome has been identified, which is a homolog of DDX60 in humans or Ski2 in yeast [96].

Transcriptomic analysis of RBPs

Analysis of the time-course transcriptomes of RBPs during malaria parasite development revealed several interesting features [71, 9799]. Hierarchical clustering and K-means analysis of RNA-seq data showed that 44 % (81) of RBP genes had correlated expression profiles. Their expression was detected during early ring stage, peaked at either early and/or late trophozoite, but decreased at early schizont stage (Fig. 5). Similarly, analysis of the microarray data for intraerythrocytic developmental cycle (IDC) showed that 73 % (127) of RBP transcripts were at their peak expression levels at ring or trophozoite stage. The abundance of most of the RBP transcripts (67 %, 111 genes) was suppressed during the schizont stage. This expression pattern is consistent with increased metabolic activities in trophozoites. While 27 % (51) of RBP genes showed elevated expression at gametocyte stage II or V, 44 % (81) of RBP genes had expression in multiple stages. About 24 % (44) of RBP genes upregulated during the IDC stage. It is interesting to note that several genes (PF3D7_0103600, PF3D7_0504200, PF3D7_0807100, PF3D7_1021500, and PF3D7_1307300) with putative or predicted functions in translation or translation regulators have elevated expressions during the gametocyte-stage. Confirming previous observations, PfDOZI (PF3D7_0320800) and PfDhhx (PF3D7_0807100) were found to have higher gene expression at gametocyte stage (Fig. 5). Of the 48 RNA helicases, five genes are upregulated in ookinetes (PF3D7_1459000, PF3D7_1021500, PF3D7_0821300, PF3D7_0602100 and PF3D7_0508700), whereas others conform to the general transcriptional program with reduced transcription at schizont stage.

Fig. 5
figure5

A heatmap of the expression profiles of PfRBPs throughout the blood and sexual stages. The expression profiles of the identified RBPs is provided with each gene plotted in a single row, and the experimental data for each time point provided as columns (e.g. R-ring, ET-early trophozoite, LT-late trophozoite, S-schizont, GII-gametocyte stage II, GV-gametocyte state IV, O-ookinete). Each of the similar expression-profile groups identified in hierarchical clustering is marked with braces on the right of the heatmap

It is noteworthy that of 28 single RRM-containing genes (Table 3), 13 are upregulated at the gametocyte stage. Noticeably, PF3D7_1126800 and PF3D7_0205700 both lack homologs in model species and showed remarkably specific elevated expression in young and mature gametocytes. PF3D7_1320900 encodes a putative peptidyl-prolyl cis-trans isomerase that interconverts cis- and trans-peptide bonds in the amino acid proline, and it was expressed at higher levels in gametocytes. A Plasmodium unique gene, PF3D7_1139100, showed higher expression levels at ring and merozoite stages but was virtually undetectable in other stages. Most of the 21 two-RRM containing genes (Table 3), however, had a uniform pattern of expression across different life stages of parasite development except for two genes [PF3D7_0414500 (musashi homolog 1) and PF3D7_1119800 (AFS-1)], which had notably higher expression during gametocyte stage.

Even though the Plasmodium transcriptome generally shows rigid, just-in-time expression patterns and ribosomal profiling demonstrates that the abundance of mRNAs correlates with their translational efficiency, many mRNAs do not fit within these bounds [100]. Therefore, assessment of RBP candidates, especially those with an enrichment of mRNA levels in a stage-specific manner merit further investigation to determine their downstream roles in gene regulation.

Predicted protein-protein interaction network of RBPs in Plasmodium

Because ~40 % of total P. falciparum genes still await functional characterization, prediction of their functions may benefit from high throughput analyses such as coexpression analysis and protein-protein interaction network analysis [101103]. Similar analyses have been conducted with P. falciparum, which have proven informative [104]. Based on the available data and protein pull-down analysis of DOZI and CITH in P. berghei [13], we attempted to construct a protein network for the P. falciparum orthologs using these data along with the yeast-two-hybrid data and interactome information retrieved from the STRING database with a combinatorial search strategy including co-occurrence, co-expression and text-trimming from published literature (Fig. 6a). CITH and DOZI are two important core components of an ancient P-granule in Plasmodium that protect quiescent mRNA from degradation in gametocytes [13, 34]. This complex also contains Albas, eIF4E, PABP, Bruno, Mushashi, enolase, and phosphoglycerate mutase. A total of 155 interactions were mapped where DOZI and CITH topped the list with 29 and 20 interactions, respectively (Fig. 6a). Gene enrichment analysis of hits obtained from the pull-down study revealed possible direct control over cell division, glycolytic pathway and translation. To assess the evolutionary preservation of interacting partners of CITH and DOZI, we interrogated the interlogous network information available for these genes from the human counterparts. A total of 407 interactions (DOZI-350 and CITH-57) were obtained from the analysis, of which ~35 interactions were common for both human and P. berghei, further confirming an ancient origin and evolutionary conservation of the P-granules (Additional file 8).

Fig. 6
figure6

Predicted protein-protein interaction networks. a A bioinformatically predicted protein interaction network for the PfCITH and PfDOZI complexes. An interactome network for PfCITH and PfDOZI is provided, where protein-protein interactions (PPIs) that provide a larger contribution to the predicted network are represented with larger fonts and nodes. b As in Panel a, a predicted Caf1-CCR4-NOT complex interaction network for P. falciparum based on the PPIs found in human interactome is illustrated. The major nodes are highlighted with the functional description (for example, HCCR4). Note that these interactions warrant experimental confirmation

Similarly, we have also constructed an interactome network for another important complex that governs post-transcriptional regulation— the PfCaf1-CCR4-NOT deadenylation complex (Fig. 6b). Currently there are no studies that have described the composition of this complex in Plasmodium species. Hence, we utilized published human Caf1-CCR4-NOT complex information to derive corresponding homologs in P. falciparum (Additional file 9). Following this analysis, the interologous network for human genes were extracted and the final gene set was searched against P. falciparum genome using BLASTp search at E-value <0.1. A total of 1090 interactions were studied, of which 774 (59 %) have homologs in P. falciparum, suggesting extensive conservation of interacting partners of this complex. Channeling these hits further into PlasmoDB we extracted and enriched gene ontology terms for biological processes. Most of the 774 predicted proteins of the Pf interactome have been categorized under primary metabolic process (GO: 0044238) that child branches into lipid metabolic process (GO:0006629), protein metabolic process (GO:0019538), carbohydrate metabolic process (GO:0005975), tricarboxylic acid cycle (GO:0 006099), nucleobase-containing compound metabolic process (GO:0006139), and cellular amino acid metabolic process (GO:0006520) suggestive of extensive interactions of the complex (Additional file 9). The entire protein network analyses in performed in this study are purely based on extrapolation of the information found in human or P. berghei, and hence these data presented here should be interpreted with those qualifiers.

Conclusions

Post-transcriptional regulation is a critical way by which malaria parasite controls its developmental processes, and RBPs are basic, underpinning elements in this process. A very few number of PfRBPs have been functionally characterized through experimentation, leaving a large portion without functional assignments. About 80 % of the total retrieved 189 PfRBPs were assigned putative functions using literature search and in silico methods. Most of these genes are predicted to be involved in pre-mRNA processing (42 genes) and ribosome biogenesis (29 genes), and a few have functions in cytosolic granules and as translational regulators. About 50 % (25 genes) of the 42 RBPs involved in pre-mRNA processing belong to the RRM family, while 55 % of 29 RBPs participating in ribosome biogenesis are from the RNA helicase family, suggesting a large fraction of these RBP families are devoted to these two basic functions. Transcriptome analyses of RBPs show both stage-specific enrichment of transcripts and mixed-curve expression profiles suggesting involvement of complex cues in their regulation. Some of the components of pre-mRNA processing and ribosome biogenesis, which are thought to be essential for these basic processes, show stage-specific enrichment of mRNA levels. Because most PfRBPs have no experimentally defined functions, these data may provide a guide to prioritize a subset of genes with an aim to better understand the basic biology of the parasite.

Methods

Database search for sequence retrieval

A multipronged search strategy was employed to retrieve putative homologs of RNA-binding proteins (RBP) genes from public domain databases. Initially, a ‘text’ based search was performed against PlasmoDB Version 12.0 (http://plasmodb.org/plasmo/) [105]. For example, to identify RBPs with a zinc-finger (Znf) like domain, “RNA-binding” followed by “Zinc finger” key words were used. Similarly, RRM, RNA helicase, Puf, K homology, Alba, PUA, S-1, YTH, PWI, SWAP, G-patch key words were used in quotes to search for RNA recognition motifs, RNA helicase, Pumilio-Homology Domain, K homology, and Acetylation Lowers Binding Affinity, pseudouridine synthase and archaeosine transglycosylase domain, S-1 motif, YT521-B homology, PWI, Suppressor-of-White-APricot domains, and G-patch motif domain containing genes, respectively. As a second strategy, a hidden Markov model (HMM) for each of the RNA-binding domains was constructed using a reference set of genes annotated from the “text” based search using hmmbuild in package HMMER version 3.0 [106]. Multiple sequence alignments were performed using the MUSCLE program using default parameters [107]. The created HMM profiles were subsequently used to perform hmmsearch (http://hmmer.janelia.org/search/hmmsearch) against the P. falciparum genome. As final strategy, Pfam ID’s of each of the putative RBDs (Additional file 1) were used to search PlasmoDB. The genes retrieved from each of the above analyses were combined and parsed to remove duplicate genes that were retrieved in multiple search strategies to arrive at the final list of putative RBPs.

Domain mapping and confirmation

To define the protein domain organization of the putative RBPs, sequences were subjected to domain profiling using the Simple Modular Architecture Research Tool (SMART) [108] and Conserved Domain Database (CDD) search tools [109]. While the SMART searches use the underlying SMART database, which consists of manually annotated protein profiles [110], the NCBI-CDD search hosts multiple databases, including CDD profiles v3.13. In addition, the CDD database uses protein 3D models in conjunction with primary sequences to classify domains into different superfamilies [109]. Where possible, a superfamily of each identified domain was used to predict RBP function in addition to annotations derived from homology searches (see below).

Functional annotations

Functional assignment of the genes predicted to encode RBPs was achieved by combining results from existing annotations from PlasmoDB v. 12.0, protein BLAST (search of GenBank [111], literature searches, and domain superfamily classification from CDD searches. BLASTp was carried out against the reference sequences of five selected model organisms—Saccharomyces cerevisiae (taxid: 4932), Caenorhabditis elegans (6239), Arabidopsis thaliana (3702), Drosophila melanogaster (7227), Homo sapiens (9606) and Trypanosoma cruzi (5693) using the following parameters: word size-3; Blosum 62 substitution matrix, gap opening 11 and extension 1. Because Plasmodium genes are often interspersed with low complexity regions (LCR), BLAST searches were configured to negate the impact of these regions on the outcome by selecting LCR filters in algorithm parameters. To avoid false functional assignment due to partial sequence matching, we employed reciprocal searches against Plasmodium genomes using sequences from model species or Trypanosomes, and more stringent criteria (≥40 % identity of the query protein and covering ≥80 % of the target gene) to assign specific functions to the proteins. In certain cases, the criteria were relaxed if the orthologs from more than one model species had a similar functional assignment, and when protein homology extends beyond the functional unit of the query protein. In the event of lack of homologs in models species, a relaxed modified-search was performed with lowered E-value (e.g. 10) and its use is noted where it is applied in this study.

Multiple sequence alignments and phylogenetic reconstruction

All multiple sequence alignments made in the study were performed using MUSCLE software with default parameters (gap opening and extending penalties as −2.9 and 0) as implemented in MEGA version 6.0 [112]. Similarly, all phylogenetic reconstructions and molecular evolutionary analysis were conducted using MEGA v6. The genetic distances were estimated using Poisson correction and phylogenetic trees were constructed following Neighbor-Joining method [113]. Tree robustness was evaluated using 1000 bootstrapped replicates.

Homology modeling

Three dimensional structures and domain folds of proteins are commonly more conserved than the amino acid sequences themselves. Hence, in this study we threaded 3D models for either defining different classes of RBPs, or to locate conserved residues, or to differentiate prokaryotic vs eukaryotic protein structures. A representative homology models for each of the five major RBDs (RRM, RNA helicase, KH, Puf, and Alba) were constructed by structural threading using algorithms implemented in I-TASSER (Iterative Threading ASSEmbly Refinement) [114] or Swiss-model [115]. The Swiss-model server automates building the homology model by first searching for a suitable template for constructing a reference-based model. Following this, the model was subjected to strained angle correction, and quality control parameters were estimated (e.g. Qmean Z-score, a likelihood of comparable quality of an estimated model to the native structure [116]. Similar to Swiss-model, the I-TASSER server also automates the model building, however, it uses three different conventional 3D model building procedures to do so (homology modeling, sequence threading, and ab initio modeling) [114, 117]. The procedure uses C-score and TM-score as quality parameters to estimate the model quality [114, 118]; where C-score is a confidence score (−5 to −2.25, higher is better) while TM-score (0–1, a higher value translates to increased confidence in the model) measures degree of absolute similarity between the built model to the native structure [114].

Transcriptome analysis

Transcriptome analysis on putative RBPs was performed using curated microarray and RNA-seq [119] datasets downloaded from PlasmoDB. Heat map and clustering of the RNA-seq data was performed using the MeV software [120]. Average linkage agglomeration rule was applied to cluster genes hierarchically with similar expression patterns. We also combined self-organizing maps data to the hierarchical clustering to derive stage-specific gene expression, which was determined using 2000 iterations at α-0.05.

Interactome analysis

An interactome analysis for PfCITH and PfDOZI was performed based on published protein-protein interaction (PPI) data for the orthologs of these proteins in the rodent parasite P. berghei [13]. The top six hits that have assigned putative functions in PlasmoDB were further used to search the STRING v9.1 database for identifying interacting partners. The STRING database reposits known and predicted protein-protein interactions. Known interactions are confirmed physical interaction between proteins, while predicted interactions were derived from four sources: genomic contexts, high-throughput experiments, coexpression and literature review [121]. We used a high-confidence score (0.7) to select the most likely interactions for further network construction using Cytoscape (www.cytoscape.org).

We have also constructed an interactome network for the PfCaf1-CCR4::NOT complex associated genes using human homologs. Following this, PPI data for human homologs were retrieved from Interologous Interaction Database (http://128.100.137.135/ophidv2.204/ppi.jsp) and the hits were used to collect P. falciparum homologs using BLASTp search against PlasmoDB with E-value <0.1. Interactions for each of the core components were searched for gene ontology terms in PlasmoDB and enrichment for biological process and primary metabolic processes were done.

References

  1. 1.

    Tun KM, Imwong M, Lwin KM, Win AA, Hlaing TM, Hlaing T, et al. Spread of artemisinin-resistant Plasmodium falciparum in Myanmar: a cross-sectional survey of the K13 molecular marker. Lancet Infect Dis. 2015;15:415–21.

  2. 2.

    Cui L, Wang Z, Miao J, Miao M, Chandra R, Jiang H, et al. Mechanisms of in vitro resistance to dihydroartemisinin in Plasmodium falciparum. Mol Microbiol. 2012;86:111–28.

  3. 3.

    Fidock DA, Rosenthal PJ, Croft SL, Brun R, Nwaka S. Antimalarial drug discovery: efficacy models for compound screening. Nat Rev Drug Discov. 2004;3:509–20.

  4. 4.

    Foth BJ, Ralph SA, Tonkin CJ, Struck NS, Fraunholz M, Roos DS, et al. Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science. 2003;299:705–8.

  5. 5.

    De Silva EK, Gehrke AR, Olszewski K, León I, Chahal JS, Bulyk ML, et al. Specific DNA-binding by apicomplexan AP2 transcription factors. Proc Natl Acad Sci U S A. 2008;105:8393–8.

  6. 6.

    Coulson RMR, Hall N, Ouzounis C a. Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. Genome Res. 2004;14:1548–54.

  7. 7.

    Cui L, Fan Q, Li J. The malaria parasite Plasmodium falciparum encodes members of the Puf RNA-binding protein family with conserved RNA binding activity. Nucleic Acids Res. 2002;30:4607–17.

  8. 8.

    Painter HJ, Campbell TL, Llinás M. The Apicomplexan AP2 family: Integral factors regulating Plasmodium development. Mol Biochem Parasitol. 2011;1–7.

  9. 9.

    Hall N, Karras M, Raine JD, Carlton JM, Kooij TWA, Berriman M, et al. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005;307:82–6.

  10. 10.

    Shock JL, Fischer KF, DeRisi JL. Whole-genome analysis of mRNA decay in Plasmodium falciparum reveals a global lengthening of mRNA half-life during the intra-erythrocytic development cycle. Genome Biol. 2007;8:R134.

  11. 11.

    Cui L, Lindner S, Miao J. Translational regulation during stage transitions in malaria parasites. Ann N Y Acad Sci. 2014;1–9.

  12. 12.

    Gomes-Santos CSS, Braks J, Prudêncio M, Carret C, Gomes AR, Pain A, et al. Transition of Plasmodium sporozoites into liver stage-like forms is regulated by the RNA binding protein Pumilio. PLoS Pathog. 2011;7, e1002046.

  13. 13.

    Mair GR, Lasonder E, Garver LS, Franke-Fayard BMD, Carret CK, Wiegant JCAG, et al. Universal features of post-transcriptional gene regulation are critical for Plasmodium zygote development. PLoS Pathog. 2010;6, e1000767.

  14. 14.

    Gerstberger S, Hafner M, Tuschl T. A census of human RNA-binding proteins. Nat Rev Genet. 2014;15:829–45.

  15. 15.

    Tsvetanova NG, Klass DM, Salzman J, Brown PO. Proteome-wide search reveals unexpected RNA-binding proteins in saccharomyces cerevisiae. PLoS One. 2010;5:1–12.

  16. 16.

    Malhotra S, Sowdhamini R. Sequence search and analysis of gene products containing RNA recognition motifs in the human genome. BMC Genomics. 2014;15:1159.

  17. 17.

    Tarique M, Ahmad M, Ansari A, Tuteja R. Plasmodium falciparum DOZI, an RNA helicase interacts with eIF4E. Gene. 2013;522:46–59.

  18. 18.

    Miao J, Fan Q, Parker D, Li X, Li J, Cui L. Puf mediates translation repression of transmission-blocking vaccine candidates in malaria parasites. PLoS Pathog. 2013;9, e1003268.

  19. 19.

    Lindner SE, Mikolajczak SA, Vaughan AM, Moon W, Joyce BR, Sullivan WJ, et al. Perturbations of Plasmodium Puf2 expression and RNA-seq of Puf2-deficient sporozoites reveal a critical role in maintaining RNA homeostasis and parasite transmissibility. Cell Microbiol. 2013;15:1266–83.

  20. 20.

    Chêne A, Vembar SS, Rivière L, Lopez-Rubio JJ, Claes A, Siegel TN, et al. PfAlbas constitute a new eukaryotic DNA/RNA-binding protein family in malaria parasites. Nucleic Acids Res. 2012;40:3066–77.

  21. 21.

    De Gaudenzi J, Frasch AC, Clayton C. RNA-binding domain proteins in Kinetoplastids: a comparative analysis. Eukaryot Cell. 2005;4:2106–14.

  22. 22.

    Wongsombat C, Aroonsri A, Kamchonwongpaisan S, Morgan HP, Walkinshaw MD, Yuthavong Y, et al. Molecular characterization of Plasmodium falciparum Bruno/CELF RNA binding proteins. Mol Biochem Parasitol. 2014;198:1–10.

  23. 23.

    Cordin O, Banroques J, Tanner NK, Linder P. The DEAD-box protein family of RNA helicases. Gene. 2006;367:17–37.

  24. 24.

    Linder P, Fuller-Pace FV. Looking back on the birth of DEAD-box RNA helicases. Biochim Biophys Acta - Gene Regul Mech. 1829;2013:750–5.

  25. 25.

    Tuteja R, Pradhan A. Unraveling the “DEAD-box” helicases of Plasmodium falciparum. Gene. 2006;376:1–12.

  26. 26.

    Abdelhaleem M, Maltais L, Wain H. The human DDX and DHX gene families of putative RNA helicases. Genomics. 2003;81:618–22.

  27. 27.

    Tanner NK, Linder P, Servet M. Gene C-: DExD / H Box RNA helicases : from generic motors to specific dissociation functions. Mol Cell. 2001;8:251–62.

  28. 28.

    De la Cruz J, Kressler D, Linder P. Unwinding RNA in saccharomyces cerevisiae: DEAD-box proteins and related families. Trends Biochem Sci. 1999;192–198.

  29. 29.

    Banroques J, Tanner NK. Bioinformatics and biochemical methods to study the structural and functional elements of DEAD-box RNA helicases. Methods Mol Biol. 2015;1259:165–81.

  30. 30.

    Rocak S, Linder P. DEAD-box proteins: the driving forces behind RNA metabolism. Nat Rev Mol Cell Biol. 2004;5:232–41.

  31. 31.

    Tuteja R. Helicases - feasible antimalarial drug target for Plasmodium falciparum. FEBS J. 2007;274:4699–704.

  32. 32.

    Mehta J, Tuteja R. A novel dual Dbp5/DDX19 homologue from Plasmodium falciparum requires Q motif for activity. Mol Biochem Parasitol. 2011;176:58–63.

  33. 33.

    Prakash K, Tuteja R. A novel DEAD box helicase Has1p from Plasmodium falciparum: N-terminal is essential for activity. Parasitol Int. 2010;59:271–7.

  34. 34.

    Mair GR, Braks JAM, Garver LS, Wiegant JCAG, Hall N, Dirks RW, et al. Regulation of sexual development of Plasmodium by translational repression. Science. 2006;313:667–9.

  35. 35.

    Guerreiro A, Deligianni E, Santos JM, Silva PAGC, Louis C, Pain A, et al. Genome-wide RIP-Chip analysis of translational repressor-bound mRNAs in the Plasmodium gametocyte. Genome Biol. 2014;15:493.

  36. 36.

    Slomi H, Choi M, Slomi MC, Nussbaum RL, Dreyfuss G. Essential role for KH domains in RNA binding: Impaired RNA binding by a mutation in the KH domain of FMR1 that causes fragile X syndrome. Cell. 1994;77:33–9.

  37. 37.

    Valverde R, Edwards L, Regan L. Structure and function of KH domains. FEBS J. 2008;275:2712–26.

  38. 38.

    Siomi H, Matunis MJ, Michael WM, Dreyfuss G. The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucleic Acids Res. 1993;21:1193–8.

  39. 39.

    Grishin NV. KH domain: one motif, two folds. Nucleic Acids Res. 2001;29:638–43.

  40. 40.

    Dennerlein S, Rozanska A, Wydro M, Chrzanowska-Lightowlers ZMA, Lightowlers RN. Human ERAL1 is a mitochondrial RNA chaperone involved in the assembly of the 28S small mitochondrial ribosomal subunit. Biochem J. 2010;430:551–8.

  41. 41.

    Komaki-Yasuda K, Okuwaki M, Nagata K, Ichiro KS, Kano S. Identification of a novel and unique transcription factor in the intraerythrocytic stage of Plasmodium falciparum. PLoS One. 2013;8, e74701.

  42. 42.

    Galgano A, Forrer M, Jaskiewicz L, Kanitz A, Zavolan M, Gerber AP. Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system. PLoS One. 2008;3, e3164.

  43. 43.

    Wickens M, Bernstein DS, Kimble J, Parker R. A PUF family portrait: 3′UTR regulation as a way of life. Trends Genet. 2002;18:150–7.

  44. 44.

    Miao J, Li J, Fan Q, Li X, Li X, Cui L. The Puf-family RNA-binding protein PfPuf2 regulates sexual development and sex differentiation in the malaria parasite Plasmodium falciparum. J Cell Sci. 2010;123(Pt 7):1039–49.

  45. 45.

    Müller K, Matuschewski K, Silvie O. The Puf-family RNA-binding protein Puf2 controls sporozoite conversion to liver stages in the malaria parasite. PLoS One. 2011;6, e19860.

  46. 46.

    Fan Q, Li J, Kariuki M, Cui L. Characterization of PfPuf2, member of the Puf family RNA-binding proteins from the malaria parasite Plasmodium falciparum. DNA Cell Biol. 2004;23:753–60.

  47. 47.

    Wardleworth BN, Russell RJM, Bell SD, Taylor GL, White MF. Structure of Alba: an archaeal chromatin protein modulated by acetylation. EMBO J. 2002;21:4654–62.

  48. 48.

    Goyal M, Alam A, Iqbal MS, Dey S, Bindu S, Pal C, et al. Identification and molecular characterization of an Alba-family protein from human malaria parasite Plasmodium falciparum. Nucleic Acids Res. 2012;40:1174–90.

  49. 49.

    Schimanski B, Heller M, Acosta-serrano A, Mani J, Gu A, Güttinger A, et al. Alba-domain proteins of trypanosoma brucei are cytoplasmic RNA-binding proteins that interact with the translation machinery. PLoS One. 2011;6, e22463.

  50. 50.

    Dupé A, Dumas C, Papadopoulou B. An Alba-domain protein contributes to the stage-regulated stability of amastin transcripts in Leishmania. Mol Microbiol. 2013;91:548–61.

  51. 51.

    Gissot M, Walker R, Delhaye S, Alayi TD, Huot L, Hot D, et al. Toxoplasma gondii Alba proteins are involved in translational control of gene expression. J Mol Biol. 2013;425:1287–301.

  52. 52.

    Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, et al. Antisense transcription in the mammalian transcriptome. Science. 2005;309:1564–6.

  53. 53.

    Krishna SS, Majumdar I, Grishin NV. Structural classification of zinc fingers: survey and summary. Nucleic Acids Res. 2003;31:532–50.

  54. 54.

    Lunde BM, Moore C, Varani G. RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol. 2007;8:479–90.

  55. 55.

    Pérez-Arellano I, Gallego J, Cervera J. The PUA domain - a structural and functional overview. FEBS J. 2007;274:4972–84.

  56. 56.

    Hartmann AM, Nayler O, Schwaiger FW, Obermeier A, Stamm S. The interaction and colocalization of Sam68 with the splicing-associated factor YT521-B in nuclear dots is regulated by the Src family kinase p59(fyn). Mol Biol Cell. 1999;10:3909–26.

  57. 57.

    Corsi A, Robbins A, Agarwal R, Megee P, Cohen-fix O, Stoilov P, et al. YTH : a new domain in nuclear proteins. Trends Biochem Sci. 2002;27:495–7.

  58. 58.

    Siomi H, Dreyfuss G. RNA-binding proteins as regulators of gene expression. Curr Opin Genet Dev. 1997;345–353.

  59. 59.

    Bycroft M, Hubbard TJ, Proctor M, Freund SM, Murzin AG. The solution structure of the S1 RNA binding domain: a member of an ancient nucleic acid–binding fold. Cell. 1997;88:235–42.

  60. 60.

    Blencowe BJ, Ouzounis CA. The PWI motif: a new protein domain in splicing factors. Trends Biochem Sci. 1999;24:179–80.

  61. 61.

    Szymczyna BR, Bowman J, McCracken S, Pineda-Lucena A, Lu Y, Cox B, et al. Structure and function of the PWI motif: a novel nucleic acid-binding domain that facilitates pre-MRNA processing. Genes Dev. 2003;17:461–75.

  62. 62.

    Aravind L, Koonin EV. G-patch: a new conserved domain in eukaryotic RNA-processing proteins and type D retroviral polyproteins. Trends Biochem Sci. 1999;24:342–4.

  63. 63.

    Dreyfuss G, Kim VN, Kataoka N. Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol. 2002;3:195–205.

  64. 64.

    Sorber K, Dimon MT, Derisi JL. RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts. Nucleic Acids Res. 2011;39:3820–35.

  65. 65.

    Tuteja R. Genome wide identification of Plasmodium falciparum helicases: a comparison with human host. Cell Cycle. 2014;9:104–20.

  66. 66.

    Chakrabarti K, Pearson M, Grate L, Sterne-Weiler T, Deans J, Donohue JP, et al. Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis. RNA. 2007;13:1923–39.

  67. 67.

    Shankar J, Pradhan A, Tuteja R. Isolation and characterization of Plasmodium falciparum UAP56 homolog: evidence for the coupling of RNA binding and splicing activity by site-directed mutations. Arch Biochem Biophys. 2008;478:143–53.

  68. 68.

    Upadhyay R, Bawankar P, Malhotra D, Patankar S. A screen for conserved sequences with biased base composition identifies noncoding RNAs in the A-T rich genome of Plasmodium falciparum. Mol Biochem Parasitol. 2005;144:149–58.

  69. 69.

    Tuteja R. Helicases involved in splicing from malaria parasite Plasmodium falciparum. Parasitol Int. 2011;335–340.

  70. 70.

    Singh PK, Kanodia S, Dandin CJ, Vijayraghavan U, Malhotra P. Plasmodium falciparum Prp16 homologue and its role in splicing. Biochim Biophys Acta - Gene Regul Mech. 1819;2012:1186–99.

  71. 71.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, et al. New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq. Mol Microbiol. 2010;76:12–24.

  72. 72.

    Iriko H, Jin L, Kaneko O, Takeo S, Han E-T, Tachibana M, et al. A small-scale systematic analysis of alternative splicing in Plasmodium falciparum. Parasitol Int. 2009;58:196–9.

  73. 73.

    Dixit A, Singh PK, Sharma GP, Malhotra P, Sharma P. PfSRPK1, a novel splicing-related kinase from Plasmodium falciparum. J Biol Chem. 2010;285:38315–23.

  74. 74.

    Eshar S, Allemand E, Sebag A, Glaser F, Muchardt C, Mandel-Gutfreund Y, et al. A novel Plasmodium falciparum SR protein is an alternative splicing factor required for the parasites’ proliferation in human erythrocytes. Nucleic Acids Res. 2012;40:9903–16.

  75. 75.

    Dasgupta T, Ladd AN. The importance of CELF control: molecular and biological roles of the CUG-BP, Elav-like family of RNA-binding proteins. Wiley Interdisciplinary Reviews: RNA 2012:104–121.

  76. 76.

    Ladd AN, Charlet N, Cooper TA. The CELF family of RNA binding proteins is implicated in cell-specific and developmentally regulated alternative splicing. Mol Cell Biol. 2001;21:1285–96.

  77. 77.

    Beisang D, Bohjanen PR, Louis IAV. CELF1, a multifunctional regulator of posttranscriptional networks. INTECH Open Access Publisher; 2012:181–206.

  78. 78.

    Chen M, Manley JL. Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat Rev Mol Cell Biol. 2009;10:741–54.

  79. 79.

    Han A, Stoilov P, Linares AJ, Zhou Y, Fu XD, Black DL. De Novo prediction of PTBP1 binding and splicing targets reveals unexpected features of its RNA recognition and function. PLoS Comput Biol. 2014;10, e1003442.

  80. 80.

    Barabino SML, Hübner W, Jenny A, Minvielle-Sebastia L, Keller W. The 30-kd subunit of mammalian cleavage and polyadenylation specificity factor and its yeast homolog are rna-binding zinc finger proteins. Genes Dev. 1997;11:1703–16.

  81. 81.

    Gatfield D, Izaurralde E. REF1/Aly and the additional exon junction complex proteins are dispensable for nuclear mRNA export. J Cell Biol. 2002;159:579–88.

  82. 82.

    Lau C-K, Diem MD, Dreyfuss G, Van Duyne GD. Structure of the Y14-magoh core of the exon junction complex. Curr Biol. 2003;13:933–41.

  83. 83.

    Li L, Monckton EA, Godbout R. A role for DEAD box 1 at DNA double-strand breaks. Mol Cell Biol. 2008;28:6413–25.

  84. 84.

    Edgcomb SP, Carmel AB, Naji S, Ambrus-Aikelin G, Reyes JR, Saphire ACS, et al. DDX1 is an RNA-dependent ATPase involved in HIV-1 Rev function and virus replication. J Mol Biol. 2012;415:61–74.

  85. 85.

    Guo XE, Chen CF, Wang DDH, Modrek AS, Phan VH, Lee WH, et al. Uncoupling the roles of the SUV3 helicase in maintenance of mitochondrial genome stability and RNA degradation. J Biol Chem. 2011;286:38783–94.

  86. 86.

    Minczuk M, Dmochowska A, Palczewska M, Stepien PP. Overexpressed yeast mitochondrial putative RNA helicase Mss116 partially restores proper mtRNA metabolism in strains lacking the Suv3 mtRNA helicase. Yeast. 2002;19:1285–93.

  87. 87.

    Buchan JR, Parker R. Eukaryotic stress granules: the ins and outs of translation. Mol Cell. 2009;36:932–41.

  88. 88.

    Kedersha N, Anderson P. Mammalian stress granules and processing bodies. Methods Enzymol. 2007;431:61–81.

  89. 89.

    Marnef A, Sommerville J, Ladomery MR. RAP55: insights into an evolutionarily conserved protein family. Int J Biochem Cell Biol. 2009;41:977–81.

  90. 90.

    Sheth U, Parker R. Decapping and decay of messenger RNA occur in cytoplasmic processing bodies. Science (80-). 2003;300:805–8.

  91. 91.

    Coller JM, Tucker M, Sheth U, Valencia-Sanchez MA, Parker R. The DEAD box helicase, Dhh1p, functions in mRNA decapping and interacts with both the decapping and deadenylase complexes. RNA. 2001;7:1717–27.

  92. 92.

    Collart MA, Panasenko OO. The Ccr4--not complex. Gene. 2012;492:42–53.

  93. 93.

    Tucker M, Valencia-Sanchez MA, Staples RR, Chen J, Denis CL, Parker R. The transcription factor associated Ccr4 and Caf1 proteins are components of the major cytoplasmic mRNA deadenylase in Saccharomyces cerevisiae. Cell. 2001;104:377–86.

  94. 94.

    Mulder KW, Inagaki A, Cameroni E, Mousson F, Winkler GS, De Virgilio C, et al. Modulation of Ubc4p/Ubc5p-mediated stress responses by the RING-finger-dependent ubiquitin-protein ligase Not4p in Saccharomyces cerevisiae. Genetics. 2007;176:181–92.

  95. 95.

    Balu B, Maher SP, Pance A, Chauhan C, Naumov AV, Andrews RM, et al. CCR4-associated factor 1 coordinates the expression of Plasmodium falciparum egress and invasion proteins. Eukaryot Cell. 2011;10:1257–63.

  96. 96.

    Halbach F, Reichelt P, Rode M, Conti E. The yeast ski complex: crystal structure and rna channeling to the exosome complex. Cell. 2013;154:814–26.

  97. 97.

    Bozdech Z, Llinás M, Pulliam BL, Wong ED, Zhu J, DeRisi JL. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003;1:085.

  98. 98.

    Llinás M, Bozdech Z, Wong ED, Adai AT, DeRisi JL. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 2006;34:1166–73.

  99. 99.

    Natalang O, Bischoff E, Deplaine G, Proux C, Dillies M-A, Sismeiro O, et al. Dynamic RNA profiling in Plasmodium falciparum synchronized blood stages exposed to lethal doses of artesunate. BMC Genomics. 2008;9:388.

  100. 100.

    Caro F, Ahyong V, Betegon M, DeRisi JL. Genome-wide regulatory dynamics of translation in the Plasmodium falciparum asexual blood stages. Elife. 2014;3:1–24.

  101. 101.

    Bischoff E, Vaquero C. In silico and biological survey of transcription-associated proteins implicated in the transcriptional machinery during the erythrocytic development of Plasmodium falciparum. BMC Genomics. 2010;11:34.

  102. 102.

    LaCount DJ, Vignali M, Chettier R, Phansalkar A, Bell R, Hesselberth JR, et al. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature. 2005;438:103–7.

  103. 103.

    Suthram S, Sittler T, Ideker T. The Plasmodium protein network diverges from those of other eukaryotes. Nature. 2005;438:108–12.

  104. 104.

    Wuchty S, Adams JH, Ferdig MT. A comprehensive Plasmodium falciparum protein interaction map reveals a distinct architecture of a core interactome. Proteomics. 2009;9:1841–9.

  105. 105.

    Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, et al. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 2009;37:D539–43.

  106. 106.

    Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–37.

  107. 107.

    Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113.

  108. 108.

    Schultz J, Copley RR, Doerks T, Ponting CP, Bork P. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 2000;28:231–4.

  109. 109.

    Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 2014;43(Database issue):D222–6.

  110. 110.

    Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43(Database issue):D257–60.

  111. 111.

    Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36:W5–9.

  112. 112.

    Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9.

  113. 113.

    Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–25.

  114. 114.

    Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010;5:725–38.

  115. 115.

    Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42:W252–8.

  116. 116.

    Bordoli L, Schwede T. Automated protein structure modeling with SWISS-MODEL workspace and the protein model portal. Methods Mol Biol. 2012;857:107–36.

  117. 117.

    Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;3389–3402.

  118. 118.

    Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33:2302–9.

  119. 119.

    López-Barragán MJ, Lemieux J, Quiñones M, Williamson KC, Molina-Cruz A, Cui K, et al. Directional gene expression and antisense transcripts in sexual and asexual stages of Plasmodium falciparum. BMC Genomics. 2011;12:587.

  120. 120.

    Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–8.

  121. 121.

    Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–8.

Download references

Acknowledgements

This work is supported by the R01AI104946 and U19AI089672 to LC, and NIAID K22 (1K22AI101039-01) and Pennsylvania State University Start-Up Funds to SEL.

Author information

Correspondence to Liwang Cui or Scott E. Lindner.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LC, SEL conceived the study. BPNR, SS, XL, KJH, KK performed in silico sequence retrieval, analyses and interpretation of the data. BPNR, KJH prepared initial draft, and LC, SEL, BPNR further improved the manuscript. All authors read and approved the final contents of the manuscript.

Additional files

Additional file 1:

A comprehensive list of putative RNA-binding proteins retrieved from Plasmodium spp.: This summary table shows each of the putative RNA-binding proteins retrieved from P. falciparum and provides their Gene ID, their protein length, orthologs in P. vivax and P. yoelii, search results against model species, functional annotations, domain organizations, structural similarity to homologs, and Gene Ontology (GO) terms. Blue, pink, and green boxes are used to denote transmembrane, low complexity, and coiled-coil regions respectively, and NH for no homologs. Naming of Plasmodium genes is tentative, and should be used cautiously until further evidences are available about their functions. (XLSX 2203 kb)

Additional file 2:

A multiple sequence alignment of the 122 RRM domains found in 66 Plasmodium falciparum proteins. The Gene IDs for proteins predicted to contain highly conserved RRM motifs (RNP1 and RNP2) is provided along with the predicted secondary structure (mapped to the top of the alignment). Please see the legend key provided at the end of the alignment for meaning attached to color-code resides and letters used for writing consensus sequence. (PDF 291 kb)

Additional file 3:

An expanded functional description and 3D model of the RRM domain in Plasmodium. (a) A representative 3D model of a RRM domain constructed using PfPABP3 (PF3D7_0923900) as a query and PDBID: 2jwn as a template using default parameters as described in the Materials and Methods section. The canonical RRM fold is marked on the 3D model, where L-stands for link; h-helix, 1–4 β-sheet, and RNP1&2 are conserved features of RRM domain. (b) A categorization of putative functional roles of RRM motif in P. falciparum and their associated genes. (PDF 75 kb)

Additional file 4:

A list of Poly(A)-binding proteins retrieved from Plasmodium genomes along with their putative cellular locations. (PDF 63 kb)

Additional file 5:

Multiple sequence alignment of DExD/H RNA helicases from P. falciparum. All the conserved motifs representing helicases are mapped to the multiple sequence alignment. For residue color and meaning of consensus letters please see key at the end of the alignment. (PDF 336 kb)

Additional file 6:

A phylogenetic representation of the RNA helicases found in P. falciparum. Phylogenetic reconstruction of 48 PfRNA-helicases show uniformly higher support for all the tree branches, which is suggestive of deep-evolutionary conservation at the sequence and probably at the functional level. Two subfamily clusters representing two functions, small subunit rRNA helicases (SSU) and pre-RNA processing (Prp) have monophyletic representation. (PDF 57 kb)

Additional file 7:

A structural and sequence-based comparison of PfPuf1 and PfPuf2. (a) A predicted model of PfPuf1 (using PDBID: 4dzs as a template) superimposed on a predicted model of PfPuf2 (using PDB ID: 3 k49 a template) at 3.4 root mean square deviation confirming the signature concave structure common to PUF domains. (b) A multiple sequence alignment of predicted PUF-domain sequences. PfPuf1 and PfPuf2 are 25 % identical at the domain level. (PDF 233 kb)

Additional file 8:

A unified gene list identified as common between human and P. falciparum DOZI interactomes. Gene IDs from P. falciparum, along with their predicted function and biological process are provided. (PDF 49 kb)

Additional file 9:

A list of Caf1-CCR4-NOT complex genes identified by searching against human PPI database. The interactions were further used to perform BLASTp-search against PlasmoDB at E-value <0.1. (PDF 41 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • RNA-binding proteins (RBPs)
  • Post transcriptional regulation (PTR)
  • Pre-mRNA splicing
  • Ribosome biogenesis
  • mRNA processing
  • Stress granules
  • Malaria
  • Plasmodium