Eukaryotic Protein Kinases (ePKs) of the Helminth Parasite Schistosoma mansoni
© Andrade et al; licensee BioMed Central Ltd. 2011
Received: 23 November 2010
Accepted: 6 May 2011
Published: 6 May 2011
Skip to main content
© Andrade et al; licensee BioMed Central Ltd. 2011
Received: 23 November 2010
Accepted: 6 May 2011
Published: 6 May 2011
Schistosomiasis remains an important parasitic disease and a major economic problem in many countries. The Schistosoma mansoni genome and predicted proteome sequences were recently published providing the opportunity to identify new drug candidates. Eukaryotic protein kinases (ePKs) play a central role in mediating signal transduction through complex networks and are considered druggable targets from the medical and chemical viewpoints. Our work aimed at analyzing the S. mansoni predicted proteome in order to identify and classify all ePKs of this parasite through combined computational approaches. Functional annotation was performed mainly to yield insights into the parasite signaling processes relevant to its complex lifestyle and to select some ePKs as potential drug targets.
We have identified 252 ePKs, which corresponds to 1.9% of the S. mansoni predicted proteome, through sequence similarity searches using HMMs (Hidden Markov Models). Amino acid sequences corresponding to the conserved catalytic domain of ePKs were aligned by MAFFT and further used in distance-based phylogenetic analysis as implemented in PHYLIP. Our analysis also included the ePK homologs from six other eukaryotes. The results show that S. mansoni has proteins in all ePK groups. Most of them are clearly clustered with known ePKs in other eukaryotes according to the phylogenetic analysis. None of the ePKs are exclusively found in S. mansoni or belong to an expanded family in this parasite. Only 16 S. mansoni ePKs were experimentally studied, 12 proteins are predicted to be catalytically inactive and approximately 2% of the parasite ePKs remain unclassified. Some proteins were mentioned as good target for drug development since they have a predicted essential function for the parasite.
Our approach has improved the functional annotation of 40% of S. mansoni ePKs through combined similarity and phylogenetic-based approaches. As we continue this work, we will highlight the biochemical and physiological adaptations of S. mansoni in response to diverse environments during the parasite development, vector interaction, and host infection.
Human schistosomiasis caused by blood fluke parasites of Schistosoma genus, remains an important parasitic disease and a major health economic problem in many tropical and subtropical countries. Schistosomes have a complex life cycle that includes six different stages (cercariae, schistosomula, adult worms - male and female, egg, miracidia and sporocyst) in different environments: water, definitive host (mammals) and intermediate host (snail). During parasite development, signals from the environment are sensed and stimulate physiological, morphological and, biochemical adaptations. Oils are shown to stimulate cercarial penetration; hormones and exposure to the snail haemolymph trigger specific physiological adaptations [1–3]. The free living parasite forms display light and geotropism and female development is dependent on signals from the male adult worm through mechanisms not completely understood [4, 5]. It has been demonstrated that worm pairing induces changes in gene expression in the female vitelline gland  and the accumulation of glutathione and lipids in the male . Furthermore, microarray analysis revealed distinct differential gene expression profiles between males and females [6–8]. Therefore, the success of the parasite infection depends on the assessment at the cellular and molecular levels of the environment and the transmission of signals to physiological regulatory networks that will collectively stimulate adaptations.
The maintenance of homeostasis and complex cellular adaptations in Schistosoma mansoni require specific extracellular signals that must be integrated to generate an appropriate response from the sensory receptor via intracellular proteins . Signal transduction involves non-linearly integrated networks that interact mostly by switching activity status via phosphorylation (protein kinases) and dephosphorylation (protein phosphatases) of amino acid residues, or the incorporation of GTP. Other cellular non-protein messengers include cyclic AMP, Ca 2+ and diacylglycerol.
Protein kinases (PKs) play a central role in mediating intracellular signals by adding a phosphate group from ATP or GTP to an amino acid residue leading to a conformational change in the target protein that will switch its activation status . Most PKs have a catalytic domain, which binds and phosphorylates target proteins, and a regulatory region. Many PKs are autophosphorylated or may be phosphorylated by other PKs, an interaction regulated by the accessory protein domains .
Protein kinase classification
Eukaryotic Protein Kinases (ePKs)
PKs are considered druggable targets from the medical and chemical viewpoints as a growing number of PKs inhibitors have been developed and approved for treatment of different human disease . An example of a successful PK inhibitor is Gleevac®, that induces a conformational change in PTK and mimics substrate binding and therefore prevents activation by upstream kinases . Beyond this, PKs have gained interest as targets treatment strategies to fight many parasites, including S. mansoni [18–21].
The current schistosomiasis treatment frequently does not cure 100% of those treated in high-risk communities and the emergence of Schistosoma resistant strains is a real possibility [22–25]. Thus, the identification of potential drug targets should be further emphasized. The recent sequencing of S. mansoni genome and large-scale transcriptome projects have yielded crucial information to the identification of new candidate drugs [26–29]. Understanding protein structure and function in many model organisms can help elucidate the function of their parasite homologs and further enable the application of such information in drug design and development. The study of the kinase complement (kinome) is therefore of major importance for the understanding of the physiology of the organism and also provides insights into how to disrupt the fine adaptative mechanisms. The present work aimed at analyzing the S. mansoni predicted proteome data in order to identify all ePKs encoded in the genome of this parasite. For this purpose, we combined computational approaches such as sequence similarity searches using Hidden Markov Models (HMMs) and distance-based phylogenetic analyses. The functional annotation was performed mainly to yield insights into the signaling process related to the complex lifestyle of S. mansoni.
Amino acid sequences corresponding to the conserved catalytic domain of ePKs were aligned by MAFFT  and further used in phylogenetic analysis based on a distance method as implemented in PHYLIP . The dataset for each ePK group also included the ePK homologs from six other eukaryotes: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, and Brugia malayi. This approach allowed us to classify the S. mansoni ePKinome at the group, family, and/or subfamily levels based on the hierarchy proposed elsewhere [12, 13, 32, 32], and sometimes provided insights into kinase function and evolution. Detailed information is available in the Additional file 1 that contains, among other things, all S. mansoni ePKs with the corresponding identifier from the genome project linked to SchistoDB database . SchistoDB http://www.schistodb.net allows the community to access to all sequences, annotations and other data types integrated into the genomic information. It also provides several tools to analyze retrieve and display the data. In the SchistoDB it is possible to encounter, for each ePK, the development expression stages by EST evidence, information about orthologs, Gene Onthology (GO) function, metabolic pathways, structural information, PDB structures, and links to external databases such as the TDR database . The TDR database contains additional information for S. mansoni genes like antigenicity, essentiality, phenotypes and associated compounds (druggability).
Of the 252 ePKs identified in S. mansoni proteome, only 16 were experimentally studied as highlighted in the supplementary material (Additional file 1) and the others 236 ePKs were previously annotated only by automatic methods based on sequence similarity searches [26, 29].
S. mansoni ePKs were examined for the presence of the 12 smaller subdomains present in the catalytic domain and also for the presence of a lysine in subdomain II and aspartic acids residues in subdomain VIb and VII, which are known to play essential roles in the kinase function [9, 12, 34]. According to our analysis, 12 proteins are predicted to be catalytically inactive ePKs, as they lack one or more of the three essential amino acid residues in the catalytic domain (Additional file 1), including all members of S. mansoni RGC group (see below).
Approximately 2% of the S. mansoni ePK remain unclassified once they do not have similarity to any known PK family. All these proteins have a truncated catalytic domain probably because of an incorrect protein prediction. The unclassified ePKs from C. elegans, D. melanogaster, H. sapiens and S. cerevisiae range from 19% to about 38% their kinomes.
Around 13 families have been classified as part of the AGC group in eukaryotic organisms . In S. mansoni, most AGC proteins belong to PKA (Protein Kinase A) (5 proteins), DMPK (Myotonic Dystrophy Protein Kinase) (4 proteins and 1 product of alternative splicing), PKC (Protein Kinase C) (4 proteins) and PKG (4 proteins) families. Other S. mansoni proteins have only one representative in the remaining AGC families (Additional file 1). According to our phylogenetic analysis, S. mansoni has no homolog of the YANK (Yet Another Novel Kinase) family (Additional file 2).
The most significant difference between PKA and PKG family members is that in PKA, the regulatory and catalytic activities are performed by separate gene products known as PKA-R and PKA-C, respectively, whereas in PKG the cNMP-binding (cyclic nucleotide-binding domain) and catalytic domains are usually present in the same polypeptide . The inactive conformation of PKA is a heterotetramer of two PKA-R and two PKA-C subunits, while PKG exists as a homodimer . S. mansoni processes five homologs of the PKA-C subunit (Additional file 1), and six predicted of PKC-R subunit (Smp_131050, Smp_147320, Smp_079010, Smp_030400, Smp_019280, Smp_022100) allowing for a variety of different holoenzymes to be formed in this parasite. Some studies demonstrated that PKG proteins of Toxoplasma  and Eimeria  and PKG and PKA proteins of Plasmodium [38, 39] are essential as the inhibitors causes an anti-parasite effect in these organisms. Recently it was shown that inhibition of the SmPKA-C subunit (Smp_152330), expressed in adult worms of S. mansoni, resulted in the death of the parasites . This result and the range of holoenzymes that can be formed, indicate that genes in this family are critical for the development of S. mansoni and may represent good targets for drug development.
PKC belongs to a large protein family that is classified into four important subfamilies: PKC Alpha subfamily, that contain the conventional PKCs (γ, βI, βII, and α) and are sensitive to diacylglycerol (DAG) and Ca2+; PKC Eta and Delta subfamilies containing the novel PKCs (ε, δ, η, and θ) which are regulated by DAG alone; and PKC Iota subfamily, that contain the atypical PKCs (ζ and ι), and are insensitive to both compounds [41, 42]. PKC is considered to be a mechanistic regulator of development in vertebrates, playing a key role in cell growth and differentiation [43–45]. S. mansoni has representatives in the three main PKC subfamilies mentioned above (Iota, Eta and Alpha) but lacks homologs in the Delta subfamily, present in C. elegans, D. melanogaster, M. musculus, and H. sapiens. The two PKC Alpha proteins found in S. mansoni (Smp_128480 and Smp_176360), belong to the PKCβI isoform and were recently characterized [46–48]. Both are associated with the neural mass, excretory vesicle, ridge cyton, tegument and germinal cells in schistosomula and miracidium, suggesting a possible role in larval transformation [46–48].
One protein in AGC group, Smp_157370, remains unclassified. In the phylogenetic tree, this protein appears more closely related to the GRK (G-protein coupled Receptor Kinase) family (Additional file 2), despite the good conservation of the catalytic domain, this protein lacks the accessory domain that is characteristic of the GRK proteins (Additiona file 2). Furthermore, Smp_157370 does not form a clade with the GRK family members according to our phylogenetic tree, which corroborates its divergence in relation to GRK homologs in other eukaryotes (Additional file 2).
Interestingly, according to SchistoDB  EST evidences, the two most highly transcribed ePKs (Smp_151140 and Smp_158560.1) in S. mansoni, belong to the DMPK family of the AGC group, mainly in cercariae, schistosomula, eggs and adult worms. This finding is interesting as these are the four life cycle stages of the parasite which are in contact with the definitive host. In C. elegans proteins of DMPK family are expressed in hypodermal cells and are involved in embryonic elongation .
The divalent cation calcium (Ca2+) is one of the ions most widely used as a second messenger in cellular signaling. A significant portion of calcium-mediated signaling is controlled by calmodulin-binding kinases. Some members of the CaMK group are dependent on the binding of Ca2+/CaM . In the S. mansoni ePKinome, 32 proteins were classified as CaMK with the vast majority (18 proteins) belonging to the CaMKL (Calcium/Calmodulin Regulated Kinase) - like family. A similar number was found in other organisms analyzed here (Additional file 3). S. mansoni also contain members of DAPK (death associated protein kinase), MAPKAPK (MAPK associated protein kinase), MLCK (myosin light chain kinase), and PHK (phosphorylase kinase) families in the CaMK group (Additional file 4).
MLCK is a Ca2+/calmodulin-dependent protein kinase whose only known substrate is myosin II regulatory light chain . The primary function of MLCK is to stimulate muscle contraction through the phosphorylation of the myosin II regulatory light chain (RLC), a eukaryotic motor protein that interacts with filamentous actin. Although MLCK has only one known substrate (RLC), this protein is linked to a variety of cellular processes due to the diverse biological function of myosin II . Two distinct smooth muscle MLCK genes were identified in S. mansoni (Smp_121780, Smp_126240), although no homologs were identified for the non-smooth muscle vertebrate MLCK through our phylogenetic analysis. This likely reflects the absence of a striated muscle in this parasite.
DCAMKL (Doublecortin and CaMK-like) is a protein that regulates the microtubule cytoskeleton and in the chick is specifically expressed in the developing brain [52, 53]. CASK is a protein that participates in cell adhesion . According to our phylogenetic analysis, a single homolog of the DCAMKL (Smp_053560) and CASK (Smp_131690) families were found in S. mansoni (Additional file 4).
While the CaMK2 (CaMK family 2) family is encoded by four genes in humans, only a single CaMK2 gene, with two predicted alternative spliced transcripts, was identified in the S. mansoni genome (Additional file 1). S. mansoni CaMK2 was recently identified as putative target for drug development after comparative chemogenomics approach using the S. mansoni proteome and the proteome of two model organisms, C. elegans and D. melanogaster. . The function of this protein in S. mansoni is still unknown. In sea urchin, CaMK2 is required for nuclear envelope breakdown following fertilization .
CMGC kinases are relatively abundant in S. mansoni, a feature that can be explained by the requirement to control cell proliferation and to ensure correct replication and segregation of organelles, which together are essential mechanisms for parasites with a complex life cycle. In the CMGC group, all of the main families are conserved between S. cerevisiae, C. elegans, M. musculus, H. sapiens, and S. mansoni, including CDK (Cyclin Dependent Kinase), MAPK (Mitogen Activated Protein Kinase), GSK (Gycogen Synthase 3 Kinase), CLK (CDC-Like Kinase), SRPK (SR Protein Kinase), CK2 (Cell Kinase 2), and DYRK (Dual-specificity Tyrosine Regulated Kinase) (Additional file 5) and RCK.
S. mansoni has 14 CDKs, the same number was found in C. elegans (compared with only seven in S. cerevisiae), including homologs of all subfamilies (CDK7, CDK4, CDK8, CRK7, CDK9, PITSLRE, CDK10, PCTAIRE, PFTAIRE, VDK5 and CDC2) (Additional file 5). On the other hand, only one RCK family protein (Smp_132890) was identified in the parasite. The RCK proteins are similar to mammalian MAK (male germ cell-associated kinase), which have been implicated in spermatogenic meiosis and in signal transduction pathways for sight and smell .
GSK family is represented by 3 proteins in S. mansoni. One of those (Smp_008260.1) was selected as putative target for drug development after comparative chemogenomics approach . GSK proteins are involved in development and cell proliferation, are overexpressed in colon carcinomas and positively regulates the Wnt signaling pathway during embryonic development and oocyte-to-embryo transition in C. elegans .
In S. mansoni, the STE group includes seven STE7 (MEK or MAPKK), two STE11 (MEKK or MAPKKK), and 13 STE20 (MEKKK) kinases (Additional file 6). The large number of STE family members in S. mansoni could translate into an enormous potential for downstream signal specificity and diversity. SmSLK (Smp_150260) is a Ste20 family protein, recently characterized in S. mansoni, which is able to activate protein MAPK/JNK in human embryonic kidney (HEK) cells as well as in Xenopus oocytes. In addition, imunofluorescence showed that SmSLK was abundant in the tegument of adult schistosomes . These findings indicate that signals sensed in the environment by many different proteins may activate the MAPK cascade that will generate an adaptive physiological response. Futhermore, molecules that activate the MAPK pathways, as some hormone and cytokine signals, are not found in the S. mansoni predicted proteome (Figure 3). It has been demonstrated that the parasite takes advantage of host proteins for its growth and development [66, 67]. Other ePKs such as members of the PKA, PKC, Raf and receptor protein tyrosine kinases (RTKs) families, also participate in MAPK signaling pathway. RTKs are anchored to the membrane and have an important role in transmitting the signal from the extracellular to cytoplasm (Figure 3) .
Orthology relationships among ePKs of S. mansoni, B. malayi, and C. elegans and RNAi phenotype for C. elegans proteins
C. elegans RNAi phenotype
C44C8.6_Ce e K08F8.1
Embryonic lethal, maternal sterile, organism morphology variant.
Larval lethal, fertility reduced
Body morphology defect, slow growth
Embryonic lethal, sterile, body morphology defect
Embryonic lethal, body morphology defect
Embryonic lethal, sterile, reduced brood size, exploted through vulva, slow growth
cortical dynamics defective early embryonic, maternal sterile
Abnormal cell migration, protruding vulva, developmental delay
Embryonic lethal, body morphology defect, slow growth, sterile, larval lethal, dumpy
More depolarized oocytes
Slow growth, larval lethal late L3/L4
Embryonic lethal, body morphology defect, larval lethal, sterile,
paralyzed, uncoordinated movement, paralyzed
Sterile, osmotic integrity problems
Embryonic lethal, slow growth, sterile progeny, uncoordinateed movement
Embryonic lethal, larval lethal, maternal sterile
Locomotion variant, larval lethal, sterile progeny
Apoptosis increased, sterile progeny, reduced brood size, embryonic lethal
Smp_134260, Smp_133490, Smp_133500
Reduced brood size, embryonic lethal
embryonic lethal, sterile progeny
Smp_155720, Smp_125310, Smp_008260.1
Larval lethal, slow growth, embryonic lethal
Embryonic lethal, slow growth, uncoordinated movement
Embryonic lethal, larval lethal
Embryonic lethal, sterile progeny
PTKs can be classified, based on the presence or absence of transmembrane domains, into receptor tyrosine kinase (RTK) that relay intracellular signals , and cytoplasmatic tyrosine kinase (CTK). S. mansoni kinome contains 15 RTKs and 19 CTKs. The 15 RTK include two InsRs (Insulin Receptors), four EGFRs (Epidermal Growth Factor Receptor), two VKRs (Venus Flytrap Kinase Receptors), a representative for Ephs (Ephrin receptors), Ror, CCK4 (Colon Carcinoma Kinase 4), and MUSK (Muscle_Specific kinase) families, besides three unknown receptors.
Two InsRs in S. mansoni, SmIR-1 (Smp_009990) and SmIR-2 (Smp_074030) present distinct functions during parasite development. These two receptors are well clustered within the InsR families but showed to be more divergent than the mammalian and D. melanogaster proteins (Additional file 7). SmIR-1 was localized in the muscles, intestinal epithelium, and basal membrane of adult male and female worms and at the periphery of schistosomula, mainly in the tegument . SmIR-1 co-localized in schistosome tegument with glucose transporters suggesting a role in the regulation of glucose uptake which is an essential nutrient for the intra-mammalian stages of S. mansoni. SmIR-2, in contrast, was distributed in the parenchyma of adult males and females indicating a possible involvement of the receptor in parasite growth. S. mansoni is the first invertebrate with two insulin receptors characterized that seem to have distinct functions, as in vertebrates [18, 75, 76]. Mammals have two InsR members; insulin-like growth factor receptor (IGFR), which has a role in controlling growth, and (InR) which has specialized in metabolic regulation .
In C. elegans EGFR signaling induces behavioral quiescence . One S. mansoni EGFR homolog (Smp_173590) was localized in the parasite muscle and perhaps related to muscle development or function . Vertebrate EGF activates S. mansoni EGFR and the downstream classical ERK pathway (Figure 3), indicating the conservation of EGFR function in S. mansoni . Moreover, human EGF was shown to increase protein and DNA synthesis as well as protein phosphorylation in parasites, supporting the hypothesis that host EGF could regulate schistosome development . The similarity of schistosome proteins to sex hormone receptors of mammalian hosts provides a good example of host parasite relationship, where the adult worm depends on the host hormone synthesis for their maturation and reproduction .
Five S. mansoni proteins are not clustered with the main RTK families as shown in our phylogenetic analyses (Additional file 7). Three of them have a truncated catalytic domain (Smp_175590, Smp_093500 and Smp_157300) and two are specific RTK with a venus flytrap domain (VKR family). VKR is a family of receptors found in invertebrates, especially in insects. One S. mansoni VKR protein, Smp_153500 (SmVKR), was recently studied . We identified another protein (Smp_019790) clustering with SmVKR (Additional file 7) with a high similarity. Despite the similarity of the catalytic domain of VKR protein with the IRs, these two proteins are not clustered with InsR family. In this respect, the most interesting finding is that VKR family members are not found in mammals and could represent good targets for drug development as a specific inhibitor for this family will probably not affect any protein of the host .
The CTKs in S. mansoni are represented by 11 different families (Additional file 7). SmTK3 (Smp_054500) and SmTK5 (Smp_136300) - src family members, and SmTK4 (Smp_149460) - syk family, are present in reproductive organs and possibly involved in the development of gonads and multiplication of germinal and vitelline cells [82–84]. Abl proteins of S. mansoni (Smp_128790 and Smp_169230) were recently studied using a Abl specific inhibitor (Imatinib, Gleevac®). The results showed an important morphological alteration in adult worms of S. mansoni that led to the death of the parasites . C. elegans contains 42 members of the Fer family, while only a single member, SmFes, was found in S. mansoni. The Fer gene of S. mansoni (SmFes, Smp_164810) exhibits the characteristic features of Fes/Fps/Fer (fes, feline sarcoma; fps, Fujinami poultry sarcoma; fer, fes related) PTKs. By immunolocalization assays it was shown that SmFes is particularly expressed at the terebratorium of miracidia and tegument of cercaria and schistosomula skin-stage. These findings suggest that SmFes may play a role in signal transduction pathways involved in larval transformation after penetration into intermediate and definitive hosts [85, 86].
Proteins in this group share sequence similarity to the catalytic domain (Pfam: PF07714) found in proteins of the TK group . The RGC group is underrepresented in most species, except in C. elegans that has a large expansion of these proteins and S. cerevisiae that has no protein with similarity to the TK catalytic domain (Figure 2). Only three RGC members were identified in the S. mansoni ePKinome. All of them are more closely related to the mammalian and insect families than the worm family. C. elegans and B. malayi RGC proteins form at least two different families noticeably more divergent from S. mansoni, D. melanogaster, M. musculus, and H. sapiens families as suggested by our phylogenetic analysis (Additional file 8). Most RGC proteins remain functionally uncharacterized. In C. elegans, several RGC proteins are highly expressed in restricted sets of neurons and are implicated in chemosensation. One RGC is involved in dauer stage formation . Other parasites such as L. major, T. brucei, T. cruzi and P. falciparum also lack homologs in the RGC group [20, 89]. The three S. mansoni RGC proteins have an amino acid substitution in the aspartic acid in subdomain VIb of the catalytic domain, rendering them catalytically inactive. Although the catalytic center of an enzyme is usually highly conserved, there have been reports of proteins, like those of the RGC group of ePKs, with substitutions at essential catalytic positions, which convert the enzyme into a catalytically inactive form. A recent study showed that inactive enzymes are found in a large variety of families conserved among metazoan species and they have lost their catalytic activity, have adopted new functions, and are involved in regulatory processes [34, 90].
TKL consists of a divergent group that is phylogenetically close to the tyrosine kinases (Figure 3). However, TKL proteins have an unusual catalytic domain that is a hybrid between the serine/threonine and tyrosine kinases . The catalytic domain may display greater similarity to the tyrosine catalytic domain (Pfam: PF07714) or to the serine/threonine catalytic domains (Pfam: PF00069) [10, 11]. In S. mansoni, the TKL group includes MLK (Miked Lineage Kinases), LISK (Family containing closely related LIMK and TESK sub-families), Raf, RIPK (Receptor Interacting Protein Kinase), STKR (Serine/threonine kinase Receptors for activin and TGFβ ligands), and LRRK (Leucine Rich Repeat Kinase) (type 1 and type 2) families. Of the 19 TKL proteins found in S. mansoni, 15 display greater similarity to the serine/threonine catalytic domain and four (of Raf and MKL/ILK families) to the tyrosine catalytic domain. S. mansoni has no homologous proteins of the IRAK (interleukin-1 (IL-1) receptor-associated kinase) family that is present in C. elegans, B. malayi, D. melanogaster, Homo sapiens, and M. musculus (Additional file 9). Although S. cerevisiae does not have any TKL protein homologue, other fungal species do contain such proteins . Raf (also known as MAPKKK) is a TKL family that plays an important role in the activation of STE proteins in the signaling cascade that culminates in the activation of ERK1/2 (Figure 4) . A recent study showed that blocking the expression of the homolog of the S. mansoni Raf protein (Smp_176990) in C. elegans by RNAi, generate a sterile phenotype, which supports the hypothesis of the involvement of Raf protein in the germline development, somatic gonad development, oogenesis, spermatogenesis, ovulation or fertilization (Table 2). Raf protein may represents a good target for drug development in S. mansoni.
A STKR member that binds to TGFβ (Transforming growth factor) is a membrane receptor that can be divided into two subclasses (Type I and Type II). The type II receptor binds TGFβ and then recruits the type I receptor. The TGFβ type I receptor was cloned in S. mansoni (SmTβRI - Smp_173400.2) and it was found to be localized in the parasite surface [91, 92]. Other type I STRK (Smp_049760) was identified in the S. mansoni predicted proteome and was not experimentally characterized so far (Additional file 9). Three type II STKRs (TGFβ type II receptors) are proteins identified in the same contig which were predicted to be a product of alternative splicing. A recent study revealed the presence of two transcripts that are translated into two different isoforms of type II receptor . These transcripts are produced from the same gene by alternative splicing of the last two exons. The authors indicated that these different type II receptors might signal in different cells or development stages. Furthermore, that study showed that in the presence of human TGFβ, SmTβRII (Smp_165310) activated SmTβRI. The results also provide evidence for the role for the TGF-β signaling pathway in male-induced female reproductive development [94, 95].
The Other group consists of a mixed collection of kinases with representatives in higher eukaryotes, including SCY1, NEK (Mitotic Kinase family, also known as NRK), PEK, Haspin, WEE, NAK (Numb-Associated Kinase), ULK (Unc-51 Like Kinase), IRE (Inositol Requiring), PLK (Polo Like Kinases), AUR (Aurora Kinase), and CDC7 (Cell Division Control 7) families (Additional file 1). Our analysis showed that 15% of the S. mansoni ePKinome do not fall into any of the eight major groups, but include 20 smaller and conserved families.
Most ePKs also have a second domain that is involved in protein-protein interaction and allosteric regulation of the catalytic domain . In this work, only the catalytic domain sequence was used in the phylogenetic analyses. Interestingly, when the information on the ePK accessory domains was integrated into the phylogenies, we observed a correlation between diversity of protein architecture and the phylogenetic patterning. We also believe that the diversification of the ePKs happened a long time ago.
The most common Pfam accessory domains found in S. mansoni kinases are Pkinase_C (Pfam: PF00433) all found in the AGC group; C1_1 (Pfam: PF00130) found in the AGC and TKL groups; SH2 (PF00017) all found in the TK group; and SH3 (Pfam: PF00018) found in TK and TKL groups. These domains are commonly found in protein kinase families as we observed in other species from KinBase .
More than 40% of S. mansoni AGC group have the PKinase_C domain associated with the catalytic domain. The C1_1 domain is conserved in N-terminal regions of all PKC proteins of S. mansoni (Figure 6) and has been shown to bind PE (phorbol esters) and DAG (diacylglycerol). DAG is an important second messenger and Phorbol esters are analogues of DAG [32, 96]. The C1_1 domain is present in one or two copies depending on the isozyme of PKC (Figure 6). cNMP_binding is a N-terminal domain of PKG proteins that bind cyclic nucleotides (cAMP or cGMP) to relieve the inhibition of the catalytic domain [15, 35]. The AKT protein of S. mansoni (Smp_162120) has an unusual domain combination (Figure 6) as the two C-terminal domains (Glutaredoxin and Na_K-ATPase) are not found in D. melanogaster, C. elegans, M. musculus and H. sapiens.
CASK is a member of the CaMK group and plays a key role in establishing inter-cellular contacts and plasticity at cellular junctions . The accessory domains found in S. mansoni CASK protein (L27 and PDZ domains, which serve as protein interaction modules, SH3, and a C-terminal guanylate kinase domains) are conserved in higher eukaryotes. However, the UPF0061 (Pfam: PF02696) is uncharacterized  and possesses an unusual domain found in the C-terminal region of S. mansoni CASK protein (Figure 6). The long protein kinase MLCK (Figure 5a) possesses a large number of Ig repeats (I-set, V-set and Ig) that, in other species, are involved in a variety of functions, including cell-cell recognition, cell-surface receptors, muscle structure and the immune system , and fn3 repeats, that is an approximately 100 amino acid domain commonly found in a variety of organisms.
The CMGC and CK1 groups have none or a few accessory domains in S. mansoni. However, it is known that small regions in these proteins play an important role in recognizing and binding to the substrate [97, 98]. For example, the CD domain (common docking domain) is a C-terminal region of MAPK proteins composed of a set of negatively charged amino acids that is used to anchor protein activators (such as STE proteins), substrates (such as MAPKAPK) and inactivating proteins (such as MAPK phosphatases) . Thus, this region governs a series of signal transduction in the cascade of reactions of MAPKs. Other regions, including the ED (ERK docking) site, working with the CD domain and ensuring specificity and interaction strength .
PBD (p21-Rho-binding) and C-terminal CNH domain are usually found in the STE20 families (Figure 6). PBD binds to cdc42-GTPases activating the signaling cascade which act upstream in the MAPK cascade. The CNH domain interacts with the small GTPase and regulating the actin cytoskeleton .
The SH3 and SH2 (Pfam: PF00018 and PF00017, respectively) domains are common found in CTK proteins. SH2 function as regulatory modules of intracellular signaling cascades and it was found in eight out of 19 S. mansoni CTKs. Fer PTK is usually composed of three domains, FHC domain, SH2, and C-terminal kinase domain as it occurs in Fer proteins of H. sapiens, M. musculus, and D. melanogaster. However, the S. mansoni Fer protein (SmFes - Smp_164810)  and the 42 Fer proteins of C. elegans seems to have lost the N-terminal FHC domain (Figure 7). RTKs are characterized by an extracellular domains, a membrane spanning segment and an intracellular kinase domain . The extracellular ligand binding domain of EGFR and InsR proteins are composed of two receptor_L sandwiching a Furin_like domain (Figure 7). SmVKR is composed of an unusual extracellular Venus flytrap module (VFT) linked through a single transmembrane domain to an intracellular tyrosine catalytic domain similar to that of the insulin receptor and a putative function in reproduction and development was observed . Other extracellular domains found in S. mansoni are Ephrin_Ibd (Pfam: PF01404) in the Ephrin recptors (Eph) and Ig domains (Pfam: PF00047) in CCK4 proteins (Additional file 1).
In conclusion, the protein architecture, including the accessory domains, may indicate potential protein partners. Signaling roles of schistosome specificities or unusual architectures are of special biological interest.
S. mansoni ePKinome annotation improved by phylogenomics
After phylogenetic analysis
Number of changes
% of S. mansoni ePKinome
Change group classification
Change family classification
Classification of unknown proteins
Total of changes
S. mansoni [NCBI taxid: 6183] and six other organisms were selected for this work including Homo sapiens [taxid: 9606], Mus musculus [taxid: 10090], Drosophila melanogaster [taxid: 7227], Caenorhabditis elegans [taxid: 6239], Brugia malayi [taxid: 6279], and Saccharomyces cerevisiae [taxid: 4932]. The S. mansoni predicted proteome data was downloaded from SchistoDB, version [2.0] , which contains the original gene and genomic information provided by the Wellcome Trust Institute and described elsewhere . Datasets of protein kinases from the other organisms were downloaded from the kinase database at Sugen/Salk - KinBase , except for Brugia malayi, which was retrieved from KEGG .
Functional classification of protein kinases into groups, families, and subfamilies followed the proposed hierarchy described elsewhere [12, 13, 32]. Potential protein kinases of S. mansoni were identified and characterized by combined approaches based on sequence similarity and phylogenetic relationships. These proteins were first identified by similarity to Hidden Markov Models (HMMs) as described below. Also based on sequence similarity, each predicted protein kinase was manually annotated by integrating data from InterProScan  and reverse PSI-BLAST (rpsblast)  output searches into Artemis . Further analysis was performed by HMMs searching for non-catalytic (accessory) domains associated to the conserved catalytic domain of protein kinases based on data available at the Protein families database - Pfam . Functional classification was also devised based on the literature and on the assumption of a broad conservation of the molecular functions. Phylogenetic analyses of the ePK kinases groups performed in the present work corroborated this classification as well as supported new functional assignments for previously uncharacterized proteins (see below).
In order to identify potential homologs in S. mansoni, amino acid sequences of known protein kinases of five model organisms (H. sapiens, M. musculus, D. melanogaster, C. elegans, and S. cerevisiae) were selected. A total of 68 diverse amino acid sequences corresponding to the kinase catalytic domain and sharing less than 50% sequence identity were aligned in MAFFT  and manually-edited for further analysis. Local and global HMMs were built with the HMMer package http://hmmer.janelia.org from multiple sequence alignments and used for sensitive searches against the S. mansoni proteome .
Amino acid sequences corresponding to the conserved catalytic domain (Pfam: PF0069 or PF07714) of each group of protein kinases were separately aligned using the default parameters of MAFFT . Multiple sequence alignments were filtered to keep proteins sharing 50% to 90% pairwise sequence identity using the decreased redundancy tool  and manually edited to remove ambiguous regions using BioEdit . Final alignments were used in phylogenetic reconstructions through multiple programs available in the Phylogeny Inference Package - PHYLIP, version 3.69 . Initially, 1000 random datasets (replicates) were created for each alignment using seqboot with default parameters. For each dataset, it was calculated a distance matrix under the JTT model with gamma-distributed sites by protdist. Next, phylogenies were estimated from distance matrix data adopting the Fitch-Margoliash criterion as implemented in fitch. Finally, the results from the random datasets were summarized by consense, which computes consensus trees by the majority-rule consensus tree method. Phylogenetic trees were visualized and edited using the Tree Figure Drawing Tool - FigTree, version 1.3.1 . Nodes with at least 80% bootstrap values were considered to support functional prediction.
cyclic adenosine monophosphate
National Center for Biotechnology Information
This work was funded by the National Institutes of Health - NIH/Fogarty International Center (PPM-00439-10 to GO), Secretaria de Ciência, Tecnologia e Ensino Superior de Minas Gerais, Fundação de Amparo à Pesquisa do Estado de Minas Gerais, SECTES/FAPEMIG, Brazil (CBB-1181/08 and PPM-00439-10 to GO), and Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq (CNPq-Universal 476036/2010-0 to LAN, CNPq-INCT 573839/2008-5 to GO, and INCT-DT 573839/2008-5 to GO). LFA is a fellow of FIOCRUZ. The authors acknowledge the use of the computing resources of the Center for Excellence in Bioinformatics, CEBio/FIOCRUZ, Brazil.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.