Protein phosphorylation is the most abundant post-translational modification in both prokaryotic and eukaryotic organisms. This process is regulated through the enzymatic activities of protein kinases and phosphatases. Phosphorylation occurs predominantly on serine, threonine, and tyrosine residues, and has been shown to be a key regulatory switch in a variety of cellular processes, ranging from cell cycle and differentiation to motility and learning [1, 2]. In particular Leishmania lacks transcription factors and phosphorylation has been proposed as an important regulatory mechanism .
Recent advances in mass spectrometry enabled the identification of a large number of phosphorylation sites in most eukaryotes (see [4, 5] for a review). Information on the phosphoproteome of parasitic protozoa is also starting to be available. In-depth analyses of the phosphoproteome of parasitic protozoa has only recently been initiated in African Trypanosomes and Leishmania [6–10].
These studies reported phosphorylation sites whose sequence did not match known kinase recognition motifs, e.g. 25% of the sites identified by Nett et al.  were not recognized by either Scansite  or Netphos . Moreover the data reveal the presence of phosphorylation events not conserved in orthologous proteins. For instance Hem et al.  showed that a number of chaperones and heat-shock proteins which are very conserved from Leishmania to human possess parasite-specific phosphorylation sites.
These findings implicate that new and more family- or genera-specific prediction tools are required. Here we use the results of phosphoproteomic experiments in Leishmania to develop a novel method that improves P-site prediction in Leishmania and other organisms of the trypanosomatidae group.
The complete spectrum of protein phosphorylation is difficult to assess due to the low stoichiometry of many phosphorylation events and the highly dynamic nature of this modification. Thus the bioinformatic identification of putative phosphorylation sites and the subsequent analysis of these sites by biochemical assays may be an important alternative strategy to discover new phosphorylation events.
Phosphorylation sites prediction tools are usually grouped into two categories: generic and kinase-specific. The first category of prediction tools indicates the phosphorylation state of the site, without making any assumption about the protein kinase responsible for the phosphorylation. Methods in the latter category aim to infer which kinase family is responsible for the phosphorylation event. This information is extremely useful for the elucidation of signaling networks, however experimental data linking a protein kinase to its substrate is available only for a limited number of sites [13, 14].
Netphos  was the first predictor to use neural networks in 1999, outperforming phosphorylation site identification based on sequence motifs alone. Besides the primary sequence, the structural context is also important in determining whether a site is phosphorylated or not [15, 16]. Indeed several predictors such as DISPHOS  and PHOSIDA  include the predicted structural characteristics of the putative phosphorylation sites.
Protein kinase-specific predictors include NetphosK , Scansite , KinasePhos , PredPhospho , GPS , pkaPS  and PrediKin . NetphosK is the extension of the method Netphos to kinase-specific predictions. Scansite uses Position Specific Scoring Matrices (PSSMs) for 62 different kinase phosphorylation motifs. KinasePhos and PredPhospho use HMMER profiles and Support Vector Machines (SVM) respectively. In both cases the prediction models are trained on sets of peptides phosphorylated by protein kinases of the same family. GPS performs profile searches with short motifs instead of using a machine learning approach. In order to achieve a higher coverage of known phosphorylation sites, the algorithm reduces the strength of the profiles, thus increasing the false positive predictions. PkaPS has been developed to predict protein kinase A-specific phosphorylation sites, based on an extensive analysis of the PKA motifs, thus achieving the best results for these particular predictions. PrediKin is based on the analysis of the contact positions between kinases and substrates in proteins of known structure. The authors were able to associate the identification of specific kinase residues with a corresponding preference in the sequence of the substrate.
Moreover a number of organism-specific prediction systems have been developed in recent years [25–28]. These methods aim at increasing the prediction accuracy by training on peptides derived from single organisms. This approach makes it possible to capture organism-specific differences in known phosphorylation motifs and to reduce false positives arising from kinase families that are under-represented in the organism of interest. Following this line of reasoning, the aim of this work is to use Leishmania phosphoproteomics data to develop a tool that improves phosphorylation site prediction in trypanosomatids.