- Methodology article
- Open Access
RSpred, a set of Hidden Markov Models to detect and classify the RIFIN and STEVOR proteins of Plasmodium falciparum
- Nicolas Joannin†1Email author,
- Yvonne Kallberg†2, 3,
- Mats Wahlgren1Email author and
- Bengt Persson2, 3
© Joannin et al; licensee BioMed Central Ltd. 2011
- Received: 17 October 2010
- Accepted: 18 February 2011
- Published: 18 February 2011
Many parasites use multicopy protein families to avoid their host's immune system through a strategy called antigenic variation. RIFIN and STEVOR proteins are variable surface antigens uniquely found in the malaria parasites Plasmodium falciparum and P. reichenowi. Although these two protein families are different, they have more similarity to each other than to any other proteins described to date. As a result, they have been grouped together in one Pfam domain. However, a recent study has described the sub-division of the RIFIN protein family into several functionally distinct groups. These sub-groups require phylogenetic analysis to sort out, which is not practical for large-scale projects, such as the sequencing of patient isolates and meta-genomic analysis.
We have manually curated the rif and stevor gene repertoires of two Plasmodium falciparum genomes, isolates DD2 and HB3. We have identified 25% of mis-annotated and ~30 missing rif and stevor genes. Using these data sets, as well as sequences from the well curated reference genome (isolate 3D7) and field isolate data from Uniprot, we have developed a tool named RSpred. The tool, based on a set of hidden Markov models and an evaluation program, automatically identifies STEVOR and RIFIN sequences as well as the sub-groups: A-RIFIN, B-RIFIN, B1-RIFIN and B2-RIFIN. In addition to these groups, we distinguish a small subset of STEVOR proteins that we named STEVOR-like, as they either differ remarkably from typical STEVOR proteins or are too fragmented to reach a high enough score. When compared to Pfam and TIGRFAMs, RSpred proves to be a more robust and more sensitive method. We have applied RSpred to the proteomes of several P. falciparum strains, P. reichenowi, P. vivax, P. knowlesi and the rodent malaria species. All groups were found in the P. falciparum strains, and also in the P. reichenowi parasite, whereas none were predicted in the other species.
We have generated a tool for the sorting of RIFIN and STEVOR proteins, large antigenic variant protein groups, into homogeneous sub-families. Assigning functions to such protein families requires their subdivision into meaningful groups such as we have shown for the RIFIN protein family. RSpred removes the need for complicated and time consuming phylogenetic analysis methods. It will benefit both research groups sequencing whole genomes as well as others working with field isolates. RSpred is freely accessible via http://www.ifm.liu.se/bioinfo/.
- Hide Markov Model
- Plasmodium Falciparum
- Broad Institute
- Plasmodium Falciparum Erythrocyte Membrane Protein
Many pathogens have evolved strategies to survive within the hosts they infect. One strategy consists of varying the antigens the pathogen exposes to its host immune system, usually resulting in the proliferation of multicopy protein families, commonly named Variable Surface Antigens (VSA) . In the case of the malaria parasite Plasmodium falciparum, there are three major VSA that allow the parasite to avoid the host's immune system and establish chronic infections: the Plasmodium falciparum Erythrocyte Membrane Protein 1, RIFIN and STEVOR proteins (reviewed in [2, 3]).
The RIFIN and STEVOR families are groups of VSA proteins that are unique to the Plasmodium falciparum and P. reichenowi parasites [4–9]. They are only present in two species, but they number more than 200 copies per genome. Although the genome of Plasmodium falciparum has been fully sequenced , the information obtained for the reference strain does not represent the full knowledge of these antigenic variant protein families. Field isolates investigated for their repertoire of rif and stevor genes show an extensive variability [10, 11]. This hypervariability makes these proteins difficult to study and their primary function(s) remain to be discovered. A recent analysis of the whole rif gene repertoire, which encode for RIFIN proteins, from the reference genome has concluded that this family can be sub-divided into functionally distinct groups . One of these sub-groups, A-RIFIN, as well as the STEVOR proteins are predominantly exposed to the host's immune system at the surface of the infected red blood cell (RBC) [4, 7, 8].
Sequestration of infected RBCs is a virulence factor that allows the parasite to avoid passage through the spleen, therefore increasing its chances of survival. A recent analysis of gene expression of VSA of a P. falciparum strain isolated from a splenectomized patient showed that A-rif and stevor genes were not expressed , whereas, in isolates from normal patients, these genes are expressed [4, 7, 10, 11]. The authors relate this loss of expression to the loss of the sequestration phenotype. Conversely, B-rif genes are expressed regardless of the absence of this virulent phenotype . These differences in phenotype as well as in the localization of these proteins [4, 11, 14, 15] and the predicted sub-functionalization of RIFIN proteins  demonstrate the importance of distinguishing each of these sub-groups.
Figure 1 shows a schematic representation of A-RIFIN, B-RIFIN and STEVOR proteins, including the potential signal peptide (SP?), variable regions (V1 and V2),Plasmodium export element motif (PEXEL) [16, 17], conserved regions (C1 and C2) and finally the two predicted transmembrane regions, first a questionable one (TM?) and second a highly probable one (TM).
Curation of the RIFIN and STEVOR repertoires of the Plasmodium falciparum DD2 and HB3 genomes
We have carried out manual curation of the RIFIN and STEVOR repertoires in the DD2 and HB3 draft genomes. We used BLAST to detect the DD2 and HB3 sequences, using the entire 3D7 rif and stevor gene repertoire as query and the DD2 and HB3 supercontigs as databases. This allowed us to detect all potential rif and stevor genes.
Sub-grouping, a new take on the matter
We needed curated data sets of sequences belonging to each group in order to train the HMMs. STEVOR and RIFIN proteins share little similarity, which makes them easy to distinguish from one another after completion of multiple sequence alignment with known STEVOR and RIFIN sequences. Full-length A-RIFIN and B-RIFIN proteins are easily recognized, upon visual inspection of multiple sequence alignments, based on the presence (A-RIFIN) or absence (B-RIFIN) of a fairly conserved 25 amino acid residue indel in the conserved region (Figure 1). However, the sub-groups within the B-RIFIN cluster are not so easily sorted without the help of phylogenetic analysis.
Previous research, based on the RIFIN repertoire of the reference genome, describes three sub-groups in the B-RIFIN cluster: B1-, B2- and B3-RIFIN . Our present analysis confirms the integrity of the B1- and B2-RIFIN sub-groups. However, we find that there is too little coherence (less than 50% average pairwise identity in the reference strain, and low confidence bootstrap scores in the phylogenetic trees) within the B3-RIFIN cluster to make it form a defined sub-group. We propose to redefine these sequences simply as B-RIFIN.
Sorting out the results and limits of detection
The first LOD is the detection of sequences as True or False: whether they are RIFIN or STEVOR sequences or neither. Any score <20 is considered False, i.e. not a RIFIN or a STEVOR. Of all the curated sequences in our dataset, three have scores <20: PFDG_05381, PFDG_04771 and PFDG_04350. The first protein, PFDG_05381, is an extremely short protein derived from a gene at the end of the supercontig 1.45. The sequencing coverage and assembly of contig ends are often questionable, generating erroneous sequences; therefore it is not surprising that this protein is not detected with the STEVOR HMM. The second protein, PFDG_04771, is one of the three sequences of the rifA2 group described by Wang et al.. The two other rifA2 sequences, PFD0070c and PFHG_03700, are among the proteins with the lowest of all the positive A-RIFIN HMM scores (60.9 and 63.8 respectively). These three sequences are extremely similar to each other with the exception of a short variable region preceding the C-terminal transmembrane domain. In the case of PFDG_04771, it is a low complexity repeat of a SSGGS motifs. Additionally, this sequence is missing its N-terminal end. We assume that these circumstances, as well as the divergence of the rifA2 proteins from the basic RIFIN type, reduced its score below the detection limit. Although these sequences are full length (with the exception of PFDG_04771), all other low scoring (but higher than any rifA2) A-RIFIN sequences are fragments, again stressing the atypical properties of rifA2. The third protein below the first LOD, PFDG_04350, is a partial sequence (119 residues) covering only the C-terminal part of the protein. It is most similar to PFL2585c, a protein with very atypical N- and C-terminal ends, although the majority of the protein is typical of A-RIFIN proteins. The limited length and odd sequence of PFDG_04350 prevent its recognition as a RIFIN protein. Thus the three proteins failing to reach the first LOD have too little sequence similarity to be identified as RIFIN or STEVOR sequences.
The second LOD is specific to STEVOR proteins: if the score against the STEVOR HMM is higher than the True/False cut-off, but <120, then the sequence is reliably related to STEVOR proteins, but either differs from typical STEVOR sequences or is too fragmented to reach a high enough score. We refer to these potential STEVOR sequences as STEVOR-like proteins. The protein fragment PFHG_05644 is an example of low confidence sequence (score < 120) that we assign as STEVOR-like, although it probably is a valid STEVOR fragment. Among the sequences that score <120 with the STEVOR HMM are two identical sequences, PFC0045w and PFDG_03056, found in the 3D7 and DD2 strains, respectively. The PlasmoDB version 7.1 annotation for the PFC0045w protein is "RIFIN". However, although they are distinct from STEVOR proteins, our phylogenetic analysis clearly shows that these sequences are not RIFIN proteins, as they tend to cluster separately from the RIFIN and closer to STEVOR proteins. Until we can accumulate more sequences of this type, RSpred will predict these proteins to be similar to STEVOR and will assign them the STEVOR-like tag.
The third LOD is specific to RIFIN proteins: if the score against either the A-RIFIN or the B-RIFIN HMM is higher than the score against the STEVOR HMM, but <300, then the sequence is reliably a RIFIN protein, but it is not possible to identify its sub-group. Typical examples are fragments of proteins, e.g. PFDG_04007, PFHG_05281 and A1KQT0 (from DD2, HB3 and Uniprot respectively). In several cases, the short length of the sequence and the absence of determining properties (e.g. the 25 amino acid residues indel) result in these sequences having low scores against both the A-RIFIN and the B-RIFIN HMMs. Some rare proteins include enough of the conserved C1 region to identify them as A- or B-RIFIN, but nevertheless score <300 and are thus sorted into the RIFIN group. These sequences are most often truncated sequences or contain very odd amino acid composition, e.g. PFDG_02116 and PFHG_03477, respectively, possibly caused by low sequencing coverage or genome assembly problems.
Finally, the fourth limit of detection concerns B1- and B2-RIFIN proteins: if the score against the B-RIFIN HMM is >300, but the B1- and B2-RIFIN HMMs do not reach the cut-offs, then the protein will be evaluated as B-RIFIN instead of its proper sub-group. Among all the sequences from our curated dataset, we have not detected any false negative B1- or B2-RIFIN sequences.
Automatic detection of RIFIN and STEVOR sub-groups in draft genomes
Prediction of RIFIN and STEVOR proteins in 15 draft genome datasets
RIFIN STEVOR STEVOR-like
Currently, RIFIN and STEVOR proteins have only been found in Plasmodium falciparum and the related P. reichenowi. Neither Pfam nor TIGRFAMs detect these proteins in any other known species. Additionally, orthology prediction tools and databases do not yield any RIFIN or STEVOR homologues in any other species [24–26]. Finally, the investigation of other Plasmodium multigene families have not detected any RIFIN or STEVOR homologous proteins [27, 28]. Hence, we decided to use other Plasmodium species as negative controls. No RIFIN or STEVOR sequences were predicted in P. vivax, P. yoelii, P. berghei, P. knowlesi or P. chabaudi. RSpred was also run against the entire Uniprot database, but there were no RIFIN or STEVOR sequences predicted, except for those belonging to P. falciparum.
Comparison with Pfam and TIGRFAMs
Other prediction methods exist for the RIFIN and STEVOR protein families, although each one has its limitations. Pfam  only predicts if a sequence is a RIFIN/STEVOR (PF02009) or not, while TIGRFAMs  only separates RIFIN (TIGR01477) from STEVOR (TIGR01478) proteins. Additionally, the TIGRFAMs were trained as global models and therefore do not detect sequence fragments. None of the two predict RIFIN sub-groups, as RSpred does.
In order to test the sensitivity of the three methods, we applied them to the set of RIFIN and STEVOR sequences that were not used for the training of RSpred. Out of 339 RIFIN/STEVOR sequences, RSpred identified 338 (99.7%) of them, whereas Pfam detected 332 (97.9%) and TIGRFAMs only detected 297 (87.6%). Both TIGRFAMs and Pfam fail to identify low scoring STEVOR, and the former also fails to identify fragments. The sorting of RIFIN and STEVOR proteins into sub-groups makes RSpred more specific than the other models. In addition, RSpred detects more sequences than Pfam and TIGRFAMs; it is therefore also the most sensitive of the three methods.
Redefining the RIFIN and STEVOR sub-groups
Previous studies describe RIFIN and STEVOR sequences as a large group of related proteins unique to P. falciparum. Subsequent analysis of the RIFIN protein family, based on the reference genome, showed that the RIFIN family can be further sub-grouped into A- and B-RIFIN sequences and the latter divided into B1-, B2- and B3-RIFIN .
Our current analysis, which includes many more sequences, confirms the sub-division of RIFIN sequences into A-, B1- and B2-RIFIN groups, which all have defined characteristics. However, it is an overstatement to create a defined group for the remaining B-RIFIN sequences. These sequences represent a heterogeneous cluster (10 genes in the 3D7 reference strain) of sequences that are defined by the fact that they are not A-RIFIN sequences and have relatively little similarity to B1- and B2-RIFIN proteins. We have therefore decided to retrograde the B3-RIFIN sequences to the rank of B-RIFIN.
A recent study has defined potential sub-groups within the A-RIFIN sequences, rifA1 and rifA3. These groupings rely on sequence similarity of 71% and 84% and, for a large majority, their genomic location in a head-to-head orientation with group A var genes . We have not trained HMMs to recognize these groups because of the low number of sequences available from the curated datasets. Also, we find that there are several other such sub-group candidates, but the small number of sequences within a single genome makes it difficult to distinguish between bona fide sub-groups and recently expanded genes.
These authors also defined a sub-group, rifA2, which is composed of one divergent RIFIN sequence that is present, with 78% conservation, in all genomes investigated . The case of single copy genes that are very conserved between genomes are possibly better classified as conserved genes rather than sub-groups. Also, we have noted that the proteins that compose the rifA2 group score the lowest of all RIFIN sequences, with one of them predicted as "false". The fact that partial A-RIFIN protein sequences score higher than the full length rifA2 and the divergence of these sequences from typical RIFIN proteins strongly suggests that these are related to RIFIN proteins but have a different function not requiring multiple copies for the survival of the parasite.
In this study, we have only focused on the three genomes (3D7, HB3 and DD2) for which annotations are available as well as the Uniprot database that contains data from field studies. We confirm the finding, by Wang et al., that several RIFIN sequences are relatively conserved across strains, however it is difficult to evaluate whether this represents a measure of the divergence of parasite populations or if they have been evolutionarily selected for specific functions.
Also, we have chosen to adopt a conservative approach to the STEVOR designation. All sequences that are clearly related to STEVOR sequences, but that do not score high enough will be tagged STEVOR-like by the RSpred program.
Four sequences predicted to be A-RIFIN proteins also had relatively high scores (> 300) with either the B1- or the B2-RIFIN HMM. Upon closer inspection of these sequences, applying phylogenetic analysis to alignments of each half of these proteins, it appears that their N-terminal half correspond well with A-RIFIN sequences whereas their C-terminal half is characteristic of B1- or B2-RIFIN proteins (data not shown). These sequences are hybrids between A- and B1/2-RIFIN proteins and confirm previous reports of recombination as a mean for the diversification of these VSA gene families .
Advantages, limits and utility of RSpred
We have named our set of HMMs and the evaluation program RSpred, for RIFIN and STEVOR predictor. We have shown that it efficiently detects RIFIN and STEVOR proteins and classifies them according to their sub-group. Although there are no false positive detections, RSpred is conservative with truncated and remotely related sequences. However, most of these sequences are at least recognized and predicted as RIFIN or STEVOR proteins. Finally, RSpred proves to be more sensitive than the existing Pfam and TIGRFAMs HMMs [18, 19], which are also limited in the scope of their classification, as they do not recognize RIFIN or STEVOR sub-groups.
We have applied RSpred to whole proteomes extracted from novel genome assemblies. Although these genomes are mostly sequenced to a very low coverage (1.25×), we were able to detect all sub-groups within these genomes. This resource will be increasingly useful as more genomes are being sequenced: in particular, there is a large Plasmodium genome sequencing project  that is scheduled to sequence over 100 Plasmodium parasite genomes, which will allow for meta-genomic analysis of the RIFIN and STEVOR protein families.
The analysis of proteins that are members of large families is often overwhelming due to the difficulty to assign proper classification. The RIFIN and STEVOR families are such groups of proteins: complications are in part due to their large diversity within each parasite's genome, but even more so with the extreme diversity between parasite populations [4, 5, 10, 11, 31]. Our prediction tool, RSpred, is designed to simplify the classification of these proteins into previously identified sub-groups [6, 12] with the following benefits:
It eliminates the need to manually retrieve reference sequences and perform multiple sequence alignments;
It eliminates the need for any prior knowledge of these protein families in order to sort them properly;
It out performs existing tools;
It identifies and sorts RIFIN proteins into RIFIN, A-RIFIN, B-RIFIN, B1-RIFIN and B2-RIFIN.
Although these groups probably have diverged in function , the sequence conservation between these proteins assumes that their respective functions are still closely related. Greater knowledge of the smaller sub-groups B1- and B2-RIFIN proteins will improve our understanding of the larger A-RIFIN and STEVOR groups that play a more preponderant role at the surface of the infected host cell [4, 13].
Data sets, retrieval and curation
We used search functionalities of the PlasmoDB v6.3 to retrieve all proteins annotated as RIFIN and STEVOR (221 sequences) excluding MAL7P1.208 that is annotated as RIFIN-like but is more similar to Rhoptry Associated Membrane Antigen (RAMA) proteins.
DD2 & HB3 retrieval and curation
We downloaded all data files pertaining to the DD2 and HB3 genomes (version 1) from the Broad Institute website .
Either there was an annotated gene corresponding to the manually curated rif or stevor gene. In this case, the gene would take the BIA gene name.
Or there was an annotated gene that did not quite overlap with the manual curation. In this case, the manually curated gene would take the BIA gene name.
Or there was no annotated gene at or near those coordinates. In this case, a new gene would be annotated with a new name.
We detected 193 and 179 RIFIN and STEVOR sequences from DD2 and HB3, respectively.
Field isolate data
We retrieved all RIFIN and STEVOR protein sequences from the Uniprot Knowledgebase  (446 sequences). We then removed all sequences from the 3D7 reference genome (215 sequences after filtering).
Additional draft genomes
Finally, we retrieved additional draft genome sequences from the Broad Institute and Welcome Trust Sanger Institute websites [22, 23]. The additional genomes downloaded from the Broad Institute were Plasmodium falciparum supercontigs files of 7G8 nucleus, D10 nucleus, D6 nucleus, Fcc-2/Hainan nucleus, RO-33 nucleus, Santa Lucia (SL) nucleus, K1 nucleus, Senegal_V34.04 nucleus, VS/1 nucleus, IGH-CR14 nucleus, RAJ116 nucleus http://www.broadinstitute.org/annotation/genome/plasmodium_falciparum_spp/MultiDownloads.html and from the Welcome Trust Sanger Institute were the Plasmodium falciparum Ghanaian Isolate contigs version 20080302 ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/falciparum/Ghanaian_Isolate/ and IT strain supercontigs version 2007114.phusion ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/falciparum/IT_strain/Archive/, as well as the Plasmodium reichenowi contigs version 031104 ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/reichenowi/.
These sequence data were produced by the Broad Institute and Welcome Trust Sanger Institute, respectively.
At the time of writing, these genomes have no official annotations; therefore, using Artemis, we extracted from them all coding sequences (CDS) equal to or greater than 100 amino acids long, regardless of the presence of a start codon (see Table 1).
Sequence analysis for sub-group determination
All alignments were carried out using MAFFT or Kalign 2, with default parameters [37, 38]. We used Jalview and Bioedit for alignment visualization and editing [39, 40]. Phylogenetic analysis was carried out with Molecular Evolutionary Genetic Analysis 4 (MEGA 4) . All phylogenetic trees were built with the Neighbor-Joining method, considering gaps and missing data as pairwise deletions and using the Amino: Poisson correction model. Phylogenetic trees were tested with 500 bootstrap replicates.
We first aligned all sequences together in order to distinguish STEVOR and RIFIN proteins from each other. During this process, we detected a small subset of sequences that are related to STEVOR proteins but do not have a high enough HMM score. These sequences will be tagged as STEVOR-like until the availability of more sequences will allow for better categorization.
The RIFIN sequences were subsequently sub-divided according to the classification described in Joannin et al.. A first approximation of the sub-grouping relies on the presence or absence of the characteristic 25 amino acid sequence that is present in A-RIFIN but absent in B-RIFIN proteins [6, 12, 42]. Sequences, which were either truncated or contained large indels, that were not identifiable as A- or B-RIFIN according to this criterion, were gathered into an "Unknown RIFIN" group. The remaining RIFIN sequences (A- and B-RIFIN) were aligned and sorted into groups according to the resulting phylogenetic tree. Sequences were grouped into A-RIFIN, B-RIFIN, B1-RIFIN, B2-RIFIN, modified from Joannin et al. with the B3-RIFIN sub-group here renamed as B-RIFIN (see Results), as well as an "Ambiguous" subgroup. The Ambiguous group gathered all sequences that were identifiable as A-or B-RIFIN sequences but were not resolved in the phylogenetic trees.
HMM training, testing and evaluation program
The HMMs for the five different groups of RIFIN and STEVOR sequences were built using HMMER2 . Both global and local build options were tried and the local (hmmbuild-f) was found to perform best with this type of data, containing full length as well as truncated and fragmented sequences.
For the purpose of HMM training, all alignments were created using Mafft-linsi . A number of protein sequences were either truncated compared to typical sequences or contained indels. We decided that sequences should be complete and typical from the PEXEL motif (Plasmodium Export Element motif) [16, 17] to the C-terminal transmembrane domain; the alignments were constrained to start at this motif as well. The five training sets were made non-redundant using FASTA , so that the final sets contained no sequence with more than 80% identity to any other. Outliers were removed using a jack-knifing test. During this test each sequence in the training set was excluded, one at a time, an alignment created and a new HMM built. The removed sequence was scored against this new HMM, together with every sequence from the other training sets (i.e. a negative dataset). If the excluded sequence did not score higher than every sequence from the negative dataset it was removed from the final training set. The final training sets consisted of 259 A-RIFIN, 96 B-RIFIN, 26 B1-RIFIN, 9 B2-RIFIN and 51 STEVOR sequences.
A program, written in C, was created to manage the results obtained when the five HMMs were used in database searches. Figure 4 displays the decision process and the cut-offs. The cut-offs were set using the manually curated dataset as 'truth', including the odd sequences (with respect to the amino acid composition or sequence length) removed from the final training set.
Control data sets
In order to test our HMMs for false positives, we retrieved the proteomes of several other Plasmodium species. All plasmodium specific datasets where downloaded from PlasmoDB version 7.1 and downloaded protein coding sequences from Plasmdium falciparum 3D7 (5418, version: 2010-06-01) , Plasmodium vivax Sal-1 (5393, version: 2007-06-13) , Plasmodium chabaudi chabaudi (5123, version: 2010-06-01), P. knowlesi strain H (5194, version: 2010-06-01) , P. yoelii yoelii strain 17XNL (7724, version: 2005-09-01)  and P. berghei strain ANKA (4857, version: 2010-06-01) . Additionally, we used the original Broad Institute annotated protein sequences from the DD2 (5380, version: 2007-04-13) and HB3 (5623, version: 2007-03-16) genomes .
This study was supported by PREGVAX (FP7-Health-2007-A-201588), the Kungl.Vetenskapsakademin, T. och R. Söderbergs Professur, the Karolinska Institutet(Distinguished Professor Award), Linköping University and the Swedish Research Council. Several of the sequence data used in this study was generated by the Welcome Trust Sanger Institute and the Broad Institute of Harvard and MIT (see text for details). Finally, we would like to thank the three anonymous reviewers whom have helped us improve the clarity of this article.
- Deitsch KW, Lukehart SA, Stringer JR: Common strategies for antigenic variation by bacterial, fungal and protozoan pathogens. Nat Rev Microbiol. 2009, 7 (7): 493-503. 10.1038/nrmicro2145.PubMedPubMed CentralView ArticleGoogle Scholar
- Deitsch KW, Hviid L: Variant surface antigens, virulence genes and the pathogenesis of malaria. Trends Parasitol. 2004, 20 (12): 562-566. 10.1016/j.pt.2004.09.002.PubMedView ArticleGoogle Scholar
- Rasti N, Wahlgren M, Chen Q: Molecular aspects of malaria pathogenesis. FEMS Immunol Med Microbiol. 2004, 41 (1): 9-26. 10.1016/j.femsim.2004.01.010.PubMedView ArticleGoogle Scholar
- Niang M, Yan Yam X, Preiser PR: The Plasmodium falciparum STEVOR Multigene Family Mediates Antigenic Variation of the Infected Erythrocyte. PLoS Pathog. 2009, 5 (2): e1000307-10.1371/journal.ppat.1000307.PubMedPubMed CentralView ArticleGoogle Scholar
- Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann A-C, et al: Genome variation and evolution of the malaria parasite Plasmodium falciparum. Nat Genet. 2007, 39 (1): 120-125. 10.1038/ng1931.PubMedView ArticleGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, et al: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419 (6906): 498-511. 10.1038/nature01097.PubMedView ArticleGoogle Scholar
- Fernandez V, Hommel M, Chen Q, Hagblom P, Wahlgren M: Small, clonally variant antigens expressed on the surface of the Plasmodium falciparum-infected erythrocyte are encoded by the rif gene family and are the target of human immune responses. J Exp Med. 1999, 190 (10): 1393-1404. 10.1084/jem.190.10.1393.PubMedPubMed CentralView ArticleGoogle Scholar
- Kyes SA, Rowe JA, Kriek N, Newbold CI: Rifins: a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Proc Natl Acad Sci USA. 1999, 96 (16): 9333-9338. 10.1073/pnas.96.16.9333.PubMedPubMed CentralView ArticleGoogle Scholar
- Helmby H, Cavelier L, Pettersson U, Wahlgren M: Rosetting Plasmodium falciparum-infected erythrocytes express unique strain-specific antigens on their surface. Infect Immun. 1993, 61 (1): 284-288.PubMedPubMed CentralGoogle Scholar
- Albrecht L, Merino EF, Hoffmann EHE, Ferreira MU, de Mattos Ferreira RG, Osakabe AL, Dalla Martha RC, Ramharter M, Durham AM, Ferreira JE, et al: Extense variant gene family repertoire overlap in Western Amazon Plasmodium falciparum isolates. Mol Biochem Parasitol. 2006, 150 (2): 157-165. 10.1016/j.molbiopara.2006.07.007.PubMedView ArticleGoogle Scholar
- Blythe JE, Yam XY, Kuss C, Bozdech Z, Holder AA, Marsh K, Langhorne J, Preiser PR: Plasmodium falciparum STEVOR proteins are highly expressed in patient isolates and located in the surface membranes of infected red blood cells and the apical tips of merozoites. Infect Immun. 2008, 76 (7): 3329-3336. 10.1128/IAI.01460-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Joannin N, Abhiman S, Sonnhammer E, Wahlgren M: Sub-grouping and sub-functionalization of the RIFIN multi-copy protein family. BMC Genomics. 2008, 9 (1): 19-10.1186/1471-2164-9-19.PubMedPubMed CentralView ArticleGoogle Scholar
- Bachmann A, Esser C, Petter M, Predehl S, von Kalckreuth V, Schmiedel S, Bruchhaus I, Tannich E: Absence of erythrocyte sequestration and lack of multicopy gene family expression in Plasmodium falciparum from a splenectomized malaria patient. PLoS ONE. 2009, 4 (10): e7459-10.1371/journal.pone.0007459.PubMedPubMed CentralView ArticleGoogle Scholar
- Petter M, Bonow I, Klinkert M: Diverse Expression Patterns of Subgroups of the rif Multigene Family during Plasmodium falciparum Gametocytogenesis. PLoS ONE. 2008, 3 (11): e3779-10.1371/journal.pone.0003779.PubMedPubMed CentralView ArticleGoogle Scholar
- Petter M, Haeggström M, Khattab A, Fernandez V, Klinkert M-Q, Wahlgren M: Variant proteins of the Plasmodium falciparum RIFIN family show distinct subcellular localization and developmental expression patterns. Mol Biochem Parasitol. 2007, 156 (1): 51-61. 10.1016/j.molbiopara.2007.07.011.PubMedView ArticleGoogle Scholar
- Marti M, Good RT, Rug M, Knuepfer E, Cowman AF: Targeting malaria virulence and remodeling proteins to the host erythrocyte. Science. 2004, 306 (5703): 1930-1933. 10.1126/science.1102452.PubMedView ArticleGoogle Scholar
- Hiller NL, Bhattacharjee S, van Ooij C, Liolios K, Harrison T, Lopez-Estraño C, Haldar K: A host-targeting signal in virulence proteins reveals a secretome in malarial infection. Science. 2004, 306 (5703): 1934-1937. 10.1126/science.1102737.PubMedView ArticleGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K: The Pfam protein families database. Nucleic Acids Res. 2010, D211-222. 10.1093/nar/gkp985. 38 DatabaseGoogle Scholar
- Haft DH, Selengut JD, White O: The TIGRFAMs database of protein families. Nucleic Acids Res. 2003, 31 (1): 371-373. 10.1093/nar/gkg128.PubMedPubMed CentralView ArticleGoogle Scholar
- Hayes C, Diez D, Joannin N, Honda W, Kanehisa M, Wahlgren M, Wheelock C, Goto S: varDB: a pathogen-specific sequence database of protein families involved in antigenic variation. Bioinformatics. 2008Google Scholar
- Wang C, Magistrado P, Nielsen M, Theander T, Lavstsen T: Preferential transcription of conserved rif genes in two phenotypically distinct Plasmodium falciparum parasite lines. Int J Parasitol. 2008Google Scholar
- The Broad Institute of Harvard and MIT - Plasmodium falciparum download page. [http://www.broadinstitute.org/annotation/genome/plasmodium_falciparum_spp/MultiHome.html]
- The Welcome Trust Sanger Institute - Protozoan genomes. [http://www.sanger.ac.uk/resources/downloads/protozoa/]
- Datta RS, Meacham C, Samad B, Neyer C, Sjölander K: Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res. 2009, W84-89. 10.1093/nar/gkp373. 37 Web ServerGoogle Scholar
- Chen F, Mackey AJ, Stoeckert CJ, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006, D363-368. 10.1093/nar/gkj123. 34 DatabaseGoogle Scholar
- Ostlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010, D196-203. 10.1093/nar/gkp931. 38 DatabaseGoogle Scholar
- Janssen CS, Barrett MP, Lawson D, Quail MA, Harris D, Bowman S, Phillips RS, Turner CM: Gene discovery in Plasmodium chabaudi by genome survey sequencing. Mol Biochem Parasitol. 2001, 113 (2): 251-260. 10.1016/S0166-6851(01)00224-9.PubMedView ArticleGoogle Scholar
- Cunningham D, Lawton J, Jarra W, Preiser P, Langhorne J: The pir multigene family of Plasmodium: antigenic variation and beyond. Mol Biochem Parasitol. 2010, 170 (2): 65-73. 10.1016/j.molbiopara.2009.12.010.PubMedView ArticleGoogle Scholar
- Freitas-Junior LH, Bottius E, Pirrit LA, Deitsch KW, Scheidig C, Guinet F, Nehrbass U, Wellems TE, Scherf A: Frequent ectopic recombination of virulence factor genes in telomeric chromosome clusters of P. falciparum. Nature. 2000, 407 (6807): 1018-1022. 10.1038/35039531.PubMedView ArticleGoogle Scholar
- Group TPW: Plasmodium White Paper V8.Google Scholar
- Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Daily JP, Sarr O, Ndiaye D, Ndir O, et al: A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007, 39 (1): 113-119. 10.1038/ng1930.PubMedView ArticleGoogle Scholar
- Aurrecoechea C, Brestelli J, Brunk B, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb O, et al: PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 2009, 37 (suppl 1): D539-D543. 10.1093/nar/gkn814.PubMedView ArticleGoogle Scholar
- Consortium U: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 2010, D142-148. 10.1093/nar/gkp846. 38 DatabaseGoogle Scholar
- McGinnis S, Madden TL: BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004, W20-25. 10.1093/nar/gkh435. 32 Web ServerGoogle Scholar
- Carver TJ, Rutherford KM, Berriman M, Rajandream M-A, Barrell BG, Parkhill J: ACT: the Artemis Comparison Tool. Bioinformatics. 2005, 21 (16): 3422-3423. 10.1093/bioinformatics/bti553.PubMedView ArticleGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell BG: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.PubMedView ArticleGoogle Scholar
- Katoh K, Toh H: Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinformatics. 2008, 9 (4): 286-298. 10.1093/bib/bbn013.PubMedView ArticleGoogle Scholar
- Lassmann T, Frings O, Sonnhammer E: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 2009, 37 (3): 858-865. 10.1093/nar/gkn1006.PubMedView ArticleGoogle Scholar
- Clamp M, Cuff J, Searle SM, Barton GJ: The Jalview Java alignment editor. Bioinformatics. 2004, 20 (3): 426-427. 10.1093/bioinformatics/btg430.PubMedView ArticleGoogle Scholar
- Hall T: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic acids symposium series. 1999, 41: 95-98.Google Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.PubMedView ArticleGoogle Scholar
- Bultrini E, Brick K, Mukherjee S, Zhang Y, Silvestrini F, Alano P, Pizzi E: Revisiting the Plasmodium falciparum RIFIN family: from comparative genomics to 3D-model prediction. BMC Genomics. 2009, 10: 445-10.1186/1471-2164-10-445.PubMedPubMed CentralView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988, 85 (8): 2444-2448. 10.1073/pnas.85.8.2444.PubMedPubMed CentralView ArticleGoogle Scholar
- Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, et al: Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008, 455 (7214): 757-763. 10.1038/nature07327.PubMedPubMed CentralView ArticleGoogle Scholar
- Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, Mourier T, Mistry J, Pasini EM, Aslett MA, et al: The genome of the simian and human malaria parasite Plasmodium knowlesi. Nature. 2008, 455 (7214): 799-803. 10.1038/nature07306.PubMedPubMed CentralView ArticleGoogle Scholar
- Carlton J, Silva J, Hall N: The genome of model malaria parasites, and comparative genomics. Current issues in molecular biology. 2005, 7 (1): 23-37.PubMedGoogle Scholar
- Hall N, Karras M, Raine JD, Carlton JM, Kooij TWA, Berriman M, Florens L, Janssen CS, Pain A, Christophides GK, et al: A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005, 307 (5706): 82-86. 10.1126/science.1103717.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.