Malaria is a major health problem in the world. Although global disease incidence is currently on decline, malaria remains responsible for at least 200 million infections and half a million deaths every year, in particular among immune-naïve African children under the age of five . Human malaria is an infectious disease caused by five different species of eukaryotic parasites of the genus Plasmodium and is transmitted by Anopheles mosquitoes. Plasmodium falciparum is the most prevalent and virulent human malaria parasite accounting for almost all malarial deaths worldwide. Plasmodium vivax is the most prevalent malaria parasite outside Africa where despite much lower mortality rates it represents a huge socioeconomic burden in many countries . Plasmodium ovale and Plasmodium malariae cause a more benign form of human malaria and are responsible for only a small percentage of global infections. Plasmodium knowlesi, although traditionally considered a non-human parasite, is responsible for a potentially life-threatening zoonotic form of human malaria acquired from Southeast Asian macaque monkeys . Other Plasmodium parasites are important model organisms in malaria research, including Plasmodium yoelii, Plasmodium chabaudi, and Plasmodium berghei (rodent parasites) as well as Plasmodium gallinaceum (bird parasite). A tree detailing the phylogenetic relationship of these and other Plasmodium species is provided in Additional file 1.
One of the most pressing issues in malaria research is the development of an effective antimalarial vaccine. Unfortunately, despite many decades of research, this goal has yet to be achieved. RTS,S is currently the most promising P. falciparum vaccine candidate, but latest results from clinical trials showed that RTS,S provides only modest protection against both clinical and severe malaria in young infants [4, 5]. For P. vivax the situation looks even grimmer with currently no vaccine candidate in advanced clinical trials . In 2002, publication of the first two Plasmodium genomes, P. falciparum and P. yoelii, promised to revolutionize vaccine development by laying out the complete map of P. falciparum genes, including a comprehensive inventory of putative antigens that could serve as vaccine targets [7, 8]. This monumental achievement was followed by the publication of the genomes of four additional Plasmodium species, including P. chabaudi and P. berghei in 2005  as well as P. vivax and P. knowlesi in 2008 [10, 11]. The P. gallinaceum genome was sequenced to three-fold coverage in 2007 and is currently unpublished. Although the availability of so many Plasmodium genome sequences provides now a rich resource for (comparative) genomics studies to learn more about parasite biology and immune evasion strategies [12, 13], the promise of an effective antimalarial vaccine has yet to be fulfilled.
One major obstacle in malaria vaccine development is the notorious variability of parasite antigens. These antigens are expressed at the surface of the parasite or of the infected erythrocyte and are typically encoded by large gene families located at subtelomeric regions of chromosomes. We now know that each of the sequenced Plasmodium genomes possesses an extensive and often species-specific array of variant surface antigens (VSAs) . The clinically most important and best studied Plasmodium VSA gene family is var, which encodes for about 60 erythrocyte surface-expressed proteins known as P. falciparum erythrocyte membrane protein 1 (PfEMP1) . In a process called antigenic variation, var gene expression switches from one family member to another over the course of an infection, allowing the parasite to evade the host immune system and to establish a chronic infection . In addition, PfEMP1 mediates adherence of infected erythrocytes to both uninfected erythrocytes and endothelial cells, which is responsible for the most severe clinical complications of P. falciparum malaria and makes PfEMP1 the prime virulence factor in this species.
Besides PfEMP1, Plasmodium parasites express many additional VSAs throughout their complex life cycle. In P. falciparum, this includes the largest P. falciparum gene family rif/stevor (~190 genes) [8, 16], the surfin gene family (10 genes) , and Pfmc-2TM (12 genes) . In P. vivax, the largest gene family by far is vir (~300 genes) [10, 19], which is related to homologous VSA gene families named kir (65 genes) in P. knowlesi[11, 20] and yir (~800 genes), bir (~100 genes), and cir (~200 genes) in P. yoelii, P. berghei and P. chabaudi, respectively [7, 9, 21]. Together, these five gene families form the large pir superfamily, the largest known gene family in Plasmodium parasites . To date no pir genes have been identified in P. falciparum. P. knowlesi possesses an additional large VSA gene family named SICAvar (28 genes), the first Plasmodium gene family demonstrated to undergo antigenic variation [11, 22]. In rodent malaria parasites the second largest VSA gene family after pir is pyst-a, which in primate parasites consists of only a single member suggesting extensive expansion of this family in the rodent malaria species . Besides VSAs, Plasmodium genomes encode another large repertoire of proteins termed the ‘exportome’ that is of potential interest for vaccine development . Proteins in this set carry an N-terminal sequence motif termed Plasmodium export element (PEXEL) or vacuolar transport signal (VTS) that targets these proteins beyond the parasitophorous vacuole to the cytosol of the infected erythrocyte [24, 25]. Exported proteins are then trafficked further to the erythrocyte surface or remain in the cytosol to help remodeling the infected host cell. Like VSAs, exported proteins are typically organized into subtelomeric variant gene families, most of which are species subset-specific . Probably the most prominent exported gene family is the large and highly divergent gene family phist, which has ~40-100 known members in each of the three primate parasites but only a single known member in rodent parasites .
Although their large numbers suggest that VSAs and exported proteins are of major importance for the parasite, we currently know surprisingly little about their biological functions. For example, proposed roles for PIR proteins include antigenic variation, immune evasion, signaling, trafficking, protein folding, and adhesion, but direct evidence for any PIR function is still lacking . Similarly, apart from expression and localization information, the exact functions of PfMC-2TM, SURFIN, PYST-A, and PHIST proteins are currently unknown. Furthermore, the P. falciparum genome contains over a dozen exported gene families named hyp1 to hyp17 whose functions have yet to be determined. The difficulty in elucidating the function of these gene families is due in part to the presence of many functionally redundant paralogs, which makes gene knockout studies challenging. In such cases it would help if one can identify and work with low copy number orthologous gene families in more accessible model parasites. Besides unknown functions, the evolutionary history of many Plasmodium variant gene families is also poorly understood. For example, standard sequence similarity searches reveal no obvious homologs for the major surface antigens of P. falciparum (PfEMP1) and P. knowlesi (SICAvar) outside their respective species, raising the question about their evolutionary origin. Similarly, there are currently no known pir homologs in P. falciparum, although rif/stevor has been suggested as related gene family based on shared sequence motifs and secondary structural features . The identification of functional homologs of VSAs across Plasmodium species is important because it aids comparative immunological studies, gives new insights into the evolutionary adaptation of malaria parasites to their respective hosts, and provides a means to transfer functional annotations from model to human parasites and vice versa.
In this study, we use a recently developed comparative gene family classification strategy  to classify VSAs and exported proteins across seven Plasmodium genomes, including P. falciparum, P. vivax, P. knowlesi, P. yoelii, P. chabaudi, P. berghei, and P. gallinaceum. We hypothesized that the sensitive sequence-based clustering of the entire body of currently available Plasmodium proteins will yield new insights into VSA function and evolution, which in turn could open up new avenues for vaccine development. In this strategy, protein sequences from Plasmodium genomes are first clustered into a hierarchical tree using average-linkage clustering and the resulting tree is then searched for clusters corresponding to known VSA gene families. Finally, identified clusters are closely analyzed for gene content and inter-cluster relationships. This analysis resulted in several noteworthy findings, including the identification of unusually well conserved PIR orthologs that are of potential interest for vaccine development, prediction of the likely function of PYST-A proteins, discovery of a novel and putatively functional PfMC-2TM domain named MC-TYR, new conclusive evidence supporting the common evolutionary origin of PfEMP1, SICAvar, and SURFIN proteins, and the identification of many new VSA gene family members, including new phist gene family members in rodent parasites. Collectively, these findings provide vital starting points for future experimental studies.