Serial Analysis of Gene Expression in Plasmodium berghei salivary gland sporozoites

Background The invasion of Anopheles salivary glands by Plasmodium sporozoites is an essential step for transmission of the parasite to the vertebrate host. Salivary gland sporozoites undergo a developmental programme to express genes required for their journey from the site of the mosquito bite to the liver and subsequent invasion of, and development within, hepatocytes. A Serial Analysis of Gene Expression was performed on Anopheles gambiae salivary glands infected or not with Plasmodium berghei and we report here the analysis of the Plasmodium sporozoite transcriptome. Results Annotation of 530 tag sequences homologous to Plasmodium berghei genomic sequences identified 123 genes expressed in salivary gland sporozoites and these genes were classified according to their transcript abundance. A subset of these genes was further studied by quantitative PCR to determine their expression profiles. This revealed that sporozoites modulate their RNA amounts not only between the midgut and salivary glands, but also during their storage within the latter. Among the 123 genes, the expression of 66 is described for the first time in sporozoites of rodent Plasmodium species. Conclusion These novel sporozoite expressed genes, especially those expressed at high levels in salivary gland sporozoites, are likely to play a role in Plasmodium infectivity in the mammalian host.


Background
The sporozoite is the stage of the malaria parasite transmitted from the mosquito vector to the mammalian host. The success of infection depends on the sporozoites leav-ing their site of inoculation in the dermis, rapidly attaining the liver, invading hepatocytes, and developing therein [1][2][3][4]. This results in the release of thousands of merozoites from the infected hepatocytes that subse-quently invade red blood cells, causing the malaria disease [5]. Sporozoites are formed within oocysts of the mosquito midgut and are initially poorly infectious. They migrate to mosquito salivary glands (SG) and must undergo a developmental programme, with concomitant changes in gene expression, in order to become highly infectious to the mammalian host. These SG sporozoites exhibit a circular gliding movement and in certain conditions can elicit a strong protective immune response in the mammalian host [6][7][8].
Recent technological advances have improved our understanding, at the molecular level, of the Plasmodium parasite including the less well known sporozoite stage. The completion of the Plasmodium falciparum (P. falciparum) genome sequence allowed the analysis of gene expression at different stages of the parasite life cycle with microarrays [9]. The data showed that approximately 2000 genes are expressed in SG sporozoites, 500 of which are expressed at high levels and over a 100 of these are not expressed at significant levels in blood stages. In addition, proteomic analyses detected 274 proteins in P. falciparum sporozoites [10,11].
Plasmodium species infecting rodents are powerful laboratory models as they are more easily amenable to genetic studies and their genomes have also been sequenced [11,12]. However, the transcriptome of sporozoites from these species is less well known and has been obtained mainly from Plasmodium yoelii (P. yoelii) cDNA libraries [13] and Suppressive Subtractive Hybridisation studies [14,15]. In addition, 119 proteins have been identified in Plasmodium berghei (P. berghei) sporozoites, of which 34 are sporozoite specific (PlasmoDB release 5.1).
We recently reported a SAGE aimed at characterising Anopheles gambiae (A. gambiae) SG genes that are differentially expressed in response to infection with P. berghei sporozoites [16]. In that study 530 tag sequences, found exclusively in libraries from infected mosquitoes, were identical to P. berghei genomic sequences, and 41 of these were readily annotated. Considering the data presently available for the transcriptome of P. berghei sporozoites, we decided that it would be worthwhile to annotate the remaining tags. SAGE allows gene-expression-profiling based on the quantification of short 14-15 nucleotide (nt) sequence tags, each sequence being, in theory, associated with the transcript of a single gene. It provides an overall estimation of the abundance of transcripts, while requiring no a priori information about the sequence of the transcripts to be studied [17,18]. However, only short cDNA sequences, usually located in the 3'UTR, are obtained and consequently the attribution of a tag sequence to a gene is highly dependent on the quality of annotation of the genome of the organism studied and available cDNA or EST studies [19,20]. On the other hand, positioning tags on the genomic sequence can provide information on both the orientation of a transcript and the length of the 3' UTR.
The tags from our SAGE were annotated by combining information on their position in the P. berghei genomic sequences, predicted gene models and ESTs from P. berghei and P. yoelii. We unambiguously identified 123 genes expressed in P. berghei sporozoites, of which 66 are detected for the first time in rodent Plasmodium species. The hierarchical classification of these transcripts according to the abundance of the tags was confirmed by qPCR and the characterisation of the gene structure and/or gene expression was undertaken for some. Finally, our results provide evidence that mRNA levels may vary not only between midgut and SG sporozoites but also during storage of the sporozoites in the SG.

Results
Annotation of Plasmodium berghei tags: rationale SAGE libraries were constructed from four different SG RNA preparations: SG isolated from A. gambiae mosquitoes 14, or 18, days after a blood meal on P. berghei infected mice and SG isolated 14, or 15-18, days after feeding on uninfected mice [16]. Sequence analysis showed that among the tags identified in the infected SG libraries, 17181 were absent from uninfected SG libraries and were thus considered as potential P. berghei sequences (see Figure 1). These sequences were then screened using the following criteria: 1) to be found more than once in the cumulative libraries, 2) to give a single hit in the P. berghei genome, 3) to be derived from the most 3' NlaIII site predicted in the annotated gene sequence in the sense orientation ("primary" tags in Figure 1) or to be found within 500 nt of the stop codon of an annotated gene in the sense orientation.
We found 2154 tag sequences at least twice in the cumulative libraries and 530 of these gave one or more perfect matches by BLASTN comparison on P. berghei genomic sequences and annotated genes. These tag sequences and their annotations are provided in Additional file 1. We discarded from further analysis 162 tags that gave multiple hits in the genome due to the impossibility of assigning them to a unique gene. Of the 368 tags that gave a single hit in the genome, 166 were found within annotated genes and 62 of these were derived from the most 3' NlaIII predicted site.
Sampling of the 3' untranslated regions of about 30 Plasmodium genes showed that 77% of them were 500 nt in length (I.C. unpublished observation). Therefore, we analysed in more detail the position in the genome of the 202 tags that were not found within annotated genes. Sixty-three tag sequences, found within 500 nt downstream of the stop codon of an annotated gene and in the sense orientation, were retained in the analysis as it seemed likely that they came from the 3' end of the adjacent genes. As two genes were identified with two different tag sequences respectively, only one tag was retained for each gene, resulting in a total of 61 (see Figure 1).
In total, 123 identifying tag sequences fulfilled all our criteria (see Table 1 and Additional file 2). BLASTN analysis of the 123 genes identified by these tags showed that 55 aligned with known transcripts from P. yoelii or P. berghei sporozoites, whereas the remaining 68 did not. Among the latter, proteomic approaches have detected PB000876.00.0 in sporozoites and a study by Ishino et al. [21] showed that PB000892.00.0 (Pbs36) is also present. It should be noted that the criteria used have resulted in the elimination of the majority of tag sequences, including several tags corresponding to genes known to be expressed in SG sporozoites (see Additional file 1). For example, two tags aligned with PB001026.00.0, coding for circumsporozoite protein (CS). The most abundant tag, found 77 times, gave 4 hits in the genome and the second tag sequence was in the antisense orientation. Although it is likely that the most abundant tag originated from the CS transcript, it was not taken into account in order to be consistent in the analysis. Another example concerned two tags aligning with PB000892.03.0 (UIS3), a gene known to be upregulated in sporozoites [14]. Again the most abundant tag gave multiple hits in the genome and the second was in the antisense orientation.
Taken together this SAGE analysis has identified 66 novel sporozoite expressed genes (indicated as SIS on Table 1).

Validation of SAGE data
SAGE, like microarrays, is designed to give quantitative information on gene expression and the interpretation of the results depends on correct gene identification. This identification is not always straightforward in P. berghei since genomic sequence clusters are generally shorter than in P. falciparum; this results in the prediction of a large number of truncated genes, often lacking 5' or 3' ends. In addition, there are less EST ressources.
As a first step towards confirming the gene expression in sporozoites and validating tag annotation, we obtained more accurate data on gene structure and organization, by clustering ESTs and comparing with P. yoelii and P. chabaudi orthologous sequences. In Table 1, 15 genes (indicated by *) have been manually reannotated resulting in a different structure and a longer ORF than that predicted in the databases. These structures have been confirmed for 13 genes by RT-PCR on sporozoite RNA (see Additional file 2). For genes that are split between two genomic clusters, the intervening fragment was cloned and sequenced (see Additional file 2).
To confirm the identification of the 66 novel genes expressed in salivary gland sporozoites, RT-PCR experiments were performed. PCR fragments were obtained for all of them indicating that these genes are truely expressed (Additional file 3).
The number of times a tag sequence is identified in a SAGE library is expected to correlate with the relative abundance of the steady state mRNA [17]. To determine whether the SAGE data correctly reflects transcript abun-Flow chart of the rationale used to select Tag sequences from the SAGE library Figure 1 Flow chart of the rationale used to select Tag sequences from the SAGE library. Shaded boxes correspond to tag sequences retained, clear boxes correspond to tag sequences rejected.    dances, we selected eighteen known or novel genes, predicted to be expressed at high or low levels, and quantified their RNA by qPCR in sporozoites isolated from SG at d18 of infection. The values were normalised to the reference gene PB001026.00.0 (CS) and plotted against the number of times the identifying tag was found in the SAGE library at d18 of infection ( Figure 2). A good correlation (R 2 = 0.8) was found and was even greater (R 2 = 0.96, not shown), when a second abundant tag sequence for two genes, UIS4 and TRAP, was taken into account. Thus, the number of times a tag was found in our SAGE data correctly reflects the gene expression levels in sporozoites.

Hierarchical classification of sporozoite expressed genes
In Table 1, the 123 genes unambiguously identified by this SAGE as being expressed in SG sporozoites have been classified into three groups according to the number of times the identifying tag sequence was found in the combined d14 and d18 libraries (with the caveat that for some genes there may be an underestimation due to the criteria used above).

Group 1
The first group of highly expressed genes, defined by tags found more than 20 times in the combined libraries, con-tains five genes of which four have been described previously: PB000374.03.0, which codes for TRAP, a major sporozoite protein having an essential role in sporozoite motility [22]; PB100551.00.0 and PB107461.00.0, which code for UIS4 and UIS7 respectively, and were identified by SSH as genes whose expression is upregulated in SG sporozoites compared to midgut sporozoites [14]; PB001063.00.0, also known as S23, which codes for a reticulocyte binding protein and was identified by SSH as a gene upregulated in SG sporozoites compared to blood stages [15].
Surprisingly, the gene PB402615.00.0, for which the identifying tag (tag 335) was found the most often, has not been described previously. The gene sequence aligns with numerous ESTs of P. yoelii sporozoites and parasites developing in the absence of host cells [13,23]. To further characterize the gene structure, a manual clustering of these ESTs was performed. Several tags were found along this cluster, both in sense and antisense orientations and, as expected, the most 3' tag was also the most abundant (Figure 3A). The other internal tags, one of which is highly represented (tag 189, see additional file 1), may be due to either alternative polyadenylation events or priming at internal polyA stretches during cDNA synthesis, as hypothesized by others [24]. To rule out the possibility that the cluster covers two different genes, a RT-PCR was performed using primers at the 5' and 3' end of the cluster and this detected a unique transcript of the predicted size (data not shown). Based on our results and annotation, we propose that the ORF predicted for PB402615.00.0 during automatic annotation is incorrect and that this gene codes for a 17 KDa protein, rich in tyrosine and basic amino acids. Furthermore, this new annotation indicates that there are orthologous sequences in P. yoelii (PY02432), P. chabaudi (PC400629.00.0) and also in P. falciparum (PFgenefinder_63r.) ( Figure 3B).
Interestingly, PB402615.00.0 is not the only gene characterized by several abundant tags. For instance, two tags are characteristic of PB100551.00.0, encoding UIS4: the most abundant (tag 186, found 95 times) is located at the end of the ORF, while the other (tag 153, found 74 times), is located 650 nt downstream and probably defines a transcript with a longer 3' UTR. Such observations may be useful when defining the gene structure with the complete potential regulatory sequences.

Group 2
The second group contains eight moderately expressed genes defined by tags found 10 to 20 times. Among these, there is one UIS gene (UIS10/PB000484.00.0) and three S genes S13/SPECT2 (PB000252.01.0), S21/PbTRSP (PB000881.01.0) and S12 (PB104017.00.0). UIS10/PbPL codes for a lecithine-cholesterol acyl-transferase involved in cell traversal [25]. The S13/SPECT2 protein is characterized by a MAC/Perforin domain and was found to be essential for membrane attack during cell invasion [26]. The protein S21/PbTRSP, which contains a thrombospondin type 1 domain, has recently been shown to have a role in host cell invasion [27]. Finally, S12 codes for a protein with a predicted signal peptide, but whose function has yet to be characterized.

Group 3
Of the remaining 110 genes, expressed at low levels and defined by tags found less than ten times, 43 aligned with P. yoelii or P. berghei sporozoite ESTs, one was detected in P. berghei sporozoites by proteomics and PB000892.00.0/ Pbs36 has been described in sporozoites [21]. Thus 65 genes in this group are shown for the first time to be expressed in this stage of the P. berghei parasite (indicated as SIS2-66 on Table 1). Orthologous P. falciparum genes are predicted for 90 genes in this group, and expression in sporozoites has been detected for 74 by microarrays (of which 20 also by proteomics) and one by proteomics only. The 20 P. berghei predicted genes for which there are no obvious P. falciparum orthologues could correspond to incomplete genes and require more precise annotation. Alternatively, they may be highly divergent genes or genes whose prediction has been missed during P. falciparum genome annotation.
In addition, two other genes could have a potential role in adhesion/invasion of host cells: PB001517.02.0 codes for a protein with a Thrombospondin 1 domain and a von-Willebrand type A domain, and PB001080.02.0, which encodes a protein defined as thrombospondin-related 3 (also found as AF375983 in Genbank). Proteins involved Correlation between the levels of gene expression as deter-mined by qPCR and the number of times the identifying tag sequence was found in the SAGE library Figure 2 Correlation between the levels of gene expression as determined by qPCR and the number of times the identifying tag sequence was found in the SAGE library. R 2 is the coefficient of correlation. in the molecular motor needed for sporozoite motility are also represented by aldolase (PB000757.02.0), which provides the link between TRAP and myosinA [34,35], and by a kinesin-related protein (PB000464.00.0/S25). While myosinA, which is thought to play a major role in the motility of Apicomplexa zoites, is absent from our description (with a tag found only once), another gene encoding a potential myosin is identified (PB000450.03.0). Two genes encoding proteases of the rhomboid family (PB001432.02.0/SIS14 and PB000352.00.0/SIS28) are also found. Interestingly, these proteins are the orthologues of PfROM4 and PfROM1, respectively, which are able to cleave adhesins, such as TRAP, AMA1, MAEBL and others, that are involved in interactions with host-cell receptors [36].
The expression of only two genes encoding ribosomal proteins has been detected, confirming, as previously reported [13], their under-representation in the sporozoite transcriptome. Interestingly, two genes coding for proteins with RNA-binding domains are identified, one of which has a pumilio domain suggesting a role in the negative regulation of translation [37]. These proteins were previously described as being upregulated in P. falciparum gametocytes and sporozoites [9] and may be involved in the regulation of sporozoite protein expression.
Other genes found in this group code for proteins with diverse biological functions for example energy metabolism, signal transduction and protein modification. Interestingly, a gene encoding a putative sugar transporter (PB000161.03.0), whose P. falciparum orthologue is expressed specifically in sporozoites, is present [9]. We previously reported an increase in mRNA levels of an Anopheles sugar transporter in infected SG [16]. These two transporters may play a role in providing the sporozoite with sufficient energy resources for its journey from the oocysts to the SG and from the bite site to the liver in the vertebrate host. The presence of three genes coding for proteins involved in iron-sulphur cluster formation and iron homeostasis can also be noted. Iron-sulphur cluster formation is essential for a wide variety of processes, including facilitation of electron transfer in oxidative phosphorylation and enzymatic activities in mitochondria, cytoplasm or nucleus as well as sensing of intracellular iron and/or oxidant levels. Expression of these genes New annotation of PB402615.00.0 may be preparing the sporozoite for high mitochondrial activities related to motility and/or for the future iron-rich blood environment.
All genes showed higher levels of expression in midgut and/or SG sporozoites compared to ookinete or blood stages, with the exception of PB000787.03.0 whose expression is clearly not sporozoite specific. Other genes, for example PB000400.01.0, PB000757.02.0 and PB001432.02.0 are easily detectable in blood stages, suggesting they may also have a role at this stage. Finally, PB001432.02.0, PB000317.00.0, PB001080.02.0 and PB000400.01.0 are expressed, like CS, at relatively high levels in the ookinete stage, although this does not mean that the protein is produced.
As observed in Table 1, the number of times a tag is found in the SG libraries at d18 of infection often differs to that at d14. This is due in part to the higher number of tags sequenced in the d18 library (2 fold) and to the increase (2 fold) in the number of sporozoites inside the glands at d18. However, the tags of some genes, for example UIS4, increase up to 15 fold between d14 and d18 whereas they decrease for others. This suggested that there might be variations in gene expression during sporozoite storage in the glands. To further characterize these variations, the relative levels of gene expression for the same panel of genes as above were determined in sporozoites isolated from SG and midguts of A. gambiae mosquitoes at d14 or d18 of infection. The values were calculated from the geometric averages and normalised to the geometric mean of PB001026.00.0 (CS), as this gene was determined to be the best reference using GeNORM software.
The results show important increases (4-66 times) in RNA amounts in SG compared to midgut sporozoites for UIS4, UIS1, UIS3, SPECT2, UIS7 and PB402615.00.0 at d18 (see Table 2). These results were expected for the UIS genes as they were found in an SSH library between SG and midgut sporozoites [14]. They also suggest that PB402615.00.0 may be important for preparing the sporozoites for infection. On the contrary, there is significantly less MAEBL RNA in SG compared to midgut sporozoites in agreement with MAEBL having a role in sporozoite adhesion to mosquito salivary glands [33].
In addition, the ratios obtained between d18 and d14 SG sporozoites indicate that there is a significant increase in expression over this period for UIS4, UIS1, UIS3 and UIS7, whereas there is no substantial change in RNA quantities for the other genes. Similar increases in expression were also seen for UIS3 and UIS4 in A. stephensi infected salivary glands suggesting that P. berghei sporozoites develop similarly in A. gambiae and A. stephensi (not shown).
Taken together, the qPCR data show that sporozoites are capable of modulating their RNA amounts between the midgut and salivary glands, as well as during their storage within the latter.

Discussion
Although the transcriptional repertoire of Plasmodium sporozoites has been investigated in several laboratories using different techniques, including cDNA libraries, SSH, microarrays and proteomics, only the microarray data has provided information concerning the level of gene expression [9]. We have obtained new data concerning the transcriptional repertoire of P. berghei sporozoites using SAGE on A. gambiae infected salivary glands. SAGE does not require a priori knowledge of the sequence of genes to be Histogram representation of qPCR analysis of gene expres-sion in mixed blood stages, ookinetes, d18 midgut sporo-zoites and d18 salivary gland sporozoites Figure 4 Histogram representation of qPCR analysis of gene expression in mixed blood stages, ookinetes, d18 midgut sporozoites and d18 salivary gland sporozoites. X axis: genes tested; Y axis: log scale of mean normalised expression.
analysed and provides quantification of gene expression by the number of times a tag sequence is obtained [17]. However, since the sequence and annotation of the P. berghei genome is incomplete, the attribution of a tag sequence to a gene was not straightforward.
Several criteria were applied before retaining a tag sequence as a gene identifier: it was found twice or more in the cumulative SAGE libraries, the BLASTN analysis gave a unique hit in the genome, which was in the sense orientation of an annotated gene and was located at the most 3' predicted NlaIII site or within 500 nt of the stop codon of a neighbouring gene. These combined criteria resulted in the unambiguous identification of 123 genes expressed in SG sporozoites, of which 16 were already known to be sporozoite-expressed genes: TRAP, several UIS and S genes, SPECT, Pbs36p, Pbs36 and MAEBL. It should be noted that this list of 123 P. berghei SG sporozoite-expressed genes is not exhaustive.
Of the 2154 unique tag sequences found at least twice in the libraries from infected mosquitoes, only 25% (530) matched perfectly to the present version of the P. berghei genome. The 1624 remaining tags may derive from A. gambiae cDNAs. Indeed, 587 match on A. gambiae ESTs and 943 on the A. gambiae genome (see Additional file 1, sheet 2). Alternatively, they may derive from genes not yet sequenced in the P. berghei genome. In addition, the absence of matches of some tags could be due to polymorphisms between the P. berghei strain sequenced, ANKA, and the NK65 strain used in our SAGE.
Also, 202 (55%) of the tag sequences that gave a single hit in the P. berghei genome were not found within an annotated gene. Upon more detailed analysis of their position in the contigs, 63 were found to be within 500 nt of the stop codon of an annotated gene, in the sense orientation and were considered to derive from transcripts of these genes. However, we were unable to place 139 tag sequences either within an annotated ORF or within 500 nt of a stop codon. These tags should help in refining the present annotation of the P. berghei genome and may, in the future, be formally proven as corresponding to sporozoite-expressed genes.
Among the 368 tag sequences that gave a single match in the P. berghei genome, 64 were found in the antisense orientation, either within annotated genes (56) or within 500 nt of the stop codon of an annotated gene (8). Antisense RNAs have been described previously in P. falciparum and they are suggested to be involved in transcriptional regulation [38][39][40]. At the present time, there is insufficient P. berghei cDNA data and no microarrays using sense and antisense probes, to establish whether or not these tags correspond to antisense RNAs or to transcripts from a gene on the opposite strand. Amongst the P. berghei SIS genes identified in this study as being expressed in sporozoites, eighteen are predicted to encode proteins with one or more transmembrane regions and/or a signal peptide sequence, suggesting that they are membrane associated or secreted proteins. Other genes of potential interest are PB000464.00.0/S25, a kinesin-related protein, which could be involved in motility; PB000416.02.0, a putative RNA-binding protein of the pumilio/mpt5 family known for their role in repression of gene expression; PB000903.01.0/SIS43 which contains an ankyrin repeat suggesting a role in proteinprotein interaction; PB000650.00.0/SIS36, which contains a fasciclin domain, may have a role in cell adhesion. Finally, PB402615.00.0, for which the identifying tag was the most abundant in this SAGE analysis, is annotated as a hypothetical protein and aligns with P. yoelii ESTs from sporozoites as well as parasites developing in the absence of host cells. Our qPCR data show that this gene is expressed in sporozoites but not in ookinetes or blood stages and that there is a substantial increase in the amount of RNA for this gene between midgut and salivary gland sporozoites. This differential regulation between organs and the mRNA abundance in SG suggest a role for this gene in sporozoite infectivity in the mammalian host.
Several properties (motility, infectivity, etc) differ between midgut and SG sporozoites, but it is not known whether these developmental changes are time and/or tissue dependent nor which signalling factors are involved. Interestingly, the qPCR data presented here indicate that, at least for UIS3 and UIS4, not only the tissue (midgut versus salivary gland) but also the time spent in the SG significantly influences the level of expression of individual genes. This change in expression of genes that are essential for development in the liver, is in agreement with the increase in infectivity of SG sporozoites between d14 and d18 post blood meal [41].
Since SAGE provides a quantitative read out of gene expression, and as our qPCR analysis confirmed this, the 123 genes were classified into 3 groups according to the number of times the tag was found in the libraries. Among the thirteen genes presenting the most abundant tags (groups 1 and 2), there are seven, UIS4, TRAP, S13/ SPECT2, S21/Pb TRSP, ECP1, UIS10/Pb PL, and Pbs36p, that have been shown via gene knockout experiments to be essential for the sporozoite or liver stage development [7,21,22,[25][26][27][28]. Furthermore, among the genes with less abundant tags there are three, SPECT, Pbs36 and MAEBL, which have also been shown to have essential roles [21,32,33]. This indicates that our approach has identified several genes known to be essential for sporozoites and points to additional genes that may be required at this stage. It will be of interest to inactivate the novel sporozoite-expressed genes identified in this paper to define their function in the parasite. Mutants that are defective in their development in the mammalian host would be of particular interest as they could provide new tools to probe the host immune response to Plasmodium infection.

Conclusion
The SAGE described here has lead to the identification of 66 novel genes expressed in P. berghei sporozoites. These novel sporozoite expressed genes, especially those expressed at high levels in salivary gland sporozoites, are likely to play a role in Plasmodium infectivity in the mammalian host.

Mosquito Infections
A. gambiae (Yaounde strain) and Anopheles stephensi (A. stephensi) (Sda500 strain) mosquitoes were reared at the Centre for Production and Infection of Anopheles at the Pasteur Institute under standard conditions [42]. 3-4 day old female mosquitoes were fed on anaesthetised P. berghei infected mice. A recombinant P. berghei strain expressing GFP in late midgut stages and SG of mosquitoes was used [43].

Construction of SAGE libraries
The construction of the SAGE libraries has been described elsewhere [16]. Briefly, 2 libraries were constructed from SG of infected A. gambiae mosquitoes: one 14 days (early stage of infection) and the other 18 days (stage at which sporozoites are considered to be fully infectious) after feeding on P. berghei infected mice. Two control libraries were made from SG of mosquitoes 14 and 15-18 days post-feeding on uninfected mice.

Analysis of Tag Sequences
The analysis of tag sequences has been described in detail in [16]. Tag sequences found only in libraries from the infected mosquitoes were considered as potential P. berghei sequences. BLASTN comparisons were performed against the library of P. berghei contigs and annotated genes obtained from the ftp site of the Sanger Institute P.