Exploring the midgut transcriptome of Phlebotomus papatasi: comparative analysis of expression profiles of sugar-fed, blood-fed and Leishmania major-infected sandflies

Background In sandflies, the blood meal is responsible for the induction of several physiologic processes that culminate in egg development and maturation. During blood feeding, infected sandflies are also able to transmit the parasite Leishmania to a suitable host. Many blood-induced molecules play significant roles during Leishmania development in the sandfly midgut, including parasite killing within the endoperitrophic space. In this work, we randomly sequenced transcripts from three distinct high quality full-length female Phlebotomus papatasi midgut-specific cDNA libraries from sugar-fed, blood-fed and Leishmania major-infected sandflies. Furthermore, we compared the transcript expression profiles from the three different cDNA libraries by customized bioinformatics analysis and validated these findings by semi-quantitative PCR and real-time PCR. Results Transcriptome analysis of 4010 cDNA clones resulted in the identification of the most abundant P. papatasi midgut-specific transcripts. The identified molecules included those with putative roles in digestion and peritrophic matrix formation, among others. Moreover, we identified sandfly midgut transcripts that are expressed only after a blood meal, such as microvilli associated-like protein (PpMVP1, PpMVP2 and PpMVP3), a peritrophin (PpPer1), trypsin 4 (PpTryp4), chymotrypsin PpChym2, and two unknown proteins. Of interest, many of these overabundant transcripts such as PpChym2, PpMVP1, PpMVP2, PpPer1 and PpPer2 were of lower abundance when the sandfly was given a blood meal in the presence of L. major. Conclusion This tissue-specific transcriptome analysis provides a comprehensive look at the repertoire of transcripts present in the midgut of the sandfly P. papatasi. Furthermore, the customized bioinformatic analysis allowed us to compare and identify the overall transcript abundance from sugar-fed, blood-fed and Leishmania-infected sandflies. The suggested upregulation of specific transcripts in a blood-fed cDNA library were validated by real-time PCR, suggesting that this customized bioinformatic analysis is a powerful and accurate tool useful in analysing expression profiles from different cDNA libraries. Additionally, the findings presented in this work suggest that the Leishmania parasite is modulating key enzymes or proteins in the gut of the sandfly that may be beneficial for its establishment and survival.


Background
Cutaneous leishmaniasis due to L. major is found throughout the Old World, including the Middle East and West Africa. Phlebotomus papatasi is the principal vector for this parasite and is refractory to the development of other species of Leishmania.
Upon taking a blood meal, hematophagous arthropods express a large number of molecules that participate in various physiologic processes ranging from blood digestion to egg development. Furthermore, many insects can either obtain or transmit pathogens during the acquisition of a blood meal. In blood-feeding arthropods, the midgut plays a crucial role as the primary organ involved in processing the blood meal and, in some instances, molecules expressed in the midgut of an insect vector have been shown to directly influence pathogen establishment [1,2]. Certain pathogens, such as Leishmania, appear able to modulate the activity of sandfly midgut proteases for their own benefit or survival [3,4].
Sequenced data sets containing information regarding expression profiles of anopheline and culicine mosquitoes, such as Anopheles gambiae and Aedes aegypti, following a blood meal have become available [5,6]. Other datasets now encompass insects such as Pedicullus humanus [7] and Cullicoides sonorensis [8]. In comparison, transcriptome information regarding sandflies is limited. Previous work has focused mainly on the sandfly salivary gland [9][10][11], whereas only a small number of sandfly-specific midgut cDNA have been identified [12][13][14][15][16]. Recently, a large set of cDNA transcripts from the whole sandfly Lutzomyia longipalpis has been sequenced, providing greater information regarding molecules present in sandflies [17]. However, the information regarding sandfly midgutspecific transcripts remains poor.
In this work, we embarked on a comprehensive study of P. papatasi midgut-specific transcripts and compared the expression profile of these transcripts by directly comparing those obtained from midguts of females fed on sugar only, on blood or on blood containing L. major. With this approach, we have identified several P. papatasi midgutspecific transcripts that are differentially expressed after a blood meal and in the presence of L. major.

Results and discussion
The midgut is the tissue where Leishmania development takes place while within its sand fly vector. Within the midgut environment, Leishmania possibly interacts with various secreted molecules and cell types lining the midgut epithelia. In order to gain greater insight into the repertoire of the proteins present in the midgut of P. papatasi, we constructed and sequenced three high quality fulllength cDNA libraries from the midgut of sandflies fed either on sugar only (unfed), blood or blood containing L. major. 4010 high quality sequenced clones obtained from the three cDNA libraries were combined and analysed resulting in the formation of 1382 clusters. Each cluster may contain a large number of transcripts which creates a contig (high quality consensus sequence) or may have a single transcript that can be defined as a singleton. Therefore, we will utilise the nomenclature of "cluster" in the remainder of the manuscript to define either a consensus sequence from various transcripts or a singleton.
Consensus sequences were compared with various databases and putative functions were assigned. The categories for the transcripts' potential biologic functions included protein synthesis machinery, protein modification machinery, transcription machinery, transporters, extracellular matrix, signal transduction, immunity, adhesion, and conserved proteins of unknown function. Table 1 summarizes this analysis listing transcripts from female P. papatasi midguts fed on sugar, on blood, and on blood containing L. major. The first column shows the putative biological function, the first section of columns shows the number of clusters found in each of the three cDNA libraries in relation to this function; the second section of columns indicates the total number of sequences for these clusters and the third section of columns shows the average of the number of sequences per cluster. The category of "conserved unknown function" had the largest number of clusters in all three of the cDNA libraries. These were followed by metabolism, energy in the sugar-fed library (95 clusters); metabolism, amino acid, which includes digestive enzymes, in the blood meal library (40 clusters); and protein synthesis machinery in the L. major bloodmeal library (51 clusters). The categories with the highest number of sequences per cluster differed between the three cDNA libraries and was highest among transcripts identified as extracellular matrix (27.33 seq/cluster) in the sugar-fed cDNA library and cytoskeletal transcripts for both the blood meal (19.40 seq/cluster) and L. major blood meal cDNA libraries (15.00 seq/cluster). The sugarfed cDNA library has 669 clusters with an average of 3.23 sequences per cluster. The cDNA library constructed from blood-fed midguts consisted of 441 clusters with an average of 3.27 sequences per cluster. Of P. papatasi midgut fed on blood containing L. major, this library produced 555 clusters with an average of 3.01 sequences per cluster.
The number of sequences in each category for the three cDNA libraries is graphically represented in Figure 1. After blood feeding, there is a decrease in the number of sequences in all categories other than cytoskeletal, amino acid metabolism, and heme metabolism. Noticeable differences in the number of sequences between the bloodfed and blood-fed containing L. major libraries occurs in the protein synthesis machinery, extracellular matrix, c ytoskeletal, heme metabolism, and conserved of unknown function categories. Table 2 gives a more detailed description of the different types of transcripts identified in the combined analysis of the three cDNA libraries. Only high quality sequences and, for the most part, full-length coding sequences submitted to GenBank are shown. This table shows the different clusters arranged in the order of cluster number in the combined analysis of the three cDNA libraries. The first column of Table 2 describes the cluster number, the second column shows the clone that produced the fulllength sequence, the third column shows the best match in the non-redundant protein database (GenBank, NCBI), the fourth column shows the e-value for the best matching BLAST result in column 3, the fifth column shows the assigned putative function of that cluster, and the sixth column shows the accession number of the transcript submitted to GenBank. The four most abundant transcripts were microvilli-associated like protein, followed by peritrophin-like protein, 40 S ribosomal protein S30 and a transcript coding for a protein of unknown function. Still, other abundant transcripts include those coding for various ribosomal proteins, chymotrypsins, carboxypeptidases, trypsins, a zinc metalloprotease astacin, a Kazal-Distribution of sequences analysed from each cDNA library separated by putative biologic function Figure 1 Distribution of sequences analysed from each cDNA library separated by putative biologic function.  type serine protease inhibitor, Glutathione S-transferase (GST) and various proteins of unknown function ( Table  2). All the sequences generated from these three cDNA libraries have been deposited as an EST database at the National Center of Biological Information (NCBI), accession numbers ES346912 -ES351350 and ES351429). The following is a more detailed description of relevant transcripts identified in the cDNA libraries:

Microvilli-associated like proteins
Of the most abundant transcripts found in the combined analysis of all three libraries were transcripts coding for proteins with similarities to microvilli membrane proteins from A. aegypti and A. gambiae. These transcripts are also homologous to major allergens identified in the cockroaches Blatella germanica and Periplaneta Americana [18] and to a nitrile-specifier protein (PrNSP) from the midgut of Pieris rapae. PrNSP has a role of converting toxic compounds, such as isothiocyanate, into less toxic compounds, such as nitriles, that are excreted in the feces of larval stages of this lepidopteran [19]. Four different puta-tive microvilli-associated proteins were identified in the three P. papatasi midgut cDNA libraries ( Figure 2). Clusters 1, 2, and 3 represent likely polymorphisms of the same transcript named here "microvilli protein 1" (PpMVP1), which has a predicted molecular weight of 23.7 kDa. Another three transcripts coding for microvilli proteins and derived from clusters 94, 96 and 98 were named PpMVP2, PpMVP3, and PpMVP4, respectively. The predicted molecular weight for these microvilli-associated like proteins is 24.0, 25.6, and 25.6 kDa, respectively. Additionally, each of these microvilli proteins has a potential signal peptide as predicted by SignalP 3.0 and no evidence of transmembrane helices as predicted using the TMHMM 2.0 server. Identity between the amino acid sequences of these microvilli proteins ranges from 21 to 36 percent (Figure 2, black-shaded amino acids) and similarity from 45 to 57 percent ( Figure 2, grey-shaded amino acids). The degree of conservation may indicate that these are biochemically distinct from one another and only commonly named based on the previous annotation of other organisms with similar sequences. Searching the Multiple sequence alignment of the four putative microvilli associated-like proteins found in the midgut of Phlebotomus papatasi Figure 2 Multiple sequence alignment of the four putative microvilli associated-like proteins found in the midgut of Phlebotomus papatasi. Predicted signal peptide sequence is underlined and the accession numbers given in parentheses.

Peritrophin-like proteins
Transcripts coding for three different putative peritrophinlike molecules were identified in the midgut of P. papatasi. PpPer1 (cluster 9) and PpPer2 (clusters 12 and 13) transcripts code for secreted proteins with predicted molecular masses of 29.8 and 9.6 kDa, respectively. PpPer1 is comprised of four potential chitin-binding peritrophin-A domains ( Figure 3A). PpPer2 is a much smaller predicted protein and has only one potential chitin-binding domain ( Figure 3A). A third putative peritrophin, PpPer3, was identified from cluster 26 with an apparent molecular mass of approximately 32 kDa ( Figure 3A) and contains two distant putative chitin-binding domains. Phylogenetic analysis using the chitin binding domains of PpPer1, Pper2, PpPer3 and those of peritrophins from several insects ( Figure 3B) suggests a low level of conservation between the domains. Insect peritrophins have been reported to bind to chitin fibers via multiple chitin-binding domains, forming the scaffold that maintains the molecular structure of the peritrophic matrix (PM) in the insect gut [20]. In addition to their role in the formation of the PM, peritrophins may also play a role in preventing the toxic effects of heme, a bi-product of blood meal digestion. In A. aegypti, AeIMUC1, a mucin that encodes putative chitin-binding domains was recently shown to bind heme [21]. Although peritrophins have been characterised from several insects, including A. aegypti and A. gambiae [20,22], no information exists related to sandfly midgut-specific peritrophins. PpPer1 and PpPer3 have high sequence similarity, at the protein level, to the translated sequences SFM-03c06 and SFM-02h07 from the L. longipalpis EST database. However, PpPer2 has lower sequence similarity to any of the assembled and translated sequences from the L. longipalpis EST database, suggesting a more divergent or novel molecule.

Chymotrypsin
Two previously characterised P. papatasi chymotrypsinlike cDNA, PpChym1 and PpChym2 [13], as well as a novel chymotrypsin-like, PpChym3 (Cluster 113) were also found in the transcriptome database. This newly identified novel chymotrypsin-like molecule was found in low abundance in the blood-fed midgut library. The predicted Characterisation of peritrophin sequences   Ppchym3 has 36% amino acid identity to Ppchym1 and 30% amino acid identity to Ppchym2. Furthermore, Ppchym3 has a signal secretory peptide ( Figure 5A) and has the required His/Asp/Ser amino acid triad necessary for catalytic activity ( Figure 5B). Ppchym1 and Ppchym2 both share sequence homology from the assembled sequence NSFM-01d03 from the L. longipalpis EST database, while Ppchym3 is most similar to sequence SFM-01b03.

Carboxypeptidase
A number of sequences were identified with homology to carboxypeptidases. The full-length transcript of a putative carboxypeptidase B,PpCpepB, was found from 37 sequences in cluster 16 and has high homology to a carboxypeptidase B identified in A. aegypti (GenBank acces-sion# AAT36733). The predicted amino acid sequence of PpCpepB contains a signal peptide, a propeptide domain, and a carboxypeptidase domain. A putative carboxypeptidase A, PpCpepA, was also identified from cluster 113 based on amino acid sequence homology. Phylogenetic analysis shows that the identified P. papatasi putative carboxypeptidases are separated into distinct clades ( Figure  6A). Comparison of sequence homology indicates the potential for these molecules to have substrate specificities of either carboxypeptidases A or B ( Figure 6B). Sequence alignment of the two carboxypeptidases depicts the difference in amino acid composition; however, both sequences contain the zinc ion binding motifs of metallocarboxypeptidases ( Figure 6B). Additionally, the presence of a putative signal peptide alludes that these molecules are midgut digestive enzymes. Similarity between these carboxypeptidases and those present in L. longipalpis EST database is evident by the high homology between PpCpepA and SFM-05c11 and between PpCpepB and NSFM-32d09.

Astacin-like zinc metalloprotease
A putative astacin-like zinc metalloprotease (PpAstacin) was identified from cluster 37, a product of five sequences. This putative astacin-like protein displays a predicted signal peptide and a slightly modified form of the signature zinc binding catalytic domain for proteins in the astacin family (HEXXHXXGFXHEXXRXDR). In

I II
PpAstacin, changes in two residues (E to M and R to A) resulted in the motif HEFLHALGFFHMQSASDR ( Figure  7A). Although the altered residues may be involved in target specificity the zinc-binding catalytic domain remains conserved. The likely role of this putative protein is blood meal digestion, as astacins molecules have not been implicated in immune functions and a considerable number of transcripts constituting this cluster were derived from the blood-fed midgut cDNA library. This is the first report of this type of protease from the gut of a sandfly, though NSFM-127b08 of the L. longipalpis EST database was identified based on sequence homology.

Kazal-type serine protease inhibitor
Two Kazal-type serine protease inhibitors were identified from cluster 111 (PpKZL1) and 859 (PpKZL2) in the cDNA midgut libraries. PpKZL1 codes for a small peptide of 78 amino acids while PpKZL2 codes for a peptide of 89 amino acids. Both proteins are predicted to be secreted based on the presence of signal peptides ( Figure 8). PpKZL1 is similar to various small Kazal-type inhibitors found in Drosophila pseudoobscura (gi: 125986397), C. sonorensis (gi:56199538) and the mosquitoes A. aegypti and A. gambiae, and to larger Kazal-type molecules such as infestin [23] from Triatoma infestans ( Figure 8A). There is only 28% identity and 42 % similarity between PpKZL1 and PpKZL2 ( Figure 8B) suggesting these may have different functions. Additionally, these two Kazal-type cDNAs are similar to the previously characterised thrombin inhibitor, rhodniin,, from the triatomine Rhodnius prolixus [24] (data not shown). Due to their anti-hemostatic effect, rhodniin and infestin are believed to play a role in the fluidity of the blood within the midgut of these vectors. It is conceivable that one or both transcripts coding for Kazaltype thrombin inhibitors identified in P. papatasi may play a role in blood fluidity within the sandfly midgut, allowing it to be fully digested by the various proteases secreted within the midgut following the blood meal. These represent the first Kazal-type serine protease inhibitors identified from sandflies. PpKZL2 shares low sequence similarity with SFM-0406 from the L. longipalpis EST database and no significant similarities were identified for PpKZL1.

Ferritin
Two transcripts encoding putative ferritin light (PpFLC) and heavy (PpFHC) chain subunits were identified in clusters 103 and 122, respectively (Figure 9). After the ingestion of a blood meal the fly encounters a tremendous dose of iron and heme which would be fatal to most organisms. Ferritin is one of the important factors in controlling the high iron load in hematophagous insects. The midgut of blood-feeding insects envelopes the blood meal and consequently makes the midgut tissue the most likely site of iron regulatory molecules. However, ferritin may also be important for oxidative stress not related to the presence of iron or heme, as it is induced by the presence of H 2 O 2 in A. aegypti [25]. PpFLC and PpFHC are similar to NSFM-144g07 and NSFM-146d09, respectively; molecules identified by searching the L. longipalpis EST database.

Glutathione S-transferase (GST)
From clusters 125 and 232, two transcripts were identified to encode putative GSTs with homology to other dipteran GSTs in the Sigma and Delta/Epsilon classes, respectively. The predicted molecular weights of the two putative proteins are similar at 23.2 kDa for cluster 125 and 24.5 kDa for cluster 232. Within the midgut, these proteins may play an important role in the regulation of reactive oxygen species which occur as a by-product of hemoglobin digestion. Cluster 125 and 232 share high protein sequence similarity with L. longipalpis ESTs NSFM-105e10 and NSFM-74c11, respectively.

Unknown proteins
A large number of clusters produced by the three cDNA libraries have no sequence similarity to other known proteins. This has also been observed in the analysis of the Chironomus tentans midgut with good evidence that the unknown transcripts contained coding sequences [26]. It is also possible that the abundance of unidentifiable sequences may be caused by the sequence quality of the transcripts or that the captured sequences are 3' untranslated regions, non-coding small nuclear RNA, or sequences of uncharacterised organisms such as bacteria and yeast present in the sandfly midgut. A number of clusters with unknown functions were identified as coding sequences which exhibited signal peptides, such as clusters 11 and 126.

Functionally characterised proteins
From the three cDNA libraries, we identified chitinase transcripts which were then expressed as recombinant proteins for the demonstration of activity in the midgut of P. papatasi sandflies [15]. Another product of the cDNA libraries was the identification and characterisation of a galectin protein as the first arthropod receptor for a parasite; specifically, L. major within the P. papatasi sandfly midgut [1].

Comparative analysis of transcripts that significantly differ from the sugar-fed and blood-fed midgut cDNA libraries
To investigate the effects of blood feeding on the midgut expression profile in P. papatasi, we compared the abundance of transcripts in sugar and blood-fed cDNA libraries. We hypothesized that a blood meal will have an effect on the expression of sandfly midgut transcripts that will be reflected in the relative abundance of sequences forming a cluster in the two libraries. Chi-square statistical analysis was used to evaluate the significance of the differences in the abundance of midgut transcripts from unfed and blood-fed cDNA libraries thereby identifying different expression profiles of selected midgut transcripts in each cDNA library.
We observed a significant difference (P value ≤ 0.05) in the abundance of a number of midgut transcripts when we compared the sugar-fed and blood-fed sandfly midgut cDNA library. Table 3 shows a list of selected transcripts that were either more abundantly or less abundantly expressed in these two cDNA libraries.
As expected, transcripts coding for proteolytic enzymes such as trypsin (PpTryp4), and chymotrypsin (PpChym2) were more abundantly represented in the blood-fed cDNA library than in the sugar-fed cDNA library (Table  3). Other transcripts coding for peritrophin and microvilli-like proteins and ferritin were also more abundantly represented in the blood-fed cDNA library. Also, we observed a number of transcripts that were less abundantly represented in the blood-fed cDNA library, such as tryspin 1 (PpTryp1), and peritrophin (PpPer2).

Validation of transcript abundance of selected sequences by real-time PCR
In order to validate the results observed by the chi-square analysis, we further characterised several transcripts by semi-quantitative end-point reverse-transcriptase PCR as well as by real-time PCR. These were utilised to assess the Phlebotomus papatasi midgut carboxypeptidase like proteins  The results of semi-quantitative PCR can be seen in Figures 10B and 10D where the induction of PpPer1 is clearly evident. The differences in PpPer2 expression between the two midguts conditions is less clear using this technique ( Figure 10D). Figure 10A shows the transcript abundance of PpPer1 as fold change over the control gene in non blood-fed and post blood-meal ingestion as measured by real-time PCR. Figure 10C shows the same real-time PCR analysis of the PpPer2 transcript. The profile of the peritrophin transcripts by real-time PCR strongly correlates with the profile found in the libraries based on the number of sequences.
Based on real-time PCR, PpPer1 expression is induced by blood digestion and it is not detected in sugar fed midguts, corresponding with the lack of any sequences produced in the sugar-fed midgut cDNA library, compared to 54 sequences found in the blood fed library. As predicted by the high sequence abundance of PpPer2 in the sugarfed cDNA library the expression of this transcript is highest in unfed sand flies and seems to be down-regulated by the ingestion of a blood meal ( Figure 10C).

Pp EU045340
EEQYNGQRELAGKISTLRKMMKTHGTIGEFMYDKKLLE Ae AAK15639 DEQHKGQRELAEKIATLKKMKKSAPKLGEFLFDKNHM-Ag XP_312474 EEQHQGQRDLAGKITMLSKLLRTNPKLGEFMFDKQNM-Gm ABC48949 EEQLHGQRDLAGKISTLKKMMDNHGGLGEFLFDKEL--Dm AAB70121 EEQLHGQRELAGKLTTLKKMMDTNGELGEFLFDKTL--11 and illustrate the induction of transcription by the ingestion of a blood meal. This mirrors what is seen by the sequence abundance of the cDNA library, in which only one sequence of PpMVP2 was observed in the sugar-fed cDNA library. The remaining sequences were contributed by the cDNA library produced from blood-fed sandflies.
Pptryp1 low and Pptryp4 high transcript abundance, were in accordance with the results of previously published endpoint reverse-transcriptase PCR [13]. Additionally, the previously characterised chitinase molecule, PpChit1, was identified in cluster 243 and produced by three sequences contributed by the blood-fed cDNA library with none present in the sugar-fed cDNA library. The mRNA expression levels of PpChit1 peak at 72 hours post blood-meal ingestion [15].

Comparative analysis of transcripts significantly differs from the blood-fed and L. major-infected midgut cDNA libraries
During its development within the midgut of the sandfly, Leishmania is faced with various potential barriers that may prevent the establishment of the infection. Among such potential barriers are digestive proteases (trypsins and chymotrypsins), the peritrophic matrix and the requirement for parasite attachment to the midgut epithe-lia to prevent excretion of parasites with remnants of the digested blood. Previous data suggested that Leishmania is able to downregulate proteolytic activity in the sandfly midgut [4]. Also, chitinases produced either by the sandfly [15] or by the Leishmania [27] facilitates parasites in the escape from the peritrophic matrix. Attachment to the midgut epithelia occurs via the presence of L. major lipophosphoglycan receptors, such as PpGalec [1] or, in the case of permissive sandflies, via the presence of midgut glycoproteins bearing terminal N-acetyl-galactosamine [28].
In sandflies, only a handful of midgut proteins have been clearly implicated in Leishmania development. Previous data indicated that Leishmania is able to manipulate the activity of certain digestive proteases, inhibiting or delaying their peak activity, possibly in order to survive the pro-Transcript abundance of microvilli associated-like proteins compared between unfed and blood fed sand flies Figure 11 Transcript abundance of microvilli associated-like proteins compared between unfed and blood fed sand flies. A, C, E: PpMVP1,PpMVP2,and PpMVP4 transcript fold over control (reference transcript = alpha tubulin) in unfed and blood fed P. papatasi midgut. B, C, F: PpMVP1,PpMVP2,and PpMVP4 semi-quantitative PCR amplified transcripts separated by agarose electrophoresis. teolytic attack it faces in the midgut of the vector [3,27]. We hypothesized that a blood meal containing L. major will affect the expression profile of midgut transcripts altering the abundance of the different transcripts in each of these cDNA libraries. Table 4 shows the results of the chi-square analysis when transcripts from the blood-fed and L. major-infected blood-fed cDNA libraries were compared. Of interest, the abundance of transcripts coding for proteolytic enzymes were dramatically decreased in the midgut cDNA library of sandflies fed on L. major-infected blood. Additionally, other transcripts that also appear to have their number reduced included those coding for microvilli-associated like proteins and peritrophins. Transcripts such as the one corresponding to PpTryp1 (trypsin 1) and one corresponding to PpPer2 (peritrophin 2) were more abundant. Other transcripts coding for unknown proteins were also less abundant in the L. major-infected cDNA library than in the blood-fed cDNA library. These data suggest that the parasite may be affecting the expression profile of these transcripts and this inhibition, particularly of proteolytic enzymes, may be advantageous for the survival and establishment of the parasite in the midgut of the sandfly.

Conclusion
Development of Leishmania within its sand fly host is largely restricted to the vector midgut. Within the midgut Leishmania begins its development confined within a peritrophic matrix and is subjected to the onslaught of digestive enzymes. Later, they attach to the epithelia to prevent excretion with remnants of the blood meal and detach as they develop into the infective metacyclic form before being transmitted to a suitable host during a subsequent blood meal. The sandfly midgut presents a number of biological barriers the Leishmania parasite must circumnavigate or defeat to proliferate and develop inside the insect vector. Acquiring a better understanding of the molecules present in this organ will illuminate the potential molecular interactions occurring between the Leishmania parasite and the sandfly vector. Comparative transcriptome analysis provides a powerful global approach as demonstrated by the repertoire of molecules identified from a whole organism or from a specific tissue and the generation of new hypotheses from these data. Large scale genome analyses benefit from data generated from transcriptome analyses, for example, by aiding in the annotation of exons and introns.
The results of the present work provide insights into the repertoire of the molecules present in the midgut of the sandfly P. papatasi, the natural vector of L. major. We identified a variety of molecules and obtained high quality, full-length sequences from many of them. The high quality sequences were deposited at NCBI, significantly augmenting the available midgut-specific coding sequences. A large number of non-annotated sequences were deposited in the EST database for the scientific communities to access these transcripts.
The global changes in sandfly midgut expression profile were assessed by comparing data generated from randomly sequenced midgut cDNA clones obtained from cDNA libraries of adult females fed on sugar only, blood or blood with the addition of L. major. Our approach allowed for the identification of transcripts that are induced by blood feeding and likely participate in the digestion of the blood meal and events leading to egg production. Digestion of blood as a nutritional source is complicated by the cellular and molecular response and components of the blood itself, once ingested by the insect vector. Transcripts identified in the P. papatasi midgut, such as ferritin, Kazal-type serine protease inhibitors, and GST, are examples of the molecules identified on the gut of this insect. Additionally, the inclusion of a L. majorinfected midgut cDNA library provides insight into genes potentially regulated by this parasite during its development within the sandfly midgut. The random sequencing approach followed by the in silico analysis of the transcript abundance was supported by experimental analyses obtained via real-time PCR.

Sandflies
Phlebotomus papatasi sandflies (Saudi Arabia strain) were obtained from colonies maintained at Walter Reed Army Institute for Research (WRAIR) and at NIAID-NIH. Three to 5-day old female sandflies were fed either on 20% sucrose solution (sugar fed) or on BALB/c mouse whole blood, via artificial meals [1], with or without the addition of 2 × 10 6 L. major (V1 strain) amastigotes per ml. Technologies, Carlsbad, CA) and 100 ng of mRNA was used to produce a first strand cDNA. A cDNA library, enriched for full-length cDNA, was synthesized using the SMART cDNA library construction kit (Clontech Laboratories, Mountain View, CA). One microgram of double stranded DNA for each original library (sugar-fed, bloodfed, L. major-infected) was fractionated using a Chromaspin 1000 column (Clontech Laboratories, Mountain View, CA) into small (S), medium (M) and large (L) transcripts based upon their electrophoresis profile on a 1.1% agarose gel. Pooled fractions were ligated into Lambda TriplEx2 vector (Clontech, Mountain View, CA) and packaged into lambda phage (Stratagene, La Jolla, CA). Individual libraries were plated on LB agar plates in order to achieve roughly 200-300 plaques per 182 mm plates.

Random sequencing
Unidirectional sequencing of randomly selected clones was completed as previously described [10]. Single, isolated plaques were picked from the plate using sterile wooden sticks and placed into 70 µl of water. Amplification of the cDNA was performed using Platinum PCR SuperMix (Invitrogen), 4 µl template, and primers PT2F1 (AAG TAC TCT AGC AAT TGT GAG C) and PT2R1 (CTC TTC GCT ATT ACG CCA GCT G). PCR amplification products were cleaned using either. Multiscreen PCR cleaning plates (Millipore) or Edge Biosystems PCR cleaning plates and three washes with ultra pure water. The cleaned PCR product was resuspended in 25 µl of water of which 4 µl were used for cycle sequencing with PT2F3 primer (TCT CGG GAA GCG CGC CAT TGT) and either DTCS reaction kit (Beckman) or Big Dye 3.1 (Applied Biosystems). Sequencing reaction products were cleaned using Sephadex G-50 (GE Healthcare) in a multiscreen cleaning plate (Millipore) and analysed using either CEQ8000 (Beckman Coulter) or ABI3700 (Applied Biosystems) DNA sequencing instrument.

Bioinformatic analysis
Detailed description of the bioinformatic analysis of the data appear in [10,29]. Briefly, prior to analysis the vector sequence was removed from the cDNA nucleotide sequences. Sequence data from the three libraries were grouped together and aligned to generate clusters of contiguous sequences or contigs based on 90% homology over 90 nucleotides, after sequences with more than 5% Ns were discarded.

Quantitative PCR
Quantitative PCR (qPCR) was performed in selected clones using the first-strand cDNA, obtained from 100 ng total RNA isolated from midguts dissected from P. papatasi females fed on sugar (unfed) or dissected after a blood meal (24-72 h post blood meal or PBM). cDNAs were synthesized using the 1 st Strand cDNA Synthesis kit (Invitrogen, San Diego CA). Transcript levels were measured with SYBR green dye using a LightCycler 2.0 (Roche Diagnostics, Manheim, Germany). For qPCR reactions, samples were subjected to an initial holding step at 95°C for 15 minutes, followed by an amplification step consisting of 35 cycles of 95°C for 10 seconds, 54°C for 20 seconds and 72°C for 20 seconds with a single acquisition. The reaction continued with a single-cycle melting step of 95°C for 10 seconds, 67°C for 30 seconds and 95°C for 10 seconds, prior to cooling for 1 minute. Equal amounts of cDNA were amplified using gene-specific primer sets targeting individual transcripts as well as a P. papatasi alpha tubulin, as control or reference transcript. Reactions were routinely done in duplicate. The relative expression ratio of the target transcript and control or reference transcript (fold over control) was calculated using the Light-Cycler relative quantification software (Roche).

Semi-quantitative PCR
Semi quantitative RT-PCR reactions were performed with selected transcripts to further demonstrate the differential expression of these genes in P. papatasi midgut. In this case, 100 ng of total RNA isolated from midguts dissected from P. papatasi females fed on sugar (unfed) or dissected after a blood meal (48 h PBM) were used to synthesize a cDNA using the 1 st Strand cDNA Synthesis kit (Invitrogen). PCR reactions were carried out by an initial hot start at 95°C for 5 minutes followed by 25 cycles of 95°C for 30 seconds, 54°C for 1 minute and 72°C for 1.5 minutes and a final extension cycle of 72°C for 5 minutes. PCR products were separated on 1.5% agarose.