A deep insight into the sialotranscriptome of the mosquito, Psorophora albipes
© Chagas et al.; licensee BioMed Central Ltd. 2013
Received: 26 July 2013
Accepted: 4 December 2013
Published: 13 December 2013
Skip to main content
© Chagas et al.; licensee BioMed Central Ltd. 2013
Received: 26 July 2013
Accepted: 4 December 2013
Published: 13 December 2013
Psorophora mosquitoes are exclusively found in the Americas and have been associated with transmission of encephalitis and West Nile fever viruses, among other arboviruses. Mosquito salivary glands represent the final route of differentiation and transmission of many parasites. They also secrete molecules with powerful pharmacologic actions that modulate host hemostasis, inflammation, and immune response. Here, we employed next generation sequencing and proteome approaches to investigate for the first time the salivary composition of a mosquito member of the Psorophora genus. We additionally discuss the evolutionary position of this mosquito genus into the Culicidae family by comparing the identity of its secreted salivary compounds to other mosquito salivary proteins identified so far.
Illumina sequencing resulted in 13,535,229 sequence reads, which were assembled into 3,247 contigs. All families were classified according to their in silico-predicted function/ activity. Annotation of these sequences allowed classification of their products into 83 salivary protein families, twenty (24.39%) of which were confirmed by our subsequent proteome analysis. Two protein families were deorphanized from Aedes and one from Ochlerotatus, while four protein families were described as novel to Psorophora genus because they had no match with any other known mosquito salivary sequence. Several protein families described as exclusive to Culicines were present in Psorophora mosquitoes, while we did not identify any member of the protein families already known as unique to Anophelines. Also, the Psorophora salivary proteins had better identity to homologs in Aedes (69.23%), followed by Ochlerotatus (8.15%), Culex (6.52%), and Anopheles (4.66%), respectively.
This is the first sialome (from the Greek sialo = saliva) catalog of salivary proteins from a Psorophora mosquito, which may be useful for better understanding the lifecycle of this mosquito and the role of its salivary secretion in arboviral transmission.
Psorophora mosquitos—commonly known as “giant mosquitoes”—belong to the subfamily Culicinae, which includes many genera with epidemiologic importance to humans and animals such as Aedes, Ochlerotatus, Haemagogus, and Culex. Notably, members of the Psorophora genus are found only in the New World. Psorophora mosquitoes are opportunistic, having mammals and birds as the main hosts of their blood-feeding [1, 2]. Psorophora females have been associated with transmission of equine encephalitis virus, West Nile fever virus, and other arboviruses [3–9].
The phylogeny of mosquitoes includes three subfamilies within the Culicidae: Anophelinae, Culicinae, and Toxorhynchitinae. Studies based on the morphology, behavior, biogeographic distribution, and life-history suggest the Anophelinae subfamily as monophyletic and basal into the Culicidae family. On the other hand, the Culicinae subfamily includes the majority of remaining mosquito genera distributed into ten tribes. Psorophora mosquitoes share the tribe Aedini together with Aedes, Ochlerotatus, and other mosquito genera, while Culex mosquitoes belong to the Culicini tribe. Previous studies have supported the genera from the tribe Culicini as basal to genera of the tribe Aedini . These results are in agreement with the phylogeny proposed by Besansky and Fahey . The Psorophora genus contains 48 species divided into three subgenera: Grabhamia (15 species), Janthinosoma (23 species), and Psorophora (10 species) . Recently, morphologic and molecular studies have supported Psorophora as a sister group with Aedes/Ochlerotatus[13–15]. In contrast, studies using 18S rDNA sequence have suggested Psorophora species as a sister group to Culex and/or to Aedes/Ochlerotatus species [12, 16].
The salivary glands (SGs) of hematophagous insects secrete a cocktail of biochemically active compounds  that interacts with hemostasis [18–21], immunity, and inflammation of their hosts [22, 23]. Perhaps because of the continuous contact of mosquito salivary proteins with host immunity, salivary proteins are at a fast pace of evolution and divergence, even in closely related species . In the past decade, the continuous advances in the fields of transcriptome and proteome analysis led to the development of high-throughput sialotranscriptome studies (from the Greek sialo = saliva) [23, 25]. These studies resulted in a large database of secreted salivary proteins from different blood-feeding arthropod families including members of the Culicidae family.
All mosquito sialotranscriptome studies so far have targeted members of the Aedes, Ochlerotatus, Anopheles, and Culex genera , which are important vectors of human and animal diseases. Although some Psorophora species are known to be vectors of several arboviruses, the molecular composition of their salivary secretion remains unknown. Our primary aim was to investigate the salivary transcriptome and proteome of a member of the Psorophora genus (Psorophora albipes) to ultimately better understand the evolution of SG composition within the Culicidae family. In addition, our work makes available the first platform of salivary proteins from this mosquito genus, relevant for improving our understanding of mosquito evolution, the evolving risks in public health due to the recent expansions of Psorophora mosquitoes to the North, and for development of exposure markers to mosquito bites and to vector-borne diseases transmitted by mosquitoes.
Psorophora mosquitoes were collected in fragments of unflooded rain forest in Manacapuru municipality, Amazonas state, Brazil, using modified CDC traps. The mosquitoes were maintained with water and sugar solution and transported to Biodiversity Laboratory of Leônidas and Maria Deane Institute (Fiocruz/Manaus). The mosquitoes were identified using the taxonomic keys proposed by Forattini  and Consoll and Lourenco de Oliveira .
SGs from P. albipes (50 pairs) were dissected in 150 mM sodium chloride pH 7.4 and immediately transferred to 50 μl RNAlater® solution and maintained at 4°C until the RNA extraction. SG RNA was extracted and isolated using the Micro-FastTrack™ mRNA isolation kit (Invitrogen, San Diego, CA) per manufacturer’s instructions. The integrity of the total RNA was checked on a Bioanalyser (Agilent Technologies Inc., Santa Clara, CA).
The SG library was constructed using the TruSeq RNA sample prep kit, v2 (Illumina Inc., San Diego, CA). The resulting cDNA was fragmented using a Covaris E210™ focused ultrasonicator (Covaris, Woburn, MA). Library amplification was performed using eight cycles to minimize the risk of over-amplification. Sequencing was performed on a HiSeq 2000 (Illumina) with v3 flow cells and sequencing reagents. One lane of the HiSeq machine was used for this and two other libraries, distinguished by bar coding. A total of 135,651,020 sequences of 101 nt in length were obtained. A paired-end protocol was used. Raw data were processed using RTA 126.96.36.199 and CASAVA 1.8.2. mRNA library construction, and sequencing was done by the NIH Intramural Sequencing Center (NISC). Reads were trimmed of low-quality regions (< 10) and were assembled together with the assembly by short sequences (ABySS) software (Genome Sciences Centre, Vancouver, BC, Canada) [27, 28] using various kmer (k) values (every even number from 24 to 96). Because the ABySS assembler tends to miss highly expressed transcripts , the Trinity assembler  was also used. The resulting assemblies were joined by an iterative BLAST and cap3 assembler . Sequence contamination between bar-coded libraries were identified and removed when their sequence identities were over 98%, but their abundance of reads were > 50 fold between libraries. Coding sequences (CDS) were extracted using an automated pipeline, based on similarities to known proteins, or by obtaining CDS containing a signal peptide . Coding and their protein sequences were mapped into a hyperlinked Excel spreadsheet (presented as Additional file 1, and also located at http://exon.niaid.nih.gov/transcriptome/Psorophora_albipes/Pso-s2-web.xlsx.). Signal peptides, transmembrane domains, furin cleavage sites, and mucin-type glycosylation were determined with software from the Center for Biological Sequence Analysis (Technical University of Denmark, Lyngby, Denmark) [32–35]. Reads were mapped into the contigs using blastn  with a word size of 25, masking homonucleotide decamers and allowing mapping to up to three different CDS if the BLAST results had the same score values. Mapping of the reads was also included in the Excel spreadsheet. Automated annotation of proteins was based on a vocabulary of nearly 250 words found in matches to various databases—including Swissprot, Gene Ontology, KOG, PFAM, and SMART, and a subset of the non-redundant protein database of the NCBI containing proteins from vertebrates. Further manual annotation was done as required. Detailed bioinformatics analysis of our pipeline can be found in our previous publication . Sequence alignments were done with the ClustalX software package . Phylogenetic analysis and statistical neighbor-joining bootstrap tests of the phylogenies were done with the Mega package . Blast score ratios were done as indicated previously . For visualization of synonymous and non-synonymous sites within coding sequences, the tool BWA aln  was used to map the reads to the CDS, producing SAI files that were joined by BWA sampe module, converted to BAM format, and sorted. The sequence alignment/map tools (samtools) package  was used to do the mpileup of the reads (samtools mpileup), and the binary call format tools (bcftools) program from the same package was used to make the final vcf file containing the single-nucleotide polymorphic (SNP) sites, which were only taken if the site coverage was at least 100 (-D100), the quality was 13 or better and the SNP frequency was 5 or higher (default). Determination of whether the SNPs lead to a synonymous or non-synonymous codon change was achieved by a program written in Visual Basic by JMCR, the results of which are mapped into the Excel spreadsheet and color visualized in hyperlinked rtf files within Additional file 1.
Fifty SG pairs from female P. albipes were used in the proteome analysis. Briefly, the glands were sonicated and the supernatant was boiled for 10 min in reducing Laemmli gel-loading buffer and subsequently resolved on a NuPAGE 4-12% Bis-Tris precast gradient gel. Proteins were visualized with SimplyBlue stain (Invitrogen). The gel was arbitrary sliced into 19 individual sections (coded as F1–19) that were destained and digested overnight with trypsin at 37°C. ZipTips® (Millipore, Belford, MA) were used to extract and desalt the peptides, which were resuspended in 0.1% TFA before mass spectrometry analysis (MS).
Nanoflow reverse-phase liquid chromatography coupled with tandem MS (MS/MS) was performed as described . We obtained a database of the tryptic peptides identified by MS as a final product. This was used to search for matches from our transcriptome database of P. albipes. Additional details about the proteome procedure and analysis can be found in the methodology described in Chagas et al. .
Functional classification of transcripts from salivary glands of the mosquito Psorophora albipes
Number of coding sequences
% of CDS
Number of reads
% of total reads
After annotation, 7,537,805 reads (55.69% of the reads mapped to CDS) were classified as originating from transcripts encoding putative S proteins, and these were assembled into 802 contigs (24.70% of the total contigs) (Table 1). Signal peptide was detected in these contig sequences, suggesting that these contigs encode for proteins secreted in the saliva. In addition, 5,473,151 transcript reads (40.44% of the total reads) mapped to transcripts encoding H proteins, which were assembled into 1,973 contigs (60.76% of the total contigs). Another 85,213 reads (0.63% of total reads) correspond to transposable elements, and 439,060 reads (3.24% of total reads) were classified as originating from transcripts that encode for U products (Table 1).
Functional classification of the housekeeping products expressed in the female Psorophora albipes salivary glands
Number of contigs
Number of reads
Protein synthesis machinery
Transporters and channels
Unknown conserved with transmembrane domains
Amino acid metabolism
Signal transduction - apoptosis
Functional classification of transcripts coding for putative secreted proteins in female Psorophora albipes salivary glands
% Secreted reads
5′ nucleotides/Apyrase family
Adenosine deaminase family
Ubiquitous protease inhibitor domains
Kazal domain-containing peptides
TIL domain family found in mosquitoes
Schistocerca protease inhibitor
Cystatins—may be housekeeping
Fred/Ficolin domain-containing proteins
C type lectins
Peptidoglycan recognition domains
Gram-negative binding proteins
Mucin I mosquito family
gSG5 mucin protein family
SG3 mucin family
Long-D7 mosquito family
Culicine short-D7 proteins
Salivary mosquito OBP
Yellow Phlebotomine family
Ubiquitous protein families existing outside Nematocera, function unknown
Aedes hypothetical secreted conserved proteins
12- to 14-kDa mosquito family similar to Drosophila proteins
15- to 17-kDa insect family
Culex/Drosophila WAP subfamily
Protein families exclusive of bloodsucking Nematocera
30 kDa/Aegyptin family - Mosquitoes and black flies
Mucin II mosquito family
Protein families specific to mosquitoes
HHH peptide family
hyp8.2 Culicine family
Aedes W-rich peptides
Aedes 6.5–8.5 protein family
Aedes 62-kDa family
Basic tail mosquito family
34-kDa Aedes family
Aedes/Anopheles darlingi 14–15 family
GQ-rich Culicine family
23.5 kDa Culicine family
Culex WRP/16-kDa family
Salivary protein 16 family
HHH family 2
Anopheline SG1 family
Protein families specific to black flies
Simulium disintegrin similar to phenoloxidase inhibitor peptides
H-rich, acidic proteins of Simulium
Salivary-orphan proteins of conserved secreted families
Similar to OT-19—contains HH repeats
Aedes 5-kDa family
Aedes 7-kDa family
Families not reported on Nematocera - sialome review
Pso 4.7 kDa
Pso 4.01 kDa ultrashort-D7 family
Pso 4.2 kDa
Pso 6.3 kDa
Pso 12 kDa
Pso 12.8 kDa—novel mosquito peptide family
Pso 20.44 kDa—unique to Culicine
Pso 4.69 kDa—unique to Aedes
Other putative secreted proteins
Putative secreted proteins identified in the sialotranscriptome of Psorophora albipes and confirmed by our proteomic studies
Protein name | Fraction → Number of Tryptic peptides*
Psor-13556|F11→8, Psor-17515|F11→8, Psor-22761|F11→8, Psor-17516|F11→7, Psor-12600|F11→5, Psor-17320|F11→4
Psor-34082|F12→18, Psor-34081|F12→18, Psor-34084|F12→8
Psor-18383|F12→594, Psor-18379|F12→562, Psor-18398|F12→504, Psor-18380|F12→473, Psor-20123|F12→263, Psor-20135|F12→239, Psor-18402|F12→207, Psor-20121|F12→159, Psor-18262|F12→151, Psor-12755|F12→19
Psor-13510|F19→7, Psor-14008|F19→7, Psor-12935|F19→7, Psor-29999|F19→7
Psor-13808|F7→34, Psor-12515|F7→29, Psor-13816|F7→24, Psor-21012|F7→23, Psor-12520|F7→16, Psor-16372|F14→3
Psor-34191|F15→29, Psor-34198|F15→28, Psor-34194|F15→28, Psor-34190|F15→22, Psor-34202|F15→17, Psor-32651|F15→9, Psor-16545|F14→26, Psor-14130|F14→22, Psor-27735|F14→22, Psor-22249|F14→22, Psor-21941|F14→10, Psor-14244|F14→5, Psor-22962|F14→5, Psor-16516|F14→4, Psor-16517|F14→3, Psor-5239|F14→2
Psor-24290|F19→11, Psor-14799|F19→8, Psor-24280|F19→8, Psor-14821|F19→8, Psor-24282|F19→8, Psor-24283|F19→7, Psor-11438|F19→5, Psor-33959|F19→3
Psor-12020|F16→22, Psor-14194|F16→5, Psor-12762|F15→7, Psor-17302|F15→3
Psor-20191|F15→39, Psor-18075|F14→19, Psor-18076|F14→18, Psor-18880|F14→10, Psor-19455|F14→7
Basic tail mosquito
Psor-23962|F18→7, Psor-23027|F12→4, Psor-18121|F12→2, Psor-15772|F12→2, Psor-15774|F12→2
Psor-19686|F19→5, Psor-19685|F19→4, Psor-19523|F19→3, Psor-12072|F19→3
Psor-31485|F19→2, Psor-31484|F19→2, Psor-31419|F19→2
Psor-4 kDa ultrashort-D7
We confirmed expression of 20 of 83 (24.09%) S protein families described in the sialotranscriptome. The three strongly stained bands of the gel apparently match to: F9 (glycosidase family), F11 (apyrase, adenosine deaminase), F15 (long-D7 mosquito family, 30-kDa Aegyptin family, Antigen-5). To conclude, six of ten protein families described as highly expressed in our P. albipes SG transcriptome (glycosidases, 30.5-kDa family, long-D7 mosquito family, 30-kDa Aegyptin-like family, Serpin family, and Culicine D7 mosquito family, respectively) were confirmed to be present in the salivary proteome of P. albipes based on our subsequent proteomics analysis. Furthermore, seven families (35% of the total families confirmed by proteome) described in the transcriptome as specific for mosquitoes (basic tail mosquito, Aedes 62 kDa, 9.7-kDa family, Hyp8.2 Culicine, 30.5-kDa family, 23.5-kDa family, Aedes 34 kDa) were also confirmed by our proteome analysis. Additionally, the proteomics analysis confirmed the presence of the newly described protein family named as “Psor-4 kDa ultrashort-D7 family–Contig Psor-9075.” More details about contigs/families found in the proteome of Psorophora can be seen in Figure 1 and Table 4. Tryptic peptides were assigned to several contigs encoding for H proteins (Additional file 1) such as a P. albipes Sphingomyelin phosphodiesterase that shows 55% amino acid identity to the homolog/ortholog from Culex quinquefasciatus. Previous proteomic studies using mosquito SGs identified some abundant protein families in Aedes aegypti such as long-D7 protein, adenosine deaminase, serpin, and 30-kDa Aegyptin . Members of all these families were similarly identified in our P. albipes proteome. Additionally, members of the two mosquito-specific families—known as 34-kDa and 32-kDa families—were identified in our Psorophora proteome; members of this family were described as immunogenic in the proteome study of Ae. aegypti saliva . Also, the antigen-5 protein was confirmed in the Psorophora proteome, and members of this family have been previously described as a SG-secreted product in Culex. Many of the identified proteins have homologs/orthologs in other mosquitoes that have been described as related to blood feeding.
The following highlights are related to the secreted sialome of P. albipes compared with others from bloodsucking Nematocera.
ʹ-nucleotidase/apyrases, adenosine deaminase, ribonuclease, endonuclease, alkaline phosphatase, serine proteases, lipase, destabilase/lysozyme, hyaluronidase, and glycosidases were identified. Cathepsins and serine-type carboxypeptidase are also noted but could be of H functions. These enzymes have all been found before in mosquito sialotranscriptomes, and their role in blood and sugar feeding has been reviewed . Notably in the case of Psorophora, however, is the finding of both endonuclease (identified by MS/MS in band 12) and hyaluronidase, which were previously restricted to C. quinquefasciatus and sand flies, but not found in Aedes or Anopheles sialotranscriptomes. This enzyme combination may help decrease skin-matrix viscosity and diffusion of salivary components, as well as breaking down neutrophil extracellular traps . Apyrase, adenosine deaminase, and glycosidases were found by MS/MS in fraction 10, consistent with their expected sizes. Transcripts encoding for sphingomyelin phosphodiesterases (SMases)—some of which are highly transcribed with coverages higher than 500—is an unusual finding in mosquito sialotranscriptomes. Although lacking the initial methionine, Psor-15064 matches at position 6 a C. quinquefasciatus protein with 55% identity over 564 amino acids that has a predicted signal peptide. The SMases are members of the DNase I superfamily of enzymes responsible for breaking sphingomyelin into phosphocholine and ceramide. In addition, activation of SMase is suggested to play a role in production of ceramide in response to cellular stresses . Tryptic peptides originating from SMase were found in fractions 11 and 12 of the NuPage gel in our proteomic analysis. The high expression of this enzyme suggests it may be secreted.
Lysozyme, gambicin, cecropin, and defensins were found among antimicrobial agents. Pathogen recognition proteins of the ML domain, Fred/ficolin, Gram negative binding, peptidoglycan recognition, leucine-rich, galectin, and C-type lectin families were identified. Of these, lysozyme was identified in gel fraction 19 by MS/MS.
The yellow gene in Drosophila is responsible for tanning of the cuticle, and the mosquito homolog was shown to have a dopachrome oxidase function [53, 54]. This protein family is specific to insects, the royal jelly protein being a member of the superfamily . Interestingly, sand flies—but no other insect sialotranscriptomes—have two members of this family recently shown to be a scavenger of serotonin [56–58]. The P. albipes sialotranscriptome revealed two members of this family, probably alleles, relatively well expressed, assembled with over 200 × coverage. This is the first description of a yellow family member in mosquito sialotranscriptomes; however, these results derive from a high-coverage mosquito sialotranscriptome, and it may be possible that members of this family may be found in species previously studied if higher transcript coverage is attained.
1,319,744 reads (18% of the total reads classified as S products) mapped to transcripts encoding proteins that can be classified according to their sequence similarity to 18 different protein families (21.68% of the total S protein families described in this transcriptome) previously described as unique to mosquitoes, i.e., they are not recognized in any other organism apart from mosquitoes . A total of 69.23% of these mosquito-specific contigs had their best matches originating from Aedes, followed by 8.15% best matching to Ochlerotatus, 6.52% to Culex, and 4.66% to Anopheles. A previous review of Nematocera sialomes  proposed that some of these mosquito-specific families appear to be spread in all mosquito genera (studied so far), while others show specific distributions to a certain mosquito subfamily and/or genus. Accordingly, we conceptually divided our discussion regarding the mosquito-specific protein families present in Psorophora sialomes into four groups: i) mosquito-specific protein families common to Culicines and Anophelines, ii) mosquito-specific protein families thus far found only in Culicines, iii) mosquito-specific protein families unique to Aedes/Ochlerotatus, and iv) mosquito-specific protein families unique to Culex.
Nine of the 12 protein families previously known as common to Culicine and Anopheline were described in the P. albipes transcriptome: the HHH peptide family, the HHH peptide family 2, the mosquito basic tail family, the salivary protein 16 family, the Aedes/Anopheles darlingi 14-15 family, the gSG8 family, the Hyp6.2 family, the Aedes 62 kDa family, and the Anopheline SG1 family. Although commonly found in mosquito SG transcriptome analyses, no member of these families has been functionally characterized so far. Moreover, studies based on RT-PCR have assigned to some of these family members a tissue and/or sex specificity in their expression that suggests a role in the physiology of Ae. albopictus SGs .
Mosquito basic tail proteins contain a Lys dipeptide tail (Figure 3B) and have been suggested as binding to negatively charged phospholipids found in cell membranes such as in the surface of platelets . They may also be associated with plasminogen activation [61, 62]. In the Psorophora transcriptome, six contigs (0.43% of the total contigs classified as S products) match mosquito basic tail peptides with 50% identity to Ae. albopictus family members (Additional file 1). Three tryptic peptides in our proteome analysis match contig Psor-13880, which encodes for a member of this family. Phylogenetic analysis of the basic tail mosquito family supports divergence of Culicine salivary proteins from the Anopheline family members (Figure 3D) where Anopheline and Culicine proteins are grouped in distinct clades (Figure 3D). Although Anophelines lack the basic tail, they have a conserved backbone (Figure 3B). In the Culicine clade, we observe that all Psorophora proteins are isolated in a genus-specific branch, separated from the other Culicine proteins with strong bootstrap support (Figure 3D).
Family Hyp6.2, represented with three truncated-sequences, is approximately 45% identical to the homologs from Ochlerotatus (Additional file 1). Additionally, all the contigs found in P. albipes transcriptome from the mosquito-specific families HHH family-2, salivary protein 16 family, Aedes/An. darlingi family, gSG8 family, and Aedes 62-kDa family have as their best matches the homologs from Ae. aegypti, with identities varying from 80% to 42% (Additional file 1). Proteome analysis revealed tryptic peptides originating from Psorophora family members showing higher similarities to the Aedes 62-kDa family (Additional file 1).
To date, five protein families found in the P. albipes sialotranscriptome are unique to Culicines. Two of these (9.7-kDa and Hyp8.2 Culicine protein families) may play a role in blood feeding, as they are abundantly expressed in female Ae. albopictus SGs . The 30.5-kDa and 23.5-kDa protein families appear to be involved in mosquito sugar feeding due to their reported expression in male and female SGs ; however, the tissue specificity of the fifth protein family—namely, the GQ-rich Culicine family—is still unknown [63, 64]. So far, no member from these families has been functionally characterized.
Two abundantly expressed families in our transcriptome analysis are represented by the 30.5-kDa (4.35% of total reads encoding for S products) and Hyp8.2 Culicine families (1.30% of total reads encoding for S products). The first family was also within the 50 most-expressed families in this transcriptome (Table 3). Expression of these two families in Psorophora SGs was confirmed by our proteome analysis (Figure 1 and Table 4). Overall, they share 53% amino acid identity with the family member from Ae. albopictus (Additional file 1). The Psorophora 9.7-kDa and 23.5-kDa families had Ae. aegypti proteins as their best BLAST similarity matches; tryptic peptides were found in our proteome analysis identifying these family members. In contrast, members of the GQ-rich Culicine family revealed 58% identity to its homologs from C. quinquefasciatus (Additional file 1).
The phylogram of the 30.5-kDa (Figure 4B) and 23.5-kDa (Figure 4C) families confirm the same pattern seen for the 9.7-kDa family (Figure 4A) in the sense that Psorophora proteins are grouped in the same clade with Aedes proteins (Figure 4A–C). The GQ-rich family shows Psorophora members grouped within the same clade containing Ochlerotarus proteins (Figure 4D). Although previous studies using 18S rDNA sequence suggested Psorophora species as a sister group to Culex and/or a sister group to the Aedes/Ochlerotatus species [12, 16], our results suggest—based on the composition of the salivary proteins—that Psorophora is much closer to Aedes than to Culex.
Two putative S protein sequences match black fly proteins previously thought to be unique to Simulium sialomes. Three previously thought to be orphan proteins of Aedes and Ochlerotatus (Aedes 7-kDa and 5-kDa families and Ochlerotatus OT-19 family) were deorphanized. Eight novel salivary protein families were found in the Psorophora sialotranscriptome, four of which appear unique to Psorophora, while the others have matches to mosquito hypothetical proteins not previously described in sialotranscriptomes.
We additionally identified 372 transcripts sequences encoding for secreted polypeptides, most of which have no relevant matches to any sequence deposited thus far in the NR database. Two of these were identified by proteome analysis. All details of these proteins are in the hyperlinked Excel spreadsheet available in Additional file 1.
Psorophora albipes polymorphisms detected on a set of 1,100 coding sequences of 16 functional classes
Average (Syn /Codon) x 1001
Average (NS /codon) x 1002
The sialotranscriptome of P. albipes as described here is the first—or among the first—to use solely Illumina sequences for its assembly, in the absence of a reference genome. Over 3,000 coding sequences were recovered, 1,790 of which were submitted to GenBank. This is also the first transcriptome of a member of the Psorophora genus. As expected, the protein sequences presented more similarities to Aedes, followed by Culex and Anopheles proteins. Despite this more Aedine nature, P. albipes presented some Culex characters—such as the presence of endonuclease and hyaluronidase—common in sand flies and black flies but so far uniquely found in Culex. A Psorophora protein similar to the WRP/16-kDa family also unique so far to Culex allowed the discovery of a “missing link” between this Culex family and hypothetical Ae. aegypti proteins, indicating this gene family is ancestral in all Culicines but poorly or not expressed in Aedes SGs. Orphan protein families from Aedes and Ochlerotatus were deorphanized, and several new families of proteins were identified, four of which appear unique to Psorophora, supporting the idea that sialotranscriptomes of new bloodsucking genera yield at least two novel protein families . However, these novel sequences may result from misassemblies or chymeras. Further sequencing of other Psorophora species may clarify this area. Unique to Psorophora is also the finding of SMase, not previously found in mosquito sialomes. Because the sample derived from 50 field-collected mosquitoes, we also were able to derive an estimate of SNPs and the rate of synonymous and non-synonymous mutations in this data set.
All data from the transcriptome and proteome analysis of P. albipes SGs are disclosed in Additional file 1, a hyperlinked Excel spreadsheet available at http://exon.niaid.nih.gov/transcriptome/Psorophora_albipes/Pso-s2-web.xlsx. Raw reads were deposited in the SRA of the NCBI under bioproject numbers PRJNA208524 and 208958 and raw data file SRR908278. One thousand seven hundred and ninety coding sequences have been publicly deposited in the Transcriptome Shotgun Assembly project at DDBJ/EMBL/GenBank under accession GALA00000000. The version described in this paper is the first version, GALA01000000, ranging from GALA01000001 to GALA01001790.
Assembly by short sequences software
Binary call format tools
National Center for Biotechnology Information
NIH Intramural Sequencing Center
Sequence alignment/map tools
Sequence read archives
Proteins of unknown function
This work was supported by the Intramural Research Program of the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health and by Fundação Oswaldo Cruz (Fiocruz) represented by Instituto Leônidas e Maria Deane (ILMD). We also thank the PAPES V program support FIOCRUZ/CNPq. We are grateful to Dr. Michalis Kotsyfakis for the critical reading of the manuscript and to Brenda Rae Marshall, DPSS, NIAID, for editing assistance. In addition, we thank Dr. Roberto Rocha (Fiocruz/Amazonia/ Brazil) for his support. Because JMCR, EC, and ACC are government employees and this is a government work, the work is in the public domain in the United States. Notwithstanding any other agreements, the NIH reserves the right to provide the work to PubMedCentral for display and use by the public, and PubMedCentral may tag or modify the work consistent with its customary practices. You can establish rights outside of the U.S. subject to a government use license.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.