An insight into the sialotranscriptome of the West Nile mosquito vector, Culex tarsalis
© Calvo et al. 2010
Received: 25 August 2009
Accepted: 20 January 2010
Published: 20 January 2010
Skip to main content
© Calvo et al. 2010
Received: 25 August 2009
Accepted: 20 January 2010
Published: 20 January 2010
Saliva of adult female mosquitoes help sugar and blood feeding by providing enzymes and polypeptides that help sugar digestion, control microbial growth and counteract their vertebrate host hemostasis and inflammation. Mosquito saliva also potentiates the transmission of vector borne pathogens, including arboviruses. Culex tarsalis is a bird feeding mosquito vector of West Nile Virus closely related to C. quinquefasciatus, a mosquito relatively recently adapted to feed on humans, and the only mosquito of the genus Culex to have its sialotranscriptome so far described.
A total of 1,753 clones randomly selected from an adult female C. tarsalis salivary glands (SG) cDNA library were sequenced and used to assemble a database that yielded 809 clusters of related sequences, 675 of which were singletons. Primer extension experiments were performed in selected clones to further extend sequence coverage, allowing for the identification of 283 protein sequences, 80 of which code for putative secreted proteins.
Comparison of the C. tarsalis sialotranscriptome with that of C. quinquefasciatus reveals accelerated evolution of salivary proteins as compared to housekeeping proteins. The average amino acid identity among salivary proteins is 70.1%, while that for housekeeping proteins is 91.2% (P < 0.05), and the codon volatility of secreted proteins is significantly higher than those of housekeeping proteins. Several protein families previously found exclusive of mosquitoes, including only in the Aedes genus have been identified in C. tarsalis. Interestingly, a protein family so far unique to C. quinquefasciatus, with 30 genes, is also found in C. tarsalis, indicating it was not a specific C. quinquefasciatus acquisition in its evolution to optimize mammal blood feeding.
Most adult female mosquitoes are hematophagous, in addition to taking sugar meals. Saliva helps blood feeding by interfering with host reactions that could disrupt the blood flow, assists sugar meals with glycosidases, and contain antimicrobials that may control microbial growth in their meals . With the advent of tissue transcriptomics, we can postulate that these functions are mediated by large numbers (70-100) of polypeptides, many of which are expressed solely in the adult female salivary glands . Unique protein families have been found in Anopheles, Aedes or Culex mosquitoes, as well as a group of common proteins or enzymes [3–5]. Functional characterization of these proteins uncovers scavengers of biogenic amines [6, 7] or leukotrienes , inhibitors of blood clotting [9–11], bradykinin formation [12, 13], platelet aggregation [14, 15] and vasodilators [16, 17]. Other molecularly uncharacterized activities include inhibitors of mast cell TNF production  and inhibition of T cell activation [19, 20]. It is apparent that the complexity of the salivary components affecting host hemostasis and inflammation mirrors the complexity of host hemostasis and inflammation itself, which must be disarmed for successful blood feeding. Indeed mosquitoes lacking salivation by salivary duct ablation feed less and take more dangerous time of exposure to their hosts [21, 22].
Perhaps due to the potent pharmacological activities of saliva, or the immune reactions to it, mosquito saliva also plays a role in pathogen transmission, including in arboviral transmission [23–25]. Accordingly, determination of the salivary composition of vector mosquitoes not only discovers new potentially pharmacologically active molecules, but also can also help generating vaccine targets for disruption of arboviral transmission. These proteins may also be of epidemiological significance as selective human or animal markers of vector exposure [26–29].
We have previously described the sialotranscriptome of C. quinquefasciatus , where several unique protein families were discovered, many of which are products of gene duplications. Indications of horizontal gene transfer from bacteria to mosquitoes were also pointed out as participating in the generation of mosquito sialomes [3, 4]. To this date, no other sialotranscriptome from a member of the Culex genus has been described. We currently portray the sialotranscriptome of Culex tarsalis, a North American bird feeding mosquito , and, like C. pipiens, a good vector of West Nile virus .
A total of 1,753 cDNA clones were used to assemble a database (Additional file 1, Table S1) that yielded 809 clusters of related sequences, 675 of which contained only one expressed sequence tag (EST). The 809 clusters were compared, using the programs BlastX, BlastN, or RPSBLAST , to the nonredundant protein database of the National Center of Biological Information (NCBI), to a gene ontology database  to the conserved domains database of the NCBI , and to a custom prepared subset of the NCBI nucleotide database containing either mitochondrial or rRNA sequences.
Classification and relative accumulation of Culex tarsalis salivary glands transcripts that are associated with housekeeping function
Number of transcripts
Percent of housekeeping group
Protein synthesis machinery
Transporters and storage
Protein modification machinery
Amino acid metabolism
Protein export machinery
Nucleic acid metabolism
Classification and relative accumulation of Culex tarsalis salivary glands transcripts that are associated with secreted products
Number of transcripts
Percent of secreted group
Function known or presumed? *
Ubiquitous protein families
Salivary immunity related products
Antigen 5 family
Unique hematophagous Diptera proteins
30 kDa antigen/Aegyptin
6.3 kDa family
Uniquely Culicine families
30.5 kDa family
23.4 kDa family
41.9 kDa family
62 kDa family
Fragment of culicine salivary protein
Uniquely Culex families
16.7 kDa family
GQP repeat family
9.7 kDa family
4.2 kDa family
HHI repeat family
Cys rich family
7.8 kDa family
From the sequenced cDNAs, a total of 283 novel C. tarsalis protein sequences were derived, 80 of which code for putative secreted products (Additional file 2, Table S2). Because cDNA sequences coding for many of these proteins were found as singletons, this secretome is to be considered preliminary and incomplete, but many parallels with previous mosquito sialotranscriptomes can be drawn, as follows:
The D7 proteins constitute a unique multi gene family found expressed in the salivary glands of blood sucking Diptera , belonging to the superfamily of Odorant Binding Proteins (OBP) . Long and short versions exist, the long versions containing two and the short versions containing one OBP domain. Three genes codes for long D7 proteins and 5 genes code for short D7 proteins in An. gambiae , while in Ae. aegypti 2 genes code for long and 3 genes code for short D7 proteins . Short versions of anopheline D7 mosquitoes were shown to bind biogenic amines such as serotonin and histamine , thus counteracting hemostasis and inflammation. More recently, the amino terminal OBP domain of a D7 long form of Ae. aegypti was shown to bind peptidic leukotrienes with high affinity. The crystal structures of a short D7 protein from An. gambiae and a long D7 protein from Ae. aegypti revealed that the D7 OBP domains have 7 alpha helices, 2 more than the canonical OBP family . In addition to these inflammatory agonist binding functions, a short D7 protein from An. stephensi, named hamadarin, was shown to inhibit bradykinin formation by inhibiting the FXII/Kallikrein pathway .
Ctar-34, Ctar-35, Ctar-37 and Ctar-38 possess the signature [ED]- [EQ]-x(7)-C-x(12,17)-W-x(2)-W-x(7,9)- [TS]-x-C- [YF]-x- [KR]-C-x(8,22)-Q-x(22,32)-C-x(2)- [VLI] found in lipid binding D7 domain of Ae. aegypti . The serotonin binding motif found on short anopheline D7 proteins as well as the carboxydomain of the long D7 protein of A. aegypti [6, 7] was not found on any C. tarsalis D7 protein (nor in any D7 protein of C. quinquefasciatus - results not shown), indicating this motif may have evolved beyond recognition, or that other proteins in Culex may undertake this task (see below under CWRP heading).
Ctar-195 and Ctar-525 represent 2 canonical OBP protein sequences, transcripts of which were found expressed in the salivary glands of C. tarsalis. Homologs of these canonical OBP proteins were also previously found expressed in C. quinquefasciatus and An. gambiae sialotranscriptomes. These proteins tend to be much more conserved in sequence to mosquito and even Drosophila proteins and it is possible that they may play an endogenous or housekeeping function.
This protein family is typical of the salivary glands of adult female mosquitoes, and was first identified as a salivary antigen in Ae. aegypti , and later found in salivary transcriptomes and proteomes of both culicine and anopheline mosquitoes [4, 5, 35–37, 43, 44], where it was named GE rich protein. Distantly related proteins also exist in black flies and sand flies. More recently, proteins of this family from Aedes and Anopheles were shown to prevent platelet aggregation by collagen [14, 45], indicating conservation of function after the split of the culicidae into the culicines and anophelines, approximately 150 MYA .
Serine and threonine rich proteins are a common finding in sialotranscriptomes. These proteins are generally modified post-translationally and their mature forms have N-acetyl galactosamine residues, typical of mucins . They probably have a function to lubricate the food canals and may also have antimicrobial function. Ctar-246 and Ctar-429 encode related protein sequences that might reflect splice variants of the same gene. These 2 truncated protein sequences have over 60 predicted glycosylation sites, and are similar to a previously described salivary mucin of C. quinquefasciatus. Ctar-261 is related to Aedes and Culex mucins that, in the case of the Aedes protein, were associated with induction by viral infection suggesting an immune function for this protein with 13 putative glycosylation sites. Ctar-581 encodes a 5' truncated protein sequence similar to salivary mucins of C. quinquefasciatus and Aedes albopictus.
Additional file 1, Table S1 indicate the presence of transcripts coding for enzymes possibly associated with the sugar meal and blood meal. Related to the sugar meal are several transcripts coding for glycosidases similar to proteins annotated as maltase and amylase. Additional file 2, Table S2 provides one full length sequence of a salivary alpha-glucosidase similar to other Culicine salivary maltases, plus 4 other truncated sequences coding for different sugar hydrolases. Regarding blood meal-related enzymes, 4 EST's coding for fragments of adenosine deaminase were found, as well as for an endonuclease. In C. quinquefasciatus a salivary endonuclease has been previously characterized and associated with helping forming the feeding hematoma . However, the C. tarsalis enzyme is most closely related to sand flies salivary endonucleases and only distantly related to the salivary endonuclease of C. quinquefasciatus, although it is closely related to a non-salivary C. quinquefasciatus enzyme, indicating this C. tarsalis enzyme may be playing a housekeeping function.
Remarkably absent from the C. tarsalis sialotranscriptome are ESTs coding for members of the 5' nucleotidase, which functions as the salivary apyrase of mosquitoes. Apyrase hydrolyzes ATP and ADP to AMP and orthophosphate, destroying these important agonists of inflammation and platelet aggregation [2, 49]. It has been previously noticed that C. quinquefasciatus has little salivary apyrase activity when compared to Aedes and Anopheles mosquitoes, and this observation was postulated to be a consequence of the lack of platelets in birds, the most common host of Culex , while Aedes and Anopheles are mostly mammalian feeders . This may explain the absence of 5'-nucleotidase/apyrase coding transcripts in C. tarsalis, although an increased sequencing effort could produce 5'-nucleotidases that might be secreted. As an example of the abundance of this type of transcript in mosquito sialotranscriptomes, only one transcript out of 503 clones coded for apyrase in the C. quinquefasciatus sialotranscriptome , while 99 out of 4,066 were found in a similarly made library from An. gambiae, and 66 out of 4,232 ESTs were found in the Ae. aegypti sialotranscriptome . From these observed frequencies, the expected number of EST's for C. tarsalis and C. quinquefasciatus are 27 and 8, respectively, producing a Chi square of 54.6 and a P < 0.001, indicating the low expression of apyrase transcripts in Culex.
Three ESTs coding for a phosphatase produce the predicted truncated sequence encoded by Ctar-194, which is similar to a salivary alkaline phosphatase previously described in Ae. aegypti. The function of this enzyme in feeding, if any, is unknown.
Transcripts coding for at least 6 different serine proteases were found in C. tarsalis sialotranscriptome (Additional files 1 and 2, Tables S1 and S2). These enzymes may function in immunity as prophenoloxidase activators, or in digesting skin matrix components, such as in an elastase function, or hydrolysing host blood clotting enzymes such as fibrinogen/fibrin, or activating plasminogen.
Antimicrobial peptides, lysozyme, and pathogen pattern recognition polypeptides are commonly found in the sialotranscriptome of blood sucking arthropods. Additional file 2, Table S2 shows the full length sequence of C. tarsalis salivary lysozyme, which is 91% identical to the C. quinquefasciatus homolog, and 75% identical to the salivary homolog of Ae. albopictus. Truncated ORF's of a C-type lectin and a Gram-negative binding protein were also found. They both match previously described salivary proteins of Aedes and Culex.
This is a ubiquitous protein family found in animals and plants , and in all sialotranscriptomes of blood sucking Diptera analyzed so far. The function of these proteins in mosquito saliva is not known, but in blood sucking Brachycera two proteins of this family have been functionally characterized. Remarkably, in a tabanid fly, a member of the AG-5 family acquired a typical RGD domain surrounded by Cys residues and acts as a main platelet aggregation , and in the stable fly a salivary AG-5 protein binds immunoglobulins and may function as an inhibitor of the classical complement pathway . We present evidence, in the form of truncated transcripts, for the expression of at least two members of the family in C. tarsalis salivary glands; Ctar-151, assembled from 3 ESTs, matches with 94% identity the salivary secreted antigen-5 precursor AG5-3 from Culex quinquefasciatus while Ctar-438 matches with 91% identity another C. quinquefasciatus protein of the same family.
Two transcripts were found in the C. tarsalis sialotranscriptome coding for basic (pI = 8.8) peptides of mature MW of 9.7 kDa containing 12 Cys residues. This peptide family was previously found in the salivary transcriptome of C. quinquefasciatus, but close relatives of the same size exist in Drosophila, Bombyx, Tribolium and Apis. The ubiquity of this protein family in insects, its size and pI suggests an antimicrobial role.
The first 41.0 kDa family member was characterized in the sialotranscriptome of Ae. aegypti, and later found in C. quinquefasciatus and in Ae. albopictus [4, 5, 36, 55]. Although not present in the sialotranscriptomes of members of the anopheline Cellia subgenus An. funestus, An. stephensi and An. gambiae (including scaning of the deducted proteome), it was recently found in An. darlingi, a member of the Nyssorhynchus subgenus , characterizing this family as uniquely Culicidae. Additional file 2, Table S2 provides evidence of a member of this protein family in C. tarsalis, encoded by Ctar-541, producing a predicted protein of mature MW of 43 kDa, being 69% identical to its C. quinquefasciatus homolog. Psiblast of Ctar-541 against the NR protein database retrieves on its first blastp cycle only mosquito salivary proteins, as expected. On the second cycle, it retrieves with lower significance (e value > 0.005) mostly bacterial proteins, but also salivary proteins from Simulium vittatum and Culicoides sonorensis. Further iterations of Psiblast retrieves with high significance bacterial proteins of the methyl-accepting chemotaxis receptor family (MCP), which may suggest this bacterial family to be originated from horizontal gene transfer to an ancestral blood feeding Nematocera. The gene structure of this protein in C. quinquefasciatus and Ae. aegypti is similar, containing 2 exons with a short intron of ca. 60 nt. Inclusion of Simulium and Culicoides sequences in this family indicates an ancient origin before the Nematocera split .
Ctar-345 codes for a protein related to mosquito proteins so far only found expressed in mosquito adult salivary glands. Psiblast of Ctar-345 against the NR database does not retrieve any additional protein with an e value better than 0.02.
This member of this mosquito salivary protein family was identified in An. stephensi , and later found also An. gambiae , but not previously in other mosquito transcriptomes, including C. quinquefasciatus and Ae. aegypti. Ctar-769, represented by a single EST in our database, codes for a member of this family as indicated by blastp comparisons to the NR protein database, where it retrieves only Culex and Anopheles proteins. Psiblast of Ctar-769 against the NR database does not increase finding matches beyond additional mosquito proteins, including proteins deducted from the uncovering of the C. quinquefasciatus genome with and without signal peptide, and additional proteins from An. gambiae lacking a signal peptide indicative of secretion. Interestingly, the Anopheles gambiae gene found expressed in the salivary glands was shown to reside as a single exon in chromosome arm 2R . The An. gambiae gene was also found to be selectively expressed in the adult female salivary glands suggesting a role in blood feeding .
Ctar-146, Ctar-147 and Ctar-208 encode peptides with similarities to C. quinquefasciatus 4.2 kDa salivary peptide, which has similarity solely to another salivary peptide previously found in Ae. albopictus.
Ctar-520 and Ctar-29 encode truncated peptides matching the previously described putative 9.7 kDa salivary peptide of C. quinquefasciatus, which has no other matches to known proteins.
Ctar-31, Ctar-32 and Ctar-33 produce matches to a low complexity protein previously described in the C. quinquefasciatus sialome characterized by a poly histidine repeat in the mature aminoterminal region, and a series of GQP/GQG repeats. The polyhistidine domain may confer a bacteriostatic role for this protein if it chelates Zn ions, a bacterial growth factor, and a domain existing in various antimicrobial peptides [59–61].
Over 170 deduced protein sequences coding for putative housekeeping (H) products are presented in Additional file 2, Table S2. These proteins allow comparison of the evolutionary rate of the S proteins compared to the H proteins, using the C. quinquefasciatus proteome as a reference set, as done before for comparing An. stephensi salivary proteins to those of An. gambiae . For this comparison, we used only protein sequences from C. tarsalis that had at least 100 AA of alignment by the blastp tool to a C. quinquefasciatus protein, and excluded from this set possible alleles or closely related gene duplications by removing the smaller sequence(s) that had 80% or more similarity to another one within the set. We thus compared 50 putative secreted C. tarsalis proteins with their C. quinquefasciatus proteome, obtaining an average of 70.1% protein identity, while 169 putative housekeeping proteins from C. tarsalis were 91.2% identical to C. quinquefasciatus predicted proteins (see Additional file 2, Table S2 worksheets - P < 0.001 Mann-Whitney rank sum test). This significant difference further supports the concept that the evolution of mosquito salivary secreted proteins occur at a faster pace than housekeeping proteins, as indicated before for anopheline proteins [35, 37, 56].
It has been suggested that codon volatility (the proportion of the point-mutation neighbors of a codon that encode different amino acids) could be a measure of selection for fast evolution of proteins, as could occur in pathogens in a constant avoidance of antibody recognition . Although this intuitive idea has created strong opposition [63–65], it is supported by published models [66, 67]. We accordingly decided to measure the average codon volatility for the 205 sequences coding for housekeeping and 80 sequences coding for putative secreted proteins shown in Additional file 2, Table S2. The average codon volatility for the H class genes was 0.7609 + 0.001 while the S class had an average volatility of 0.7746 + 0.002 (Average + SE), a highly significant result (P = 6.9 × 10-8, double tailed t test). Whatever the discussion regarding the value of this index, it indicates that a single point mutation on an S class gene has a significantly higher chance of producing a non-synonymous amino acid substitution than in an H class gene.
From a conservative perspective, the sialotranscriptome of C. tarsalis confirms the presence of ubiquitous salivary mosquito protein families, such as the D7, 41 kDa, hyp37, 30 kDa antigen/Aegyptin, 23.4 kDa, mucins, cysteine rich peptide, antigen 5, amylase/maltase, and the immunity related proteins lysozyme, pathogen recognition molecules, and serine proteases possibly associated with immunity or with host matrix/fibrin degradation. Adenosine deaminase has also been found, but not the 5'nucleotidase/apyrase that has been abundantly found in mammalian feeding mosquitoes, but possibly absent or reduced in the bird feeding C. tarsalis. From another stand point, the C. tarsalis sialotranscriptome confirmed the presence of proteins so far known exclusively in culicine mosquitoes, such as endonucleases and the 62 kDa, 30.5 kDa, 7.8 kDa and 4.2 kDa families, all without a known function. Additionally, further confirmation of unique proteins of the Culex genus were found for members of the highly expanded CWRP, 9.7 kDa and the GPQ repeat families.
It has been previously indicated that many unique mosquito protein families appear to be related to bacterial proteins, (when using the Psiblast tool against the NR database) suggesting their acquisition by horizontal transfer. Interestingly, the genes coding for such proteins, now available for An. gambiae, C. quinquefasciatus and Ae. aegypti are often single exonic, as is the case of bacterial genes. In the case of C. tarsalis, this appears to be the case with the 41 kDa (which have a single short intron) and the CWRP (mostly uniexonic in C. quinquefasciatus) families. However, other possibilities for single exonic genes are their acquisition as gene duplication resulting from retrotransposition of an mRNA precursor deriving from an endogenous, multi exonic gene. At any rate, the frequency of the relatively unusual processes of horizontal transfer and/or retrotransposition in the acquisition of new genes associated with blood feeding appear to occur at a high rate in the formation of mosquito sialomes.
Since our transcriptome was based on ~2,000 ESTs from a non-normalized library, the question arises as to what extent novel putative secreted proteins can be discovered with a more extensive sequencing, or the use of a normalized library. As indicated in a recent review , it has been our experience that, perhaps due to the relatively low complexity of the salivary gland proteome (when compared to organs like the mammalian liver or brain, for example), 1000-2000 sequenced clones are enough to display the majority of the sialome. Indeed, ~2,000 clones sequenced from an Ae. aegypti salivary gland cDNA library  discovered virtually all those found in another effort that obtained ~20,000 sequences . A similar situation was encountered with the An. gambiae sialome, where ~4,000 sequences identified basically all those in a large sequencing effort, also ~20,000 sequences done by the Pasteur Institute . Sequencing of normalized libraries, or more intensive sequencing of existing libraries, perhaps with newer generation of pyrosequencing methodology, may increase the number of salivary secreted peptides, but possibly to no more than 10% of the number found with random sequencing of ~2,000 clones. It should be indicated, however, that these additional low abundance transcripts may account for pharmacologically potent peptides.
Finally, the fast divergence of salivary proteins allows the possibility of using C. tarsalis proteins as specific markers of vector exposure, as is being attempted for ticks [70–72], mosquitoes [26–29] and sand flies .
Culex tarsalis, strain Bakersfield (Bakersfield, California) was supplied by Dr. W.K. Riesen University of California-Davis. Colonized mosquitoes were maintained on mouse blood (for egg laying) and given 10% sucrose ad libitum. Larvae were reared and adults maintained under controlled conditions of temperature (27°C), humidity (70% RH), and light (16:8 L:D diurnal cycle). PolyA+ RNA was extracted from 80 dissected pairs of salivary glands using the Micro-FastTrack mRNA isolation kit (Invitrogen, Carlsbad, CA), which was then used to make a non-normalized PCR-based cDNA library using the SMART™ cDNA library construction kit (BD Biosciences-Clontech, Palo Alto, CA) as described before .
The salivary gland cDNA library was plated on LB/MgSO4 plates containing X gal/IPTG to an average of 250 plaques per 150 mm Petri plate. Recombinant (white) plaques were randomly selected and transferred to 96 well MICROTEST TM U bottom plates (BD BioSciences, Franklin Lakes, NJ), containing 100 μl of SM buffer (0.1 M NaCl; 0.01 M MgSO4; 7 H2 O; 0.035 M Tris HCl (pH 7.5); 0.01% gelatin) per well. The plates were covered and placed on a gyrating shaker for 30 min at room temperature. The phage suspension was either immediately used for PCR or stored at 4°C for future use.
To amplify the cDNA using a PCR reaction, 4 μl of the phage sample was used as a template. The primers were sequences from the λ TriplEx2 vector and named pTEx2 5seq (5/TCC GAG ATC TGG ACG AGC 3/) and pTEx2 3LD (5/ATA CGA CTC ACT ATA GGG CGA ATT GGC 3/), positioned at the 5/end and the 3/end of the cDNA insert, respectively. The reaction was carried out in 96 well flexible PCR plates (Fisher Scientific, Pittsburgh, PA) using TaKaRa EX Taq polymerase (TAKARA Mirus Bio, Madison, WI) on a GeneAmp® PCR system 9700 (Perkin Elmer Corp., Foster City, CA). The PCR conditions were: one hold of 95°C for 3 min; 25 cycles of 95°C for 1 min, 61°C for 30 sec; 72°C for 5 min. The amplified products were analysed on a 1.5% agarose/EtBr gel. cDNA library clones were PCR amplified, and the ones showing single band were selected for sequencing. Approximately 200-250 ng of each PCR product was transferred to Thermo Fast 96 well PCR plates (ABgene Corp., Epsom, Surray, UK) and frozen at -20°C before cycle sequencing. Samples were shipped on dry ice to the Rocky Mountain Laboratories Genomics Unit with primer and template combined together in an ABI 96-well Optical Reaction Plate (P/N 4306737) following the manufacturers recommended concentrations. Sequencing reactions were setup as recommended by Applied Biosystems BigDye® Terminator v3.1 Cycle Sequencing Kit by adding 1 μl ABI BigDye® Terminator Ready Reaction Mix v3.1 (P/N 4336921), 3 μl 5× ABI Sequencing Buffer (P/N 4336699), and 2 μl of water for a final volume of 10 μl. Cycle sequencing was performed at 96°C for 10 seconds, 50°C for 5 seconds, 60°C for 4 minutes for 27 cycles on either a Bio-Rad Tetrad 2 (Bio-Rad Laboratories, Hercules, CA) or ABI 9700 (Applied Biosystems, Inc., Foster City, CA) thermal cycler. Fluorescently-labeled extension products were purified following Applied Biosystems BigDye® XTerminator™ Purification protocol and subsequently processed on an ABI 3730xL DNA Analyzer (Applied Biosystems, Inc., Foster City, CA). The AB1 file generated for each sample from the 3730xL DNA Analyzer was provided to researchers in Rockville, MD through a secure network drive for all subsequent downstream sequencing analysis. In addition to the sequencing of the cDNA clones, primer extension experiments were performed in selected clones to further extend sequence coverage.
Expressed sequence tags (EST) were trimmed of primer and vector sequences. The BLAST tool , CAP3 assembler  and ClustalW  software were used to compare, assemble, and align sequences, respectively. For assembly of ESTs, a pipeline using blast and the CAP3 assembler was used, by first blasting all sequences using blastn with a word size of 36 and feeding the results as a fasta file to the CAP3 assembler. The CAP3 parameters were set at default values, with an overlap length cut off of 40 and percent identity of overlap at 80%. Phylogenetic analysis and statistical neighbor-joining (NJ) bootstrap tests of the phylogenies were done with the Mega package . For functional annotation of the transcripts we used the tool BlastX  to compare the nucleotide sequences to the non-redundant (NR) protein database of the National Center for Biotechnology Information (NCBI, National Library of Medicine, NIH,) and to the Gene Ontology (GO) database . The tool, reverse position specific Blast (RPSBLAST) was used to search for conserved protein domains in the Pfam , SMART , Kog , and conserved domains databases (CDD) . We have also compared the transcripts with other subsets of mitochondrial and rRNA nucleotide sequences downloaded from NCBI and to several organism proteomes downloaded from NCBI, ENSEMBL, or VectorBase. Segments of the three-frame translations of the EST (because the libraries were unidirectional, 6-frame translations were not used), starting with a methionine found in the first 300 predicted amino acids (AA), or the predicted protein translation in the case of complete coding sequences, were submitted to the SignalP server  to help identify translation products that could be secreted. O-glycosylation sites on the proteins were predicted with the program NetOGlyc . Functional annotation of the transcripts was based on all the comparisons above. Following inspection of all these results, transcripts were classified as either Secretory (S), Housekeeping (H) or of Unknown (U) function, with further subdivisions based on function and/or protein families. Codon volatility was calculated as previously described .
antigen 5 family
expressed sequence tag
odorant binding protein
switching mechanism at 5/end of RNA transcript
This work was supported by the Intramural Research Program of the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, and NIH R01 AI46435 grant to KEO. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the government of the United States of America. We are grateful to NIAID intramural editor Brenda Rae Marshall for assistance.
Because E.C., A.J.V, K.D.B, V.M.P, and J.M.C.R. are government employees and this is a government work, the work is in the public domain in the United States. Notwithstanding any other agreements, the NIH reserves the right to provide the work to PubMedCentral for display and use by the public, and PubMedCentral may tag or modify the work consistent with its customary practices. You can establish rights outside of the U.S. subject to a government use license.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.