Comparative profiling of the transcriptional response to infection in two species of Drosophila by short-read cDNA sequencing
© Sackton and Clark; licensee BioMed Central Ltd. 2009
Received: 10 December 2008
Accepted: 07 June 2009
Published: 07 June 2009
Homology-based comparisons of the genes involved in innate immunity across many insect taxa with fully sequenced genomes has revealed a striking pattern of gene gain and loss, particularly among genes that encode proteins involved in clearing pathogens (effectors). However, limited functional annotation in non-model systems has hindered understanding of evolutionary novelties in the insect innate immune system.
We use short read sequencing technology (Illumina/Solexa) to compare the transcriptional response to infection between the well studied model system Drosophila melanogaster and the distantly related drosophilid D. virilis. We first demonstrate that Illumina/Solexa sequencing of cDNA from infected and uninfected D. melanogaster recapitulates previously published microarray studies of the transcriptional response to infection in this species, validating our approach. We then show that patterns of transcription of homologous genes differ considerably between D. melanogaster and D. virilis, and identify potential candidates for novel components of the D. virilis immune system based on transcriptional data. Finally, we use a proteomic approach to characterize the protein constituents of the D. virilis hemolymph and validate our transcriptional data.
These results suggest that the acquisition of novel components of the immune system, and particularly novel effector proteins, may be a common evolutionary phenomenon.
Host-pathogen interactions are ubiquitous in nature, leading to coevolutionary dynamics that are predicted to drive rapid evolution of the immune system. It is now increasingly clear that this coevolutionary "arms race" leads to increased rates of protein evolution in genes encoding components of the immune system across a large number of taxa [1–7]. Recent work in mosquitoes [8, 9] and fruit flies  has suggested that the immune system may also be unusual in the rate at which new genes are recruited into the system, and existing components of the system turn over by gene duplication or loss. Genes encoding effector proteins (proteins involved in bacterial killing and clearance), and particularly antimicrobial peptides (AMPs), often have lineage-restricted patterns of homology and show very rapid rates of gene turnover within gene families . Most strikingly, two multigene families – the Drosomycin antimicrobial peptide family  and the Turandot family [11, 12] of induced but otherwise uncharacterized proteins – appear to be evolutionary novelties restricted to D. melanogaster and related species in the Sophophora subgenus of drosophilids . This pattern is in contrast to genes encoding components of immune-related signaling pathways, which are typically found as single copy orthologs even among distantly related insects [8, 9, 13], and have identifiable homologs in mammals . Together, these observations suggest that disruption of stoichiometry, dosage, and other conserved interactions among signaling pathways is usually deleterious, leading to very low tolerance of gene copy variation among signaling pathways and preservation of single-copy orthologs across deep evolutionary time. Conversely, these observations suggest that pathway outputs retain flexibility, allowing novel effectors to be easily recruited into the system, potentially leading to rapid, and perhaps advantageous, proliferation of effector components across evolutionary time.
If this model is correct, effector proteins should be recruited and lost from the immune system at a relatively high frequency, which implies that novel components of the immune system remain to be discovered in species of Drosophila distantly related to D. melanogaster. It has long been recognized that highly divergent insect clades often harbour unique antimicrobial peptides: gambicin in mosquitoes , lebocin in Bombyx , thanatin from the bug Podisus maculiventris , and many others [reviewed in ]. However, the evolutionary dynamics of the acquisition of novel effector components in the innate immune system have not been considered previously. While the genome sequences of twelve Drosophila species  have allowed the discovery of many novel genes across this phylogeny, functional annotation still depends largely on homology to known D. melanogaster genes. Thus, our knowledge of the gain and loss of immune function genes has a strong ascertainment bias, and is restricted to genes that are paralogous to known genes in D. melanogaster.
In this study, we used short-read sequencing technology to characterize the transcriptional response to infection in D. virilis, a member of the Drosophila subgenus that last shared a common ancestor with D. melanogaster and the rest of the Sophophora subgenus 40 million years ago . Short-read sequencing technology allows identification of differentially expressed genes without the need for prior annotations, and thus is ideal for detecting induced components of the D. virilis immune system that lack homologs in D. melanogaster. Here, we demonstrate that short-read sequencing of oligo(dT)-primed double-stranded cDNA provides a robust and accurate method to identify differentially transcribed regions of the genome. We then use this approach to sequence cDNA from infected and uninfected samples of D. virilis to characterize the genes that are induced by infection, and use that sequencing data to annotate novel components of the D. virilis immune system. Furthermore, using a LC-MS/MS proteomic approach, we identify immune-regulated protein constituents of the D. virilis hemolymph that validate novel transcripts that had been identified by the short-read cDNA sequencing as being responsive to infection in D. virilis.
Results and discussion
Aligning sequencing reads to the reference genome and identified genes regulated by infection
Induced and strongly induced HMM classes
Number of expressed regions assigned to each induction class
Validation of cDNA sequencing by comparison to published microarray data
Additionally, we can quantitatively compare induction between our study and the previously published microarray where the infection protocol and sampling time post-infection where most similar to our study (; in which D. melanogaster was infected by septic injury with a mixed bacterial culture and assayed at 12 hours post infection). Despite differences in the line (Oregon R vs. iso-1) and sex (male vs. female) of the flies, and the species and pathogenicity of bacteria used (non-pathogenic E. coli and M. luteus mixture vs. pathogenic S. marcescens and E. faecalis mixture), we still find a significant correlation between induction measured by microarray in DeGregorio et al.  and induction measured by our method (r = 0.3225, P < 2.2 × 10-16). However, this correlation is much weaker when limited to genes that are weakly induced (2-fold or less induction in both our data and the microarray data; r = 0.0429, P = 0.0388), and much stronger when limited to genes that are strongly induced (greater than 2-fold induction in both datasets; r = 0.5053, P = 0.002). While these results demonstrate considerable consistency between induction measured by short-read sequencing of oligo(dT) primed, double-stranded cDNA and induction measured by traditional microarray methods, they also suggest that higher depth of coverage may be needed to accurately assay weakly induced genes. However, at least for identifying strongly induced genes, short read sequencing approaches appear to be robust and accurate, suggesting that this approach may prove to be a simple and cost-effective way to identify differentially regulated genes in poorly annotated genomes in response to any number of treatments of biological interest.
The transcriptional response to infection in D. virilis and D. melanogaster
We identified 841 expressed regions that appear to be induced by infection in D. virilis. Because of the 3' bias in our cDNA preparation, the relatively low coverage of our sequencing, and the lack of annotation of 3' UTR sequence in the D. virilis genome, only about 5% of these induced regions overlap with an annotated exon. To attempt to associate a greater percentage of induced regions with genes, we analyzed the genomic region in more detail for these 841 regions, and preliminarily assigned expressed regions to annotated gene models if they were fewer than 500 bp from the 3' end of the nearest gene model, and more than 1 kb from the 3' end of all other gene models (see Methods for details). We eliminated from further analysis induced regions (but not strongly induced regions) located on minor scaffolds (< 1 Mbp), leaving a total of 199 candidate induced regions in D. virilis, 101 of which can be preliminarily associated with 95 D. virilis genes.
In order to understand the similarities between the D. melanogaster and D. virilis immune responses, we focused on the 490 genes in D. melanogaster (Additional file 3) and the 95 genes in D. virilis (Additional file 4) associated with induced regions. We used three approaches to identify orthologs and paralogs of these gene models. First, for any gene model included in the manual homology annotation of immune system genes , we used the homology and orthology assignments from that work. For the remaining genes, we used homologs annotated by FlyBase. If FlyBase reported no homolog, we verified the absence of homologs by reference to the homology assignments generated by the Drosophila 12 Genomes Consortium .
Of the 490 induced genes in D. melanogaster (Additional file 3), 19 have no identifiable homologs in D. virilis, 444 have homologs in D. virilis, and 27 have ambiguous homology. Genes associated with expressed regions assigned to state 1 (highly induced) are significantly more likely to lack homologs in D. virilis that genes associated with expressed regions assigned to state 2 (Odds ratio = 5.22, Fisher's Exact Test P-value = 0. 001). As highly induced genes are much more likely to represent effectors, this result is expected based on the analysis of Sackton et al. , which showed that effector proteins are much more likely to have lineage restricted patterns of homology than immune system genes as a whole.
This same pattern seems to hold in D. virilis, suggesting that the high rate of turnover in effector proteins may be quite general. In D. virilis, the 95 gene models associated with induced regions (Additional file 4) include 8 with no identifiable homologs in D. melanogaster and 87 with homologs in D. melanogaster. Like in D. melanogaster, genes associated with expressed regions assigned to state 1 are more likely to lack homologs that those associated with expressed regions assigned to state 2, although this pattern is not significant (Odds ratio = 3.08, Fisher's Exact Test P-value = 0.15).
In both species, gene models that are highly induced (state 1) encode significantly shorter predicted polypeptides than induced (state 2) gene models (Mann-Whitney U; D. melanogaster P = 1.7 × 10-9, D. virilis P = 2.3 × 10-4). Furthermore, in both species gene models that lack homologs in the other species are significantly shorter than those gene models that have homologs (Mann-Whitney U; D. melanogaster P = 7.4 × 10-7, D. virilis P = 0.006). Because it can be difficult to infer homology patterns between highly divergent short peptides, we cannot rule out the possibility that the deficiency of gene models with detectable homologs among the highly induced class reflects high rates of divergence rather than lack of homologs, although the observation that immune effector proteins tend to be highly conserved at the protein sequence level among Drosophilids  suggests against this interpretation.
The putatively induced genes that have identifiable homologs between species reveal broad similarities in the transcriptional response to infection between D. melanogaster and D. virilis. As expected based on our comparison of the D. melanogaster induced genes to previous microarray studies, most of the highly induced genes are antimicrobial peptides, Turandots, and other immune-induced peptides such as the IMs [Drosophila immune molecule, ]. We also see other immune genes such as PGRP-SB1, Transferrin 1, and TepII strongly induced after infection in D. melanogaster. In D. virilis, homologs of many effectors or putative effectors are identified as induced in our sample. The genes associated with expressed regions assigned to state 1 in D. virilis encode Attacins, Cecropins, Metchnikowin, Diptericins, PGRP-SB1, and a protein with homology to IM1. We also find evidence for induction of homologs of two additional peptidoglycan recognition proteins (PGRP-LB and PGRP-LC) and the D. virilis homologs of Relish, MP1, and necrotic (Additional file 4). Given the limitations of our sequencing strategy in detecting weakly induced genes, we have almost certainly not detected all induced signalling and recognition genes in D. virilis. Thus, in the next section, we focus on the most strongly induced category of genes in both species, the AMPs.
Differences in the members of AMP families induced after infection
Despite the overall similarity of the transcriptional response to infection in D. virilis and D. melanogaster, notable differences exist in the pattern of induction of members of AMP gene families. The D. melanogaster genome encodes 20 antimicrobial peptides that are members of seven gene families. These peptides can be broadly grouped into three categories: cysteine-rich peptides characterized by pairs of disulfide bonds (Defensin: Def; and Drosomycins: Drs, Drs-l, Dro2, Dro3, Dro4, Dro5, Dro6), peptides with an amphiphilic α-helical conformation (Cecropins: CecA1, CecA2, CecB, CecC), and proline or glycine rich peptides (Attacins: AttA, AttB, AttC, AttD; Drosocin: Dro; Metchnikowin: Mtk, and Diptericins: Dpt and DptB). Five of these families – Diptericins, Cecropins, Attacins, Metchnikowin, and Defensin – have homologs in D. virilis, encoding total of 15 known antimicrobial peptides; Drosocin and Drosomycins are absent from the entire Drosophila subgenus .
Induction of antimicrobial peptides in D. melanogaster.
D. melanogaster gene
Average Coverage (Naïve)
Average Coverage (Infected)
Induction of antimicrobial peptides in D. virilis.
D. virilis gene
D. melanogaster homolog
Average Coverage (Naïve)
Average Coverage (Infected)
On a broader scale, in both species the proline- and glycine- rich peptides represent most of the total AMP transcription (D. virilis: 93.6%, D. melanogaster: 81.9% of the total coverage across all AMPs, normalized for length). Again, though, D. melanogaster appears to transcribe a broader spectrum of antimicrobial peptides in response to infection, with a substantial fraction of the total transcription of AMPs in D. melanogaster associated with cysteine-rich (Drs/Def; 13.5%) AMPs. This analysis of course excludes any uncharacterized AMPs in D. virilis. However, as discussed below, the most promising candidates for novel D. virilis AMPs appear to be in the glycine- and proline- rich family, suggesting that D. virilis may in fact produce a narrower range of AMP types in response to our artificial infection protocol. In is possible that this observation could be attributable to differences in the ability of the bacterial species used to replicate efficiently between D. virilis and D. melanogaster, resulting in differences in the nature of the infection experienced by the two species of Drosophila. Alternatively, D. virilis and D. melanogaster differ at a number of ecological traits, any of which could potentially lead to different selective pressures for the diversity of AMPs produced: D. melanogaster is tropical, D. virilis is Holarctic; D. melanogaster breeds on a wide range of substrates, typically rotting fruit, D. virilis breeds on sap fluxes . While it is tempting to speculate, a fuller understanding of the diversity of D. virilis AMPs and the persistence of differences in transcription in response to multiple challenges will be needed before the hypothesis that D. virilis responds to infection with a narrower and less diverse range of AMPs can be established.
Novel components of the D. virilis immune system
Genes associated with induced regions in D. virilis that lack homologs in D. melanogaster.
D. virilis gene model
Gene models lacking a signal peptide
Two putatively induced D. virilis gene models lack a signal peptide. One, dvir_GLEANR_13841 is a short protein (155 aa) that is moderately induced (corrected induction 2.04, assigned to state 2). The second, dvir_GLEANR_15023, is somewhat longer, 280 amino acids, but is highly repetitive, consisting of 20 repeats of a 12–17 amino acid motif. The repetitive nature of this predicted gene makes identifying putative homologs difficult; we fail to detect any via BLAST, and no homologs in any species are reported in Drosophila 12 Genomes Consortium . This gene model is flagged as potentially representing a repeat-contaminated gene model, suggesting that this result may be artefactual.
Secreted, IM-like peptides
Two gene models in D. virilis that are putatively associated with an induced region, dvir_GLEANR_345 and dvir_GLEANR_5361, have strong evidence for a signal peptide, and are short (fewer than 50 amino acids), suggesting the possibility that these are novel effector proteins. However, they are unlikely to be antimicrobial peptides. Almost all antimicrobial peptides have a net positive charge; the predicted proteins encoded by these two GLEANR models both have a negative net charge. The proteins with the closest homology are the IM proteins of D. melanogaster. These are a family of short, strongly induced peptides of unknown function.
Putative novel AMPs
The remaining three GLEANR models, plus the two non-GLEANR gene models (which are paralogs of each other) that appear to be strongly induced by infection, are all secreted peptides with predicted molecular weights between 4 and 6 kD and predicted positive charge at physiological pHs (Table 5). All three have homologs in D. mojavensis and D. grimshawi, the two species with sequenced genomes that are most closely related to D. virilis, but not in any other species of Drosophila with sequenced genomes. In addition dvir_GLEANR_3774 and the non-GLEANR gene model are both ~18% proline, which combined with their patterns of induction, size, and charge suggests that they might be similar in function to the proline-rich family of AMPs, including abaecins and apidaecins from bees (~30% proline), as well as Mtk and Dro from Drosophila (~25% proline).
Protein constituents of the D. virilis hemolymph
In order to further characterize the D. virilis immune response, we used a proteomic approach (iTRAQ/tandem MS) to analyze the protein constituents of D. virilis hemolymph, in both naïve (uninfected) and immune-challenged flies. Previous work in D. melanogaster has focused on identifying proteins that increase in concentration in the hemolymph after immune challenge [27, 29–32], characterizing the proteins involved in clotting [33, 34], and mapping the proteins present in larval hemolymph in an unchallenged state [35, 36]. This wealth of information about the protein constituents of hemolymph in D. melanogaster thus provides a comparative context for proteomic studies of other species of Drosophila.
In order to characterize the protein content of the D. virilis hemolymph, we used two methods. First, we assessed the relative concentration of each identified protein using the emPAI statistic implemented in MASCOT . This statistic is calculated based on the fraction of potentially observable peptides that are in fact observed in the sample for any given protein. Second, we used the iTRAQ chemistry (Applied Biosystems), which uses tags to mark each sample (in this case, infected and uninfected hemolymph), to estimate the relative abundance of each identified protein in the infected and uninfected samples using the software ProteinPilot (Applied Biosystems).
In this study, we used short read cDNA sequencing to characterize the transcriptional response to infection in D. virilis and D. melanogaster. We show that even a relatively small number of sequencing reads (1 lane per sample, about 5 million reads before filtering and about 1.2 million mapped reads) can produce reliable estimates of induction, at least for strongly induced genes, consistent with recent results suggesting high technical repeatability of this approach . By comparing the relative induction of AMP gene families in D. melanogaster and D. virilis, we show that significant differences in the relative induction of different peptides exist between species. The striking increase in induction of Dpt homologs in D. virilis relative to D. melanogaster is supported by the observation that Dpt homologs represent 11.5% of the protein in D. virilis hemolymph from infected flies, far higher than any other AMP. Finally, we show that some predicted D. virilis genes that lack homologs to D. melanogaster share characteristics with the proline-rich AMP superfamily, and the protein encoded by at least one of these genes is detectable in the hemolymph of infected flies, suggesting that D. virilis likely possesses lineage-restricted immune system components, and that the pattern we observe in D. melanogaster is general. Taken together, these results suggest that novel downstream components of the immune system can be rapidly integrated of relatively short time scales. The adaptive potential of gene gain and loss should not be overlooked in the evolutionary dynamics of host immune systems.
Biological samples for transcriptional analysis
The Drosophila stocks used in this experiment were the sequenced strains of D. melanogaster (iso-1) and D. virilis (15010-1051.87). Flies were maintained in bottle cultures on a rich dextrose medium at 25° and in 12 hr:12 hr light/dark for the duration of the experiment. We infected 50 females of each species with a mixed bacterial culture of Serratia marcescens and Enterococcos faecalis by pricking the thorax with a 0.1-mm dissecting pin (Fine Science Tools, Foster City, CA) dipped in bacterial culture, as previously described . We chose to use a mixed culture, consisting of one Gram-negative bacteria and one Gram-positive bacteria, in order to maximize our ability to detect a diversity of immune-inducible proteins. At 12 hours after infection, infected flies and a sample of 50 naïve flies were frozen in liquid nitrogen. We extracted total RNA from frozen flies using standard protocols (Trizol). After extraction, total RNA was treated with DNase to remove potential genomic DNA contamination, according to the manufacturer's protocols. We synthesized first strand cDNA using oligio-d(T) primers, and then synthesized second strand cDNA, according to standard protocols. Sequencing was done by the Cornell University Life Sciences Core Laboratories Center on an Illumina GA2 sequencer.
Aligning reads to the reference genome
Prior to mapping reads to the reference genome, we filtered low quality, low complexity, and repetitive reads. We first removed any read having fewer than 24 bases with a Phred quality score (Q) greater than 20; this is our 'low quality' filter. Next, we removed any read with low nucleotide complexity (80%+ of the sequence composed of only 2 bases) or repetitive elements (more than half the sequence composed of dinucleotide or trinucleotide repeats). Finally, we removed any reads with a mononucleotide run greater than 24 bases.
After filtering, we did an initial round of mapping to the repeat-masked D. virilis or D. melanogaster reference genome with Mosaik, a software program written by the Gabor Marth lab (A. Quinlan and G. Marth, http://bioinformatics.bc.edu/marthlab/Mosaik, unpublished) that uses a BLAT-like approach. The program hashes the genome into unique n-mers (where n, the hash size, can be specified by the user; we used 17 bp), which it uses as seeds to align the sequencing reads to the reference. We required all matches to align for at least 91% of the read length (33 of 36 bp) and have no more than 3 mismatches.
To supplement the mapping from this initial Mosaik run, we took two approaches. First, we noticed that some reads fail to map because low quality ends or partial polyA sequence cause them to fail to pass our alignment length filter. In order to get around this, we trimmed up to 10 bp from the end of any read where average quality across a 5 or 10 bp segment was less than Phred Q20. We also trimmed any mononucleotide run from the end of a read. After trimming, we rejected any read that was shorter than 20 bp, or that failed to pass our QC filters (which we reran on the trimmed sequence). The remaining reads were then rerun in Mosaik, using the same parameters described above.
Finally, we attempted to map the remaining sequences using BLAST. Any reads that passed all our quality control filters, but could not be mapped using Mosaik even after end-trimming, were run through a BLAST pipeline: we used blastn with a word size of 7 and an E-value cutoff of 1 × 10-6, and considered any read mapped if either 1) there was only a single BLAST hit to the reference genome, or 2) there were fewer than 10 hits and the best hit aligned over at least 90% of the read length and had a lower E-value than the next best hit. Any read with more than 10 hits was considered repetitive and was not mapped.
After mapping, we combined the output from both Mosaik runs and the BLAST pipeline to produce a single file for each contig containing the depth of coverage at each base in the genome (using the program ace2dep, from the Marth lab, to convert Mosaik output into depth, and custom Perl scripts to convert BLAST output into depth information). The depth of coverage at each base along a scaffold was then the input to our pipeline to identify expressed regions of the genome regulated by infection.
Identifying regions regulated by infection
To determine the extent to which each expressed region responds to infection, we developed a Hidden Markov Model, with five hidden states representing the degree of induction (highly induced, moderately induced, unchanged, moderately repressed, highly repressed), where the emission probability for each state is the binomial probability of observing x infected coverage given n total coverage at each base pair, and the observed data is the coverage of infected reads at each base. We used the HiddenMarkov package in R to optimize our HMM using the Baum-Welch algorithm, and to calculate the most probable set of states using the Viterbi algorithm. The optimized emission probabilities for each state are given in Table 1. Before running the HMM, we removed sites with less than 10× coverage pooled across samples, as there is very little power to distinguish between states with so few reads. In order to increase the number of expressed regions for which all sites are assigned to the same state, we tuned the transition probabilities by increasing the probability of remaining in the same state, and decreasing the probabilities of transitioning between states proportionally, so that the highest probability in the matrix was equal to 0.999. Empirically, this tuning appeared to increase the consistency of our results, with fewer expressed regions being assigned to multiple states.
To determine the most likely state for any given expressed region, we weighted the Viterbi estimate of the state of each base in an expressed region by the summed coverage of that base. If one state had a majority of this measure, we assigned the expressed region to that state. For the 5.2% of expressed regions in D. virilis and 13.5% of expressed regions in D. melanogaster for which either the weighted sum assigned to the best state was not at least 50% of the total weighted sum, or for which no single base had sufficiently high coverage to be included in our HMM, we consider the state as "not determined" (Table 2).
Associating expressed regions with genes
Association of induced expressed regions with gene models in D. virilis.
Strongly Induced (state 1)
Overlap GLEANR model
Associate with GLEANR model
Not associated with annotated GLEANR model
In order to understand why there is a large difference in the fraction of expressed regions that can be associated with annotated genes between the two induced classes in D. virilis, we divided the D. virilis scaffolds into "major" scaffolds (the 23 scaffolds with at least 1 Mb of sequence, which represent 77% of the total D. virilis sequence) and the remaining "minor" scaffolds. Induced regions assigned to state 1 are much more likely to be on a major scaffold than induced regions assigned to state 2 (Odds ratio = 14.3, Fisher's Exact Test P-value = 4.85 × 10-12). As expected, regions on minor scaffolds, irrespective of class, are much more likely to fail to be associated with an annotated gene (Odds ratio = 144.6, Fisher's Exact Test P-value < 2.2 × 10-16). However, the difference between minor and major scaffolds does not seem to fully explain the difference between state 1 and state 2, as even when restricted to just the major scaffolds expressed regions assigned to state 1 are more likely to be associated with genes (Odds ratio = 4.4, Fisher's Exact Test P-value = 0.0027). It could be that highly induced genes are more likely to have homologs in D. melanogaster, increasing the probability that those genes would be annotated in D. virilis. However, among just the regions that are associated with genes, it is actually state 2 that is more likely to have homologs in D. melanogaster (based on fuzzy reciprocal BLAST calls downloaded from FlyBase under the "Genomes FTP" section; Odds ratio = 3.60, Fisher's Exact Test P-value = 0.0176). Because of the difficulties in annotating genes on minor scaffolds, we have limited our primary analysis to the 33 expressed regions in state 1, plus the 166 expressed regions in state 2 that are on major scaffolds: this sample of 199 expressed regions includes 101 that can be associated with an annotated gene, as described above, and 98 that cannot.
Sample preparation and iTRAQ labeling for proteomic studies
We extracted hemolymph from approximately 200 D. virilis females 24 hours after infection with a mixed culture of E. faecalis and S. marcescens (infected sample) and 200 uninfected controls (uninfected sample). Hemolymph extracts were spun for 10 minutes at low speed to pellet cellular material. Equal amounts of each sample (40 μg protein) were then aliquoted in duplicate, reduced, cysteine-blocked and digested by trypsin. Each aliquot was labeled with a different iTRAQ tag according to the manufacturer's instructions (document #4351918A and 4350831C downloaded from http://docs.appliedbiosystems.com/search.taf, Applied Biosystems, Foster City, CA). The 114 and 115 tags were used to label the peptides in the two identical samples from control (uninfected) sample and the 116 and 117 tags were used for two extracts from the infected sample. After labeling, the four samples were pooled and subjected to cation exchange chromatography as described below.
Strong Cation Exchange Fractionation
Strong cation exchange (SCX) fractionation was completed using an Agilent 1100 HPLC with UV detector (Agilent Technologies, Inc. Santa Clara, CA). The tryptic peptides labeled with iTRAQ tags were reconstituted in buffer A (10 mM potassium phosphate pH 3.0, 25% ACN), prior to SCX LC. The samples (~400 μg) were loaded onto a PolyLC Polysulfoethyl A column (2.1 mm × 150 mm) purchased from PolyLC Inc. (Columbia, MD). Buffer B was composed of 10 mM potassium phosphate pH 3.0, 25% ACN with 1 M KCl. Sample fractionation was completed using the gradient 0% B, 15 min, 0–25% B in 40 min, 25–50% B in 10 min and hold 50% B for 10 min. During this elution, forty fractions were collected at a flow rate of 200 μl/min on the basis of the UV trace at 214 nm. Several fractions were pooled post-collection to yield a total of 10 sample containing fractions. Salts were removed via solid phase extraction using Waters SepPak C18 cartridge (Waters, Milford, MA) and purified peptide fractions were dried and reconstituted in 2% ACN, 0.05% formic acid and injected on the nLC-MS/MS.
Reverse-Phase Separation and Tandem Mass Spectrometry
The 10 SCX fractions were partially evaporated to remove ACN and desalted by solid phase extraction. All SPE-extracted and gel-extracted peptide samples were reconstituted in 50 μl of 0.1% formic acid with 2% acetonitrile prior to mass spectrometry (MS) analyses. NanoLC was carried out by an LC Packings Ultimate integrated capillary HPLC system equipped with a Switchos valve switching unit (Dionex, Sunnyvale, CA). An aliquot of peptide fractions (5.0 μl) were injected using a Famous auto sampler onto a PepMap C18 trap column (5 μm, 300 μm × 5 mm, Dionex) for on-line desalting and then separated on a PepMap C-18 RP nano column, eluted in a 90-minute gradient of 5% to 40% acetonitrile in 0.1% formic acid at 250 nl/min. The nanoLC was connected in-line to a hybrid triple quadrupole linear ion trap mass spectrometer, 4000 Q Trap from ABI/MDS Sciex (Framingham, MA) equipped with Micro Ion Spray Head ion source. MS data acquisition was performed using Analyst 1.4.2 software (Applied Biosystems) in the positive ion mode for information dependant acquisition (IDA) analysis. The nanospray voltage was 2.0 kV used for all experiments in positive ion mode. Nitrogen was used as the curtain (value of 10) and collision gas (set to high) with heated interface at 150°C. The declustering potential was set at 50 eV and Gas1 was 15 (arbitrary unit). In IDA analysis, after each survey scan for m/z 400 to m/z 1550 and an enhanced resolution scan, the three highest intensity ions with multiple charge states were selected for tandem MS (MS/MS) with rolling collision energy applied for detected ions based on different charge states and m/z values.
Data analysis and protein identifications
MS/MS spectra generated from nanoLC/ESI-based IDA analyses were interrogated using ProteinPilot™ software 2.0 (Applied Biosystems, Foster City, CA) for database search against the FlyBase FB2008_06 D. virilis peptides database by Paragon method. The default setting for trypsin with MMTS modification of cysteine and a methionine oxidation was used for quantitative processing and rapid ID. The protein identifications are reported with total ProtScore >1.3 for each protein representing > 95% statistical significance in ProteinPilot. In order to estimate abundance ratios and statistical significance for each identified protein, we created custom R scripts (available upon request) that extended the ProteinPilot statistical approach to account for the technical replication included in our experiment. Briefly, for each peptide we calculate an average infected/uninfected ratio as well as an average error (based on the reported ratio error for each of the two technical replicates per treatment). These ratios are then averaged using 1/Error as a weighting factor, and significance is determined using the weighted standard deviation as described in the ProteinPilot manual (ABI).
For the estimated abundance analysis of each identified protein, the MS/MS data were also submitted to Mascot 2.2 for database searching using in-house licensed Mascot local server against the same FlyBase FB2008_06 D. virilis peptides database with one missed cleavage site by trypsin allowed and iTRAQ-4-plex quantitation. The peptide tolerance was set to 1.2 Da and MS/MS tolerance was set to 0.6 Da. MMTS modification of cysteine and iTRAQ modification of N-terminal and lysine were fixed and a methionine oxidation and iTRAQ modification of tyrosine were set as variable modifications. Only significant scores for the peptides defined by Mascot probability analysis http://www.matrixscience.com/help/scoring_help.html#PBM greater than "identity" were considered for the protein identifications. The exponentially modified protein abundance index (emPAI) is obtained for each identified protein from Mascot searching and the corresponding protein content in mol percent is calculated as mol % = emPAI/Σ(emPAI) * 100 as reported previously . For our analysis, we focus on the 144 proteins identified by both MASCOT and ProteinPilot, for which we can calculate both a infected/uninfected ratio and emPAI.
We thank Rich Meisel, Nadia Singh, and Jamie Walters for helpful discussion. We also acknowledge the support and advice from Peter Schweitzer and the Cornell University Life Sciences Core Laboratories Center, where the Illumina sequencing was performed; and Sheng Zhang and Sabine Baumgart at the Cornell Proteomics and Mass Spectrometry Core Facility where the proteomics experiments were performed. This work was funded by a Howard Hughes Predoctoral Fellowship to T.B.S., NSF DDIG DEB-0608196 to T.B.S., and NIH RO1 AI064950 to A.G.C. and Brian P. Lazzaro.
- Begun DJ, Whitley P: Adaptive evolution of relish, a Drosophila NF-kappaB/IkappaB protein. Genetics. 2000, 154 (3): 1231-1238.PubMed CentralPubMedGoogle Scholar
- Hughes AL, Nei M: Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature. 1988, 335 (6186): 167-170. 10.1038/335167a0.View ArticlePubMedGoogle Scholar
- Murphy PM: Molecular mimicry and the generation of host defense protein diversity. Cell. 1993, 72 (6): 823-826. 10.1016/0092-8674(93)90571-7.View ArticlePubMedGoogle Scholar
- Obbard DJ, Jiggins FM, Halligan DL, Little TJ: Natural selection drives extremely rapid evolution in antiviral RNAi genes. Current biology. 2006, 16 (6): 580-585. 10.1016/j.cub.2006.01.065.View ArticlePubMedGoogle Scholar
- Sackton TB, Lazzaro BP, Schlenke TA, Evans JD, Hultmark D, Clark AG: Dynamic evolution of the innate immune system in Drosophila. Nature genetics. 2007, 39 (12): 1461-1468. 10.1038/ng.2007.60.View ArticlePubMedGoogle Scholar
- Schlenke TA, Begun DJ: Natural selection drives Drosophila immune system evolution. Genetics. 2003, 164 (4): 1471-1480.PubMed CentralPubMedGoogle Scholar
- Tiffin P, Moeller DA: Molecular evolution of plant immune system genes. Trends in genetics. 2006, 22 (12): 662-670. 10.1016/j.tig.2006.09.011.View ArticlePubMedGoogle Scholar
- Christophides GK, Zdobnov E, Barillas-Mury C, Birney E, Blandin S, Blass C, Brey PT, Collins FH, Danielli A, Dimopoulos G, et al: Immunity-related genes and gene families in Anopheles gambiae. Science. 2002, 298 (5591): 159-165. 10.1126/science.1077136.View ArticlePubMedGoogle Scholar
- Waterhouse RM, Kriventseva EV, Meister S, Xi Z, Alvarez KS, Bartholomay LC, Barillas-Mury C, Bian G, Blandin S, Christensen BM, et al: Evolutionary dynamics of immune-related genes and pathways in disease-vector mosquitoes. Science. 2007, 316 (5832): 1738-1743. 10.1126/science.1139862.PubMed CentralView ArticlePubMedGoogle Scholar
- Fehlbaum P, Bulet P, Michaut L, Lagueux M, Broekaert WF, Hetru C, Hoffmann JA: Insect immunity. Septic injury of Drosophila induces the synthesis of a potent antifungal peptide with sequence homology to plant antifungal peptides. The Journal of biological chemistry. 1994, 269 (52): 33159-33163.PubMedGoogle Scholar
- Ekengren S, Hultmark D: A family of Turandot-related genes in the humoral stress response of Drosophila. Biochemical and biophysical research communications. 2001, 284 (4): 998-1003. 10.1006/bbrc.2001.5067.View ArticlePubMedGoogle Scholar
- Ekengren S, Tryselius Y, Dushay MS, Liu G, Steiner H, Hultmark D: A humoral stress response in Drosophila. Current biology. 2001, 11 (9): 714-718. 10.1016/S0960-9822(01)00203-2.View ArticlePubMedGoogle Scholar
- Evans JD, Aronstein K, Chen YP, Hetru C, Imler JL, Jiang H, Kanost M, Thompson GJ, Zou Z, Hultmark D: Immune pathways and defence mechanisms in honey bees Apis mellifera. Insect molecular biology. 2006, 15 (5): 645-656. 10.1111/j.1365-2583.2006.00682.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Silverman N, Maniatis T: NF-kappaB signaling pathways in mammalian and insect innate immunity. Genes & development. 2001, 15 (18): 2321-2342. 10.1101/gad.909001.View ArticleGoogle Scholar
- Vizioli J, Bulet P, Hoffmann JA, Kafatos FC, Muller HM, Dimopoulos G: Gambicin: a novel immune responsive antimicrobial peptide from the malaria vector Anopheles gambiae. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98 (22): 12630-12635. 10.1073/pnas.221466798.PubMed CentralView ArticlePubMedGoogle Scholar
- Hara S, Yamakawa M: Moricin, a novel type of antibacterial peptide isolated from the silkworm, Bombyx mori. The Journal of biological chemistry. 1995, 270 (50): 29923-29927. 10.1074/jbc.270.50.29923.View ArticlePubMedGoogle Scholar
- Fehlbaum P, Bulet P, Chernysh S, Briand JP, Roussel JP, Letellier L, Hetru C, Hoffmann JA: Structure-activity analysis of thanatin, a 21-residue inducible insect defense peptide with sequence homology to frog skin antimicrobial peptides. Proceedings of the National Academy of Sciences of the United States of America. 1996, 93 (3): 1221-1225. 10.1073/pnas.93.3.1221.PubMed CentralView ArticlePubMedGoogle Scholar
- Bulet P, Hetru C, Dimarcq JL, Hoffmann D: Antimicrobial peptides in insects; structure and function. Developmental and comparative immunology. 1999, 23 (4–5): 329-344. 10.1016/S0145-305X(99)00015-4.View ArticlePubMedGoogle Scholar
- Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, et al: Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007, 450 (7167): 203-218. 10.1038/nature06341.View ArticlePubMedGoogle Scholar
- Markow TA, O'Grady PM: Drosophila biology in the genomic age. Genetics. 2007, 177 (3): 1269-1276. 10.1534/genetics.107.074112.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMedGoogle Scholar
- Apidianakis Y, Mindrinos MN, Xiao W, Lau GW, Baldini RL, Davis RW, Rahme LG: Profiling early infection responses: Pseudomonas aeruginosa eludes host defenses by suppressing antimicrobial peptide gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102 (7): 2573-2578. 10.1073/pnas.0409588102.PubMed CentralView ArticlePubMedGoogle Scholar
- Boutros M, Agaisse H, Perrimon N: Sequential activation of signaling pathways during innate immune responses in Drosophila. Developmental cell. 2002, 3 (5): 711-722. 10.1016/S1534-5807(02)00325-8.View ArticlePubMedGoogle Scholar
- De Gregorio E, Spellman PT, Rubin GM, Lemaitre B: Genome-wide analysis of the Drosophila immune response by using oligonucleotide microarrays. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98 (22): 12590-12595. 10.1073/pnas.221458698.PubMed CentralView ArticlePubMedGoogle Scholar
- De Gregorio E, Spellman PT, Tzou P, Rubin GM, Lemaitre B: The Toll and Imd pathways are the major regulators of the immune response in Drosophila. The EMBO journal. 2002, 21 (11): 2568-2579. 10.1093/emboj/21.11.2568.PubMed CentralView ArticlePubMedGoogle Scholar
- Irving P, Ubeda JM, Doucet D, Troxler L, Lagueux M, Zachary D, Hoffmann JA, Hetru C, Meister M: New insights into Drosophila larval haemocyte functions through genome-wide analysis. Cellular microbiology. 2005, 7 (3): 335-350. 10.1111/j.1462-5822.2004.00462.x.View ArticlePubMedGoogle Scholar
- Uttenweiler-Joseph S, Moniatte M, Lagueux M, Van Dorsselaer A, Hoffmann JA, Bulet P: Differential display of peptides induced during the immune response of Drosophila: a matrix-assisted laser desorption ionization time-of-flight mass spectrometry study. Proceedings of the National Academy of Sciences of the United States of America. 1998, 95 (19): 11342-11347. 10.1073/pnas.95.19.11342.PubMed CentralView ArticlePubMedGoogle Scholar
- Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004, 340 (4): 783-795. 10.1016/j.jmb.2004.05.028.View ArticlePubMedGoogle Scholar
- Sabatier L, Jouanguy E, Dostert C, Zachary D, Dimarcq JL, Bulet P, Imler JL: Pherokine-2 and -3: two Drosophila molecules related to pheromone/odor-binding proteins induced by viral and bacterial infections. European journal of biochemistry/FEBS. 2003, 270 (16): 3398-3407. 10.1046/j.1432-1033.2003.03725.x.View ArticlePubMedGoogle Scholar
- de Morais Guedes S, Vitorino R, Domingues R, Tomer K, Correia AJ, Amado F, Domingues P: Proteomics of immune-challenged Drosophila melanogaster larvae hemolymph. Biochemical and biophysical research communications. 2005, 328 (1): 106-115. 10.1016/j.bbrc.2004.12.135.View ArticlePubMedGoogle Scholar
- Vierstraete E, Verleyen P, Baggerman G, D'Hertog W, Bergh Van den G, Arckens L, De Loof A, Schoofs L: A proteomic approach for the analysis of instantly released wound and immune proteins in Drosophila melanogaster hemolymph. Proc Natl Acad Sci USA. 2004, 101 (2): 470-475. 10.1073/pnas.0304567101.PubMed CentralView ArticlePubMedGoogle Scholar
- Vierstraete E, Verleyen P, Sas F, Bergh Van den G, De Loof A, Arckens L, Schoofs L: The instantly released Drosophila immune proteome is infection-specific. Biochemical and biophysical research communications. 2004, 317 (4): 1052-1060. 10.1016/j.bbrc.2004.03.150.View ArticlePubMedGoogle Scholar
- Karlsson C, Korayem AM, Scherfer C, Loseva O, Dushay MS, Theopold U: Proteomic analysis of the Drosophila larval hemolymph clot. J Biol Chem. 2004, 279 (50): 52033-52041. 10.1074/jbc.M408220200.View ArticlePubMedGoogle Scholar
- Scherfer C, Karlsson C, Loseva O, Bidla G, Goto A, Havemann J, Dushay MS, Theopold U: Isolation and characterization of hemolymph clotting factors in Drosophila melanogaster by a pullout method. Current biology. 2004, 14 (7): 625-629. 10.1016/j.cub.2004.03.030.View ArticlePubMedGoogle Scholar
- Vierstraete E, Cerstiaens A, Baggerman G, Bergh Van den G, De Loof A, Schoofs L: Proteomics in Drosophila melanogaster: first 2D database of larval hemolymph proteins. Biochemical and biophysical research communications. 2003, 304 (4): 831-838. 10.1016/S0006-291X(03)00683-1.View ArticlePubMedGoogle Scholar
- de Morais Guedes S, Vitorino R, Tomer K, Domingues MR, Correia AJ, Amado F, Domingues P: Drosophila melanogaster larval hemolymph protein mapping. Biochemical and biophysical research communications. 2003, 312 (3): 545-554. 10.1016/j.bbrc.2003.10.156.View ArticleGoogle Scholar
- Perkins DN, Darryl JC, Pappin David M, Creasy John S Cottrell: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999, 20 (18): 3551-3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2.View ArticlePubMedGoogle Scholar
- Marioni JC, Mason CE, Shrikant MM, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research. 2008, 18 (9): 1509-17. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Lazzaro BP, Sceurman BK, Clark AG: Genetic basis of natural variation in D. melanogaster antibacterial immunity. Science. 2004, 303 (5665): 1873-1876. 10.1126/science.1092447.View ArticlePubMedGoogle Scholar
- Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics. 2005, 4 (9): 1265-1272. 10.1074/mcp.M500061-MCP200.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.