Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis
© Hahn et al; licensee BioMed Central Ltd. 2009
Received: 13 February 2009
Accepted: 19 May 2009
Published: 19 May 2009
Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptomic data are available for members of this genus. We used massively parallel pyrosequencing on the Roche 454-FLX platform to produce a substantial EST dataset for the flesh fly Sarcophaga crassipalpis. To maximize sequence diversity, we pooled RNA extracted from whole bodies of all life stages and normalized the cDNA pool after reverse transcription.
We obtained 207,110 ESTs with an average read length of 241 bp. These reads assembled into 20,995 contigs and 31,056 singletons. Using BLAST searches of the NR and NT databases we were able to identify 11,757 unique gene elements (E<0.0001) representing approximately 9,000 independent transcripts. Comparison of the distribution of S. crassipalpis unigenes among GO Biological Process functional groups with that of the Drosophila melanogaster transcriptome suggests that our ESTs are broadly representative of the flesh fly transcriptome. Insertion and deletion errors in 454 sequencing present a serious hurdle to comparative transcriptome analysis. Aided by a new approach to correcting for these errors, we performed a comparative analysis of genetic divergence across GO categories among S. crassipalpis, D. melanogaster, and Anopheles gambiae. The results suggest that non-synonymous substitutions occur at similar rates across categories, although genes related to response to stimuli may evolve slightly faster. In addition, we identified over 500 potential microsatellite loci and more than 12,000 SNPs among our ESTs.
Our data provides the first large-scale EST-project for flesh flies, a much-needed resource for exploring this model species. In addition, we identified a large number of potential microsatellite and SNP markers that could be used in population and systematic studies of S. crassipalpis and other flesh flies.
Flesh flies in the genus Sarcophaga (Diptera: Sarcophagidae) have long been important models in insect physiology and biochemistry, particularly with respect to the study of endocrinology [1, 2], diapause [3, 4], cold hardiness [5, 6], reproduction [7, 8], and immunity . Larvae of Sarcophaga flies typically feed on carrion or feces, playing an important ecological role in decomposition and feature prominently in human forensic studies [10, 11]. Sarcophaga species can cause myiasis in humans and livestock and therefore are of importance to medical and veterinary entomology . In addition, Sarcophaga flies are natural hosts of the parasitic wasp Nasonia vitripennis and are often used in laboratory studies and for wasp culture. Nasonia vitripennis parasitizes pupae of many species of higher flies, including house flies and other filth flies, making them economically important for biological control . Furthermore, N. vitripennis is a popular model for understanding host-parasite relationships, behavioral ecology, development, and evolution, thus offering an impetus for sequencing the genome of this wasp and two congeners N. giraulti and N. longicornis [14–16].
Despite its prominence as a model in several sub-disciplines, there is currently no genome project for any Sarcophaga species, and genomic resources are limited. Several laboratories have performed low-throughput EST projects such as differential display  and subtractive hybridization , but the mainstay of functional genomic research on species in this genus has focused on homology-based, gene-by-gene cloning. The genomic resources developed for a handful of insects including Drosophila, honey bees, mosquitoes, and Bombyx mori has prompted much work on these species, pushing our understanding of processes in these few models far ahead of other insects. Development of such resources for Sarcophaga would greatly benefit the study of flesh flies as has been done with other non-traditional model insects such as Polistes wasps , Glanville fritillary butterflies , and tobacco hornworm moths .
RNA Extraction Strategy.
Exposed to short days to induce diapause development
Exposed to long days to induce direct development
Fed only sugar
Fed sugar and protein (liver)
First Instar Larvae
First Instar Larvae
Day 1 Males
Day 1 Males
Second Instar Larvae
Second Instar Larvae
Day 1 Females
Day 1 Females
Early Third Instar Larvae
Early Third Instar Larvae
Day 2 Males
Day 2 Males
Late Third Instar Larvae
Late Third Instar Larvae
Day 2 Females
Day 2 Females
Day 3 Males
Day 3 Males
Early Diapause Pupae (15 days)
Day 3 Females
Day 3 Females
Early Pharate Adult
Mid-diapause Pupae (30 days)
Day 4 Males
Day 4 Males
Mid-Pharate Adult (red eye stage)
Late Diapause Pupae (60 Days)
Day 4 Females
Day 4 Females
Late Pharate Adult (black bristle stage)
Day 5 Males
Day 5 Males
Day 5 Females
Day 5 Females
Day 7 Males
Day 7 Males
Day 7 Females
Day 7 Females
Day 9 Males
Day 9 Males
Day 9 Females
Day 9 Females
Results and discussion
Our normalized S. crassipaplis library was first subjected to a one-quarter plate titration run on the 454 GS-FLX machine that yielded 1,512,233 total bases from 5,727 sequence reads that assembled into 554 contigs and 3,182 singletons. Of these contigs and singletons 1,465 were identifiable at an E ≤ 0.001 using BLASTX against the NCBI non-redundant protein database (NR).
We followed this one-quarter plate preliminary run with a full-plate production run that yielded 72,816,811 total bases from 281,537 sequence reads and combined the results of both runs into a single dataset which was assembled de novo. Files containing our results are available from the National Center for Biotechnology Information Short Read Archive (accession numbers SRR005065 and SRR006884). After filtering adaptors and low-quality sequences  we were left with 74,329,044 total bases from 207,110 sequence reads with an average length of 241 bases. These data were assembled into 20,995 contigs with a mean length of 332 bp and a range of 30 to 2,958 bp as well as an additional 31,056 singletons for a total of 52,051 high-quality sequences. A file of all assembled contigs and singletons is available from the authors upon request.
Summary of Sarcophaga crassipalpis EST data.
Average Read Length
Number of Contigs
Average Contig Length
Range Contig Length
30 to 2,958
Number of Reads in Contigs
Number of Singletons
Unigenes with E ≤ 0.001 vs. NR or NT Database
Estimated Unique Transcripts
While the ~9,000 identified transcripts are certainly less than the entire transcriptome, we acquired many sequences of interest. Because our groups, among others, are particularly interested in diapause and stress biology, we compiled a brief list of genes we expect play a role in diapause (Additional file 1). Clearly, even this abbreviated list will provide fodder for years of functional genomics work on these candidate genes for our group and others.
In addition to the focal animal, transcriptomics projects can also provide insights into parasites, pathogens, and other associated microorganisms. Because we used whole organisms grown in non-sterile culture conditions we expected a portion of our library to include microorganismal sequences, some which are directly associated with S. crassipalpis and some that are inadvertent environmental contaminants. We identified only 160 non-metazoan sequences that appeared to be independent transcripts with an E<0.0001 from our BLASTX and BLASTN searches. Although we expect that flesh flies have a rich microorgansimal community, our selection of poly-A RNA as the starting material for our library likely eliminated many potential microbial sequences. Few hits were ascribed to any single microorganismal taxa. Roughly one-third of the sequences had high sequence similarity to various eukaryotic microorganisms including yeasts (e.g., Candida, Saccharomyces), filamentous fungi (e.g., Aspergillus, Neurospora), and protozoans (e.g., Plasmodium, Theileria). Sequences having apparent bacterial origin included many genera with species that have been associated with insects either as pathogens or as normal flora in the gut (e.g., Burkholderia and Escherichia), but these sequences could just as easily represent environmental contamination. Perhaps most interesting of the non-metazoan sequences were those for transposable elements, particularly sequences in the Mariner-Tc1 family and several retroviral elements. With future development such sequences may provide insight into new functional genomics tools for Sarcophaga, potentially including transgenic modification.
Another interesting feature of our 454 sequence data is that a number of sequences appear to contain repeat motifs (i.e., microsatellites) or single nucleotide polymorphisms (SNPs). We searched for microsatellite motifs within our sequences using the program MSATCOMMANDER  to identify sequences with di-, tri-, tetra-, penta, and hexanucleotide repeats. Our search for repeat motifs revealed 521 sequences containing some such motif from the total set of 52,053 sequences (using a minimum repeat length of seven, Additional file 2). The high levels of variation typically found for microsatellite markers have led to their widespread use for a broad range of studies such as genome mapping, parentage analysis, and analyses of gene flow and genetic structure. Because our library was constructed from a long-standing laboratory colony expected to harbor little standing variation compared to field populations, we did not screen any of these loci for allelic variability. Similarly, to identify SNPs in our contigs we used the program PolyBayesShort, a version of the PolyBayes program  that has been optimized for data from short read platforms such as 454, under the default parameters. We found at total of 12,090 SNPs that were distributed among 3436 of our 20,995 contigs, yielding an average of 7.23 SNPs per 10 kb. However, because our samples were derived from a long-standing, inbred laboratory colony our SNP estimates are likely a small fraction of the variation existing in S. crassipalpis populations in the field. We hope that identification of candidate microsatellite and SNP loci will be useful for others interested in field-based population studies of S. crassipalpis and closely related Sarcophaga species, some of which are important for forensic studies.
Studies of flesh flies in the genus Sarcophaga have contributed greatly to our understanding of diapause, stress resistance, reproduction, and immunity in insects, but work in this "non-traditional" model organism has been hampered by a lack of genomic resources. Our use of pyroseqencing to develop a new EST library containing approximately 9,000 transcripts provides a new resource for innovative work that we hope will draw renewed interest in using flesh flies as a model. We expect that these EST data will provide the foundation for a proliferation of functional work on flesh flies including "-omics" approaches such as the development of microarrays for large scale gene expression studies as well as serving as a foundation for identifying previously unrecognizable proteins in proteomics studies. These data will also increase our ability to explore specific gene and pathway-based approaches such as transcript quantification by qRT-PCR and RNA interference of candidate genes that have previously been refractory to homology-based cloning. Furthermore, these data compliment the growing number of EST and genome projects in flies other than Drosophila and mosquitoes, such as the medfly Ceratitis capitata and the Hessian fly Mayetiola destructor [26, 27], enhancing resources for comparative studies of molecular evolution across the Diptera. Our initial comparative analysis of evolutionary rates across GO categories illustrates this utility, and we anticipate that similar genome and transcriptome-wide analyses will yield important insights into insect diversification.
The culture of S. crassipalpis used for this work originated in Columbus, Ohio, and was maintained in the laboratory for nearly 8 years before the onset of this project. Non-diapausing pupae were obtained by rearing the colony under long-day conditions (15L:9D) with adults held at 25°C and larvae and pupae at 20°C, whereas diapausing pupae were generated by rearing the maternal generation of adults, larvae and pupae under short-day conditions (12L:12D) with adults held at 25°C and larvae and pupae held at 20°C, as previously described .
We created a single normalized cDNA library from all life stages to maximize representation of the S. crassipalpis transcriptome. Because we are particularly interested in pupal diapause and reproduction, we included both diapause-destined and direct developing individuals, and among adults we included both protein-fed and protein-starved males and females sampled from eclosion through one reproductive bout (see Table 1 for full sampling scheme). RNA from each sample group was extracted separately in Tri-reagent (Ambion) and the quality of total RNA was verified on 1.4% agarose-MOPS-formaldehyde denaturing gels. Poly-A RNA was extracted from each total RNA sample using the Oligotex® mRNA Midi Kit (Qiagen) and the quality again verified on a denaturing gel. Because the total mRNA yield from each pool differed, 1 μg of mRNA from each sample group was combined into a single large pool and mixed well. This single large, equally-mixed pool was the source of starting material for the creation of the normalized cDNA library. An initial cDNA library was constructed using the SMART cDNA Synthesis Kit (BD Clonetech, Mountain View, CA) and normalized using the Trimmer-Direct Kit (Evrogen, Moscow). The quality of the normalized cDNA library was checked on a 1.4% agarose gel.
Approximately 2.5 μg of the normalized cDNA pool was used for a titration run using one quarter of a plate on the Roche GS FLX sequencer and was followed by a whole plate run on the same equipment using 15 μg of cDNA and a full GS FLX plate at the Interdisciplinary Center for Biotechnology Research at the University of Florida (UF-ICBR). The pyroluminescence intensity for each bead-containing well was used to determine base calls for each nucleotide incorporation step. Sequence assembly was performed first using the Newbler Assembler v1.1.02.15 (454 Life Science, Branford, CT). Afterwards, Newbler output, including all contigs and singletons, was subjected to a second assembly step using the Paracel Transcript Assembler v3.0.0 (Paracel Inc., Pasadena, CA). Sequences were masked for oligonucleotide adaptors used in SMART cDNA synthesis and normalization using a threshold of 15. Low-quality data (base call score < 10) was trimmed from the ends of individual sequences. Low complexity sequence regions (simple sequence repeats) were identified and excluded from consideration during initial pair-wise comparison, but were included during final alignment and consensus building. Assembly was performed in two stages. The first stage used the "Haste" algorithm to build groups (or clusters) of sequences sharing a minimal amount of identifiable sequence similarity (threshold = 50). The second stage assembled sequences within individual clusters into consensus transcripts using the software defaults, except parameters MinCovRep (500), InOverhang (30), EndOverhang (30), RemOverhang (30), QualSumLim (300), MaxInternalGaps (15) and PenalizeN (0). For our dataset, an advantage of the second assembly step using Paracel Transcript Assembler was in cleaning-up the data by further masking any SMART adaptor sequences or poly-A stretches that were included in the initial Newbler assembly.
All contigs and singlets were annotated by BLAST search against NCBI NR and NT databases where the e-value threshold was set at 1e-4. For each query sequence, the top 100 BLAST hits were obtained if available and stored in BlastQuest . If a BLAST hit represented a NCBI Gene database entry, it was used to map Sarcophaga transcripts to putative homologs from Drosophila, human, mouse, zebrafish, dog or other organisms. Gene Ontology (GO) annotations were also derived based on sequence similarity. Transcripts were classified into 14 major Gene Ontology categories under biological process. If a transcript was annotated with more than one GO category it was split equally among them.
We also used our GO annotations to assess whether coding sequence divergence within Diptera is relatively homogenous across GO categories. We obtained full translation databases for D. melanogaster and A. gambiae from FlyBase http://flybase.org/ and VectorBase http://www.vectorbase.org/, respectively. We then used our S. crassipalpis EST database as a query in a BlastX search against each of these additional databases setting the e-value threshold to 1 × 10-4 and using default values for all other parameters. The 454 sequencing technology is prone to producing apparent frame-shift errors via erroneous insertions or deletions of extra bases , limiting the quality and length of codon alignments and blastX hits. Without correction, codons and proteins will align in overlapping, frame-shifted segments, and to our knowledge no pyrosequencing study has previously addressed this issue. We developed a procedure to identify and correct for 454 insertion (overcall) and deletion (undercall) errors including the following steps: 1) Identify BlastX hits with two frame-shifted, overlapping High-scoring Segment Pairs (HSPs) both with e-values less than 1 × 10-5 (if present, we excluded additional, higher e-value HSPs from our alignments). 2) Determine protein alignment score for each HSP in the region of overlap. We calculated alignment scores by assigning a value of 1 for identical residues, 0.5 for a positive alignment score in the BLOSUM62 scoring matrix, and 0 for non-positive BLOSUM62 scores or a gap. 3) Retain the region of overlap in the HSP alignment with the highest alignment score, and remove the region of overlap from both the un-aligned nucleotide and aligned protein sequence of the lower scoring HSP. 4) Remove one or two additional nucleotides past the region of overlap in the un-aligned nucleotide sequence for the lower-scoring HSP if the frameshift is +1 or +2, respectively. These are nucleotides in the 5' or 3' direction if the lower scoring HSP is more 5' or 3', respectively. 5) Insert a single gap in the query protein sequence at the 3' end of the first HSP (or the 5' end of the second HSP) if the frameshift is +2. 6) Concatenate the highest scoring HSP (protein) with the lowest scoring, truncated HSP. The procedure corrects for single insertion and deletion errors to produce the best scoring full alignment based on our scoring criteria for the region of overlap. In the case of an insertion (+1 frameshift) error, a single nucleotide is removed in the correction. In the case of a deletion (+2 frameshift) two nucleotides are removed and a gap in the protein sequence is introduced. This effectively removes a codon from the alignment, reflecting the uncertainty in reconstructing the translated amino acid associated with the deletion. Out of 12,321 total blastX hits from a Sarcophaga to Drosophila translation database search, 1036 (8 percent) had putative frame-shift errors, 681 +1 (insertion) shifts and 355 +2 (deletion) shifts. Clearly correcting these frame-shift errors will substantially improve genome-wide alignments using any 454-derived transcriptome.
After applying frame-shift corrections, we retained the top-scoring amino acid alignment from each BlastX hit and produced the associated transcript alignments using tranalign in the EMBOSS package  in conjunction with transcription databases for D. melanogaster and A. gambiae. Using the program YN00 from the PAML package  we estimated synonymous (dS) and non-synonymous (dN) substitution rates for each alignment via the method of Yang and Nielsen . Our initial intention was to estimate dN/dS ratios, but alignments were largely oversaturated with synonymous substitution rates (often inestimable). Thus, we retained only dN estimates. Finally, we estimated the median, interquartile range (IQR), and an estimate of the 95% confidence interval of the median as ± 1.58 IQR/sqrt(n) for dN values from each GO category (genes falling into more than one GO category contribute to estimates in all associated categories). This particular 95% CI estimate provides a confidence interval that is largely insensitive to the underlying distribution . We also performed all of the above steps on the pairwise comparison between D. melanogaster and A. gambiae, yielding three pairwise comparisons in the complete analysis.
We thank Drs. Bill Farmerie and Li Liu from the UF-ICBR core facility for providing critical advice and bioinformatics support during all phases of this project, as well as Dietmar Schwarz and Rob Michaud for comments on the manuscript. This work was supported by funds from USDA-CSREES 2004-35302-745994, NSF-IOS-641505, and the Florida State Agricultural Experiment Station to DH, USDA NRICGP 2006-35302-18001 (Agreement No. 58-6615-7-137) to DS, and NSF-IOB-0416720 to DD.
- Fraenkel G, Sivasubramanian P, Zdarek J: Hormonal factors in CNS and hemolymph of pupariating fly larvae which accelerate puparium formation and tanning. Biol Bull. 1972, 143 (1): 127-139. 10.2307/1540333.View Article
- Verleyen P, Huybrechts J, Sas F, Clynen E, Baggerman G, De Loof A, Schoofs L: Neuropeptidomics of the grey flesh fly, Neobellieria bullata. Biochem Biophys Res Commun. 2004, 316 (3): 763-770. 10.1016/j.bbrc.2004.02.115.View ArticlePubMed
- Denlinger DL: Regulation of diapause. Annu Rev Entomol. 2002, 47: 93-122. 10.1146/annurev.ento.47.091201.145137.View ArticlePubMed
- Hahn DA, Denlinger DL: Meeting the energetic demands of insect diapause: Nutrient storage and utilization. J Insect Physiol. 2007, 53 (8): 760-773. 10.1016/j.jinsphys.2007.03.018.View ArticlePubMed
- Lee RE, Chen CP, Denlinger DL: A rapid cold-hardening process in insects. Science. 1987, 238 (4832): 1415-1417. 10.1126/science.238.4832.1415.View ArticlePubMed
- Rinehart JP, Li A, Yocum GD, Robich RM, Hayward SAL, Denlinger DL: Up-regulation of heat shock proteins is essential for cold survival during insect diapause. PNAS. 2007, 104 (27): 11130-11137. 10.1073/pnas.0703538104.PubMed CentralView ArticlePubMed
- Browne LB: Quantitative aspects of the regulation of ovarian development in selected anautogenous Diptera: integration of endocrinology and nutrition. Entomologia Experimentalis et Applicata. 2001, 100 (2): 137-149. 10.1046/j.1570-7458.2001.00859.x.View Article
- Hahn DA, James LN, Milne KR, Hatle JD: Life-history plasticity after attaining a dietary threshold for reproduction is associated with protein storage in flesh flies. Funct Ecol. 2008, 22 (6): 1081-1090. 10.1111/j.1365-2435.2008.01451.x.PubMed CentralView ArticlePubMed
- Natori S, Shiraishi H, Hori S, Kobayashi A: The roles of Sarcophaga defense molecules in immunity and metamorphosis. Developmental & Comparative Immunology. 1999, 23 (4–5): 317-328. 10.1016/S0145-305X(99)00014-2.View Article
- Pape T: Catalogue of the Sarcophagidae of the world. Memoires of Entomology International. 1996, 8: 1-558.
- Wells JD, Stevens JR: Application of DNA-based methods in forensic entomology. Annu Rev Entomol. 2008, 53 (1): 103-120. 10.1146/annurev.ento.52.110405.091423.View ArticlePubMed
- Sherman RA, Hall MJR, Thomas S: Medicinal maggots: An ancient remedy for some contemporary afflictions. Annu Rev Entomol. 2000, 45 (1): 55-81. 10.1146/annurev.ento.45.1.55.View ArticlePubMed
- Legner EF: Biological control of Diptera of medical and vetrinary importance. J Vector Ecol. 1995, 20 (1): 59-120.
- Rivers DB, Ruggiero L, Hayes M: The ectoparasitic wasp Nasonia vitripennis (Walker) (Hymenoptera: Pteromalidae) differentially affects cells mediating the immune response of its flesh fly host, Sarcophaga bullata Parker (Diptera: Sarcophagidae). J Insect Physiol. 2002, 48 (11): 1053-1064. 10.1016/S0022-1910(02)00193-2.View ArticlePubMed
- Beukeboom L, Desplan C:Nasonia. Curr Biol. 2003, 13 (22): R860-R860. 10.1016/j.cub.2003.10.042.View ArticlePubMed
- Opijnen T, Baudry E, Baldo L, Bartos J, Werren JH: Genetic variability in the three genomes of Nasonia : nuclear, mitochondrial and Wolbachia. Insect Mol Biol. 2005, 14 (6): 653-663. 10.1111/j.1365-2583.2005.00595.x.View ArticlePubMed
- Flannagan RD, Tammariello SP, Joplin KH, Cikra-Ireland RA, Yocum GD, Denlinger DL: Diapause-specific gene expression in pupae of the flesh fly Sarcophaga crassipalpis. PNAS. 1998, 95 (10): 5616-5620. 10.1073/pnas.95.10.5616.PubMed CentralView ArticlePubMed
- Yamamoto-Kihara M, Kotani E: Isolation and characterization of a C-type lectin cDNA specifically expressed in the tip of mouthparts of the flesh fly Sarcophaga peregrina. Insect Mol Biol. 2004, 13 (2): 133-140. 10.1111/j.0962-1075.2004.00468.x.View ArticlePubMed
- Toth AL, Varala K, Newman TC, Miguez FE, Hutchison SK, Willoughby DA, Simons JF, Egholm M, Hunt JH, Hudson ME, et al: Wasp gene expression supports an evolutionary link between maternal behavior and eusociality. Science. 2007, 318 (5849): 441-444. 10.1126/science.1146647.View ArticlePubMed
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology. 2008, 17 (7): 1636-1647. 10.1111/j.1365-294X.2008.03666.x.View ArticlePubMed
- Zou Z, Najar F, Wang Y, Roe B, Jiang H: Pyrosequence analysis of expressed sequence tags for Manduca sexta hemolymph proteins involved in immune responses. Insect Biochemistry and Molecular Biology. 2008, 38 (6): 677-682. 10.1016/j.ibmb.2008.03.009.PubMed CentralView ArticlePubMed
- Huse S, Huber J, Morrison H, Sogin M, Welch D: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology. 2007, 8 (7): R143-10.1186/gb-2007-8-7-r143.PubMed CentralView ArticlePubMed
- Friedrich M, Tautz D: Evolution and phylogeny of the Diptera: A molecular phylogenetic analysis using 28S rDNA sequences. Systematic Biology. 1997, 46 (4): 674-698. 10.2307/2413500.View ArticlePubMed
- Faircloth BC: MSATCOMMANDER: detection of microsatellite repeat arrays and automated, locus-specific primer design. Mol Ecol Resour. 2008, 8 (1): 92-94. 10.1111/j.1471-8286.2007.01884.x.View ArticlePubMed
- Marth GT, Korf I, Yandell MD, Yeh RT, Gu ZJ, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR: A general approach to single-nucleotide polymorphism discovery. Nature Genet. 1999, 23 (4): 452-456. 10.1038/70570.View ArticlePubMed
- Chen M-S, Zhao H-X, Zhu YC, Scheffler B, Liu X, Liu X, Hulbert S, Stuart JJ: Analysis of transcripts and proteins expressed in the salivary glands of Hessian fly (Mayetiola destructor) larvae. J Insect Physiol. 2008, 54 (1): 1-16. 10.1016/j.jinsphys.2007.07.007.View ArticlePubMed
- Gomulski LM, Dimopoulos G, Xi ZY, Soares MB, Bonaldo MF, Malacrida AR, Gasperi G: Gene discovery in an invasive tephritid model pest species, the Mediterranean fruit fly, Ceratitis capitata. BMC Genomics. 2008, 9: 15-10.1186/1471-2164-9-243.View Article
- Denlinger DL: Induction and termination of pupal diapause in Sarcophaga (Diptera: Sarcophagidae). Biol Bull. 1972, 142 (1): 11-24. 10.2307/1540242.View Article
- Farmerie WG, Hammer J, Liu L, Sahni A, Schneider M: Biological workflow with BlastQuest. Data Knowl Eng. 2005, 53 (1): 75-97. 10.1016/S0169-023X(04)00113-2.View Article
- Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum C, Jaffe DB: Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008, 18 (5): 763-770. 10.1101/gr.070227.107.PubMed CentralView ArticlePubMed
- Rice P, Longden I, Bleasby A: EMBOSS: The European molecular biology open software suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.View ArticlePubMed
- Yang Z: PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.View ArticlePubMed
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17 (1): 32-43.View ArticlePubMed
- McGill R, Tukey JW, Larsen WA: Variations of box plots. American Statistician. 1978, 32 (1): 12-16. 10.2307/2683468.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.