- Research article
- Open Access
A BAC-based physical map of the Hessian fly genome anchored to polytene chromosomes
© Aggarwal et al; licensee BioMed Central Ltd. 2009
- Received: 13 January 2009
- Accepted: 02 July 2009
- Published: 02 July 2009
The Hessian fly (Mayetiola destructor) is an important insect pest of wheat. It has tractable genetics, polytene chromosomes, and a small genome (158 Mb). Investigation of the Hessian fly presents excellent opportunities to study plant-insect interactions and the molecular mechanisms underlying genome imprinting and chromosome elimination. A physical map is needed to improve the ability to perform both positional cloning and comparative genomic analyses with the fully sequenced genomes of other dipteran species.
An FPC-based genome wide physical map of the Hessian fly was constructed and anchored to the insect's polytene chromosomes. Bacterial artificial chromosome (BAC) clones corresponding to 12-fold coverage of the Hessian fly genome were fingerprinted, using high information content fingerprinting (HIFC) methodology, and end-sequenced. Fluorescence in situ hybridization (FISH) co-localized two BAC clones from each of the 196 longest contigs on the polytene chromosomes. An additional 70 contigs were positioned using a single FISH probe. The 266 FISH mapped contigs were evenly distributed and covered 60% of the genome (95,668 kb). The ends of the fingerprinted BACs were then sequenced to develop the capacity to create sequenced tagged site (STS) markers on the BACs in the map. Only 3.64% of the BAC-end sequence was composed of transposable elements, helicases, ribosomal repeats, simple sequence repeats, and sequences of low complexity. A relatively large fraction (14.27%) of the BES was comprised of multi-copy gene sequences. Nearly 1% of the end sequence was composed of simple sequence repeats (SSRs).
This physical map provides the foundation for high-resolution genetic mapping, map-based cloning, and assembly of complete genome sequencing data. The results indicate that restriction fragment length heterogeneity in BAC libraries used to construct physical maps lower the length and the depth of the contigs, but is not an absolute barrier to the successful application of the technology. This map will serve as a genomic resource for accelerating gene discovery, genome sequencing, and the assembly of BAC sequences. The Hessian fly BAC-clone assembly, and the names and positions of the BAC clones used in the FISH experiments are publically available at http://genome.purdue.edu/WebAGCoL/Hfly/WebFPC/.
- Bacterial Artificial Chromosome
- Bacterial Artificial Chromosome Clone
- Nile Tilapia
- Polytene Chromosome
- Bacterial Artificial Chromosome Library
The Hessian fly (Mayetiola destructor) is an important pest of wheat (Triticum spp.) and a member of one of the largest and most economically important families of insects, the gall midges (Cecidomyiidae). Its genetic tractability and short generation time (~28 days) make it especially attractive as an experimental model for investigating insect-plant interactions . This capacity was first demonstrated when it was the first insect shown to have a gene-for-gene interaction with its host plant . Previously, it was used to study chromosome elimination and the function of the germ-line-limited "E" chromosomes that characterize all gall midge species [3–5]. Later investigations demonstrated that it has other biological attributes that are compatible with genomic analyses, including polytene chromosomes in the larval salivary glands  and a small genome (158 Mb) . The development of a physically anchored genetic map , a syntenic analysis of a BAC-based contig , and transcriptomic analyses of the first instar salivary glands [10, 11] also demonstrated the potential of this insect for comparative genomics with other dipteran species.
Improved genomic maps are essential to further develop this insect as an experimental organism. However, certain biological characteristics make fine-scale genetic mapping in the Hessian fly problematic. The first problem is the relatively low number of offspring produced by single females, typically ranging from 50 to 200, which limits the resolution of a genetic map. The second problem is the small size (2 to 3 mm in length) of the insect. This severely limits the yield of the genomic DNA extracted from individuals, and the number of molecular markers that can be genetically mapped. A third problem, the insect's unusual chromosome cycle and mechanism of sex determination, makes the construction of inbred strains difficult. Because this phenomenon is both central to the genetics of the insect and worthy of genomic investigation, it is briefly described below.
The germ line of the Hessian fly contains both maternally and paternally derived copies of each of two autosomes (A1 and A2) and two X chromosomes (X1 and X2) . It also contains a variable number (~32) of maternally inherited germ-line-limited "E" chromosomes (A1A2X1X2/A1A2X1X2; E). The E chromosomes are eliminated from all future somatic cells during the fifth cleavage division of embryogenesis . This post-zygotic division also determines sex. Embryos that eliminate the paternally inherited X chromosomes, together with the E chromosomes, possess a male determining somatic karyotype (A1A2X1X2/A1A2OO) and develop as males. Those that retain both sets of X chromosomes possess a female determining somatic karyotype (A1A2X1X2/A1A2X1X2) and develop as females. Maternal genotype determines whether the paternally derived X chromosomes are eliminated or retained in the soma . As a consequence, an unusual mating structure is established that is resistant to inbreeding; females produce either all-female or all-male families. Chromosome elimination also occurs during spermatogenesis. The result is that all sperm carry only the maternally derived autosomes and X chromosomes (A1A2X1X2). Genetic recombination between the homologous autosomes and X chromosomes occurs only during meiosis of oogenesis. All ova carry a single copy of each autosome and each X chromosome and a full complement (~32) of E chromosomes (A1A2X1X2; E). There is no heterogametic sex since all sperm and all ova are genetically equivalent and sex is determined after fertilization.
BAC libraries fingerprinted for the Hessian fly physical map
Mean Insert Size (kb)
No. FP BAC clones
Mean no. bands/clone
Summary of the Hessian fly physical map
Number of processed clones
Number of clones used in contig assembly
Number of singletons
Number of contigs
Physical length of the contigs (kb)
Contigs were assembled from the fingerprint data using the computer program FPC version 8.5.1 [15, 19, 20]. FPC parameters were adjusted for the HICF technique as described by Lou et al.  and Nelson . In each assembly, we used a tolerance (5) that restricted two bands with lengths greater than 0.5 bp from being considered the same band. In the initial assembly, we used a cutoff (the threshold for the probability score that matching bands are a coincidence) of 1e-35. Using these settings, FPC built 1258 contigs containing from 2 to 126 BAC clones per contig. The DQer function was then used to identify contigs with ≥ 10% questionable clones (Qs). These contigs were then gradually split using three more stringent cutoffs (1e-38, 1e-41, and 1e-44). This produced 1477 contigs containing from 2 to 67 BAC clones per contig. The assembly was then end merged at a cutoff of 1e-29. This produced the first generation HICF Hessian fly map, which consisted of 1377 contigs, containing 7,716 BACs, and 4,236 (35%) singletons. The map had an average of 5.6 BACs per contig and the largest contig contained 73 BACs (Table 2). FPC analysis estimated that the clones used to produce the map had 1,061,478 unique bands, averaging 88.8 bands per clone. We used the known lengths of 23 BAC clones and the total number of labeled fragments for each of these clones to estimate average band size (1182 bp). Using this value, the assembled contigs had an estimated length of 283,555 kb, providing approximately 1.8-fold coverage of total genome length. The average contig length was 206 kb and the longest contig covered 1166 kb of the genome. FPC identified a total of 441 questionable clones (Qs) in the assembly. There were 88 contigs with >10% Qs. However, the average number of Qs per contig was only 0.32. Furthermore, 1212 contigs (88%) had no Qs, no contig had more than 14 Qs, and the majority of the Qs were in a few large contigs.
Compared to the contigs of other HICF maps, the Hessian fly contigs were relatively small. For example, the B. rapa, catfish, and Nile tilapia maps each had greater average numbers of BACs per contig (37.7, 23.4, and 9.0 respectively), greater average contig lengths (512 kb, 521 kb, 390 kb) and lower percentages of singletons (20.7, 2.0, and 7.5) [16–18]. Since both BAC coverage and fragment size reproducibility were good, other factors were probably responsible for the shorter and shallower contigs in the Hessian fly map. The organization of the much smaller Hessian fly genome might have been one factor. However, we suspect that a greater frequency of restriction fragment length polymorphisms (RFLPs) in the Hessian fly BAC libraries was the major problem. This had been anticipated because the Hessian fly BAC libraries were each constructed with DNA derived from thousands of individuals in heterogeneous strains that are poorly characterized. Thus, indels, single nucleotide polymorphisms, gene duplications, and other rearrangements may all increase the number of mismatched bands. Regardless of the cause and in spite of the relatively low number of BACs per contig, the coverage of total genome length (1.8) appeared to exceed that of the maps of B. rapa (1.3), catfish (0.93), and Nile tilapia (1.65). In addition, the percentage of Qs in the Hessian fly assembly (0.32) was lower than that observed in maize (11.0) , B. rapa (15.0), catfish (7.3), and Nile tilapia (9.6). Thus, it appeared that the Hessian fly contigs provided reasonable coverage with few questionable clones. This suggested that an abundance of RFLPs is not an absolute barrier to the construction of a HICF-based physical map.
Because the BAC clones were largely derived from two different libraries (Table 1), we performed an FPC assembly of each library separately using the same parameters that were used to assemble both libraries combined. The BACs in the CL library assembled into a greater number of contigs (888 vs. 795) with a greater average number of BACs per contig (4.74 vs. 3.78) than the BACs in the Hf library. In addition, the percentage of singletons in the CL assembly (31%) was fewer than the singletons in the Hf assembly (48%), but the number of contigs with >10% Qs was nearly the same (CL = 33 and Hf = 30). Thus, it appeared that if RFLPs were interfering with contig assembly, they were more abundant in the Hf library. Interestingly, the sum of the total number of contigs assembled using the CL (888) and Hf BACs (793) separately was only 22% greater than the total number of BACs in the combined assembly (1377), and the sum of the coverage provided by the CL (159,451 kb) and Hf BACs (149,099 kb) was only 9% greater than that of the combined assembly (283,555 kb). Thus, it appeared as if the contigs of one library only occasionally overlapped with the contigs of the other library and this lowered contig size more than total coverage when the libraries were combined. This possibility was also evident in the contigs of the FPC map where CL BACs tended to stack with other CL BACs and Hf BACs tended to stack with other Hf BACs . It is also consistent with the suggestion that using libraries prepared with different restriction enzymes reduces the number of gaps in the assembly [22–24].
Assessment of 196 contigs using two BAC clones per contig as FISH probes.
Total no. contigs
Total no. BACs
Mean no. BACs/contig
19 ± 10
21 ± 11
17 ± 11
24 ± 10*†
Total no. Qs
Mean no. Qs/contig
3.8 ± 4.7
6.0 ± 5.3*
3.7 ± 5.7
7.5 ± 4.6**†
Mean no. Qs/BAC/contig
0.16 ± 0.13
0.25 ± 0.17**
0.15 ± 0.16
0.31 ± 0.15**††
Total CB units
Mean CB units/contig
344 ± 129
(407 ± 152)
378 ± 148
(447 ± 175)
324 ± 126
(383 ± 150)
415 ± 154*
(491 ± 182)
Distribution of 266 FISH-positioned contigs in 26 Hessian fly chromosomal segments (A-Z)
% Relative length
No. (%) of contigs
No. (%) of BAC clones
No. (%) of Qs
contig length (% of total)
Using cytogenetic data to validate a physical map has been performed previously in Drosophila species [25, 26]. However, to our knowledge, this is the first time it has been used to anchor the physical map of an insect of agricultural importance. Our method was analogous to performing linkage analysis with the two most terminal BACs in the contigs. Using that approach, 18 of the 19 longest contigs (95%) in the catfish HICF map were validated . Therefore, although the two approaches are not entirely comparable, our results suggest that the Hessian fly map may have a few more errors. Nevertheless, our results also clearly indicate that the Hessian fly assembly was largely valid and that most errors are associated with the contigs containing the greatest number of Qs. Moreover, they demonstrated that the Hessian fly map can be easily improved using FISH as a manual contig-editing tool.
To further evaluate the coverage of the assembled contigs, FISH was used to position 70 additional contigs. In these experiments, only one BAC was used as probe, but we avoided invalid contigs by selecting those that had an average of only 0.1 ± 0.1 Qs per contig. These experiments brought the total number of BACs used as probes to 489 and the total number of contigs that had been FISH-mapped to 266 (Table 4). Of these contigs, 257 were assigned to single chromosome segments. Nine contigs hybridized to the pericentromeric heterochromatin of all four chromosomes, and could not be assigned to a single chromosome segment. The relative length of each chromosome segment was used as a measure of genome content so that the amount of coverage provided to each segment could be compared (Table 4). Except for the nucleolar organizing region (NOR), segment E, every chromosome segment contained at least one contig. The subtelomeric segments A, P, and Q, as well as segments B, F, and H, had a slightly greater proportion of contigs than their relative lengths. In the remaining segments, the proportion of contigs and BACs approximated relative length. Thus, the contigs appeared to be evenly distributed. Disregarding the FPC assembly errors previously discovered, and ignoring the centromeric contigs, 95,668 kb of genome length was covered by the FISH mapped contigs. We observed no evidence suggesting that any contig was exclusively associated with the E chromosomes. This coverage therefore represents approximately 60% of the total content of the Hessian fly autosomes and X chromosomes. Moreover, the heterochromatin of these chromosomes is restricted to the centromeres . These contigs therefore clearly appeared to be distributed in the gene rich regions of these chromosomes.
The FPC assembly was developed into a publically available WebFPC archive that shows the BAC clones that were used as probes to physically anchor each contig . Ordering the contigs that are currently grouped into chromosome segments will be one of the most immediate improvements we make to the map. The segments on the autosomes and the long arm of chromosome X1 are expected to provide sufficient resolution to order the contigs using FISH. However, morphologies of the short arm of chromosome X1 and all of chromosome X2 are problematic. We therefore expect that genetic mapping will be necessary to order the contigs in the corresponding segments.
In order to fully utilize the physical map as a resource in genetic investigations, comparative analyses, and whole genome sequence assembly, we end sequenced the 13,614 fingerprinted BAC clones that were used to generate the physical map. This resulted in 21,814 sequenced BAC ends, 4,708 paired reads, and 13,351,753 bp of high quality bases of Hessian fly genomic DNA [GenBank Trace Archive TI numbers 2136865139–2136875614 and 2136877165–2136888504]. If there were no overlap among the BAC ends, this sequence would represent 8.4% of the Hessian fly genome. The average number of bases sequenced per successfully sequenced BAC end was 764 ± 237. G/C content of all sequenced BACs was relatively low, averaging 33.4 ± 0.03%.
Repetitive sequence composition of Hessian fly BAC ends.
% Total repetitive
Class I transposable elements
Class II transposable elements
Simple sequence repeats
Gene families & unclassified TEs
Within the Class I TEs, LTR retrotransposons were more prominent than the non-LTR retrotransposons (Table 5). The most prominent Class II TE was the Mos1 mariner-like DNA transposable element, which accounted for 1.7% of the repetitive fraction. The remaining Class II TEs composed 0.24% of the repetitive sequence, which had sequence similarities to the Tigger2, Blackjack, Looper, MER45R, and Zaphod TEs. Previous investigation found an abundance of mariner-like elements in Hessian fly pericentromeric heterochromatin . Therefore, it was interesting to note that the combined proportions of TEs and low-complexity sequence in the BES (2.6%) was virtually the same as the proportion of the BACs that hybridized to pericentromeric heterochromatin in the FISH experiments that anchored the physical map to the chromosomes (2.4%). These observations were consistent with a low abundance of TEs in the euchromatin of the small Hessian fly genome, and they suggested that the BACs that hybridized to heterochromatin contained one or more TEs. This in turn suggests that seven of the eleven contigs that were initially classified as "invalid-repetitive" (Table 3) would have been classified as "valid" if not for the presence of one or more TEs in the BACs used as probe.
SSRs, which have a wide range of applications , formed 0.81% of the total BES and 4.51% of the total repetitive fraction. The di- and tri-nucleotide repeats were 29.1% and 33.4% of the SSRs, respectively, and were the most frequent types of SSRs found in this analysis. The most abundant di- and tri-nucleotide motifs were (TC/GA)n and (TAA/TTA)n, which composed 49.9% of the total di-nucleotides and 43.7% of the total tri-nucleotides, respectively. The value of 'n' ranged from 10 to 56 in the (TC/GA)n motif and from 7 to 76 in the (TAA/TTA)n motif.
To identify putative genic sequences in the BES, both the repeatmasked and unmasked BESs were searched against the NCBI non-redundant database using BLASTX . Of the 20,487 total unmasked BES analyzed, 4,221 had blast hits, of which 2,770 were mapped to gene ontology (GO) terms and 1,847 were annotated. For the repeatmasked BES dataset, 3,639 had BLAST hits, 2,374 were assigned to GO categories and 1,565 were returned with annotations. Among completely sequenced genomes, mosquitoes, which belong to the same suborder (Nematocera) as the Hessian fly, had the highest percentages of the top scoring BLASTX hits (Aedes aegypti, 17.3%; Culex quinquefasciatus, 15.7%; and Anopheles gambiae, 10.7%). Of the total number of reads that had a blast hit, 3,386 hit a conserved protein sequence in an insect genome. In both the repeatmasked and unmasked datasets the largest GO categories comprised genes involved in transcription and cellular communication. For example, genes involved in protein binding, nucleic acid and nucleotide binding, hydrolase activity, transferase and signal transduction activities comprised the largest GO categories (86% of the total) based on the association with gene ontology terms in the molecular function category [see Additional file 1]. Interestingly, fewer annotated sequences were observed in the repeatmasked set for almost all categories. This observation indicated that a proportion of the predicted gene products were derived from the repetitive sequences, which when masked, resulted in lower numbers of annotations. This was also consistent with the earlier observation that multigene families composed a relatively large proportion of the repetitive BES. To examine this further, we unmasked the SSRs so that they were excluded from the repeatmasked set [see Additional file 2]. Again, we observed fewer annotations with that data set than with unmasked BES. Therefore, multigene families and transcripts from transposable elements probably accounted for the observed differences between the two data sets.
Hessian fly's small genome and the polytene chromosomes provided an excellent opportunity to determine if BAC libraries constructed from heterogeneous strains would unduly limit the mapping abilities of the HICF and FPC technologies. Although less heterogeneity would have been preferred, the results have clearly improved the Hessian fly as an experimental model. A first generation map now exists that includes 266 FPC contigs containing 4,563 BAC clones and their associated BES positioned to 26 segments of the Hessian fly genome. The BES provided new knowledge regarding Hessian fly genome organization, and the mapped BES constitutes an abundant resource for the development of physically mapped DNA markers. We expect that these improvements will be made as genetic investigations focus on the discovery of avirulence genes, the genes that are involved in plant-gall formation, and the mechanisms of chromosome elimination. In the near future, we also expect the map to facilitate the assembly of whole-genome shotgun sequences and the assignment of sequenced scaffolds to chromosome segments.
Three BAC libraries were utilized in this investigation. Each library was constructed using a separate heterogeneous source of genomic DNA. The Hf library, previously described by Liu et al. 2004 , was prepared from DNA isolated from a Kansas "Great Plains" (GP) population. To supplement the Hf library, we used clones derived from the MD_Bb library developed by the Clemson University Genomics Institute . Herein referred to as the "CL" library, these BACs were prepared from DNA isolated from an Indiana "L" Hessian fly population. From a third library (Mde), 78 BACs were fingerprinted. Seventy-three of these clones were present in two contigs that were previously developed by chromosome walking . These clones therefore served as an internal control while developing contigs from the fingerprint data using FPC software . The Mde library was constructed with DNA isolated from the Georgia derived "vH13" Hessian fly population as described previously . Plates and filters of all of these clones are available on a cost-recovery basis by request from the Purdue Genomics Center .
High-information content DNA fingerprints of BACs were obtained using the SNaPshot Primer Extension Kit (Applied Biosystems, CA) as described by Luo et al. (2003)  with little modification. Each BAC clone was grown in a separate well of a 96-well micro-plate (QIAGEN, CA) containing 150 μl of 2× YT medium (12.5 μg/ml of chloramphenicol). BAC DNA was extracted from each well using the R.E.A.L 96-prep kit following the manufacturer's recommendations (QIAGEN, CA). A QIAvac 96 Manifold, attached to a BioRobot 3000 (QIAGEN, CA), was used to vacuum (-930 mbar) collect the lysate through the QIAfilter into new 96-well blocks. The DNA was then precipitated with iso-propanol, collected by centrifugation, and re-suspended in 40 μl of distilled (dd) H2O. The DNA concentration in four wells per plate was then determined by agarose gel electrophoresis. DNA concentration ranged from 50 to 100 ng/μl.
To fingerprint each BAC, 1 μg of each sample was transferred to a separate well of a 96-well plate. This DNA was then restricted at 37°C for 3 h with 4-units of each of five restriction endonucleases (Eco RI, Bam HI, Xba I and Xho I and Hae III) in 50 μl of a solution containing 5 mM NaCL, 10 mM Tris-Cl (pH 7.9), 1 mM MgCl2, 0.1 mM dithiothreitol, 100 μg/μl bovine serum albumin, 0.5 μg/μl DNase free RNase A, and 0.02% β-mercaptoethanol. After the restriction digestions, 10 μl of a solution containing 0.4 μl of the SNaPshot multiplex kit, 7 μl of 33 mM Tris-Cl (pH 9.0), 50 μmoles of NaCl, and 10 μmoles of MgCl2 were added directly to the solution in each well. This mixture was incubated at 65°C for 1 h. The labeled DNA was then precipitated and air-dried. To prepare each sample for analysis, each labeled DNA pellet was suspended in 6 μl of Hi-Di Formamide (Applied Biosystems, CA) and 3 μl of ddH2O. To this solution, 0.05 μl of an internal size standard (Liz 500; Applied Biosystems, CA) was added. The DNA was then denatured at 95°C and the labeled fragments were separated and sized using an ABI 3730 DNA analyzer.
Fluorescence in situ hybridization (FISH)
Polytene chromosome preparations and in situ hybridizations were performed as previously described . Briefly, BAC DNA (1 μg) was labeled with either biotin- or digoxigenin-conjugated dUTP (Roche) by nick-translation. Hybridizations were performed overnight at 37°C in a solution (10 μl) containing 40–100 ng of denatured probe DNA in 10 μl of hybridization solution (10% dextran sulfate, 2× SSC, 50% deionized formamide, and 10 μg denatured salmon sperm DNA) under a coverslip. Detection was performed using Alexa Fluor 488-conjugated anti-biotin and rhodamine-conjugated anti-digoxigenin (Molecular Probes-Invitrogen). When testing for the co-localization of BACs in the same contig, both labeled BACs (one labeled with biotin and the other labeled with digoxigenin) were hybridized simultaneously to the same polytene chromosome preparations with the expectation that the probes would co-localize if the contigs were genuine. Digital images were taken using UV optics on an ORCA-ER (Hammanmatsu) digital camera mounted on an Olympus BX51 microscope, and MetaMorph (Universal Imaging Corp.) imaging software.
Two BAC contigs were used to test the reliability of the FPC-based Hessian fly contigs. These contigs were constructed by chromosome walking from two markers flanking the Hessian fly Avirulence gene, vH13. The walk from marker 124 was previously described . The walk from marker 134 is described above (Figure 3). Chromosome walking experiments consisted of BAC library screens using 32P-dCTP-labeled probes prepared from markers and BAC-end DNA sequence. Probe labeling and the isolation of BAC-end DNA were performed as described previously .
BAC-end Sequence Analysis
BAC clone DNA was prepared and sequenced in an automated process using a 384-well format. BAC clones were grown in 375 μl of sterile TB supplemented with chloramphenicol (12.5 μg/ml) for 20 hours at 30°C. Alkaline lysis was used to isolate BAC DNA. Sequencing was performed in 7.5 μl volumes containing 0.5 μl of Big Dye Terminator v3.1 (Applied Biosystems), 1.8 μl of Big Dye Terminator v1.1/v3.1 5× buffer (Applied Biosystems), 3 pmoles of primer, and 100 to 250 ng of BAC DNA. Oligonucleotides T7-ZL (TAATACGACTCACTATAGGG) and BES-HR (CACTCATTAGGCACCCCA) were used to prime separate reactions from opposite ends of each BAC template. Those reactions were performed in a GeneAmp PCR System 9700 (Applied Biosystems). Using primer T7-ZL, sequencing reactions underwent 120 cycles of 96°C for 10 s, 45°C for 5 s, and 60°C for 8 min. Using primer BES-HR, sequencing reactions underwent 120 cycles of 96°C for 10 s, 45°C for 5 s, and 52°C for 8 min. The sequencing products were cleaned using ethanol precipitation and then resuspended in 15 μl of ddH2O. Sequences were then determined using a 3730XL Genetic Analyzer (Applied Biosystems).
Analysis of repetitive sequence in BES was performed using a custom repeat database combined with RepBase database version 20061006 . The custom repeat database was constructed using RECON - a de novo approach to identify repetitive sequences. PERL scripts were used to select repetitive sequences greater than 40 bp in length and present in 5 or more copies, which were then annotated using BLASTX  at e = 10-5 to the NCBI non-redundant database. The annotated repeat database was used as a custom repeat library for RepeatMasker . RepeatMasker version 3.1.9 was used at a default mode with WU-BLAST as the search engine (blastp version 2.0 MP-Washington University) to mask the repetitive sequences in the BESs. Another round of RepeatMasker using the RepBase update 20061006 (RepBase database version 20061006) was also run on the same set of sequences. Results from both runs were combined after manually removing the overlaps. The masked sequences were then categorized into different classes of repetitive sequences.
To identify potential genic sequences and to look for differences between the repeat-masked and unmasked BESs, we used both sets of sequences for GO analysis. The two sets of sequences were used as queries to the NCBI non-redundant database using BLASTX (e = 10-5). The BLAST output in the XML format was imported into BLAST2GO (B2G) for GO analysis and functional annotation of gene or protein sequences . The resulting annotations were converted into 'GO-Slim' format and retrieved for the three GO categories (biological process, molecular function and cell component) with an alpha score of at least 0.6 and an ontology depth level of 3.
The authors would like to thank Sue Cambron (USDA-ARS) for material contributions and Alison Witt, Phillip San Miguel, and Rick Westerman, of the Purdue Genomics Center, for technical and bioinformatics assistance. The authors would also like to thank William M Nelson, Jeff Tomkins, Catherine Hill, Scott Jackson, Shannon Schlueter, Jessica Schlueter, and Brian Abernathy for valuable contributions and assistance. This work was supported by the USDA Cooperative State Research Education and Extension Service National Research Initiative Grant #2004-03099 and the Indiana 21st Century Research and Technology Fund Grant 042700-0207.
- Harris MO, Stuart JJ, Mohan M, Nair S, Lamb RJ, Rohfritsch O: Grasses and gall midges: Plant defense and insect adaptation. Annu Rev Entomol. 2003, 48: 549-577. 10.1146/annurev.ento.48.091801.112559.View ArticlePubMedGoogle Scholar
- Hatchett JH, Gallun RL: Genetics of the ability of the Hessian fly, Mayetiola destructor, to survive on wheats having different genes for resistance. Ann Entomol Soc Am. 1970, 63 (5): 1400-1407.View ArticleGoogle Scholar
- Bantock C: Cytology: Chromosome elimination in Cecidomyiidae. Nature. 1961, 190: 466-467. 10.1038/190466a0.View ArticleGoogle Scholar
- Bantock CR: Experiments on chromosome elimination in the gall midge, Mayetiola destructor. J Embryol Exp Morph. 1970, 24 (2): 257-286.PubMedGoogle Scholar
- White MJD: Animal Cytology and Evolution. Cambridge university press. 1973, 3: 516-546.Google Scholar
- Stuart JJ, Hatchett JH: Morphogenesis and cytology of the salivary gland of the Hessian fly, Mayetiola destructor (Diptera: Cecidomyiidae). Ann Entomol Soc Am. 1987, 80 (4): 475-482.View ArticleGoogle Scholar
- Johnston JS, Ross LD, Bean L, Hughes DP, Kathirithamby J: Tiny genomes and endoreduplication in Strepsiptera. Insect Mol Biol. 2004, 13 (6): 581-585. 10.1111/j.0962-1075.2004.00514.x.View ArticlePubMedGoogle Scholar
- Behura SK, Valicente FH, SD Rider J, Chen MS, Jackson S, Stuart JJ: A physically anchored genetic map and linkage to avirulence reveal recombination suppression over the proximal region of Hessian fly chromosome A2. Genetics. 2004, 167: 343-355. 10.1534/genetics.167.1.343.PubMed CentralView ArticlePubMedGoogle Scholar
- Lobo NF, Behura SK, Aggarwal R, Chen M-S, Hill CA, Collins FH, Stuart JJ: Genomic analysis of a 1 Mb region near the telomere of Hessian fly chromosome X2 and avirulence gene vH13. BMC Genomics. 2006, 7: 7-10.1186/1471-2164-7-7.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen MS, Fellers JP, Zhu YC, Stuart JJ, Hulbert S, El-Bouhssini M, Liu X: A super-family of genes coding for secreted salivary gland proteins from the Hessian fly, Mayetiola destructor. J Insect Sci. 2006, 6: 12-10.1673/2006.06.12.1.PubMed CentralView ArticleGoogle Scholar
- Chen M-S, Zhao H-X, Zhu YC, Scheffler B, Liu X, Liu X, Hulbert S, Stuart JJ: Analysis of transcripts and proteins expressed in the salivary glands of Hessian fly (Mayetiola destructor) larvae. J Insect Phys. 2008, 54: 1-16. 10.1016/j.jinsphys.2007.07.007.View ArticleGoogle Scholar
- Stuart JJ, Hatchett JH: Cytogenetics of the Hessian fly, Mayetiola destructor (Say): II. Inheritance and behavior of somatic and germ-line-limited chromosomes. J Hered. 1988, 79: 190-199.Google Scholar
- Stuart JJ, Hatchett JH: Genetics of sex determination in the Hessian fly, Mayetiola destructor. J Hered. 1991, 82 (1): 43-52.View ArticleGoogle Scholar
- Luo M-C, Thomas C, You FM, Hsiao J, Ouyang S, Buell CR, Malandro M, McGuire PE, Anderson OD, Dvorak J: High-throughput fingerprinting of bacterial artifidcial chromosomes using the SNaPshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics. 2003, 82: 378-389. 10.1016/S0888-7543(03)00128-9.View ArticlePubMedGoogle Scholar
- Nelson WM, Bharti AK, Butler E, Wei F, Fuks G, Kim H, Wing RA, Messing J, Soderlund C: Whole-genome validation of high-information-content fingerprinting. Plant Physiol. 2005, 139: 27-38. 10.1104/pp.105.061978.PubMed CentralView ArticlePubMedGoogle Scholar
- Katagiri T, Kidd C, Tomasino E, Davis JT, Wishon C, Stern JE, Carleton KL, Howe AE, Kocher TD: A BAC-based physical map of the Nile tilapia genome. BMC Genomics. 2005, 6: 89-10.1186/1471-2164-6-89.PubMed CentralView ArticlePubMedGoogle Scholar
- Quiniou SM-A, Waldbieser GC, Duke MV: A first generation BAC-based physical map of the channel catfish genome. BMC Genomics. 2007, 8: 40-10.1186/1471-2164-8-40.PubMed CentralView ArticlePubMedGoogle Scholar
- Mun J-H, S-J K, Yang T-J, Kim H-S, Jin M, Kim JA, Lim M-H, Lee SI, Kim H-I, Kim H, et al: The first generation of a BAC-based physical map of Brassica rapa. BMC Genomics. 2008, 9: 280-10.1186/1471-2164-9-280.PubMed CentralView ArticlePubMedGoogle Scholar
- Soderlund C, Longden I, Mott R: FPC: a system of building contigs from restriction fingerprinted clones. Comput Appl Biosci. 1997, 13: 523-535.PubMedGoogle Scholar
- Soderlund C, Humphray S, Dunham A, French L: Contigs built with fingerprints, markers, and FPC v4.7. Genome Res. 2000, 10: 1772-1787. 10.1101/gr.GR-1375R.PubMed CentralView ArticlePubMedGoogle Scholar
- WebFPC: Hfly. [http://genome.purdue.edu/WebAGCoL/Hfly/WebFPC/]
- Tao Q, Chang Y-L, Wang J, Chen H, Islam-Faridi MN, Scheuring C, Wang B, Stelly DM, Zhang H-B: Bacterial artificial chromosome-based physical map of the rice genome constructed by restriction fingerprint analysis. Genetics. 2001, 158: 1711-1724.PubMed CentralPubMedGoogle Scholar
- Zhang H-B, Wu C: BAC as tools for genome sequencing. Plant Physiol Biochem. 2001, 39: 195-209. 10.1016/S0981-9428(00)01236-5.View ArticleGoogle Scholar
- Meyers BC, Scalabrin S, Morgante M: Mapping and sequencing complex genomes: Let's get physical. Nature Rev Genet. 2004, 5: 578-589. 10.1038/nrg1404.View ArticlePubMedGoogle Scholar
- Hoskins RA, Nelson CR, Berman BP, Laverty TR, George RA, Ciesiolka L, Naeemuddin M, Arenson AD, Durbin J, David RG, et al: A BAC-based physical map of the major autosomes of Drosophila melanogaster. Science. 2000, 287: 2271-2274. 10.1126/science.287.5461.2271.View ArticlePubMedGoogle Scholar
- González J, Nefedov M, Bosdet I, Casals F, Calvete O, Delprat A, Shin H, Chiu R, Mathewson C, Wye N, et al: A BAC-based physical map of the Drosophila buzzatii genome. Genome Res. 2005, 15: 885-892. 10.1101/gr.3263105.PubMed CentralView ArticlePubMedGoogle Scholar
- Stuart JJ, Hatchett JH: Cytogenetics of the Hessian fly: I. Mitotic karyotype analysis and polytene chromosome correlations. J Hered. 1988, 79: 184-189.PubMedGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of eukaryotic repetitive elements. Cytogentic and Genome Research. 2005, 110: 462-467. 10.1159/000084979.View ArticleGoogle Scholar
- Chen MS, Fellers JP, Stuart JJ, Reese JC, Liu XM: A group of unrelated cDNAs encoding secreted proteins from Hessian fly [Mayetiola destructor (Say)] salivary glands. Insect Mol Biol. 2004, 13: 101-108. 10.1111/j.1365-2583.2004.00465.x.View ArticlePubMedGoogle Scholar
- Shukle RH, Russell VW: Mariner transposase-like sequences from the Hessian fly, Mayetiola destructor. J of Hered. 1995, 86: 364-368.Google Scholar
- Ellegren H: Microsatellites: simple sequences with complex evolution. Nature Rev Genet. 2004, 5: 435-445. 10.1038/nrg1348.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Molec Biol. 1990, 215: 403-410.View ArticlePubMedGoogle Scholar
- Liu XM, Fellers JP, Wilde GE, Stuart JJ, Chen MS: Characterization of two genes expressed in the salivary glands of the Hessian fly [Mayetiola destructor (Say). J Insect Biochem Mol Biol. 2004, 34 (3): 229-237. 10.1016/j.ibmb.2003.10.008.View ArticleGoogle Scholar
- Clemson University Genomics Institute Homepage. [https://www.genome.clemson.edu]
- Purdue Genomics Center Homepage. [http://www.genomics.purdue.edu]
- Shukle RH, Stuart JJ: Physical mapping of DNA sequences in the Hessian fly, Mayetiola destructor. J Hered. 1995, 86: 1-5.Google Scholar
- Bao Z, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12: 1269-1276. 10.1101/gr.88502.PubMed CentralView ArticlePubMedGoogle Scholar
- Institute for Systems Biology: Repeat Masker. [http://www.repeatmasker.org/]
- Conesa A, Götz S, García-Goméz JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.