A BAC-based physical map of the Hessian fly genome anchored to polytene chromosomes

Background The Hessian fly (Mayetiola destructor) is an important insect pest of wheat. It has tractable genetics, polytene chromosomes, and a small genome (158 Mb). Investigation of the Hessian fly presents excellent opportunities to study plant-insect interactions and the molecular mechanisms underlying genome imprinting and chromosome elimination. A physical map is needed to improve the ability to perform both positional cloning and comparative genomic analyses with the fully sequenced genomes of other dipteran species. Results An FPC-based genome wide physical map of the Hessian fly was constructed and anchored to the insect's polytene chromosomes. Bacterial artificial chromosome (BAC) clones corresponding to 12-fold coverage of the Hessian fly genome were fingerprinted, using high information content fingerprinting (HIFC) methodology, and end-sequenced. Fluorescence in situ hybridization (FISH) co-localized two BAC clones from each of the 196 longest contigs on the polytene chromosomes. An additional 70 contigs were positioned using a single FISH probe. The 266 FISH mapped contigs were evenly distributed and covered 60% of the genome (95,668 kb). The ends of the fingerprinted BACs were then sequenced to develop the capacity to create sequenced tagged site (STS) markers on the BACs in the map. Only 3.64% of the BAC-end sequence was composed of transposable elements, helicases, ribosomal repeats, simple sequence repeats, and sequences of low complexity. A relatively large fraction (14.27%) of the BES was comprised of multi-copy gene sequences. Nearly 1% of the end sequence was composed of simple sequence repeats (SSRs). Conclusion This physical map provides the foundation for high-resolution genetic mapping, map-based cloning, and assembly of complete genome sequencing data. The results indicate that restriction fragment length heterogeneity in BAC libraries used to construct physical maps lower the length and the depth of the contigs, but is not an absolute barrier to the successful application of the technology. This map will serve as a genomic resource for accelerating gene discovery, genome sequencing, and the assembly of BAC sequences. The Hessian fly BAC-clone assembly, and the names and positions of the BAC clones used in the FISH experiments are publically available at .


Background
The Hessian fly (Mayetiola destructor) is an important pest of wheat (Triticum spp.) and a member of one of the largest and most economically important families of insects, the gall midges (Cecidomyiidae). Its genetic tractability and short generation time (~28 days) make it especially attractive as an experimental model for investigating insect-plant interactions [1]. This capacity was first demonstrated when it was the first insect shown to have a gene-for-gene interaction with its host plant [2]. Previously, it was used to study chromosome elimination and the function of the germ-line-limited "E" chromosomes that characterize all gall midge species [3][4][5]. Later investigations demonstrated that it has other biological attributes that are compatible with genomic analyses, including polytene chromosomes in the larval salivary glands [6] and a small genome (158 Mb) [7]. The development of a physically anchored genetic map [8], a syntenic analysis of a BAC-based contig [9], and transcriptomic analyses of the first instar salivary glands [10,11] also demonstrated the potential of this insect for comparative genomics with other dipteran species.
Improved genomic maps are essential to further develop this insect as an experimental organism. However, certain biological characteristics make fine-scale genetic mapping in the Hessian fly problematic. The first problem is the relatively low number of offspring produced by single females, typically ranging from 50 to 200, which limits the resolution of a genetic map. The second problem is the small size (2 to 3 mm in length) of the insect. This severely limits the yield of the genomic DNA extracted from individuals, and the number of molecular markers that can be genetically mapped. A third problem, the insect's unusual chromosome cycle and mechanism of sex determination, makes the construction of inbred strains difficult. Because this phenomenon is both central to the genetics of the insect and worthy of genomic investigation, it is briefly described below.
The germ line of the Hessian fly contains both maternally and paternally derived copies of each of two autosomes (A1 and A2) and two X chromosomes (X1 and X2) [12]. It also contains a variable number (~32) of maternally inherited germ-line-limited "E" chromosomes (A1A2X1X2/A1A2X1X2; E). The E chromosomes are eliminated from all future somatic cells during the fifth cleavage division of embryogenesis [4]. This post-zygotic division also determines sex. Embryos that eliminate the paternally inherited X chromosomes, together with the E chromosomes, possess a male determining somatic karyotype (A1A2X1X2/A1A2OO) and develop as males. Those that retain both sets of X chromosomes possess a female determining somatic karyotype (A1A2X1X2/A1A2X1X2) and develop as females. Maternal genotype determines whether the paternally derived X chromosomes are eliminated or retained in the soma [13]. As a consequence, an unusual mating structure is established that is resistant to inbreeding; females produce either all-female or all-male families. Chromosome elimination also occurs during spermatogenesis. The result is that all sperm carry only the maternally derived autosomes and X chromosomes (A1A2X1X2). Genetic recombination between the homologous autosomes and X chromosomes occurs only during meiosis of oogenesis. All ova carry a single copy of each autosome and each X chromosome and a full complement (~32) of E chromosomes (A1A2X1X2; E). There is no heterogametic sex since all sperm and all ova are genetically equivalent and sex is determined after fertilization.
In contrast to the problems that complicate genetic analyses, the Hessian fly polytene chromosomes make physical mapping relatively simple. Although they lack the banding resolution of Drosophila polytene chromosomes, they still provide a relatively high mapping resolution when bacterial artificial chromosomes (BACs) are used as probes for fluorescence in situ hybridization (FISH) [8]. This facility suggested that the development of a dense physical map of the Hessian fly's polytene chromosomes should precede high-resolution genetic mapping. We therefore applied the high-information content fingerprinting (HICF) method developed by Luo et al. [14] to construct a BAC contig map using three Hessian fly BAC libraries (Table 1). The result was a first generation FPCbased physical map of the Hessian fly genome that is anchored to the polytene chromosomes. It is based on restriction fingerprints of 13,614 BAC clones. The largest 266 contigs were positioned on the chromosomes by fluorescence in situ hybridization (FISH). These contigs contain 4,563 BAC clones, and cover ~60% of the genome. The BAC end sequences (BES) of the clones used in this analysis were determined to provide the capacity to develop sequenced tagged site (STS) markers evenly distributed over the genome. BAC-end sequence data were also analyzed to improve our understanding of Hessian fly genome organization.

BAC fingerprinting
We subjected 13,614 BACs to high-information content fingerprinting ( Table 2). The DNA fingerprint profile of each BAC clone was then converted into an FPC compatible format using the FP Miner 1.1 software (BioinforSoft LLC, OR). The standard deviation associated with the size of two vector bands present in the fingerprints of each BAC (157.4 ± 0.01 bp and 369.5 ± 0.02 bp) indicated that fragment size reproducibility was high [15]. The FP Miner software was used to remove background signals and vector bands, and to assign a quality score to each BAC DNA fingerprint. We examined all of the fingerprints and removed 1662 that failed to meet each of the following quality standards: 1) 30 to 250 fragments, 2) = 10 fragments of each of the four colors, 3) a size standard match quality score that was >0.8, and 4) a fingerprint editing quality score that was >10. The fingerprints of 11,952 BACs (88% of the total) remained available for FPC assembly. The percentage of successfully fingerprinted clones compared favorably with the HICF maps developed for maize (Zea mays; 87%) [15], Nile tilapia (Oreochromis niloticus; 87%) [16], channel catfish (Ictalurus punctatus; 91.5%) [17], and Brassica rapa (94.2%) [18]. DNA fragment sizes ranging from 50 to 500 bp were imported into the FPC v8.5.1 [19,20] to build contigs. Among the successfully fingerprinted clones, 5,766 were derived from the Hf library, 6,108 were derived from the CL library, and 78 were derived from the Mde library. BAC clones from the Hf library had an average insert size of 150 kb and the CL library had an average insert size of 130 kb. The average number of scored bands per clone in the Hf library (90.6) and the CL library (87.6) was slightly less than the average observed during the construction of the maize map (107) [15], but slightly greater than the averages of the two BAC libraries used to construct the Nile tilapia map (54 and 70) [16]. Combined, the BACs in the Hf and CL libraries provided an estimated 10.5-fold cov-erage of the Hessian fly genome (Table 1). This was greater coverage than that used to construct either the Nile tilapia (5.6-fold) [16] or the catfish (6.8-fold) [17] maps, but considerably less than the coverage used to construct the maps of B. rapa (15.2-fold) [18] and maize (22-fold) [15].

Contig assembly
Contigs were assembled from the fingerprint data using the computer program FPC version 8.5.1 [15,19,20]. FPC parameters were adjusted for the HICF technique as described by Lou et al. [14] and Nelson [15]. In each assembly, we used a tolerance (5) that restricted two bands with lengths greater than 0.5 bp from being considered the same band. In the initial assembly, we used a cutoff (the threshold for the probability score that matching bands are a coincidence) of 1e-35. Using these settings, FPC built 1258 contigs containing from 2 to 126 BAC clones per contig. The DQer function was then used to identify contigs with ≥ 10% questionable clones (Qs). These contigs were then gradually split using three more stringent cutoffs (1e-38, 1e-41, and 1e-44). This produced 1477 contigs containing from 2 to 67 BAC clones per contig. The assembly was then end merged at a cutoff of 1e-29. This produced the first generation HICF Hessian fly map, which consisted of 1377 contigs, containing 7,716 BACs, and 4,236 (35%) singletons. The map had an average of 5.6 BACs per contig and the largest contig contained 73 BACs (Table 2). FPC analysis estimated that the clones used to produce the map had 1,061,478 unique bands, averaging 88.8 bands per clone. We used the known lengths of 23 BAC clones and the total number of labeled fragments for each of these clones to estimate average band size (1182 bp). Using this value, the assembled contigs had an estimated length of 283,555 kb, providing approximately 1.8-fold coverage of total genome length. The average contig length was 206 kb and the longest contig covered 1166 kb of the genome. FPC identified a total of 441 questionable clones (Qs) in the assembly. There were 88 contigs with >10% Qs. However, the average number of Qs per contig was only 0.32. Furthermore, 1212 contigs (88%) had no Qs, no contig had  In addition, the percentage of Qs in the Hessian fly assembly (0.32) was lower than that observed in maize (11.0) [15], B. rapa (15.0), catfish (7.3), and Nile tilapia (9.6). Thus, it appeared that the Hessian fly contigs provided reasonable coverage with few questionable clones. This suggested that an abundance of RFLPs is not an absolute barrier to the construction of a HICF-based physical map.
Because the BAC clones were largely derived from two different libraries (Table 1), we performed an FPC assembly of each library separately using the same parameters that were used to assemble both libraries combined. The BACs in the CL library assembled into a greater number of contigs (888 vs. 795) with a greater average number of BACs per contig (4.74 vs. 3.78) than the BACs in the Hf library. In addition, the percentage of singletons in the CL assembly (31%) was fewer than the singletons in the Hf assembly (48%), but the number of contigs with >10% Qs was nearly the same (CL = 33 and Hf = 30). Thus, it appeared that if RFLPs were interfering with contig assembly, they were more abundant in the Hf library. Interestingly, the sum of the total number of contigs assembled using the CL (888) and Hf BACs (793) separately was only 22% greater than the total number of BACs in the combined assembly (1377), and the sum of the coverage provided by the CL (159,451 kb) and Hf BACs (149,099 kb) was only 9% greater than that of the combined assembly (283,555 kb). Thus, it appeared as if the contigs of one library only occasionally overlapped with the contigs of the other library and this lowered contig size more than total coverage when the libraries were combined. This possibility was also evident in the contigs of the FPC map where CL BACs tended to stack with other CL BACs and Hf BACs tended to stack with other Hf BACs [21]. It is also consistent with the suggestion that using libraries prepared with different restriction enzymes reduces the number of gaps in the assembly [22][23][24].

Contig validation
Two different approaches were used to validate contigs in the first generation Hessian fly HICF map. The first examined the FPC assembly of two contigs that had been constructed by chromosome walking (Figure 1). This was the only region of the Hessian fly genome in which chromosome walking had been previously performed. The construction and composition of one walk (walk-124) was described previously [9]. It consisted of 64 BAC clones covering a 7.5-cM genetic distance that was marked with four STS markers and a physical distance of nearly 1 Mb. The other chromosome walk began from STS marker 134 (walk-134) and was composed of 60 BAC clones. Together, the contigs constructed by walk-124 and walk-134 contained 73 Mde BAC clones that were analyzed by FPC. The 134-walk was deeper and contained a greater proportion of Hf clones. Neither chromosome walk contained clones from the CL library because that library had not yet been prepared when the walking experiments were performed. Only 5 clones in the 124-walk were discovered in the contigs of the FPC assembly ( Figure 1). Those clones were present in 3 small contigs (contigs #837, #938, and #1176) containing a combined total of 7 clones. FPC assembly of the 134-walk was more successful. FPC assembled 52 (87%) of the clones in the 134walk into six contigs ( Figure 1). The majority of those clones were assembled into two contigs: one (contig #12) containing 24 134-walk clones and the other (contig #56) containing 19 134-walk clones. An additional 28 CL clones were identified in these contigs. Reducing the stringency of FPC assembly (cutoff of 1e-15) merged all six contigs associated with walk-134 into a single contig. However, the assembly of the 124-walk was not improved. This analysis suggested that, as expected, regions of the genome with relatively deep coverage (walk-134) were well represented in the map whereas regions with less depth (walk-124) were poorly represented.
The second approach of assessing contig validity was to test the continuity of the 196 longest FPC assembled contigs using FISH. In these experiments, a BAC clone at one extreme of each FPC assembled contig was labeled with biotin and a BAC clone at the opposite extreme was labeled with digoxigenin. Separate FISH experiments were BAC contigs established by chromosome walking performed for each contig, in which both BAC clones were hybridized to the same polytene chromosome preparations ( Figure 2). The FPC assembly was judged "valid" when both probes from the same contig co-localized on the Hessian fly polytene chromosomes ( Table 3). The contigs examined contained from 9 to 73 BAC clones per contig and ranged in length from 172 to 1157 kb. Using the banding patterns of the four polytene Hessian fly chromosomes, we divided the Hessian fly genome into 26 chromosomal segments (Figure 2), and determined which segments contained each valid contig (Table 4). Among the 196 contigs tested, 169 contigs (86%) were scored as valid (Table 3). In 63 of these valid contigs (32% of the total) it was possible to discern which of the two BAC clones was the most proximal ( Figure 2). It was, therefore, possible to suggest the orientation of those 63 contigs on the chromosomes. There were 27 contigs (14%) that were determined to be "invalid." These had a greater number of Qs than the valid contigs (Table 3). Moreover, they were easily separated into two groups: A "repetitive" group of 11 contigs, and a "non-repetitive" group of 16 contigs. The contigs in the repetitive group had at least one probe that hybridized to multiple locations ( Figure 3). Seven of these had a probe that hybridized to pericentromeric heterochromatin. The nonrepetitive group had probes that hybridized to different, but unique, genomic locations. This group of contigs had a greater number of Qs per contig than both the valid group of contigs and the invalid-repetitive group of contigs (Table 3). We reexamined all invalid contigs by performing FISH with additional BAC clones selected from the same contigs. In each experiment, these clones hybridized in the same position as one of the previous BAC clones ( Figure 3). These data were then used to assign the unreliable contigs to a chromosomal segment (Table 4).
Using cytogenetic data to validate a physical map has been performed previously in Drosophila species [25,26]. However, to our knowledge, this is the first time it has been used to anchor the physical map of an insect of agricultural importance. Our method was analogous to performing linkage analysis with the two most terminal BACs in the contigs. Using that approach, 18 of the 19 longest contigs (95%) in the catfish HICF map were validated [17]. Therefore, although the two approaches are not entirely comparable, our results suggest that the Hessian fly map may have a few more errors. Nevertheless, our results also clearly indicate that the Hessian fly assembly was largely valid and that most errors are associated with the contigs containing the greatest number of Qs. Moreover, they demonstrated that the Hessian fly map can be easily improved using FISH as a manual contig-editing tool.

Genome coverage
To further evaluate the coverage of the assembled contigs, FISH was used to position 70 additional contigs. In these experiments, only one BAC was used as probe, but we avoided invalid contigs by selecting those that had an average of only 0.1 ± 0.1 Qs per contig. These experiments brought the total number of BACs used as probes to 489 and the total number of contigs that had been FISHmapped to 266 (Table 4). Of these contigs, 257 were assigned to single chromosome segments. Nine contigs hybridized to the pericentromeric heterochromatin of all four chromosomes, and could not be assigned to a single chromosome segment. The relative length of each chromosome segment was used as a measure of genome content so that the amount of coverage provided to each segment could be compared (Table 4). Except for the nucleolar organizing region (NOR), segment E, every chromosome segment contained at least one contig. The Contigs in which the probes co-localized were assessed as valid. Contigs in which the probes failed to co-localize were assessed as invalid. Invalid contigs were further divided into contigs that either had one or more repetitive BAC probes (Invalid-R) or no repetitive BAC probes (Invalid-NR). *Means significantly different from the Valid mean (Student's t; p < 0.10). **Means significantly different from the Valid mean (Student's t; p < 0.05). † Means significantly different from the Invalid-R mean (Student's t; p < 0.10). † † Means significantly different from the Invalid-R mean (Student's t; p < 0.05).
subtelomeric segments A, P, and Q, as well as segments B, F, and H, had a slightly greater proportion of contigs than their relative lengths. In the remaining segments, the proportion of contigs and BACs approximated relative length. Thus, the contigs appeared to be evenly distributed. Disregarding the FPC assembly errors previously discovered, and ignoring the centromeric contigs, 95,668 kb of genome length was covered by the FISH mapped contigs. We observed no evidence suggesting that any contig was exclusively associated with the E chromosomes. This coverage therefore represents approximately 60% of the total content of the Hessian fly autosomes and X chromosomes. Moreover, the heterochromatin of these chromosomes is restricted to the centromeres [27]. These contigs therefore clearly appeared to be distributed in the gene rich regions of these chromosomes.
The FPC assembly was developed into a publically available WebFPC archive that shows the BAC clones that were used as probes to physically anchor each contig [21].
Ordering the contigs that are currently grouped into chromosome segments will be one of the most immediate improvements we make to the map. The segments on the autosomes and the long arm of chromosome X1 are expected to provide sufficient resolution to order the contigs using FISH. However, morphologies of the short arm of chromosome X1 and all of chromosome X2 are problematic. We therefore expect that genetic mapping will be necessary to order the contigs in the corresponding segments.

BAC-end sequence
In order to fully utilize the physical map as a resource in genetic investigations, comparative analyses, and whole genome sequence assembly, we end sequenced the 13  Hessian fly polytene chromosomes and BAC clone co-locali-zations Figure 2 Hessian fly polytene chromosomes and BAC clone co-localizations. Chromosome segments (A-Z) are indicated on geimsa stained chromosomes (a, f, k, and p). White bars (10 μm) indicate the relative lengths of corresponding DAPI stained chromosomes (b, g, l, and q). Four examples of co-localizing BAC clones are shown in color overlay images (e, j, o, and t). Contig #320 BAC clones CL4h4 (c) and CL31e21 (d) hybridized to chromosome A1 segment G (e). Contig #199 BAC clones Hf30g15 (h) and Hf4e2 (i) hybridized to chromosome A2 segment L (j). Note that Hf30g15 hybridization (green) was more distal than Hf4e2 hybridization (red), allowing contig orientation. Contig #320 BAC clones CL25e22 (m) and Hf10j14 (n) hybridized to chromosome X1 segment U (o). Contig #320 BAC clones CL29m24 (r) and Hf10j14 (s) hybridized to chromosome X2 segment X (t).
Resolving "invalid" contigs using FISH Figure 3 Resolving "invalid" contigs using FISH. (a) The hybridization of BACs CL24B4 (green arrow, A2 segment L) and Hf15M12 (red arrows, pericentromeric heterochromatin) defined contig #36 as invalid and repetitive. (b) The hybridization of a third contig #36 BAC (CL24N4, yellow arrow) resolved the position of contig #36 to A2 segment L. (c) The hybridization of BACs CL17J4 (green arrow, A2 segment L) and Hf7B18 (red arrows, A1 segment J) defined contig #234 as invalid but non-repetitive. (d) Hybridization of a third contig #234 BAC clone (Hf13K2, yellow arrow) with BAC clone CL17J4 resolved the position of contig #234 to A2 segment L.
A combination of a custom repeat library and RepBase database [28] was used to identify repetitive DNA sequence. Because the BACs were derived from heterogeneous strains, this analysis required parameters that limit the possibility of confusing alleles with gene families. Therefore, only repetitive sequences that were >40 bp and present in = 5 copies were selected. These criteria made identification of gene families more likely, given the low coverage of BES. Using these settings, we found that 17.91% of the BES contained repetitive elements (Table  5). There was, however, a 1.48% overestimate of repetitive content due to the difficulty of assigning boundaries to different repeat classes. Transposable Elements (TEs) composed only 4.53% of the repetitive DNA and 0.81% of total BES (Table 5). Combined, Helicases, ribosomal DNA (including tRNA, rRNA and 60S and 40S ribosomal proteins), simple sequence repeats (SSRs), and sequence of low complexity composed 15.79% of the repetitive fraction. The largest fraction of repetitive DNA (79.68%) consisted of multigene families and unclassified TEs. This observation was consistent with previous investigations of the Hessian fly salivary gland transcriptome [10,11,29] that discovered over 200 gene families of small putatively secreted proteins with no sequence similarities to other genes in GenBank. In the BES, the most prominent multigene families identified appeared to be tandem copies of the calponin homology domain (an actin binding domain), thrombospondin type1 repeats (which bind and activate TGF-beta), U1 small nuclear ribonucleoprotein C proteins (involved in mRNA splicing), and SMChinge family genes (involved in the structural maintenance of chromosomes).
Within the Class I TEs, LTR retrotransposons were more prominent than the non-LTR retrotransposons ( Table 5). The most prominent Class II TE was the Mos1 mariner-like DNA transposable element, which accounted for 1.7% of the repetitive fraction. The remaining Class II TEs composed 0.24% of the repetitive sequence, which had sequence similarities to the Tigger2, Blackjack, Looper, MER45R, and Zaphod TEs. Previous investigation found an abundance of mariner-like elements in Hessian fly pericentromeric heterochromatin [30]. Therefore, it was interesting to note that the combined proportions of TEs and low-complexity sequence in the BES (2.6%) was virtually the same as the proportion of the BACs that hybridized to pericentromeric heterochromatin in the FISH experiments that anchored the physical map to the chromosomes (2.4%). These observations were consistent with a low abundance of TEs in the euchromatin of the small Hessian fly genome, and they suggested that the BACs that hybridized to heterochromatin contained one or more TEs. This in turn suggests that seven of the eleven contigs that were initially classified as "invalid-repetitive" (Table 3) would have been classified as "valid" if not for the presence of one or more TEs in the BACs used as probe.
SSRs, which have a wide range of applications [31], formed 0.81% of the total BES and 4.51% of the total repetitive fraction. The di-and tri-nucleotide repeats were 29.1% and 33.4% of the SSRs, respectively, and were the most frequent types of SSRs found in this analysis. The most abundant di-and tri-nucleotide motifs were (TC/ GA)n and (TAA/TTA)n, which composed 49.9% of the total di-nucleotides and 43.7% of the total tri-nucleotides, respectively. The value of 'n' ranged from 10 to 56 in the (TC/GA)n motif and from 7 to 76 in the (TAA/TTA)n motif.
To identify putative genic sequences in the BES, both the repeatmasked and unmasked BESs were searched against the NCBI non-redundant database using BLASTX [32]. Interestingly, fewer annotated sequences were observed in the repeatmasked set for almost all categories. This observation indicated that a proportion of the predicted gene products were derived from the repetitive sequences, which when masked, resulted in lower numbers of annotations. This was also consistent with the earlier observation that multigene families composed a relatively large proportion of the repetitive BES. To examine this further, we unmasked the SSRs so that they were excluded from the repeatmasked set [see Additional file 2]. Again, we observed fewer annotations with that data set than with unmasked BES. Therefore, multigene families and transcripts from transposable elements probably accounted for the observed differences between the two data sets.

Conclusion
Hessian fly's small genome and the polytene chromosomes provided an excellent opportunity to determine if BAC libraries constructed from heterogeneous strains would unduly limit the mapping abilities of the HICF and FPC technologies. Although less heterogeneity would have been preferred, the results have clearly improved the Hessian fly as an experimental model. A first generation map now exists that includes 266 FPC contigs containing 4,563 BAC clones and their associated BES positioned to 26 segments of the Hessian fly genome. The BES provided new knowledge regarding Hessian fly genome organization, and the mapped BES constitutes an abundant resource for the development of physically mapped DNA markers. We expect that these improvements will be made as genetic investigations focus on the discovery of avirulence genes, the genes that are involved in plant-gall formation, and the mechanisms of chromosome elimination. In the near future, we also expect the map to facilitate the assembly of whole-genome shotgun sequences and the assignment of sequenced scaffolds to chromosome segments.

BAC libraries
Three BAC libraries were utilized in this investigation. Each library was constructed using a separate heterogeneous source of genomic DNA. The Hf library, previously described by Liu et al. 2004 [33], was prepared from DNA isolated from a Kansas "Great Plains" (GP) population. To supplement the Hf library, we used clones derived from the MD_Bb library developed by the Clemson University Genomics Institute [34]. Herein referred to as the "CL" library, these BACs were prepared from DNA isolated from an Indiana "L" Hessian fly population. From a third library (Mde), 78 BACs were fingerprinted. Seventy-three of these clones were present in two contigs that were previously developed by chromosome walking [9]. These clones therefore served as an internal control while developing contigs from the fingerprint data using FPC software [19]. The Mde library was constructed with DNA isolated from the Georgia derived "vH13" Hessian fly population as described previously [8]. Plates and filters of all of these clones are available on a cost-recovery basis by request from the Purdue Genomics Center [35].
Detection was performed using Alexa Fluor 488-conjugated anti-biotin and rhodamine-conjugated anti-digoxigenin (Molecular Probes-Invitrogen). When testing for the co-localization of BACs in the same contig, both labeled BACs (one labeled with biotin and the other labeled with digoxigenin) were hybridized simultaneously to the same polytene chromosome preparations with the expectation that the probes would co-localize if the contigs were genuine. Digital images were taken using UV optics on an ORCA-ER (Hammanmatsu) digital camera mounted on an Olympus BX51 microscope, and Met-aMorph (Universal Imaging Corp.) imaging software.

Chromosome walking
Two BAC contigs were used to test the reliability of the FPC-based Hessian fly contigs. These contigs were constructed by chromosome walking from two markers flanking the Hessian fly Avirulence gene, vH13. The walk from marker 124 was previously described [9]. The walk from marker 134 is described above (Figure 3). Chromosome walking experiments consisted of BAC library screens using 32 P-dCTP-labeled probes prepared from markers and BAC-end DNA sequence. Probe labeling and the isolation of BAC-end DNA were performed as described previously [9].

BAC-end Sequence Analysis
BAC clone DNA was prepared and sequenced in an automated process using a 384-well format. BAC clones were grown in 375 μl of sterile TB supplemented with chloramphenicol (12.5 μg/ml) for 20 hours at 30°C. Alkaline lysis was used to isolate BAC DNA. Sequencing was performed in 7.5 μl volumes containing 0.5 μl of Big Dye Terminator v3.1 (Applied Biosystems), 1.8 μl of Big Dye Terminator v1.1/v3.1 5× buffer (Applied Biosystems), 3 pmoles of primer, and 100 to 250 ng of BAC DNA. Oligonucleotides T7-ZL (TAATACGACTCACTATAGGG) and BES-HR (CACTCATTAGGCACCCCA) were used to prime separate reactions from opposite ends of each BAC template. Those reactions were performed in a GeneAmp PCR System 9700 (Applied Biosystems). Using primer T7-ZL, sequencing reactions underwent 120 cycles of 96°C for 10 s, 45°C for 5 s, and 60°C for 8 min. Using primer BES-HR, sequencing reactions underwent 120 cycles of 96°C for 10 s, 45°C for 5 s, and 52°C for 8 min. The sequencing products were cleaned using ethanol precipitation and then resuspended in 15 μl of ddH2O. Sequences were then determined using a 3730XL Genetic Analyzer (Applied Biosystems).
Analysis of repetitive sequence in BES was performed using a custom repeat database combined with RepBase database version 20061006 [28]. The custom repeat database was constructed using RECON [37]-a de novo approach to identify repetitive sequences. PERL scripts were used to select repetitive sequences greater than 40 bp in length and present in 5 or more copies, which were then annotated using BLASTX [32] at e = 10 -5 to the NCBI non-redundant database. The annotated repeat database was used as a custom repeat library for RepeatMasker [38]. RepeatMasker version 3.1.9 was used at a default mode with WU-BLAST as the search engine (blastp version 2.0 MP-Washington University) to mask the repetitive sequences in the BESs. Another round of RepeatMasker using the RepBase update 20061006 (RepBase database version 20061006) was also run on the same set of sequences. Results from both runs were combined after manually removing the overlaps. The masked sequences were then categorized into different classes of repetitive sequences.
To identify potential genic sequences and to look for differences between the repeat-masked and unmasked BESs, we used both sets of sequences for GO analysis. The two sets of sequences were used as queries to the NCBI nonredundant database using BLASTX (e = 10 -5 ). The BLAST output in the XML format was imported into BLAST2GO (B2G) for GO analysis and functional annotation of gene or protein sequences [39]. The resulting annotations were converted into 'GO-Slim' format and retrieved for the three GO categories (biological process, molecular func-