A first generation BAC-based physical map of the channel catfish genome

Background Channel catfish, Ictalurus punctatus, is the leading species in North American aquaculture. Genetic improvement of catfish is performed through selective breeding, and genomic tools will help improve selection efficiency. A physical map is needed to integrate the genetic map with the karyotype and to support fine mapping of phenotypic trait alleles such as Quantitative Trait Loci (QTL) and the effective positional cloning of genes. Results A genome-wide physical map of the channel catfish was constructed by High-Information-Content Fingerprinting (HICF) of 46,548 Bacterial Artificial Chromosomes (BAC) clones using the SNaPshot technique. The clones were assembled into contigs with FPC software. The resulting assembly contained 1,782 contigs and covered an estimated physical length of 0.93 Gb. The validity of the assembly was demonstrated by 1) anchoring 19 of the largest contigs to the microsatellite linkage map 2) comparing the assembly of a multi-gene family to Restriction Fragment Length Polymorphism (RFLP) patterns seen in Southern blots, and 3) contig sequencing. Conclusion This is the first physical map for channel catfish. The HICF technique allowed the project to be finished with a limited amount of human resource in a high throughput manner. This physical map will greatly facilitate the detailed study of many different genomic regions in channel catfish, and the positional cloning of genes controlling economically important production traits.


Background
Channel catfish production is now the leading aquaculture species in the U.S., with 600 millions pounds processed annually [1]. Consequently, selective breeding of catfish broodstock is ongoing in order to improve the genetic potential of the species for commercial production. To support selective breeding research, molecular tools are being developed to help researchers characterize the catfish genome, which consists of 28 pairs of autosomes and one pair of sex chromosomes, and identify genomic regions that control important production traits. However, a considerable amount of available data has not been organized within the structural framework of catfish chromosomes. Briefly, framework genetic linkage maps have been produced based on microsatellite loci [2] and Amplified Fragment Length Polymorphism (AFLP) loci [3]. Channel catfish Expressed Sequence Tags (EST) have been identified from several tissues [4,5] and clustered and annotated in the Gene Index Project [6]. A new EST project is underway through the Community Sequencing Program to sequence an additional 300,000 cDNA clones [7]. One cDNA microarray (L Hanson, personal communication) and one high density oligonucleotide array [8] have been developed for global gene expression studies.
Two large-insert catfish Bacterial Artificial Chromosome (BAC) libraries have been produced: CCBL1 contains a 7fold genome equivalent of HindIII-digested genomic DNA from a 3 rd generation gynogenetic female [9], and CHORI-212 contains a 10-fold genome equivalent of EcoR1digested genomic DNA from a diploid male [10]. More than 20,000 BAC end sequences were recently generated from CHORI-212 [11] and 37,251 BAC end sequences have been generated from the CCBL1 library (S. Quiniou, unpublished data). Thus, a physical map of the channel catfish genome is needed to integrate the genetic map with catfish chromosomes, allowing for fine mapping of phenotypic trait alleles such as Quantitative Trait Loci (QTL) and effective positional cloning of genes controlling economically important traits to improve germplasm. The integrated physical map will also be essential for comparative genomic analyses and the assessment of conserved synteny. In addition, a minimal tiling path provided by the physical map would provide the framework for a whole genome sequencing project [12].
To date, most vertebrate physical maps have been produced based on imaging of restriction fragments in agarose gels [13]. Though a proven technique, agarose fingerprinting is time prohibitive with limited personnel, even with the use of band calling software. As a result, alternative fluorescence-based techniques have been developed in order to make use of high-throughput DNA analyzers [14][15][16]. We report in this study, the construction of a whole-genome BAC-based Fingerprinted Contigs (FPC) map for catfish, Ictalurus punctatus, using the 4 color High-Information-Content Fingerprinting (HICF) SNaPshot technique [16]. To our knowledge, this is the first reported assembly of a whole-genome with the SNaPshot HICF in the literature though several other projects are under way (Wheat [17]; Rice [18]). Only two other HICF whole-genome maps have been built, one based on 3 colors fingerprinting for maize [19] and one based on one color fingerprinting for tilapia [20].

BAC fingerprinting
The HICF FPC technique used by Luo et al. [16] was chosen as it is well suited to a high-throughput format and the SNaPshot labeling kit is commercially available (Applied Biosystems, Foster city, CA). We fingerprinted 54,816 clones from the HindIII CCBL1 catfish BAC library [9]. This library was chosen, rather than CHORI-212, because of the lower DNA sequence polymorphism of the gynogenetic donor. In this library, three percent of the wells were empty and ten percent of the clones did not contain an insert [9]. Also, 1,142 clones were substituted with control clones, so approximately 46,550 clones were effectively fingerprinted. We used the FPMiner v1.2 software (Bioin-forSoft LLC, Beaverton, OR) to size the DNA fragments from the capillary fingerprinting chromatograms and to identify and remove the clones not meeting our quality standards. Data was then exported to GenoProfiler [21] to remove the plate control wells and clones resulting from potential cross-contamination from the 384 and 96 well plate formats. Altogether, 42,616 fingerprinted BAC clones (91.5%) were analyzed with FPC software, 41,749 were assembled into contigs and 867 remained singletons. The channel catfish genome size is estimated to be 1 Gb [22][23][24] and the average size of inserts is 165 kb [9], therefore the fingerprinted BAC clones represented approximately a 6.8-fold coverage of the genome. Table 1 summarizes the data for the physical map. The assembly resulted in 1782 contigs with 867 singletons. The resulting contigs of the channel catfish physical map can be accessed at the ARS-USDA-CGRU-Genomics website [25]. Each genome-fold coverage of fingerprinted BACs contributed nicely to the increase in average contig length and physical length of the map (Table 1). After assembly of approximately 7 genome-fold coverage of BAC clones, the contigs reached an average length of 524 kb and the physical map reached a physical length of 0.93 Gb. FPC identified 3,060 questionable (Q) clones in this assembly corresponding to 7.3% of the clones. This is similar to the 11% reported in the maize study [19]. While these numbers are high compared to numbers obtained using an agarose based technique, Q clones in HICF projects can arise from the lower overall reproducibility of the fingerprints hindering alignments of such clones in the assembly. For this project the average fingerprinting reproducibility of the control clones was about 85%. Similarly, Nelson et al. [19] showed a reproducibility rate of 75%. As such, the number of Q clones is then less an indicator of the assembly quality in HICF than in agarose fingerprinting. With our current assembly, more fingerprinting of the same library would not likely close the gaps efficiently as we are approaching full-length coverage and because some regions of the genome could be poorly represented in the HindIII CCBL1 library. Addition of data from a 1 to 2-fold coverage of the EcoRI CHORI-212 library would be more useful but still potentially biased because it was also generated by restriction enzyme digestion. The most effective would be data from a randomly sheared BAC library generated from the same gynogenetic fish used to make the CCBL1 library.

Contig validation
Three different approaches were used to validate the contig assembly. First, 19 of the largest contigs were anchored to the microsatellite-based genetic linkage map [2] using markers developed from the two most terminal BACs in each contig ( Table 2 and Table 3). If no polymorphic marker was available for an end clone, markers were developed from internal BACs which assembled at a Sulston score higher than 1e-40 as these could potentially be weaker points of the assembly. Marker pairs developed within each contig displayed close genetic linkage for 18 of 19 contigs, and 16 contigs were placed on the catfish genetic map. The two sets of markers for contigs 23 and 321 were closely linked within each respective contig (Table 3). However, these markers were not linked to any other marker on the current genetic map ( Table 2). One possible explanation is that these two contigs are telomeric and placement of these contigs on the genetic map will require additional linked markers. Contig 84 was found to be the result of a false join ( Table 3). Two of the markers present in contig 84 belonged to linkage group U22 and 5 markers belonged to linkage group U16 (Table  2). If this contig represented a junction of the two linkage groups, one would expect the markers on U16 to be linked to the markers on U22, but this was not supported by the linkage analysis. Also, it was unlikely this result was due to a chimeric BAC clone due to the number of clones covering the region between the markers. In the current analysis, the ratio of average contig length to average recombination distance between the most distal markers was 311 kb/cM, compared with 447 kb/cM estimated from the low resolution catfish genetic linkage map [2]. Addition of markers to the genetic map will likely increase map length and decrease the latter estimate. While the ratio of physical to linkage distances varied within the sampled contigs (Table 3), this variability was also evident in the human genome [26]. Further integration of the linkage and physical maps will help clarify this relationship in the catfish genome. This limited analysis produced a contig assembling error estimated at 5%. Similarly, 4% of false joins were found when the maize genome was assembled using the 3 color fingerprinting HICF FPC technique [19]. While the rate of misassembly between two clones should be constant, the probably of a contig containing a false join should be proportional to the number of clones in the contig. Since our analysis was skewed toward the largest contigs, containing 89 to 156 clones, the whole map error rate may be lower. The current validation does not exclude that a few BAC clones could be placed in the wrong contig. Nevertheless the successful anchoring 18 of the largest contigs to the microsatellite linkage map attests to the overall reliability of the contigs and the physical map.
As a second validation, the grouping of the Major Histocompatibility Complex (MHC) class I multigene family was examined to determine if clones containing these genes would be correctly assembled using this HICF technique. MHC class I genes were chosen as the extensive multigenicity of MHC I genes in channel catfish has been shown [27,28]. Twenty-one of the clones previously identified by PCR screening of the CCBL1 BAC library with a MHC class I alpha 3 domain primer pair were examined [28]. Those clones were then grouped either by capillary fingerprinting or by Southern blot pattern following a Dra I digest and hybridization with a MHC class I alpha 3 domain probe. Figure 1 shows the Southern blot pattern obtained for each clone. The restriction patterns were segregated into groups A, B, or C. Clones within each group shared one or more bands (Figure 1). Each common band within the group presumably represented a shared MHC I gene between overlapping BACs. Depending on the extent of their overlap, BAC clones shared different bands or even had some unique bands within a group depending on their location in the genome. The remaining 16 BAC clones grouped consistently with the observed Southern blot patterns: A, B, and C corresponded to Contigs 646, 1499 and 1648, respectively, thus confirming the correct assembly of those BAC clones in contigs. Five MHC class I BAC clones were eliminated from the assembly during the quality check of the fingerprinting as they had either   1 LG = linkage group 2 Sense-strand primer was 5' labeled with FAM or HEX fluorophore. 3 Informative meioses 4 No linkage to markers on the current genetic map too few or too many bands to fit our standards. However, a separate assembly that included these lower quality clones showed that they still mapped according to their grouping in the Southern blot analysis.
Third, the validity of the assembly was also confirmed by the ongoing sequencing of the channel catfish immunoglobulin heavy chain locus [29,30]. Two structurally related genomic clusters of catfish immunoglobulin heavy chain constant region gene segments are known and the distance between the two has been estimated at 725 kb [31]. Six BAC clones representing the two clusters were identified [30]. Those clones from the non-arrayed CCBL2 library were independently fingerprinted, and were localized to two contigs (Contigs 499 and 528, [30]). One clone from each contig/cluster was selected for sequencing and adjacent clones were identified for sequencing using the minimum tiling path and BAC end sequence markers. No discrepancies were found in the assembly of those two contigs when sequencing the clones. Even though FPC could not join those two contigs because the overlap was too small, sequence analysis demonstrated one of the clones from CCBL2 in Contig 499 overlapped with clones GY079K15 and GY099L11 in Contig 528 by 7 kb and 28 kb, respectively (data not shown).

Conclusion
This is the first whole-genome physical map of the channel catfish. By adapting the SNaPshot based technique to a high-throughput DNA analyzer, we were able to fingerprint a whole genome in a relatively short period of time with limited human resources (two people processed four 384-well plates per day). The resulting stage 1 map was assembled with an initial Sulston score of 1e-40 to maintain a conservative core of contigs as the backbone for further analysis. The validation studies confirmed the reliability of this assembly. We are currently integrating this physical map with our other catfish genomic resources most notably the genetic map [2], to enable QTL studies, comparative genomic analyses and fine mapping of genes to assist in identification of markers associated with economically important traits such as such as disease resistance, growth rates and carcass yield. 1 Percent recombination based on two-point linkage 2 Log 10 likelihood ratio of linkage vs. non-linkage 3 No linkage to markers on the current genetic map * When a polymorphic marker was not available (n/a) for a terminal BAC clone, this most distal mapped clone was used for linkage analysis.

BAC library fingerprinting
BAC clones were obtained from the CCBL1 BAC library [9]. The CCBL1 library was produced from a female channel catfish obtained by meiotic gynogenesis. The BAC clones from each 384-well plate were inoculated using the GeneTAC G 3 robot (Genomic Solutions, Ann Harbor, MI) in four 96-well 2.2 mL plates (ABgene, Rochester, NY) containing 1.5 ml of LB/12.5 μg/mL chloramphenicol. The clones were grown at 37°C in a HiGro shaker (Genomic Solutions) for 24 hours. Wells E7 and H12 were inoculated with the same BAC clone on all plates to serve as internal controls for plate orientation and fingerprinting quality. The DNA was isolated via an alkaline lysis method with Qiagen reagents (Qiagen, Inc., Valencia, CA) in a 96-well format using an Apricot pipettor (PerkinElmer Life and Analytical Sciences, Wellesley, MA). The DNA was then resuspended in 30 μl ddH 2 O. All steps of the fingerprinting were performed according to Luo et al. [16] except that all amounts of reagents including the DNA were divided by 3. Briefly for 60 minutes C in a PTC-200 thermal cycler (MJ Research). The resulting labeled fragments were precipitated with sodium acetate and ethanol.

Capillary electrophoresis
To attain high-throughput, the samples were run on a 3730 xl DNA Analyzer (Applied Biosystems, Foster City, CA). The fluorescent BAC fingerprinting fragments were resuspended in 10 μl per well of Hi-Di formamide (Applied Biosystems) solution containing 0.05 μl GeneScan-500 LIZ size standard (Applied Biosystems) for at least 1 hour at 4°C. Samples were denatured for 5 min at 95°C, cooled to 4°C on ice and centrifuged (3,220 × g for 2 min) to the bottoms of the microplate wells. The DNA fragments were injected on a 50-cm length, 96 capillary array filled with POP-7 (Performance Optimized Polymer, Applied Biosystems) and resolved using the instrument run module settings shown in Table 4. Initially, the run module and spectral calibration protocols in Data Collection version 1.0 were adapted from sequencing and 36-cm fragment analysis protocols to enable separation and detection of the HICF fragments on the 3730 xl DNA Analyzer. However, functionality for both 50-cm fragment analysis and custom 5-color spectral access were enabled by Applied Biosystems in Data Collection versions 2.0 and 3.0. Injection voltage and time were adjusted to optimize fluorescent peak heights and minimize cross-talk between adjacent capillaries.

BAC contig assembly
The chromatogram files were analyzed with FPMiner v1.2 software (BioinforSoft LLC, Beaverton, OR) to size the fragment and determine fingerprint quality. All samples with fewer than 50 fragments and more than 160 fragments were removed, as were all samples with a Size Standard Matching Quality Score below 0.9 or Fingerprint Editing Quality Score below 10. All fragments present in more than 20% of the samples, which included the vector fragments and potential repetitive DNA, were also removed. Those values were determined after initial examination of the raw fingerprinting data. For channel catfish, the average number of blue (BamH1) and red (XhoI) fragments per clone were lower than the number of green (EcoRI) and yellow (XbaI) fragments. As a result clones with unusual ratios of band numbers between the four colors (too high or too low compared to the average number per color) were manually removed. The sample files were then exported to GenoProfiler software [21] to remove the controls and samples demonstrating potential contamination from neighboring wells in the 96-or 384well plate format. Contigs were assembled from bands between 50-500 bp using FPC software Version 8.5 [32,33]. FPC parameters were adjusted as described by Luo et al. [16] and Nelson [19] for the HICF technique. Briefly, as FPC did not accept color labels or fractional sizes, every size was multiplied by 10 and the color labels were converted to non-overlapping numeric ranges by adding a different offset value for each color. As a result, the gel length was set at 18,000 bp and tolerance was set at 4 to obtain the 0.4 bp optimal tolerance value determined by Luo et al. [16] for HICF-SNaPshot fingerprinting. The clones had an average of 93 bands and an average size of 165 kb [9]; hence, the estimated band size was set at 1,800 bp in the configuration file. Those values allowed the FPC software to estimate contig length and physical map length. An initial Sulston score of 1e-40 was determined to be optimal for our data set in order to minimize number of contigs without overly increasing the number of clones with questionable alignments (Q clones). Contigs with more than 10 Q clones were reassembled with a stricter cutoff by setting the value of the DQer function of FPC to 10 and setting the step value to 5. We also set the "Best of" function to 50 builds as this setting controlled how many different attempts FPC makes when building the consensus band (CB) maps to try finding CB maps with fewer Q clones. Next, the "Ends to Ends" auto merge function was used with default settings and the stringency was decreased from 1e-40 to 1e-15. To finish, the 'Singles to Ends' function was used with a minimum of 2 ends matching. In that case the stringency was decreased from 1e-40 to only 1e-25 because singletons only had to match on one side. Reproducibility of the fingerprinting technique was assessed by determining the average percentage of shared bands between one selected representative control clone and 100 randomly selected control clones using Genoprofiler.

Linkage analysis
The terminal BAC clones of each contig, or clones at potentially weak points (resulting from "Ends-to-Ends" assembly) were chosen for linkage mapping. Parents and 48 offspring from two reference families were genotyped, and two-point linkage analysis was performed as described [2]. BAC-specific microsatellites were identified by STRAP sequencing [34] or from BAC end sequencing data ( Table 2). Sense-strand primers were synthesized containing a 5' FAM or HEX fluorescent label (InVitrogen Corp., Carlsbad, CA), and anti-sense strand primers were unlabeled (Integrated DNA Technologies, Coralville, IA). The 10 μl PCR reactions were performed using Titanium Taq DNA polymerase following the manufacturer's protocol (BD Biosciences, San Jose, CA) with the following modifications: we used Promega (Madison, WI) dNTPs and added 3 pmol of each primer.