A first generation integrated map of the rainbow trout genome

Background Rainbow trout (Oncorhynchus mykiss) are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL) have been identified for production and life-history traits in rainbow trout. An integrated physical and genetic map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS) programs for improving rainbow trout aquaculture production. Results The first generation integrated map of the rainbow trout genome is composed of 238 BAC contigs anchored to chromosomes of the genetic map. It covers more than 10% of the genome across segments from all 29 chromosomes. Anchoring of 203 contigs to chromosomes of the National Center for Cool and Cold Water Aquaculture (NCCCWA) genetic map was achieved through mapping of 288 genetic markers derived from BAC end sequences (BES), screening of the BAC library with previously mapped markers and matching of SNPs with BES reads. In addition, 35 contigs were anchored to linkage groups of the INRA (French National Institute of Agricultural Research) genetic map through markers that were not informative for linkage analysis in the NCCCWA mapping panel. The ratio of physical to genetic linkage distances varied substantially among chromosomes and BAC contigs with an average of 3,033 Kb/cM. Conclusions The integrated map described here provides a framework for a robust composite genome map for rainbow trout. This resource is needed for genomic analyses in this research model and economically important species and will facilitate comparative genome mapping with other salmonids and with model fish species. This resource will also facilitate efforts to assemble a whole-genome reference sequence for rainbow trout.


Background
Rainbow trout (Oncorhynchus mykiss) are the mostwidely cultivated cold freshwater fish in the world and are considered by many to be the "aquatic lab-rat". Interests in the utilization of rainbow trout as a model species for genome-related research activities focusing on carcinogenesis, toxicology, comparative immunology, disease ecology, physiology, transgenics, evolutionary genetics, and nutrition have been well documented [1].
Rainbow trout are cultured on every continent except Antarctica, with 2008 global production estimated at 576,289 metric tons and valued at $2.39 billion [2]. Coupling great interest in this species as a research model with the need for genetic improvement for aquaculture production efficiency and product quality justifies the continued development of genome resources facilitating selective breeding.
The rainbow trout genome is large and complex. Genome size estimates derived from determining the molecular weight of DNA per cell for rainbow trout and other salmonids vary from 2.4 to 3.0 × 10 9 bp [3,4]. As with most salmonids, rainbow trout experienced a recent genome duplication event resulting in a semi-tetraploid state (i.e. after an autotetraploid event in the salmonids, their genome is undergoing reversion to a diploid state) [5]. All ray-finned fishes share an additional (3R) round of ancestral genome duplication in their evolutionary history compared to mammals and birds, but the salmonids' common ancestor underwent an additional recent (4R) whole genome duplication event and more than half of the loci are still duplicated [6]. In addition, it is estimated that 50% to 60% of the rainbow trout genome contains interspersed repeat sequences (Genet et al.: Analysis of BAC-end sequences in rainbow trout: content characterization and assessment of synteny between trout and other fish genomes, submitted).
Two microsatellite-based genetic maps with medium to high marker densities were recently developed for rainbow trout by INRA [13] and the NCCCWA [16]. The INRA map is based on a panel of two DH gynogenetic lines. It has more than 900 microsatellites over 31 linkage groups and a total length of 2,750 cM (average resolution of 3 cM). The NCCCWA map is based on a panel of five families that represent the starting genetic material of the NCCCWA selective breeding program. It has 1,124 microsatellite loci over 29 linkage groups and a total length of 2,927 cM (average resolution of 2.6 cM). The linkage groups from the two microsatellite genetic maps were anchored to the physical chromosomes using fluorescent in-situ hybridization and were found to represent 52 chromosome arms [23,24].
Qualitative/quantitative trait loci (QTL) mapping experiments in rainbow trout have been very successful because of their high fecundity, external fertilization, and ease of gamete handling and manipulation. Many QTL have been identified for production and life-history traits including resistance to the parasite C. shasta [25], resistance to IHNV [26,27] and to IPNV [28], whirling disease resistance [29], Killer cell-like activity [30], upper thermal tolerance [31,32], embryonic development rate [9,33,34], spawning time [35,36], confinement stress response [37], early maturation [38] and smoltification [39]. The availability of a BAC physical map integrated with the genetic map will facilitate fine mapping of QTL, the selection of positional candidate genes and the incorporation of marker-assisted selection (MAS) into rainbow trout breeding programs. A major shortcoming of QTL studies is that they are limited to the variation present in a limited number of families and typically do not detect loci with small effects. This can be overcome by whole genome association studies and other approaches, such as genomic selection, that capture the effects of most QTL that contribute to the populationwide variation in a trait. Recently we demonstrated the feasibility of low resolution LD association studies in rainbow trout [40,41]. In the absence of a reference genome sequence assembly, a robust integrated physical and genetic map will provide better resolution than the current genetic maps for ordering of genetic markers and estimating physical distances between markers, thus facilitating future whole genome association studies in rainbow trout.
The first BAC-based physical map of the rainbow trout genome was recently assembled using DNA fingerprints of 154,439 clones from the 10X HindIII Swanson library [8]. The map contains 4,173 contigs and 9,379 singletons. The physical length of the map contigs was estimated to be approximately 2.0 Gb, which represents approximately 80% of rainbow trout genome. Here we report the construction of the first integrated physical and genetic map of the rainbow trout genome using microsatellites isolated from BAC end sequences and PCR superpools for library screening and identification of BACs that harbor previously mapped markers. This integrated map provides a frame work for a robust composite genome map and future reference genome sequence assemblies.

BAC end sequencing (BES) microsatellites
We screened the BES reads from 184 of the largest BAC fingerprinting contigs and selected 205 microsatellites from 117 contigs for PCR optimization and genotyping (Table 1). Of the 205 markers genotyped, 128 markers appeared to amplify single marker regions and were polymorphic. Ten markers were monomorphic, and 58 markers could not be resolved and unambiguously scored. Fifteen markers generated duplicated patterns, of which 8 could be scored for a single marker region and 1 produced a scorable duplicated pattern. Hence, 7 of the duplicated markers produced a monomorphic or an unresolved pattern for one of the two marker regions. Two of the 128 informative markers could not be assigned to linkage groups (i.e. 126 markers were mapped using the NCCCWA mapping families). The BES reads from which the 126 mapped markers were isolated represent 88 unique BAC FPC contigs. The 205 BES microsatellites are listed in Additional file 1, sheet 1, with the corresponding PCR primers and conditions for each marker, number of alleles and size range, Gen-Bank accessions, primers sequences and physical map contigs. We have also mapped an additional six BES microsatellites onto linkage groups of the INRA genetic map (Additional file 1, sheet 1).

Library screening with PCR superpools Previously mapped microsatellites
The 10x Swanson BAC library was screened with the NCCCWA PCR super-pools using 137 markers that were previously mapped with high confidence to the NCCCWA genetic map representing 25 of the 29 chromosomes and the INRA super-pools were screened with 265 markers that were previously mapped onto the INRA genetic map representing all linkage groups. The result of the combined effort was that 146 markers covering all linkage groups were localized to one or two BAC FPC contigs ( Table 2). The list of the markers with positive hits is shown in Additional file 1, sheet 2, with the corresponding positive clones and physical map contigs.

Immune response genes
The BAC library was also screened with PCR primers from 12 immune response genes that were not previously mapped to the rainbow trout genome (Additional file 2, Table S1). Positive clones were verified by PCR of the individual clones and direct sequencing from the BAC DNA. The BAC clones that were positive and their corresponding physical map contigs are listed in Additional file 1, sheet 3.

Single nucleotide polymorphism (SNP) markers
The experimental design and results of SNPs discovery in rainbow trout using a reduced representation library (RRL) were recently published [42]. Of the 183 SNPs that were validated, 167 were polymorphic in the NCCCWA genetic mapping panel and 159 were mapped to chromosomes on the genetic map ( Table 3). The HaeIII RRL SNP discovery database was aligned with the BES database (Genet et al.: Analysis of BAC-end sequences in rainbow trout: content characterization and assessment of synteny between trout and other fish genomes, submitted) to find matches that can be useful for the integration of the genetic and physical maps. We found 618 unique matches using SSAHA2 [43]. Assuming 48% validation rate for this SNPs database [42] we expect that approximately 300 of the matched SNPs will be useful for integration between the physical and genetic maps. Two of the matching SNPs were among the 183 validated by Castaño-Sánchez et al. [42]. One marker (OMS00144) was among the 159 that were mapped. The other SNP (OMS00174) was not informative for linkage analysis in the NCCCWA panel, but it had two positive hits on end sequences from two BACs that overlap in contig number 431 of the physical map (Additional file 1, sheet 3).

The genetic map
Information from 1,486 genetic loci was used for linkage analysis (Table 3). Two-point linkage analysis placed 1,229 loci in 29 linkage groups at LOD ≥8.75. An additional 192 markers with two-point LOD <8.75 were added to linkage groups manually, of which only six markers had a two-point LOD <3.0 (2.90, 2.89, 2.64, 2.12, 2.10 and 1.80). The specific best of two-point LOD score for each marker is provided in Additional file 3, Worksheet 1. The total combined sex averaged map distance was 3,346.3 cM (Kosambi). A sample map representing chromosome 2 is presented in Figure 1, and maps representing all chromosomes are presented in Additional file 4. Multipoint linkage analysis was conducted on individual linkage groups to assign LOD scores for the specific position of each marker within the linkage group. The number of markers included in a framework map created at LOD ≥4 for the specific position of the marker in the linkage group was 460. The only chromosome that did not contain any framework markers at LOD ≥4 was OMY21, for which a framework map was created at LOD ≥3. Additional loci were added at LOD ≥3 (77), ≥2 (80) ≥1 (56), and ≥0 (748) ( Table 3).  . It is noteworthy that this type of sex recombination ratio estimates do not take into account the larger differences in recombination rate that exist between males and females throughout most of the length of the linkage groups. It is likely that female:male ratios will be elevated throughout most of the length of the chromosome arms, while they will be much lower in the more contracted telomeric ends of the linkage groups because of elevated male recombination rates in these regions [15]. It should be pointed out that overall estimates of recombination rate may not be accurately depicted in the current study, because recombination estimates were not obtained by direct comparisons of adjacent intervals. Therefore, the reported recombination distances given in this study are likely an underestimate of the real recombination ratio values. In this version of the map, we have added to the map of Rexroad et al. [16] through multipoint linkage analysis 159 RRL SNPs, 126 microsatellites from BES and 9 microsatellites isolated from BACs that harbor immune response genes (Additional file 2, Table S2). The SNPs were distributed in all the chromosomes (2-10 per chromosome; Additional file 3, worksheet 3) and the BES microsatellites were mapped to all but chromosome 24 (1-10 per chromosome; Additional file 3, worksheet 4). Twenty seven loci that were previously mapped to expand the length of linkage groups [16] were not mapped in this version, and 29 loci that were previously genotyped but were not linked, were assigned to linkage groups in the current version. A high frequency of duplicated microsatellite loci was observed as previously reported [16], but in many cases only one locus was successfully ordered on the map. Overall, 88 duplicated markers were successfully mapped to two loci (176 loci), which means that the total number of markers mapped was 1,333.

The integrated map
Anchoring of 203 BAC contigs from the physical map to linkage groups was accomplished through mapping of 266 loci onto the NCCCWA genetic map. The marker loci were derived from the PCR screening of the BAC superpools, BES microsatellites (OMY4000), microsatellites isolated from BACs that harbor genes of interest (OMM3000) and one SNP marker matched with BES (OMS00144). A schematic illustration of a BAC fingerprinting contig anchored to a linkage group is presented in Figure 2. Markers from 12 of the anchored contigs were mapped to two different linkage groups as a result of PFC assembly errors or linkage mapping errors as we have previously discussed [8]. The fraction of contigs that are in disagreement between the physical map and genetic map is used to estimate the error rate in the FPC assembly. This error rate of 6% (12/203) is similar to the 5% estimated for the catfish physical map of Quiniou et al. [45] or the 4% rate detected in the 3-color HICF physical map of the maize genome [46]. The number of contigs anchored per chromosome ranged from 3 to 17 with an average of 7.4. Chromosomes OMY18, 24 and 28 had the lowest number of 3 anchored contigs each, and OMY12 had the highest number with 17 anchored contigs.
The combined physical length of the 203 anchored contigs was 138,525 consensus bands (CB) which is equal to 235,493 Kb based on a conversion ratio of 1 CB = 1.7 Kb [8]. Therefore, we estimate that the integrated map covers~12% of the physical map, or~10% of the rainbow trout genome, assuming haploid genome size of 2.4 × 10 9 bp. The length of anchored contigs ranged from 119 Kb to 4,590 Kb with an average length The integrated map we developed for the rainbow trout genome will facilitate comparative genomics studies with other salmonids and with model fish species. Many microsatellite markers can be used for genetic mapping across salmonid species which is very useful for comparative genome mapping [23,48] and can benefit research in species with less developed genome maps. In addition, the rainbow trout BAC end sequences can be used to infer conserved synteny with other fish genomes as we have previously shown (Genet et al.: Analysis of BAC-end sequences in rainbow trout: content characterization and assessment of synteny between trout and other fish genomes, submitted), and this integrated map provides a larger frame-work expanding the size of the syntenic blocks that can be identified between fish genomes.

Conclusions
The first generation integrated map of the rainbow trout genome is composed of 238 BAC contigs anchored to chromosomes of the genetic map. It covers more than 10% of the genome across segments from all 29 chromosomes. This map provides a frame work for a robust composite genome map. The availability of an integrated physical and genetic map will enable detailed comparative genome analyses, fine mapping of QTL, positional cloning, selection of positional candidate genes for economically important traits and the incorporation of MAS into rainbow trout breeding programs. A comprehensive integrated map will also provide a minimal tiling path for genome sequencing and a framework for whole genome sequence assembly.

BAC end sequencing and markers development
The 10X HindIII Rainbow trout BAC library [6] was used for BAC-end sequencing (BES) as previously described (Genet et al.: Analysis of BAC-end sequences in rainbow trout: content characterization and assessment of synteny between trout and other fish genomes, submitted). Briefly, BAC culture was conducted using standard protocols and end sequencing with SP6 and T7 primers was done using standard Sanger technique. The raw, untrimmed files were processed by PHRED Rainbow Trout Chr. 2 Figure 1 Chromosome 2 from the new NCCCWA linkage map is shown as an example. Annotation of genes linked to the marker or BAC contig from the 1 st generation physical map are connected to the marker name by underscore (e.g. OMM3080_TAP1_ctg260). Annotation of "or_?" means that the marker is duplicated and only one of two BAC contig was identified for the marker. Blue, Green, Red, Black and Italicized font markers were mapped to their specific location on the linkage group at LOD scores of 4, 3, 2, 1 and 0, respectively. Sex average distances between markers are shown in cM.
software [49]. The PHRED quality score cut-off value was set at 20 for the acquisition of Q20 values. The BESs were trimmed of vector sequences (pBeloBAC11 vector [50]) and filtered of E. coli sequences. Microsatellites and other simple sequence repeats (SSR) were analyzed using Tandem repeat Finder software [51]. We examined ten classes of SSRs by using a maximum period size of 10. BES reads harboring at least 50 base pairs (bp) flanking sequences on either side of the SSRs were selected for PCR primer design. Primers for BESs containing microsatellites were designed using Primer3 software [52]. The primer product size range was chosen between 150 and 450 nucleotides. The optimum size of primers was set to 20 nucleotides (range from 18 to 27 nucleotides) with an optimum melting temperature of 60.0°C (range from 57 to 63°C).

Microsatellites Genotyping
The NCCCWA mapping panel of 5 families was genotyped with microsatellites as previously described [16]. A total of 205 microsatellite markers isolated from BAC end sequences (Additional file 1, sheet 1) were genotyped using the tailed protocol of Boutin-Ganache et al. [53]. Primers were obtained from commercial sources (Alpha DNA, Montreal, Quebec, Canada). Three oligonucleotide primers were used in each DNA amplification reaction (Forward: 5' GAGTTTTCCCAGTCACGAC-primer sequence 3'; reverse: 5' GTTT-primer sequence 3'; fluorescent labeled primer with FAM: 5' GAGTTTTCCCA GTCACGAC 3'). Primers were optimized for amplification by varying annealing temperatures and MgCl2 concentrations. PCR reactions (12 μl total volume) included 50 ng DNA, 1.5-2.5 mM MgCl 2 , 2 pmol of forward primer, 6 pmol of reverse primer, 1 pmol of fluorescent labeled primer, 200 μM dNTPs, 1X manufacturer's reaction buffer, and 0.5 unit Taq Polymerase (ABI, Foster City, CA, USA). Amplifications were conducted in an MJ Research DNA Engine thermal cycler model PTC 200 (MJ Research, Waltham, MA) as follows: an initial denaturation at 95°C for 10 min, 30 cycles consisting of 94°C for 60 sec, annealing temperature for 45 sec, 72°C extension for 45 sec; followed by a final extension of 72°C for 10 min. PCR products were visualized on agarose gels after staining with ethidium bromide. Three μl of each PCR product was added to 20 μl of water, 1 μl of the diluted sample was added to 12.5 μl of loading mixture made up with 12 μl of HiDi formamide and 0.5 μl of Genscan 400 ROX internal size standard. Samples were denatured at 95°C for 5 min and kept on ice until loading on an ABI 3730 DNA Analyzer (ABI, Foster City, CA, USA). Output files were analyzed using GeneMapper version 3.7 (ABI, Foster City, CA, USA), formatted using Microsoft Excel and stored in a Microsoft Access database.

Library screening with PCR superpools
The 10x Swanson BAC library was screened using the NCCCWA or the INRA PCR superpools with microsatellites that were mapped to the NCCCWA or INRA genetic maps [13,16] as previously described [54,55]. The screening results were cross-referenced with the physical map to localize the positive clones onto contigs.  Figure 2 A schematic illustration of a BAC fingerprinting contig anchored to the rainbow trout Chr. 2 using microsatellites isolated from BACs. The four microsatellite markers from Ctg260 (224 clones; 1,584 CB or approximately 2.7 Mb) were mapped to Chr. 2 and the TAP1 positive BACs (highlighted in green) were previously identified by probe hybridization and confirmed by PCR and direct sequencing. The microsatellites order shown is based on the FPC map (not the genetic map). Markers in bold blue (OMY4005 and 4006) were localized on the linkage group at LOD4 and markers in regular font at LOD0. The genetic distance between the LOD4 markers is marked by a solid-line arrow and between markers that were localized at lower confidence by broken-line arrows.
For microsatellite markers that did not have at least two positive clones from the same FPC BAC contig, the individual positive clones were picked from glycerol stock and confirmed by PCR as previously described [6].

SNPs discovery using reduced representation libraries (RRL)
Protocols developed and used for SNPs discovery in cattle and swine [56][57][58] were adapted for rainbow trout using RRL libraries and high throughput parallel 454 GS FLX pyrosequencing. The experimental design and results of the rainbow trout work were recently published [42]. Briefly, DNA from 96 unrelated individuals representing the families of the NCCCWA broodstock was pooled into one sample. The reduced representation library consisted of 440 bp fragments resulting from complete digestion of the pooled DNA with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent computational data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the semi-tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map.

Alignments between SNPs and BES
To find matches we aligned the HaeIII RRL SNP discovery database of Castaño-Sánchez et al. [42] with the BES database (Genet et al.: Analysis of BAC-end sequences in rainbow trout: content characterization and assessment of synteny between trout and other fish genomes, submitted). Matches were found using SSAHA2 [43] (http://www.sanger.ac.uk/Software/analysis/SSAHA2/) for pairwise sequence alignment with a threshold Smith-Waterman score of 160 (very restrictive and conserved to avoid matches between paralogous loci).

Linkage analysis
The microsatellites and SNPs were placed on the rainbow trout genetic map using the genetic linkage mapping programs MULTIMAP [59] and CRI-MAP [60]. First, genotype data combined for both sexes were formatted into the standard LINKAGE [61] file format and checked for Mendelian inheritance using PEDCHECK [62]. RECODE [63] was then used to convert the allele sizes into number-coded alleles. Using an in-house Perl script, make_gen, the genotype data and the locus names were assembled into CRI-MAP input format. The resulting file was then added to that of Rexroad et al. [16] using another in-house Perl script, join_gens, and MULTIMAP was used to conduct two-point linkage analyses to identify the closest markers with LOD ≥8.75 and recombination fraction r ≤0.2. An additional 192 markers with two-point LOD <8.75 were added to linkage groups manually, of which only six markers had a two-point LOD <3.0 (2.90, 2.89, 2.64, 2.12, 2.10 and 1.80). The specific best of two-point LOD score for each marker is provided in Additional File 3, Worksheet 1. Multipoint linkage analysis was conducted on individual linkage groups to assign LOD scores for the specific position of each marker within the linkage group. Framework maps were constructed at LOD ≥4 for all linkage groups but OMY21, for which the framework map was created at LOD ≥3. Markers were added to comprehensive maps by lowering the LOD threshold one integer at a time and starting with the previous order. Resulting maps are consensus maps, accounting for coinformative meiosis across the five families. Chromosome numbers were assigned to linkage groups using the integrated cytogenetic/linkage map of Phillips et al. [24].
Additional file 3: Additional material.
physical map and wrote the manuscript draft; CG participated in the study design, obtained BAC end sequences, identified microsatellites in BAC end sequences and designed PCR primers for microsatellites genotyping and supervised the INRA BAC library screening; MCL participated in the study design, supervised DNA extractions and BAC fingerprinting and assembled the physical map; AC screened the INRA PCR superpools; GG improved the genetic linkage analysis pipeline, conducted the linkage analysis and assembled the genetic map, and conducted the SSAHA2 alignment between the SNP and BES databases; YH conducted DNA extractions and BAC fingerprinting; KTA contributed to the INRA BAC library pooling and screening; FK contributed to the INRA BAC library screening; CCS conducted the SNPs discovery and validation experiments; JY co-supervised the SNPs discovery and validation experiments; RLV participated in the study design and developed the genetic linkage analysis pipeline; CER participated in the study design and in the genetic linkage analysis and supervised the SNPs discovery and validation experiments. All authors reviewed and contributed to the manuscript.