Comparative mapping of expressed sequence tags containing microsatellites in rainbow trout (Oncorhynchus mykiss)

Background Comparative genomics, through the integration of genetic maps from species of interest with whole genome sequences of other species, will facilitate the identification of genes affecting phenotypes of interest. The development of microsatellite markers from expressed sequence tags will serve to increase marker densities on current salmonid genetic maps and initiate in silico comparative maps with species whose genomes have been fully sequenced. Results Eighty-nine polymorphic microsatellite markers were generated for rainbow trout of which at least 74 amplify in other salmonids. Fifty-five have been associated with functional annotation and 30 were mapped on existing genetic maps. Homologous sequences were identified for 20 of the EST containing microsatellites to identify comparative assignments within the tetraodon, mouse, and/or human genomes. Conclusion The addition of microsatellite markers constructed from expressed sequence tag data will facilitate the development of high-density genetic maps for rainbow trout and comparative maps with other salmonids and better studied species.


Background
Genome research in agriculturally important species is facilitated by the availability of species-specific molecular genetic tools and resources such as chromosome maps and large volumes of sequence data. Recently such resources have been developed for important aquaculture species including rainbow trout, which are also widely used as a model system for carcinogenesis, toxicological, and comparative immunological research [1].
The recent evolutionary divergence of the salmonids [22] and the importance of many of these species to aquaculture will allow for comparative QTL mapping. For example, the development of genetic linkage maps for Atlantic salmon and Arctic char [23][24][25] has enabled the identification of QTLs for growth characteristics, disease resistance, and temperature tolerance in those species [18,26,27]. The development of microsatellites markers from EST sequences will facilitate the use of genome information in salmonids species by 1) increasing Type II [5] marker densities on genetic maps; 2) integrating physical and genetic maps; 3) developing comparative genetic maps among salmonids; and 4) developing comparative maps with aquatic model organisms such as zebrafish, fugu, and tetraodon and with better studied avian and mammalian species. This comparative information will aid in the identification of positional candidate genes [28] for production traits in salmonid aquaculture and for basic research which utilizes rainbow trout as model organism.
An expressed sequence tag (EST) [29] project was initiated for rainbow trout with the following aims: 1) identify as many unique transcribed sequences as possible; 2) anno-tate sequence data with information from other species; 3) develop functional genome tools for rainbow trout; and 4) identify microsatellite and single nucleotide polymorphism (SNP) genetic markers for the construction of high-density chromosome maps [30]. Sequences from a normalized cDNA library (NCCCWA 1RT) constructed from brain, gill, liver, muscle, kidney, and spleen tissue resulted in the creation of the Rainbow Trout Gene Index (RTGI) [31]. Microsatellite marker development was conducted simultaneously with the sequencing phase of the project through hybridization of (GT) 11 and (GA) 11 probes to high-density filters representing 27,648 clones from the library. Positive clones were selected for further analyses resulting in 89 polymorphic microsatellite markers derived from ESTs, 30 which were informative in mapping reference families, 55 were associated with functional annotation, and 20 for which comparative mapping assignments were determined.

Marker development
Hybridization of high-density filters representing 27,648 cDNA clones from a normalized cDNA library with (GA) 11 and (GT) 11 oligonucleotide probes identified 415 clones potentially containing microsatellite repeats. Forward and reverse sequencing for 384 of these clones resulted in 755 sequences of good quality (PHRED score > 20 over 100 bp [32]). Dinucleotide microsatellite repeat were identified from 181 clone sequences. Analysis of redundancy identified 161 unique sequences. PCR primer design was possible for 128 of the 161 sequences which were assigned locus names using OMM5000 nomenclature (in-house terminology for microsatellite markers derived from ESTs). PCR optimization was successful for 93 of the 128 primer pairs. Testing for polymorphism in OMM5091  276  201-210  178  201-205  221-262  368  223-244  265-283  OMM5092  161  161  186  186  202-208  192-217  --OMM5093  285  285  285  285  285  285  285  285  OMM5099  244  243  228   three reference parents and five doubled haploids resulted in the development of 89 polymorphic microsatellites markers with an average of 4.52 alleles (range 2-7), 40% of which were duplicated as determined by the observance of multiple alleles in clonal lines (see Additional File 1). Cross-amplification in other salmonid species using PCR conditions that were optimized for rainbow trout was determined (Table 1) to be similar to markers from previous publications [33].

Functional annotation
Functional annotations were associated with ESTs by BLAST analyses of the RTGI which previously included EST sequence data for the clones described in this manuscript. The highest scoring matches all had E-values rang-ing from 0 to 10 -40 and percent identities ranging from 91-100 % (see Additional File 2). TIGR gene index annotation for tentative consensus sequences (TCs) includes three levels of significance based on percent identity: matches in the range of 90 to 100% are categorized as "homologues," matches in the range of 70-90% are categorized as "similar," and matches less than 70% are categorized as "weakly similar." Annotation of ESTs in this manuscript resulted in 10 highly significant matches to genome sequences, 8 categorized as homologues, 28 as similar, 9 as weakly similar, and 41 for which no associations were determined Locus or gene symbols from Locus Link [34] or UniProt [35] were added to 8 loci designated as homologues. Genetic and comparative mapping Linkage analyses of 33 informative markers resulted in the assignment of 30 markers to linkage groups (see Additional File 3). Twenty-three markers were informative in the reference families of Sakamoto et al. [3] and 7 markers were placed on the map of Nichols et al. [2] in addition to 3 which were not included into previous linkage groups ( Table 2). Comparisons to zebrafish and fugu databases identified homologous assignments for 16 ESTs each (see Additional File 4 and Additional File 5), however, the chromosomal assignments in these 2 species are not yet available.

Microsatellite marker development
Marker development strategies for the construction of high-density genetic maps typically utilize random or targeted approaches. Random approaches are commonly employed in the early phases of the map construction and are characterized by the use of sequence data not associated with mapping or functional annotation for marker development. In targeted approaches, commonly employed to increase marker density in a specific chromosome region or to map genes of interest, only sequence data meeting specified parameters with respect to mapping or function are utilized for marker development. Our approach for increasing the marker densities of rainbow trout genetic maps was a hybrid of random and targeted approaches. Although clones for marker development were not chosen based on functional annotation, the sequence data utilized were known to be transcribed. The benefit of this approach is that these microsatellites are Type I and II markers [5], serving to increase marker densities on both genetic and comparative maps. Similar strategies have been employed in the development of microsatellite markers for other agriculturally important animals including sheep, turkey, cattle, catfish, and pig [36][37][38][39][40].

Cross amplification within the salmonidae
Salmonids are believed to have diverged from a common tetraploid ancestor some 25 million years ago [22]. As a result of this evolutionarily recent divergence, microsatellite markers can be used in the development of comparative genetic maps among the salmonidae. Cross-species amplification was obtained for 74 markers and ranged between 83% and 97% per species, with observed polymorphism that ranged between 36% and 82% per marker. Sampling additional individuals from multiple populations is likely to increase observations of polymorphism. This high level of cross-amplification and polymorphism should facilitate the development of comparative and genetic maps for the salmonids.

Functional annotation
The RTGI was used to associate ESTs with functional annotation as their sequence data was previously included in RTGI Version 4.0. Unfortunately, 42% of the markers were not associated with any annotation, demonstrating an overall lack of functional annotation of the rainbow trout transcriptome.

Genetic and comparative mapping
The goal of the activities outlined in this manuscript was to identify homologous regions of chromosomes between rainbow trout and species for which there is an abundance of genome information including whole genome sequence. Eight regions of homology were identified between trout and tetraodon, seven with human, and 10 with mouse ( Table 2). Although mapping single loci does not identify segments of conserved synteny, the homologies reported in this paper are supported by the examination of direct comparative information between tetraodon and human and mouse. For instance, OMM5000 was observed to be homologous with TNI 8, HSA19, and MMU7. The NCBI human/mouse comparative map [41] reveals a homologous region between HSA 19 and MMU 7, and the tetraodon comparative map [42,43] reveals regions of homology between TNI8 and both HSA19 and MMU7. Similar analyses of comparative assignments in two or more species supported our findings for every marker reported.

Conclusion
This project was initiated at a time where very little sequence data was publicly available for salmonid species. Now the RTGI contains over 150,000 ESTs which represent ~ 50,000 unique sequences. Current methods to develop new microsatellite markers from EST sequences would most likely replace hybridization with an in silico strategy on the RTGI data set. Therefore, the continuation of microsatellite marker development from expressed sequence tag data is feasible and will be useful for developing comparative maps with other salmonids and with better studied species.

Identification of cDNA clones with microsatellites
A rainbow trout normalized cDNA library was constructed using mRNA from brain, gill, liver, spleen, kidney, and muscle tissues. The library was plated, picked, and arrayed into 384-well plates. Sets of 72 plates were gridded onto single 20 cm 2 positively charged nylon membranes for hybridization experiments. One highdensity membrane (representing 27,648 clones) was hybridized overnight at 65°C with radioactively ( 32 P) labeled (GA) 11 and (GT) 11 oligonucleotide probes using standard protocols [44]. Membranes were removed from hybridization solution, washed, and exposed to storage phosphor screens for 1 hour. The phosphor screens were scanned on a Storm (Amersham Biosciences Corp, Piscataway, NJ) and positive clones identified.

Sequencing and primer design
Positive clones were re-arrayed into 96-well plates and grown overnight. DNA was isolated for each clone using manufacturer's standard miniprep protocols for the BioRobot 8000 (QIAGEN, Valencia, CA, USA). Sequencing reactions were carried out using ABI Dye Terminator Chemistry (Applied Biosystems, Foster City, CA, USA) using SP6 and T7 primers. Sequencing reactions were purified and electrophoresed on an ABI3700. Sequences were trimmed for quality and vector using PHRED and Cross_match [32]. Consensus sequences were constructed for clones having multiple sequence data files. Those containing microsatellites were analyzed for redundancy within the dataset and previously discovered salmonid microsatellites using Vector NTI Suite 6.0 (InforMax, Bethesda, MD). PCR primer pairs were designed to amplify unique microsatellite sequences using Oligo 6.0 [45]. Creek, and Clearwater [46]). Cross-species amplifications were attempted in two samples representing various other salmonids including cutthroat, Sockeye, Kokanee, Chinook, Atlantic salmon, brown trout, brook trout, and Artic char. PCR products were electrophoresed and verified by visualization in 3% agarose gels. PCR reactions were then combined according to label and size. Typical combinations of markers for capillary electrophoresis were made by combining PCR reactions for markers having alleles of at least 100 bp (based on agarose results) difference in size and different fluorescent labels. One microliter of each PCR product was added to 20 microliters of water, of which one microliter was added to 12 microliters of HiDi formamide and 0.5 microliters of ROX standard for genotyping for electrophoresis on an ABI PRISM3700 DNA Analyzer or an ABI PRISM 3100 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Genescan output files were analyzed using Genotyper 3.5 software (Applied Biosystems, Foster City, CA, USA). Markers for which the parents of the reference families were informative were genotyped on the offspring. Markers not informative on the Sakamoto et al. [3] map having been associated with mapping annotation were genotyped on the reference families of Nichols et al. [2].

Annotation
A FASTA file was generated containing clone sequence data for use in standalone BLAST with the goal of obtaining functional and mapping annotation. Functional annotation was associated by comparison to the RTGI Version 4.0 (Appendix 2) [31]. Mapping annotation was obtained by comparisons to sequence data from the Tetraodon Genome Browser [47] and zebrafish, fugu, human and mouse genome sequences from NCBI (Appendices 4 and 5) [48].