International genome initiatives have resulted in draft sequences of the genome of several farm animals (cattle, pig, chicken, and horse) and of model fish species (zebrafish (Danio rerio), medaka (Oryzias latipes), stickleback (Gasterosteus aculeatus), takifugu (Takifugu rubripes), and tetraodon (Tetraodon nigroviridis)). Whole genome sequencing are currently underway for a number of aquaculture species: rainbow trout (Oncorhynchus mykiss), Atlantic salmon (Salmo salar), Nile tilapia (Oreochromis niloticus), Asian seabass (Lates calcarifer), European seabass (Dicentrarchus labrax), channel catfish (Ictalurus punctatus) and common carp (Cyprinus carpio). At the same time, high-throughput genomic tools have been developed, improving the description of genomic structure and function.
Projects associated with genome sequencing activities using different breeds from the same species have provided the opportunity to discover hundreds of thousands of potential single-base changes, also known as single nucleotide polymorphisms (SNPs) or short insertion/deletion mutations (indels). The bi-allelic nature of SNPs makes them less informative than microsatellites. Nevertheless, SNPs are considered as a highly reliable and valuable molecular marker system for genotyping and selective breeding because of their omnipresence throughout the entire genome, both within gene coding and non-coding regions.
SNPs in gene coding sequences can be either synonymous (silent polymorphism) or non-synonymous (replacement polymorphism). They are of particular interest to study the genetics of expressed genes and to map functional traits. Synonymous SNPs may alter RNA secondary structures and can affect protein conformation and function
. Non-synonymous SNPs can potentially have deleterious functional effects because they lead to changes in amino acid sequences and possibly affect protein structure and function
SNPs in non-coding regions can occur in introns, promoters, intergenic sequences, and in 5'- or 3'-untranslated regions. They may alter gene expression by affecting gene splicing, transcription factor binding, mRNA degradation, or non-coding RNA sequences.
Over the last decades, large-scale SNP production initiatives have been associated with the development of high-throughput genotyping technologies that facilitate the simultaneous analysis of hundreds of thousands of SNPs. These low-cost but highly reliable assays have permitted fine-scale gene mapping and candidate gene association studies for complex traits in several species such as humans
 and sheep
In species whose complete genome sequences are not yet accessible, the increasing availability of expressed sequence tags (ESTs) represents an alternative in silico strategy for de novo SNP identification. This approach does not require any additional bench work, offers a low cost source of SNPs, and has been recently used in a few aquaculture species such as blue and channel catfish species
, and salmonids
[10–13]. Moreover, EST-derived SNPs are considered as gene-derived SNPs since they are located within gene coding and 3′-UTR regions and they can lead to the identification of quantitative trait nucleotides (QTN)
However, the usefulness of EST-derived SNPs remains putative until their true informativity (sequence polymorphism) and duplication status have been checked with genomic DNA in the populations of interest. Although it is possible to use base quality values to discern true allelic variations from sequencing errors, validation is a key step for detection of true SNPs
. This is generally carried out by genotyping several population samples with a subset of the EST-derived SNPs
Rainbow trout is the most widely cultivated cold freshwater fish in the world. It has great potential for aquaculture and recreational sport fisheries. In addition to its commercial interest, rainbow trout is also a model species for a wide range of genome-related research activities
The rainbow trout haploid genome size was estimated to be between 2.4 and 3.0 × 109 bp
[17, 18]. A common ancestor of rainbow trout and other salmonids has undergone a fourth whole-genome duplication (4R WGD) event about 25 to 100 million years ago, which was followed by a period of re-diploidization resulting in a semi-tetraploid state
. It has been estimated that up to half of the loci are still duplicated
. Although the tetraploidization event increases the genome complexity, it also makes the salmonids an attractive model to study the mechanisms behind the whole-genome duplication event and the subsequent reduction of one of the two copies of the duplicated gene(s).
Both the interest brought into rainbow trout as a research model and the need for its genetic improvement for aquaculture production efficiency and product quality led to the development of several genomic resources for this species. Meanwhile, great efforts have been and are still devoted to the development of SNP genetic markers
Previous efforts using reduced representation libraries
 and reference transcriptome datasets
 resulted in the production of up to 47,000 and 58,000 putative SNPs, respectively. A subset of 384 randomly selected SNPs were genotyped on individual fish and 184 (48 %) were validated
. The observed low validation rate could be partly explained by the presence of paralogous sequences with allelic variation which resulted in the production of false positive SNPs.
Finally, these putative SNPs were not yet publicly available. Therefore, EST-derived SNPs could represent an alternative and complementary in silico approach to assess the quality and to validate larger numbers of SNPs. These resources will add to the already available 184 SNPs validated from the reduced representation libraries study.
Miller and co-authors
 have also used the RAD (Restriction site Associated DNA) sequencing technology for low density SNP genotyping and reported the construction of a high-resolution linkage map containing 4,563 markers. However, the flanking sequences for these SNPs were only 68 nucleotides long and thus may not be suitable for the design of high-throughput genotyping assays, such as the Illumina assays. Retrieving longer flanking sequences suitable for high-throughput genotyping studies using these RAD-associated markers will need additional information on the whole genome sequence. Efforts are in progress in France and USA
[25, 26] to provide a rainbow trout reference genome sequence in the near future. Nevertheless, in both cases, aiming at facilitating the assembly step, the sequencing was performed using a doubled haploid homozygous DNA sample which hinders the identification of new SNPs.
Mining EST datasets remains an attractive alternative approach for in silico SNP identification in rainbow trout. Up to 31,121 in silico EST-derived SNPs are currently available at the INRA Sigenae database (
http://www.sigenae.org/). However, they do not provide any information neither on their true informativity nor on their duplication status. Therefore, it is necessary to validate the status of these markers. Validation of rainbow trout EST-derived SNPs in a large number of populations will not only allow to identify fully informative true SNPs but also will highlight the proportion of informative SNPs shared across different populations, a crucial information to efficiently design future rainbow trout specific SNP chips. These new tools will contribute to studies on population genetics and will facilitate quantitative trait loci (QTL) identification, and marker assisted selection.
In the present study, a panel of 1,152 EST-derived SNPs was selected from the Sigenae SNP database and were subsequently assayed for allelic variation in several rainbow trout population samples using the Illumina GoldenGate assays. Successfully validated EST-derived SNPs were used to analyse the genetic diversity in three bisexually reproducing experimental stocks and a collection of doubled haploid (DH) clones and to update the INRA linkage map by integrating 223 new markers.