Identification of SNPs associated with muscle yield and quality traits using allelic-imbalance analyses of pooled RNA-Seq samples in rainbow trout

Background Coding/functional SNPs change the biological function of a gene and, therefore, could serve as “large-effect” genetic markers. In this study, we used two bioinformatics pipelines, GATK and SAMtools, for discovering coding/functional SNPs with allelic-imbalances associated with total body weight, muscle yield, muscle fat content, shear force, and whiteness. Phenotypic data were collected for approximately 500 fish, representing 98 families (5 fish/family), from a growth-selected line, and the muscle transcriptome was sequenced from 22 families with divergent phenotypes (4 low- versus 4 high-ranked families per trait). Results GATK detected 59,112 putative SNPs; of these SNPs, 4798 showed allelic imbalances (>2.0 as an amplification and <0.5 as loss of heterozygosity). SAMtools detected 87,066 putative SNPs; and of them, 4962 had allelic imbalances between the low- and high-ranked families. Only 1829 SNPs with allelic imbalances were common between the two datasets, indicating significant differences in algorithms. The two datasets contained 7930 non-redundant SNPs of which 4439 mapped to 1498 protein-coding genes (with 6.4% non-synonymous SNPs) and 684 mapped to 295 lncRNAs. Validation of a subset of 92 SNPs revealed 1) 86.7–93.8% success rate in calling polymorphic SNPs and 2) 95.4% consistent matching between DNA and cDNA genotypes indicating a high rate of identifying SNPs with allelic imbalances. In addition, 4.64% SNPs revealed random monoallelic expression. Genome distribution of the SNPs with allelic imbalances exhibited high density for all five traits in several chromosomes, especially chromosome 9, 20 and 28. Most of the SNP-harboring genes were assigned to important growth-related metabolic pathways. Conclusion These results demonstrate utility of RNA-Seq in assessing phenotype-associated allelic imbalances in pooled RNA-Seq samples. The SNPs identified in this study were included in a new SNP-Chip design (available from Affymetrix) for genomic and genetic analyses in rainbow trout. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3992-z) contains supplementary material, which is available to authorized users.


Background
Fish growth rate, muscle yield and fillet quality are major traits affecting profitability of aquatic food animal production. As feed cost is a major factor influencing the profitability, efficiency of growth is important and related to growth rate and muscle yield and composition.
Skeletal muscle constitutes about 50-60% of the fish weight [1]. Given that growth efficiency and fillet firmness and appearance are critical for profitability and production of premium products [2], optimizing fish growth, muscle yield and fillet quality traits is a key objective in aquaculture breeding programs. Traditional phenotype-based selection is typically used to select for fast growth; however, muscle yield and quality traits are difficult to improve by conventional selection because measurement of these traits requires sacrificing the animal [2]. Genomic selection tools have been created to improve economically important traits in plants and livestock. Genetic maps, which characterize the linkage or coinheritance patterns of genetic markers, have been developed for a wide range of species, including fish, with the aim of discovering allelic variation affecting traits; and ultimately identify DNA sequences underlying phenotypes [3,4]. Markers have been identified by various molecular techniques, including numerous and genomewide single nucleotide polymorphisms (SNPs). In addition, recent technological developments have enabled high throughput genotyping of these SNPs rendering them useful for genome-wide association studies [5][6][7][8]. Functional SNPs are generally defined as SNPs from genome sequences that affect structure, expression or function of a gene. These sequences include coding SNPs (e.g. non-synonymous, splicing), promoter and noncoding SNPs, as well as functional elements identified from studying of genome conservation [9]. Coding/ Functional/ SNPs (c/fSNPs) are especially important because they have the potential to change the function of a protein [4,10,11]. In addition, c/fSNP markers, because they are located within expressed genes, they are unlikely to become unlinked from their associated genes due to genetic recombination. Therefore, c/fSNPs can be useful genetic markers for detecting significant associations with phenotypes. Understanding molecular mechanisms of muscle growth and quality can help in making better selection decisions. In terrestrial livestock, several genes, genetic markers and QTLs associated with production traits, including growth, have been characterized using molecular techniques [12,13]. In addition, marker-assisted selection has been used to enhance genetic improvement in livestock breeding programs by direct selection on genes affecting economic traits [14] and to optimize selection for quantitative traits [12,13]. However, the genetic basis of muscle growth and quality traits is not well studied in fish [15].
Rainbow trout is the most cultivated cool and cold freshwater fish in the U.S. [16], and it is considered a model species for studies in several fields of biology, including ecology [17], pathology [18], physiology [19], toxicology [20] and carcinogenesis [21]. Several studies used RNA sequencing to identify markers in human [22,23] and non-model species [11,24,25]. However, most SNP detection algorithms were developed for DNA-Seq analyses and are not optimized/tested for RNA-Seq, especially in pooled samples. The objective of this study was using RNA-Seq analyses of pooled samples to identify c/fSNP markers and develop a resource for studies of marker association with production traits in rainbow trout. First, transcriptome-wide SNP allele frequencies were correlated to phenotypic variations in fish whole body weight (WBW) and muscle yield, fat content, shear force and whiteness. Second, SNPs with allelic imbalance scores (ratios between the allelic frequencies of the highend families and that of the low-end families) were identified. Then, a subset of the putative SNPs was validated for allelic polymorphism and tested for trait association. Finally, genes harboring SNPs with allelic imbalances were annotated to obtain insight into the potential functional effects of the SNPs.

Result and discussion
Phenotypes SNPs were identified in fish families with divergent phenotypes in WBW, muscle yield, fat content, shear force (texture) and whiteness of the fillet. These rainbow trout were from a growth-selected line developed by the NCCCWA breeding program [26]. Briefly, this line was created through artificial selection, starting in 2004, from 7 founder strains with documented diversity and domestication history. Over five generations, the population responded to selection by 9.8-12.7% increase in WBW per generation, and rate of inbreeding averaged 0.86% per generation [26]. In this study population, which was sampled after three generations of selection (hatch year of 2010), WBW was positively correlated with muscle yield and muscle fat content (R 2 = 0.56 and 0.50 respectively, data not shown). Our previous reports showed that fast growth may be genetically associated with improved muscle yield, paler fillets (affected by intramuscular fat content) and firmer texture [27]. The trait heritability estimates for muscle yield, muscle weight, WBW10, WBW13, carcass weight, fat percentage, shear force and fillet color were moderate to high (0.31-0.81) [6,27]. Those moderate to high heritability estimates imply that substantial additive genetic variation exist in the study population for growth and carcass traits.
For RNA sequencing, muscle samples were collected from 7 to 9 different full-sib families showing divergent phenotypes per trait (i.e. 3-5 high ranked families versus 3-5 low ranked families per trait). Five fish were sampled from each family. Divergent phenotypic attributes ( Fig. 1

Identification of putative SNPs
RNA pools from muscle tissues of 5 fish per family were used for RNA-Seq analyses. A total of 259,634,620 reads (100 bp single-end) were generated from 22 families at an average of 11,801,573 reads per family. Reads were aligned against the rainbow trout genome [28] using the STAR [29] alignment tool. Percentage of reads mapped to the genome ranged from 80% to 82% per family.
A total of 204,604 putative SNPs were detected for the five traits using Haplotypecaller tool of Genome Analysis Toolkit v3.3.0 (GATK) [30], with an average of 40,920 SNPs per trait. Using the SAMtools/Popoolation software package [31,32], a total of 304,805 putative SNPs were predicted, with an average of 60,961 SNPs per trait (Table 1). After removing redundant SNPs among all traits, we had 59,112 SNPs from GATK and 87,066 from SAMtools/Popoolation2 with 50,885 shared between the two bioinformatics pipelines (Table 1).
After identifying putative SNPs, an in-house Perl script was used to estimate allelic imbalances of the SNPs in each trait. A total of 6275 SNPs with allelic imbalances were identified from the GATK dataset at cutoff values of >2.0 as an amplification and <0.5 as loss of heterozygosity. In addition, 969 SNPs explicitly existed in only the high or low phenotypic group. After removing redundant SNPs between traits at the two cutoff values, there were 4798 unique SNPs (Table 1). Similarly, SAMtools/Popoolation2 identified 5070 SNPs with allelic imbalances at cutoff values of >2.0 as an amplification and <0.5 as loss of heterozygosity. In addition, 1450 SNPs existed in families at one of the two ends of each trait variation scale but not in the other (Table 1). There were 4962 non-redundant SNPs among the five traits that were identified with SAMtools/ Popoolation2 at the two cutoff values. There were only 1829 non-redundant SNPs shared between GATK and SAMtools/Popoolation2. Differences in variant calling and filtering steps might have caused the observed differences in number of SNPs between GATK and SAMtools/Popoo-lation2. There were 7930 non-redundant SNPs with allelic imbalances from both methods. The results of the SNPs' allelic imbalances should be taken with caution because we could not find a reliable statistical test associated with the ratio calls derived from the allelic imbalance calculation to report statistical significance. However, by utilizing exact allele counts instead of frequencies, we were able to assign Chi Square P-Values to most of the SNPs with For subsequent analyses, we combined SNPs from GATK and SAMtools/Popoolation2 into three different groups: 1) Non-redundant SNPs with allelic imbalances from both methods (7930 SNPs); 2) Common putative SNPs from both methods (50,885 SNPs); 3) Putative non-redundant SNPs from both methods (95,234 SNPs) ( Table 1). All SNPs data are provided in Additional file 1.

SNP validation
A total of 92 putative SNPs including 88 SNPs from the GATK/SAMtools common pool (50,885 SNPs) were selected for SNP validation. Among the 92 putative SNPs, 68 SNPs showed allelic imbalances (Table 2), including 25 SNPs identified by GATK pipeline, 10 SNPs identified by SAMtools pipeline, and 33 SNPs identified by both pipelines ( Table 2). Among the 92 tested SNPs, 72 (78.2%) SNPs were polymorphic, 11(11.9%) SNPs were monomorphic and 9 failed the assay ( Table 2). Failure of the Fluidigm assay can be caused by unsuccessful or non-specific primer binding to the target genomic DNA. Therefore, we cannot assume that a failed assay indicates failure of our bioinformatics pipeline to detect a SNP in the RNA sequence data, and can remove the failed SNP assays from the calculation of SNP validation rate. As 72 out of the 83 working Fluidigm SNP assays were polymorphic, we can claim 86.7% validation rate in detecting polymorphic SNPs in the overall putative SNP pool and 90% validation rate in the GATK/SAMtools shared SNPs pool. This success rate is much higher than what we previously achieved in rainbow trout using RNA-Seq (70%) and genomic reduced representation libraries (48%) [11,33]. The improved success rate in this study is perhaps due to use of a reference genome instead of de novo assembled references used in the previous studies. In addition, a transcriptome sequence coverage of ∼7.4X per fish was used compared to only ∼0.97X in our previous RNA-Seq study [11]. The 90% successful SNP validation rate is comparable to that reported in diploid fish or using genomic RADs and doubled haploid fish in rainbow trout [7,34]. In addition, a recent rainbow trout genome re-sequencing study with at least 10× genome coverage per fish had 86% successful validation rate [7]. Relatively lower success rates in SNP detection were  reported from RNA-Seq studies in rainbow trout due to genome duplication and assembly errors in the genome/ transcriptome references [11,35,36]. Noteworthy and in a separate study, we found variation in gene expression in only 75 genes distributed between all 5 traits (data will be published elsewhere). Therefore, differential gene expression effects on estimating allelic imbalances were negligible as only 75 genes distributed between all five traits were differentially expressed between the high and low families. Minor effects of variation in gene expression on allele frequency estimation accuracy were previously reported [37]. The SNP validation data, albeit small, indicated that the GATK method was more successful in calling polymorphic SNPs with allelic imbalances than the SAMtools pipeline; 87.5% versus 66.7%, respectively. However, combined GATK and SAMtools data had a 93.8% success rate. Success rates between SNPs with and without allelic imbalances were 88.7% and 86.7%, respectively. Importantly and out of 72 validated SNPs, 61 (84.7%) and 58 SNPs (80.5%) were polymorphic in fish from two different commercially important rainbow trout populations in the US, Troutlodge Inc. and Clear Springs Foods Inc., respectively. These results suggest that the SNPs identified in this study are also useful for other commercial rainbow trout populations.
To evaluate ability of the pipeline in calculating allelic imbalances, DNA and cDNA of the 35 fish used for RNA-Seq analyses of high versus low muscle yield were also genotyped. For all 72 validated SNPs, all DNA and cDNA genotypes were consistent except for 4.64% that indicated mono-allele specific gene expression as explained below.

Assessment of mono-allelic gene expression
Out of the 72 validated polymorphic SNPs (Table 2), there were 46 SNPs that showed potential mono-allelic expression in cDNA in at least one fish. In other words, the genomic DNA is heterozygous for the SNP while cDNA is monomorphic. Thirty-three of the 35 fish showed mono-allelic expression in at least one SNP. Out of the aforementioned 46 SNPs, 5 SNPs were randomly selected for validation using Sanger sequencing. All SNPs were heterozygous at the DNA level. However, manual investigation of the cDNA sequence chromatograms exhibited existence of substantial allelic imbalances ranging from existence of two alleles with >2.0 X peak height ratios between the 2 alleles at the SNP base to a complete mono-allelic expression (a single peak).
Overall, approximately 4.64% random mono-allelic/allelic imbalances existed in gene expression of rainbow trout. These data are consistent with a recent study in human stem cells showing that most allelic imbalances did not represent 'on/off' events, but instead revealed biased expression from each allele [38]. None of the 8 tested families in our study showed mono-allelic expression in all individuals specific to a given family, indicating no parental origin effect through genomic imprinting. Likewise, the human stem cell study suggested that most of the allele-biased gene expression is not due to genomic imprinting [38]. Compared to our estimated 4.64% mono-allelic expression, recent studies showed 12-24% random mono-allelic expression in mammals and 7-9% in interspecies catfish [4,[39][40][41]. Our mono-allelic expression assessment is based on only 72 SNPs, and hence a genome-wide assessment of mono-allelic expression in rainbow trout warrants further investigation.

SNP genomic/functional classification
Three sets of SNPs were considered for genomic/functional classifications. For the 7930 SNPs with allelic imbalances, 2898 (37.69%) were intergenic. Of them, 635 (8.01%) and 721 (9.09%) SNPs were located within 5Kb upstream or downstream of protein-coding genes, respectively. The rest of the intergenic SNPs, 1633 (20.59%) were located more than 5Kb distant to proteincoding genes.
In these three SNP datasets, there were large percentages of intergenic (including upstream/downstream) SNPs (37-49%). Approximately 10% intergenic in addition to 30% non-coding SNPs were reported in humans from RNA-Seq data [42]. Our high percentages of intergenic SNPs may be partially explained by the incomplete annotation of protein coding genes and exons in the current version of the rainbow trout reference genome sequence [28].

Distribution and density of SNPs in the genome
Chromosome density distribution of the SNPs with allelic imbalances exhibited high density for all five traits in several chromosomes with the three highest peaks in chromosomes 9, 20 and 28 (Fig. 2a). All five traits revealed very similar pattern of distribution with a single exception; shear force exhibited a relative higher density than the other traits on chromosome 9. The similarity in density distribution between traits may be explained at least in part by the positive correlation that we observed between the phenotypes in this population. WBW and thermal growth coefficient were used as selection criterion in this population [11,26], and we found that WBW as an independent variable has significant effects on muscle yield and fat percentage (multivariable regression analysis [P < 0.01], data not shown). However, despite the similarity in SNP density distributions, most of the identified SNPs were unique to each trait. From the 7930 SNPs with allelic imbalances, only 27 were shared by all five traits, 161 were shared by four traits, 680 were shared by three traits and 1783 were shared by two traits. In agreement with our results, a recent GWAS study identified two windows with effect on fillet yield located on chromosome 9 and explaining 1.0-1.5% of genetic variance in the same fish population [6].
As can be expected, the number of SNPs with allelic imbalances per chromosome was strongly correlated with chromosome length (Fig. 2b). In general, numbered unknown chromosomes, which are longer in the current reference genome [28], had more SNPs compared to the known chromosomes (Fig. 2b). Chromosome "Unknown" (1.1 Gb of scaffolds not assigned to chromosomes) had 4086 (49.05%) SNPs (not shown in Fig. 2b). Previous genetic mapping reports showed that the growth-related SNPs/QTL are distributed over~20 chromosomes [11,43,44]. Together with our data, these reports confirm the polygenetic nature of growth/muscle related traits in rainbow trout.

SNP functional annotation
Functional annotation of genes harboring SNPs with allelic imbalances were performed using the Blast2GO suite [45]. The SNP-flanking sequences were searched against the NCBI nr-protein database using BLASTx; then, associated genes and Gene Ontology (GO) terms were acquired. In the biological processes category, SNP-harboring genes were associated with various cellular processes mainly involved in growth-related mechanisms, including regulation of metabolic and oxidation- reduction processes and protein translation (Fig. 3). In the molecular function category, SNP-containing genes were associated with binding metal ions, ATP, nucleic acid, and actin. In addition, a significant number of the genes were associated with transferase, motor, oxidoreductase, and structural molecule activities (Fig. 3). In the cellular component category, many of the genes exhibited association with the cytoplasmic compartment, membranes, myosin complex, and extracellular region compartment (Fig. 3). Genes with similar GO associated terms were previously reported to be involved in rainbow trout muscle growth and quality [11,19,43,[46][47][48]. Additionally, KEGG pathway mapping was used to assign enzyme function to the SNP-containing transcripts [49]. Searching transcripts against the KEGG database yielded 1043 transcripts (13.15%) with significant KEGG hits to 632 KEGG Orthologies (KOs) belonging to different pathways ( KOs), and cofactors and vitamins metabolism (14 transcripts, 11 KOs). These preliminary SNP functional annotations are in agreement with previous reports that showed strong association between 1) mutations and altered expression of glycolytic and oxidative phosphorylation enzymes and 2) rainbow trout growth and muscle degeneration [11,19,43,46,47].
In addition, 176 KEGG annotated sequences were assigned to the genetic information processing category  (Table 4). A significant number of the SNP-harboring genes matched ribosomal (68 sequences, 48 KOs) and RNAtransport proteins (22 sequences, 12 KOs). Previously, we showed that the atrophying muscle and muscle from fast versus slow growing rainbow trout had differentially expressed genes involved in RNA processing, protein synthesis, posttranslational modification, and intracellular protein trafficking [19,43,46]. Moreover, 166 sequences (99 KOs) were classified by KEGG mapping into the environmental information processing category; these sequences were further assigned to signal transduction (147 sequences, 87 KOs) and signaling and interaction molecules (19 sequences, 12 KOs) ( Table 4). The PI3K-Akt signaling, Calcium signaling, MAPK signaling, and cGMP-PKG signaling pathways had the largest numbers of hits: 21, 18, 18, and 16 KOs, respectively. Previous studies indicated involvement of MAPK and Calcium signaling in fish/muscle growth [46,50].
Furthermore, the cellular processes category contained 152 KEGG-annotated sequences matching 85 KOs, which were further classified into cellular community  (Table 4). In the organismal systems category, the most significant subcategories were endocrine (105 transcripts, 53 KOs), circulatory (49 transcripts, 30 KOs), immune (44 transcripts, 28 KOs), and digestive systems (32 transcripts, 16 KOs). Recently, a GWAS study using the same fish population identified a small number of genes involved in muscle development explaining~1.0% of the total genetic variance of the muscle yield and growth rate [6].
Distributions of KEGG matches were generally similar among all five traits. Albeit, we noticed an increased number of hits related to fillet whiteness compared to other traits, for carbohydrate metabolism (47 transcripts, 28 KOs) and amino acid metabolism (32 transcripts, 26 KOs) (Table 4). Similarly, there was a noticeable increase in numbers of hits in whiteness for PI3K-Akt signaling, focal adhesion, gap junction and regulation of actin cytoskeleton ( Table 4). Regulation of focal adhesion and actin cytoskeleton were associated with development of pale, soft, and exudative (PSE) meat in turkey [51]. In addition, the muscle yield trait exhibited an increased number of transcripts for energy metabolism, with 28     Thyroid hormone synthesis     (Table 4). Our KEGG pathway mapping results have linked many of the genes harboring SNPs with allelic imbalances to potential regulation of growth and metabolic pathways, which may support pathway-based GWAS analyses in rainbow trout, similar to what has been recently applied to detect genetic pathways explaining live weight and muscle growth variation in cattle genotypes [52].

Fish population, sampling and sequencing
Phenotypic data and muscle samples were collected from~500 fish representing 98 families (5 fish/family) from the growth-selected line at NCCCWA (year class 2010) as previously described [6,11,26]. Families were produced and reared until~13 months post-hatch as described in reference [26]. Briefly, full-sib families were produced from single-sire × single-dam matings. Eggs were reared in spring water, and water temperatures were manipulated between approximately 7 and 13°C to synchronize hatch times. Each family was stocked separately in 200-L tanks at a density of approximately 600 alevins/tank. Fish were randomly culled every month to maintain stocking densities <50 kg/m3. At about 5months old, fish were anesthetized using 100 mg/L of tricaine methanesulfonate (Tricaine-S, Western Chemical, Ferndale, WA) and uniquely tagged by inserting a passive integrated transponder (Avid Identification Systems Inc., Norco, CA) into the dorsal musculature, and tagged fish were combined and reared in 1000-L communal tanks. Fish were fed a commercial fishmealbased diet (42% protein, 16% fat; Ziegler Bros Inc., Gardners, PA) using automatic feeders (Arvotec, Huutokoski, Finland). Initially, young fish were fed at a daily feeding rate ∼ 2.5% of body weight (BW), which later was gradually reduced to approximately 0.75% of BW.
Fish were sampled as previously described for year class 2010 in Gonzalez-Pena et al., publication [6]. Briefly, WBW was measure in fish belonging to 98 families and families were sorted according to their WBW. The 2nd or 3rd fish from each family was selected for muscle sampling to keep the distribution of WBW consistently adjusted around the median of each family. Selected fish were randomly assigned to one of five harvest groups (~100 fish each) allowing one fish per family per harvest group. The five groups were sampled in five consecutive weeks (one group/week). Fish were samples at about~13-months old (410-437 days post-hatch, mean body weight = 985 g; SD = 239 g). At harvest, fish were anesthetized in approximately 100 mg/L of tricaine methane sulfonate (Tricaine-S, Western Chemical, Ferndale, WA).
At harvest, a muscle sample was excised from the left dorsal musculature and frozen in liquid nitrogen for subsequent RNA sequencing. Fish were slaughtered, and eviscerated then head-on gutted carcasses were packed in ice, transported to the West Virginia University Meats Processing Laboratory (Morgantown, WV), and stored overnight. The next day, carcasses were hand-processed into trimmed, skinless fillets by a trained faculty member and weighed. Muscle yield and quality analyses were conducted as previously described [53]. Briefly, muscle yield was calculated as a percent of muscle weight relative to WBW. A 40 × 80 mm muscle section was separated, parallel to the long axis of the body, from the dorsal musculature for texture analysis [54]. The remaining muscle from the fillets was pulverized with liquid nitrogen in a Waring Blender (Waring, New Hartford, CT) and kept at −25°C for chemical composition analyses. Proximate composition of muscle was determined using AOAC [55] approved methods. Crude fat was analyzed using the Soxhlet solvent extractor with petroleum ether. Texture of fillet sections was determined using a five-blade, Allo-Kramer shear cell attached to a Texture Analyzer (Model TA-HDi®; Texture Technologies Corp., Scarsdale, NY), equipped with a 50kg load cell and at a crosshead speed of 127 mm/min. Force-deformation graphs were recorded and analyzed using the Texture Expert Exceed software (version 2.60; Stable Micro Systems Ltd., Surrey, U.K.). Peak shear force (g/g sample) was recorded.
Fresh fillet surface color was measured with a Chroma meter (Minolta, Model CR-300; Minolta Camera Co., Osaka, Japan) calibrated using a standard white plate No. 21333180 (CIE Y 93.1; × 0.3161; y 0.3326). L* (lightness), a* (redness), and b* (yellowness) values were recorded at three locations above the lateral line along the long axis of the right fillet, and these values were used to calculate a fillet whiteness index according to the following equation: For RNA-Seq analyses, out of 98 families measured for phenotypic data, eight families (5 fish each) showing opposite phenotypes for each of the 5 traits were analyzed (4 high ranked families versus 4 low ranked families on average for each trait). Since some fish families were common between the traits, the total number of selected families for RNA-Seq was 22 families. Total RNA was isolated from each fish muscle sample using TRIzol™ (Invitrogen, Carlsbad, CA). Equal masses of total RNA from 5 samples of each family were pooled and used for RNA-Seq sequencing. cDNA libraries were prepared and sequenced on an Illumina HiSeq (single-end, 100 bp read length) using multiplexing standard protocols as previously described [56]. Briefly, mRNA was selected from one microgram of high quality total RNA. Firststrand synthesis was synthesized with a random hexamer and SuperScript II (Life Technologies). Double stranded DNA was blunt-ended, 3′-end A-tailed and ligated to indexed adaptors. The adaptor-ligated double-stranded cDNA was amplified by PCR for 10 cycles with the Kapa HiFi polymerase (Kapa Biosystems, Woburn, MA) to reduce the likeliness of multiple identical reads due to preferential amplification. The final libraries were quantitated Qubit (Life Technologies, Grand Island, NY) and the average size was determined on an Agilent bioanalyzer DNA7500 DNA chip (Agilent Technologies, Wilmington, DE), diluted to 10 nM and the indexed libraries were pooled in equimolar concentration before sequencing.

SNP detections using SAMtools/Popoolation2
For each trait (WBW, muscle yield, muscle fat content, shear force, and whiteness), sequence reads from each family were aligned to the rainbow trout genome using STAR [29]. After read alignment, the SAMtools view/ sort and mpileup functions were used within the Popoo-lation2 package (version 1.201) to determine the genotype for each variant and calculate allele frequencies [57,58]. Initial SNPs were considered at minimum reads >10 and minor allele count >4 and MAF > 0.05. Putative SNPs associated with each trait were determined by calculating SNP allelic imbalance scores as previously described [11,59]. A SNP allelic imbalance score was determined by assessing the ratio of [frequency of allele A/frequency of allele B in high-end families]/[frequency of allele A/frequency of allele B in the corresponding low-end families]. The allelic imbalances score ranges from zero to infinity. SNPs with allelic imbalance were called if the ratio is more than or equal 2.0 (as an amplification) or less than or equal 0.5 (as loss of heterozygosity. The phase of the alleles could not be determined for families surveyed since the parental genotypes were not known for most of the fish. Allele counts in the divergent families were extracted from the VCF files. Chisquare test of two-by-two Tables [60] was performed with p-value <0.05 to determine if SNPs that are showing allelic imbalances are statistically significant.

SNP detection using GATK tools
For the GATK pipeline [61], reads from each sample were aligned to the rainbow trout genome using STAR [29] as recommended by the GATK practice. Picard tools were used to sort the SAM files and to mark duplicates, a step used by GATK to reduce a false positive due to error in duplicate that could be falsely detected as a SNP. The following steps were performed according to GATK pipeline for RNA-Seq (Split and trim to reassign mapping quality, Indel realignment, local realignment around Indel in order to clean up any mapping artifacts and Base Quality Score Recalibration). After data preparation, variants were called using Haploty-peCaller followed by hard-filtering using the following parameters: Qual By Depth (QD) 2.0, FisherStrand (FS) 60.0: RMS Mapping Quality (MQ) 40.0, MAF > 0.05. Since GATK was not optimized to calculate allelic imbalances in RNA-Seq data, putative SNPs identified in each family were analyzed using an in-house Perl script to determine the allelic imbalances applying the criteria that we used in the SAMtools/Popoolation2 method.

SNP validation
Flanking sequences (up to 250 bp on each side) of putative SNPs were extracted from the reference genome [28]. Some SNPs were removed from SNP assay design because either a sequence gap was located less than 60 bp from the SNP site or a non-target SNP was located less than 30 bp away from the target SNP. A total of 92 SNP assays were developed and evaluated with 282 DNA or cDNA samples. These included 85 DNA samples derived from 19 full-sib families used for RNA-Seq and their parents (38 DNA samples), DNA samples of 2 full-sib mapping families (2 parents and 19 offspring per family), 64 DNA samples from two commercial populations (Troutlodge Inc. and Clear Springs Foods Inc.) and 35 cDNA samples derived from the RNA samples used for RNA-Seq high versus low muscle yield. The SNP genotyping was performed following the instructions of the Fluidigm genotyping user guide. Briefly, DNA and cDNA samples were pre-amplified, diluted and used for genotyping with 96.96 Dynamic Array IFCs (Integrated Fluidic Circuits). The arrays were read using EP1 system, and genotypes were called automatically using Fluidigm SNP genotyping analysis software 4.1 with a confidence threshold of 85. The genotype clusters were examined for each assay and any wrong calls or no calls were corrected manually. The program Pedcheck [62] was used to identify genotypes inconsistent with Mendelian inheritance between parents and offspring. Chi-square goodness of fit tests were performed to identify SNPs with significant segregation distortion (P < 0.01) in the two mapping families. Those SNPs were reported as assay-failed SNPs.
For the Sanger sequencing validation of the SNPs showing potential mon-allelic gene expression, flanking sequences (up to 250 bp on each side) of each SNP were PCR amplified from DNAs and cDNA from the same 35 fish samples that were used for RNA-Seq high versus low muscle analyses. PCR amplicons were Sanger sequenced and manually inspected for consistency between DNA and cDNA genotypes or mono-allele specific gene expression as explained in the results section.