Genomic Predictions for Muscle Yield and Fillet Firmness in Rainbow Trout using Reduced-Density SNP Panels

Background One of the most important goals for the rainbow trout aquaculture industry is to improve muscle yield and llet quality. Previously, we showed that a 50K transcribed-SNP chip can be used to detect quantitative trait loci (QTL) associated with muscle yield and llet rmness. In this study, data from 1,568 sh genotyped for the 50K transcribed-SNP chip and ~774 sh phenotyped for muscle yield and llet rmness were used in a single-step genomic BLUP (ssGBLUP) model to compute the genomic estimated breeding values (GEBV). In addition, pedigree-based best linear unbiased prediction (PBLUP) was used to calculate traditional, family-based estimated breeding values (EBV). Results The genomic predictions outperformed the traditional EBV by 35% for muscle yield and 42% for llet rmness. The predictive ability for muscle yield and llet rmness was 0.19 - 0.20 with PBLUP, and 0.27 with ssGBLUP. Additionally, reducing SNP panel densities indicated that using 500 – 800 SNPs in genomic predictions still provides predictive abilities higher than PBLUP. Conclusion These results suggest that genomic evaluation is a feasible strategy to identify and select sh with superior genetic merit within rainbow trout families, even with low-density SNP panels.


Phenotypes and heritability estimates
The total numbers of phenotyped sh for muscle yield and shear force were 775 and 772, respectively, and varied per year-class (YC) with 471 sh in YC 2012, and 304 sh in YC 2010. Descriptive statistics of the data are provided in Table 1. A slightly higher muscle yield and shear force were observed for sh from the YC 2012 compared to that from YC 2010. A near-zero phenotypic correlation (R 2 = 0.02) was observed between muscle yield and shear force in this study.
A total of 1,572 genotyped sh passed the quality check (QC) and were available for this study (four samples were removed due to sh ID con icts). The total number of genotyped individuals from YC 2010 and 2012 were 400 and 558, respectively (Table 1). In addition, 380 sh were genotyped before YC 2010, whereas 234 were genotyped after YC 2012. No phenotypic data were available for sh produced before YC 2010 or after YC 2012 ( Table 1). All sh had complete pedigree information dating back to YC 2002. In this study, the heritability estimates were calculated based on all data from YC 2010 and YC 2012 using PBLUP and ssGBLUP models and are shown in Table 2. The estimated heritability values using both methods were higher for llet rmness than for muscle yield ( Table 2). Other studies reported a slightly higher heritability (0.34-0.36) of muscle yield in other rainbow trout populations [12,36]. The estimated heritability values indicate a moderate additive genetic component for these traits. Garcia-Ruiz et al. (2016) showed that genomic information can help to increase accuracy even for lowly heritable traits. Additionally, increased accuracies in a trait with h 2 = 0.14 was reported in Atlantic salmon using GBLUP [37], which provides evidence for the feasibility of GS in the current National Center of Cool and Cold Water Aquaculture (NCCCWA) sh population.  Genomic predictions using 50K SNP panel The genetic architecture of muscle yield and rmness has been recently described using a 50K SNP chip, which revealed a polygenic nature of the traits [13,19]. Genomic information has the potential to increase the ability to predict future performance for several traits, including the lethally-measured ones [23].
Therefore, this study expands the utility of the 50K SNP chip in GS models as well as investigates the impact of the SNP density to accurately predict future sh performance for muscle yield and rmness in a rainbow trout genetic line produced at NCCCWA.
The predictive ability of the genomic model was evaluated using a ve-fold cross-validation scheme. Using ve replicates helps to minimize errors that could be generated due to a single-fold sampling [26]. Table 2 shows the average predictive ability for muscle yield and rmness under the ve-fold crossvalidation strategy. For both traits, using genomic information through ssGBLUP enhanced the ability to predict sh performance relative to PBLUP. Genomic information has a greater impact on the predictive ability of lethally-measured traits than traits that can be directly measured on sh [12,22]. In this study, genomic information increased predictive ability by 35% and 42%, compared to PBLUP, for muscle yield and rmness, respectively. Gonzalez-Pena et al. [12] also reported higher accuracy of GEBV compared to EBV; 0.55 and 0.13, respectively. Similar gains through GS have been achieved in other aquaculture species. For instance, a 29% increase in predictive ability, relative to PBLUP, was reported for residual carcass weight in channel cat sh [22].
Variable gains of accuracy using genomic information were reported in Atlantic salmon for lice resistance (52%), llet color (22%) [37], and growth traits (20%) [29]. The considerable variation in the relative improvement of accuracy between the traits is consistent with the difference in predictive ability between muscle yield and rmness in the current study. Even though high predictive abilities have been achieved by applying genomic information in this study, using a higher number of sh in the training population would help to achieve higher prediction accuracies. Low predictive abilities (0.46-0.49) for BCWD phenotypes in trout were observed using ssGBLUP compared to PBLUP (0.50) when small training sample size (n = 583) was used [38]. Incorporation of more genotyped and phenotyped sh in the analysis improved the accuracy of the GEBVs by ~ 80% [25].
The regression coe cients (b1), representing in ation for muscle yield and rmness, are shown in Table 2. For llet rmness, GEBV was less in ated than EBV. These results are consistent with results reported for harvest weight and residual carcass weight in cat sh [22] and BCWD resistance in rainbow trout [25], showing that the genomic information provides not only more accurate but also less biased evaluations. In this study, variance components estimated from the whole dataset were used. Using the same variance components for a ve-fold cross-validation strategy yielded less in ated GEBV for muscle yield, close to 1 (b1 = 0.97), than that of the harvest weight (b1 = 0.92) and residual carcass weight (b1 = 0.91) in cat sh [22], and BCWD survival STATUS in rainbow trout (b1 = 0.86) [25]. On the other hand, breeding values for llet rmness were more in ated (b1 = 0.88) than those for muscle yield. Updating the variance components for the training datasets in cat sh have been suggested to reduce the in ation of the genomic evaluations [22].
Fish used in this study were sampled from a genetic line selected for a high growth rate. Previously, when muscle yield and shear force phenotypes were regressed on body weight, coe cient of determination (R 2 ) values of 0.56 and 0.01 were observed, respectively [39]. Selection for growth in this population might have less representation of sh with low-ranked phenotype for muscle yield in the population. Fish with phenotypes used in the study were sampled in a manner that captures between-and within-family variation for growth performance, although. In an admixed population of Atlantic salmon, sh families produced from selection lines showing extreme phenotypes for lice resistance were over-represented among the genotyped sh leading to in ation in the between-family variation and less reliability of the GS model [37].

Linkage disequilibrium and effective population size
Long-range linkage disequilibrium (LD) and small effective population size (Ne) provided evidence of the possibility to reduce the marker density needed for GS in cat sh and rainbow trout selectively-bred for BCWD resistance [22,24]. When the historical LD is weak, the accuracy of genomic prediction decreases [40]. In addition, higher genomic prediction accuracies are associated with small Ne [22,41,42]. LD and Ne were estimated for the rainbow trout population used in this study ( Table 3). The mean LD per chromosome ranged from 0.21 to 0.34, with an overall genome-wide average r 2 of 0.26. Twenty-two chromosomes showed a mean r 2 ≥ 0.25. The LD (r 2 = 0.26) in the current sh population was consistent with LD (r 2 = 0.27) reported in a Troutlodge, Inc., May-spawning, odd-year line [24]. Chromosomes 5 and 23 had the highest mean LD estimate (r 2 = 0.34) followed by chromosome 1 (r 2 = 0.32). Conversely, chromosomes 6 (r 2 = 0.21) and 19 (r 2 = 0.22) had the lowest mean LD estimates. LD decay with distance was estimated for all the rainbow trout chromosomes using only high-quality anchored SNPs (Fig. 1). The LD decay plots show a long-range LD in all chromosomes. Interestingly, the strongest LD appeared on chromosome 5 and extended over 20 Mb. In agreement with our results, Vallejo et al. [24] identi ed long-range LD spanning over 1 Mb in all chromosomes and over 25 Mb on chromosome 5 in a commercial rainbow trout population. This strong LD on chromosome 5 was likely due to a large chromosomal double inversion, which prevents recombination in heterozygous sh [24,43]. Also, recent admixture events [37] and recombination interference on the male chromosome were shown to contribute to long-range LD in rainbow trout and other salmonids [12]. The population structure analysis suggested nine genetically different groups in the current population (data not shown), thus supporting population admixture as a contributor to the long-range LD observed in this study. In contrast, a Tasmanian Atlantic salmon population that was originated from a single founder strain showed short-range LD [44].
The mean estimated effective population size based on LD was Ne = 113 (Table 3). An average Ne of 145 was reported in the early generations of the NCCCWA rainbow trout selective breeding program and was expected to decline with selection [45]. In a commercial rainbow trout population, an Me = 20,026 was reported [24]. Overall, our results suggest the feasibility of reducing the SNP panel size, which can reduce the genotyping cost. In the next section, further investigation was conducted to determine the impact of reducing the SNP panel size on the predictive ability of GEBV in the current rainbow trout population. There is an interest from the aquaculture industry to implement GS in breeding programs. However, genome-wide genotyping using high-density SNP chips may be cost-prohibitive, especially for small and mid-size hatcheries and companies; therefore, a cost-e cient genotyping is necessary.
In the current study, two approaches have been adopted to evaluate the effect of reducing the density of the SNP panel on the genomic predictive ability for muscle yield and rmness.

Reducing the SNP panel density based on LD
In the rst approach, LD-based SNP pruning was used to reduce the density of SNP panel throughout the genome [48]. Five LD values were used to prune SNPs (r 2 > 0.7, > 0.4, > 0.1, > 0.05, and > 0.01). The predictive ability for muscle yield, based on LD, was reasonably well maintained with the reduction of the SNP panel density down to ~ 11K SNPs (Table 4). However, further reduction of the marker density, below 11K SNPs, led to lower predictive abilities than those estimated using the PBLUP model (0.20). Similar results have been reported in other aquaculture populations/species. For instance, a moderate decrease in genomic accuracy was reported when reducing the SNP density down to 10K SNPs in a trout population selected for BCWD resistance [24]. Also, increasing SNP density above ~ 10K SNPs in Atlantic salmon showed no improvement in genomic accuracy for growth traits [29]. This can be related to the theory of dimensionality of genomic information, where the number of animals and SNP needed to achieve a reasonable level of accuracy approaches 4NeL [49,50], which was 14,600 in our study.
In the case of llet rmness, a reduction of the SNP density down to 1,153K SNPs yielded the same predictive ability (0.27) as the full 50K SNP panel; however, GEBV were more in ated. Interestingly, further reduction of the SNP density down to 497 SNPs yielded predictive abilities (0.24) that, although were lower than the 50K panel, still outperformed the traditional PBLUP model (0.19) (Table 5). Recently, gains of accuracy relative to traditional PBLUP, using 500 SNP panel, were reported for BCWD in rainbow trout. Such gains in accuracy with a signi cantly reduced SNP panel was attributed to the long-range LD in the studied population [24].  Reducing SNP density based on percentage of additive genetic variance explained by SNPs The second approach involved reducing the density of the panel based on the percentage of additive genetic variance explained by SNPs for each trait, which were already determined in our previous publications using weighted single-step genomic best linear unbiased prediction (wssGBLUP) analysis [13,19]. Reduced SNP panels with the percentage of additive genetic variance between > 0.05% and > 1.8% were used to evaluate the predictive ability using the ve-fold cross-validation strategy. Tables 6 and  7 show the predictive ability for muscle yield and rmness using those reduced SNP panels. For muscle yield, reduction of the SNP marker density down to ~ 9K SNPs yielded the same predictive ability (0.27) that was obtained using the 50K panel. Vallejo et al. [24] reported that SNP panel densities less than 20K SNPs are suitable for GS in rainbow trout. Simulation studies in plants and livestock showed that marker Loading [MathJax]/jax/output/CommonHTML/jax.js densities higher than ~ 10K SNPs had little or no improvement in genomic accuracy [51,52]. Interestingly, further reduction of the number of SNPs down to ~ 800 SNPs caused a minimal decrease in the predictive ability for muscle yield (0.26). For llet rmness, reducing the number of SNPs down to 10K SNPs had a small increase in predictive ability (0.30) compared to the 50K SNP panel (0.27). Interestingly, the in ation of GEBV greatly improved for rmness without a signi cant change in muscle yield. For both traits, the predictive ability of GEBV for all SNP panels down to ~ 800 SNPs was higher than those estimated using the pedigree-based model. Overall, prioritizing SNPs based on their percentage of additive genetic variance explained allowed a high reduction of the SNP density, down to ~ 800 SNPs, while maintaining the potential to enhance the accuracy of genomic predictions in this population, but with a trade-off of increasing in ation. Altogether, reducing the SNP density to about 10,000 SNPs would likely help to reduce the genotyping cost of implementing GS in this population with no loss in predictive ability and no bias. A further reduction to around 1000 SNP would have a small impact on predictive ability, but GEBV may have an increased level of bias depending on which method was used to reduce SNP density.

Conclusions
This study reveals the impact of using genomic information on progressing the rainbow trout breeding programs for muscle yield and rmness. Using genomic information improves the ability to predict future performance and reduces the in ation of breeding values. The long-range LD detected in this study, partially due to population admixture, is likely contributing to the high genomic predictive ability in case of the reduced density SNP panels. Reducing the SNP panel density to approximately 10,000 SNPs is a feasible strategy to help to reduce the cost of implementing GS in rainbow trout.

Fish population and phenotypes
The sh population and sample collection were previously described in detail [53]. The selective breeding program has been established in 2004 and underwent ve generations of selection for body weight.
Details on how the population was formed and selected for growth can be found in [8,54]. Phenotypic records on muscle yield and shear force were collected from the third and fourth generations, comprising For phenotyping, sh were harvested over ve consecutive weeks (one sh/family/week). The YC 2010 sh were 410 to 437 days old at harvest with a mean body weight of 985 ± 239 (g). Fish from the YC 2012 were 446 to 481 days old, with a mean body weight of 1,803 ± 305 (g). Head-on gutted carcasses were manually processed into skinless llet. The trimmed llet was weighed, and muscle yield was calculated as a percent of whole-body weight. The sh (YC 2010 & YC 2012) had an average muscle yield of 48.91 ± 2.42 (%). The shear force of a cooked llet section was used to measure the llet rmness as fully described in a previous publication [55].
Genotyping data and quality control check In the current study, we used a 50K gene-transcribed SNP chip that has recently been developed for rainbow trout [19]. A total of 1,728 sh were genotyped as previously described [13,19]. Affymetrix SNPolisher software was used at the default parameters to perform quality control and lter out samples that did not meet the threshold ltration criteria. In addition, about 5K SNPs that were not anchored to the newly assembled rainbow trout genome were ltered out because LD calculation (explained in the next Loading [MathJax]/jax/output/CommonHTML/jax.js sections) requires physically mapped SNPs. Anchored SNPs were subjected to QC analysis using PREGSF90 software, which belongs to the BLUPF90 family [56]. After QC, 35,303 SNPs (70.6%) were retained, those SNPs had minor allele frequency (MAF) > 0.05, call rate for SNP > 0.90, and deviation from the Hardy-Weinberg equilibrium (HWE) < 0.15. The ltered SNPs were used both for GS and LD analysis.

Model and analysis
A single-trait mixed model was used for muscle yield and shear force ( llet rmness) as follows: y = Xb + Z 1 a + Z 2 c + e where y is a vector of phenotypes (muscle yield or llet rmness), b is a vector of xed effects of hatchyear and harvest group, a is a vector of additive direct genetic effect, c is a vector of random of common environmental effect (i.e., family effect), and e is the vector of residuals. Incidence matrices for effect contained in b, a, and c are represented by X, Z 1 , and Z 2 , respectively. where G − 1 is the inverse of the genomic relationship matrix, constructed as in [59] and A -1 22 is the inverse of the pedigree relationship matrix for genotyped animals.
Variance components were estimated using all data, with and without genomic information for both traits using AIREMLF90 [57]. However, pedigree-based variance components, estimated with all the data, were used in the validation study to have fair comparisons between PBLUP and ssGBLUP. Heritabilities were calculated as:

Cross-validation
To evaluate predictive abilities of both traditional pedigree-based and genomic evaluations, as well as the impact of different SNP densities, ve-fold cross-validation was conducted [ animals with phenotypes (muscle yield N = 775; shear force N = 772) were randomly split into ve mutually exclusive folds/groups. Then, phenotypes were removed from the data, one group at a time to form the validation groups (muscle yield N = 155; shear force N = ~ 154). The remaining animals (i.e., training group) were used to predict the future performance of the validation group. Results were presented as the overall mean of the 5 replicates. This cross-validation approach was chosen given the small number of genotyped animals with phenotypes.
Predictive ability was de ned as the Pearson's correlation between phenotypes adjusted for xed effects (y*) and (G)EBV.

Linkage disequilibrium and effective population size
The LD was calculated using PREGSF90 [57] according to the following equation: r 2 = D 2 p 1 p 2 q 1 q 2 where p 1 , p 2 , q 1 , and q 2 represent the allele frequencies; D = p 11 p 22 − p 12 p 21 . The mean LD was estimated for each chromosome as the average estimate of r 2 from all pairwise SNPs [24].
For each chromosome, LD decay with the distance between SNP markers was calculated by tting Sved's equation [61] as follows: where d t is the average length of chromosome t in Morgan, r 2 is the average LD of chromosome t, (n) −1 represents the adjustment term for the sample size, and α is a xed parameter that equals to 1 in the absence of mutations or to 2 in the presence of mutations. In this study, α = 2 was used.
Reducing SNP density based on LD [ ] ) PLINK [48] was used to generate reduced/pruned SNP subsets based on variable LD values of 0.7, 0.4, 0.1, 0.05, and 0.01. SNP pruning was achieved using the command line (plink -le data -indep-pairwise 50 5 r 2 ). In brief, LD was calculated between each pair of SNPs within a genomic window of 50 SNPs. A pair of SNPs was removed if the LD between the two pairs of SNPs exceeded the LD value, e.g., 0.7. The window was shifted 5 SNPs forward, and the procedure was repeated. The command line was reexecuted using the four other r 2 values. Each one of the ve resulting pruned SNP subsets with 20042, 11433, 1153, 497, and 128 SNPs were used in genomic predictions.
Reducing SNP density based on percentage of additive genetic variance The second approach used to reduce the density of the SNP panel was based on the percentage of additive genetic variance explained by SNPs in each trait. The SNP variances were previously computed using wssGBLUP for muscle yield and llet rmness [13,19]. In this approach, SNPs were clustered into ve to six groups based on the percentage of additive variance explained, with clusters ranging from > 0.05% to > 1.0%. Accordingly, the reduced number of SNPs of each group was used to evaluate the predictive ability using the ve-fold cross-validation strategy.