Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce

Background Genomic selection (GS) can increase genetic gain by reducing the length of breeding cycle in forest trees. Here we genotyped 1370 control-pollinated progeny trees from 128 full-sib families in Norway spruce (Picea abies (L.) Karst.), using exome capture as genotyping platform. We used 116,765 high-quality SNPs to develop genomic prediction models for tree height and wood quality traits. We assessed the impact of different genomic prediction methods, genotype-by-environment interaction (G × E), genetic composition, size of the training and validation set, relatedness, and number of SNPs on accuracy and predictive ability (PA) of GS. Results Using G matrix slightly altered heritability estimates relative to pedigree-based method. GS accuracies were about 11–14% lower than those based on pedigree-based selection. The efficiency of GS per year varied from 1.71 to 1.78, compared to that of the pedigree-based model if breeding cycle length was halved using GS. Height GS accuracy decreased to more than 30% while using one site as training for GS prediction and using this model to predict the second site, indicating that G × E for tree height should be accommodated in model fitting. Using a half-sib family structure instead of full-sib structure led to a significant reduction in GS accuracy and PA. The full-sib family structure needed only 750 markers to reach similar accuracy and PA, as compared to 100,000 markers required for the half-sib family, indicating that maintaining the high relatedness in the model improves accuracy and PA. Using 4000–8000 markers in full-sib family structure was sufficient to obtain GS model accuracy and PA for tree height and wood quality traits, almost equivalent to that obtained with all markers. Conclusions The study indicates that GS would be efficient in reducing generation time of breeding cycle in conifer tree breeding program that requires long-term progeny testing. The sufficient number of trees within-family (16 for growth and 12 for wood quality traits) and number of SNPs (8000) are required for GS with full-sib family relationship. GS methods had little impact on GS efficiency for growth and wood quality traits. GS model should incorporate G × E effect when a strong G × E is detected.

Hayes et al. [14] considered four major factors affecting the accuracy of GS: 1) heritability of the target trait; 2) the extent of linkage disequilibrium (LD) between the marker and the quantitative trait locus (QTL); 3) the size of training population and the degree of relationship between training set (TS) and validation set (VS); and 4) the genetic architecture of the target trait. The genetic architecture and heritability of the target/breeding traits are intrinsic of the nature of the traits and environment in the test trials. Thus it is difficult to change in a practical breeding process. However, some factors such as LD between the marker and the QTL and the size of training population are relatively easy to be managed by increasing the number of markers and relationship between training and prediction (or validation) populations, reducing effective population size (N e ), and increasing training population size [15].
Model selection in GS is quite important for the prediction of genomic estimated breeding values (GEBVs) [14]. In GS, model adequacy is more related to the genetic architecture of the target trait. Based on genome-wide association studies (GWAS), most growth and wood quality traits for conifer species have polygenic inheritance with a gamma or exponential distribution of allelic effects [16]. To account for these skewed distributions of a few genes with large effects and most of genes with small effects, Bayes A, B, and Cπ and Bayesian Least Absolute Shrinkage and Selection Operator (BLASSO) were developed to fit the models more accurately, in contrast to Genomic best linear unbiased prediction (GBLUP) model that assumes a normal distribution of allelic effect. In most studies for growth and wood quality traits, the results were similar regardless of models used [4,9]. Resende et al. [9] reported that fusiform rust in loblolly pine may be controlled by a few genes with large effects and Bayesian-based models had a higher predictive ability (PA), which is defined as the correlation between adjusted phenotype values and GEBVs. Thus, it is worthwhile to test different models on traits that may have different genetic architectures for specific tree species.
So far, several genotyping technologies have been employed in GS, such as diversity array technology (DArT) array [5,17], SNP chip/array [4-6, 9, 10, 13], genotyping by sequencing (GBS) [10,11], and exome capture [18]. Those technologies were developed to genotype a subset of a whole genome, especially for conifer species with a large genome size. From the published papers, the number of used markers in tree species varies from 2500 to 69,511 single nucleotide polymorphisms (SNPs), most with a few thousands of SNPs. With the large genome size in most commercial conifer species (~20Gb) [19], for example, 20 Gb in Norway spruce [20], such small number of markers may not be able to capture most of QTL effects with short-range marker-QTLs LD in undomesticated populations or large breeding populations. Thus, such studies mostly capture those QTL effects with long-range marker-QTLs LD and relationships in highly related populations, such as full-sib families in tree breeding programs with a small N e .
Evaluations of accuracy in GS have been performed with phenotypic and dense marker data from a single site and multiple sites in several tree species [8,11,17]. Substantial G × E effects for growth traits have been found in conifer species [21][22][23]. However, wood quality traits usually have a low or non-significant G × E [24][25][26]. Thus, a GS model for growth traits using data from a single site and used to predict genomic breeding values in another site, may produce a low accuracy.
The aims of this study were to 1) evaluate the accuracy of GS on tree height and wood quality traits; 2) assess the GS accuracy for single site, cross-site, and joint-sites (e.g. G × E effect on GS selection); 3) examine effect of different statistical models and ratios between TS and VS for GS; 4) explore the roles of relatedness (full-sib, half-sib and unrelated) on accuracy of GS; 5) test the accuracy using subsets of random markers and markers with the largest positive effects; 6) estimate number of trees within-family and number of families required for effective GS for tree height and wood quality traits.

Sampling of plant material
In this study, 1370 individuals were selected from two 28-year-old control-pollinated progeny trials with the same 128 families from a partial diallel mating design that consisted of 55  A completely randomized design without designed pre-block was used in the Vindeln trial (site 1), which was divided into 44 post-blocks. Single-tree plot with a spacing of 1.5 m × 2 m was used in each rectangular block with 60 trees (6 × 10). The same design was also used in the Hädanberg trial (site 2) with 44 post-blocks, but for the purpose of demonstration, there was an extra design with 47 extra plots, each plot with 16 trees (4 × 4). Based on the spatial analysis, in the final model, 47 plots were combined into two big post-blocks.

Phenotyping
Tree height was measured in 2003 at the age of 17 years. Solid-wood quality traits including Pilodyn penetration (Pilodyn) and acoustic velocity (velocity) were measured in October 2016. Surrogate wood density trait was measured using Pilodyn 6 J Forest (PROCEQ, Zurich, Switzerland) with a 2.0 mm diameter pin, without removing the bark. Velocity is highly related to microfibril angle (MFA) in Norway spruce [27] and was determined using Hitman ST300 (Fiber-gen, Christchurch, New Zealand). By combining the Pilodyn penetration and acoustic velocity, indirect modulus of elasticity (MOE) was estimated using the equation developed by Chen et al. [27].

Genotyping
Total genomic DNA was extracted from 1, 370 control-pollinated progeny and their 46 unrelated parents using the Qiagen Plant DNA extraction protocol with DNA quantification performed using the Qubit® ds DNA Broad Range Assay Kit, Oregon, USA. Probe design and evaluation were described in Vidalis et al. [28]. Sequence capture was performed using the 40,018 probes previously designed and evaluated for the materials [28] and samples were sequenced to an average depth of 15x at an Illumina HiSeq 2500 platform. Raw reads were mapped against the P. abies reference genome v1.0 using BWA-mem [29,30]. SAMTools [31] and Picard [32] were used for sorting and removal of PCR duplicates and the resulting BAM files were subsequently reduced to containing the probe only bearing scaffolds (24,919) before variant calling. Variant calling was performed using the Genome Analysis Toolkit (GATK) HaplotypeCaller [32] in Genome Variant Call Format (gVCF) output format. Samples were then merged into batches of~200 before all samples were jointly called.
As per the recommendations from GATK's best practices, Variant Quality Score Recalibration (VQSR) method was performed in order to avoid the use of hard filtering for exome/sequence capture data. The VQSR method utilizes machine-learning algorithms to learn from a clean dataset to distinguish what a good versus bad annotation profile of variants for a particular species should be like. For the VQSR analysis two datasets were created, a training subset and the final input file. The training dataset was derived from a Norway spruce genetic mapping population with loci showing expected segregation patterns [33]. The training dataset was designated as true SNPs and assigned a prior value of 12.0. The final input file was derived from the raw sequence data using GATK best practices with the following parameters: extended probe coordinates by + 100 excluding INDELS, excluding LowQual sites, and keeping only bi-allelic sites. The following annotation parameters QualByDepth, MappingQuality and Base-QRankSum, with tranches 100, 99.9, 99.0 and 90.0 were then applied to the two files for the determination of the good versus bad variant annotation profiles.
The recalibrated Variant Call Format was filtered based on the following steps: (1) removing indels; (2) keeping only biallelic loci; (3) treating genotype with a genotype quality (GQ) < 6 as missing; (4) filtering read depth (DP) < 2; (5) removing individual call rate < 50%; (6) removing variant call rate ("missingness") < 90%; (7) minor allele frequency (MAF) < 0.01. After steps 1-4, we calculated discordance between 148 pairs technique replicates, the average discordance was less 1%. Thus, conditions of GQ < 6 and DP < 2 as missing are sufficient to do downstream analysis. After all filtering, 116,765 SNPs were kept for downstream analysis and 77,116 SNPs were independent based on LD (r 2 < 0.2) calculated in PLINK [34]. The resultant SNPs were annotated using the default parameters for snpEff 4. The Ensembl general feature format (GTF, gene sets) information for the P. abies genome was utilized to build an annotation database. This analysis revealed that 90% of the variants were located within gene coding regions, with only 10% variants in intronic regions.
LD K-nearest neighbour genotype imputation approach [35] was used to impute missing genotypes in TASSEL 5 [36]. After several rounds of imputation, a few of missing genotypes were imputed using random imputation of the codeGeno function in the synbreed package in R [37]. In total, 5.9% of missing genotypes were imputed.

Estimating breeding values
Breeding values for the genotypes in the trials were predicted using the following model: Where y is a vector of phenotypic observations of a single trait; β is a vector of fixed effects, including a grand mean and site effects, b(s) is a vector of post-block within site effects, a is a vector of site by additive effects of individuals. X, W, and Z 1 are incidence matrices for β, b(s), and a, respectively. For join-site cross-validation, the average of breeding values was assumed as estimated (true) breeding values (EBVs) when unstructured variance and covariance were used for additive effects. Otherwise, EBVs in the single site were assumed as true or reference breeding values. The random additive effects (a) in equation (1) were assumed to follow a∼N ð 0; A σ 2 a1 σ a12 σ a12 σ 2 a2 " # Þ , where A is the additive genetic relationship matrix, σ 2 a1 and σ 2 a2 are the additive genetic variances for site 1 and site 2, respectively, σ a12 is additive genetic covariance between site 1 and site 2. The residual e was assumed to follow e $ Nð0; Þ , where σ 2 e1 and σ 2 e2 are the residual variances for site 1 and site 2, I n1 and I n2 are identity matrices, n1 and n2 are the number of individuals in each site, 0 is the zero matrix.
To obtain accurate heritability estimates for joint-site models, we used the equation as following: where sa is a vector of site-by-additive interaction effects, a, sa, and e were assumed to be homogenous between the two sites.
Statistical analyses for genomic predictions GBLUP, Bayesian ridge regression (BRR), BLASSO, and reproducing kernel Hilbert space (RKHS) were used to estimate GEBVs. We implemented GBLUP calculations using ASReml R [38]. And we implemented the BRR, BLASSO, and RKHS methods using BGLR function from the BGLR package in R [39]. The details of these statistical methods will be defined later. The GEBVs were estimated using the following mixed linear model: Where y' is a vector of adjusted phenotypic observations by post-block effects and standardized site effect (transforming it to have zero mean and unit variance for each site), β is a vector of fixed effect, including a grand mean), a and e are vectors of random additive and random error effects, respectively, and X and Z are the incidence matrices.
The four genomic-based best linear unbiased prediction methods were compared with the traditional pedigree-based best linear unbiased prediction (ABLUP)

1) ABLUP
The ABLUP is the traditional method that utilizes a pedigree relationship matrix (A) to predict the EBVs. For ABLUP the vector of random additive effect (a) in equation (3) is assumed to follow a normal distribution a $ Nð0; Aσ 2 a Þ, where σ 2 a is the additive genetic variance. The residual vector e is assumed as e $ Nð0; Iσ 2 e Þ, where I is the identity matrix. The mixed model equation (3) was solved to obtain EBVs as: The scalar α is defined as α ¼ σ 2 e =σ 2 a , where σ 2 e is the residual variance, σ 2 a is the additive genetic variance.

2) GBLUP
The GBLUP model is the same as ABLUP, with the only difference being that the genomic relationship matrix (G) replaces the A matrix. The G matrix is calcu- where M is the matrix of samples with SNPs encoded as 0, 1, 2 (i.e. the number of minor alleles), P is the matrix of allele frequencies with the i th column given by 2(p i − 0.5), where p i is the observed allele frequency of all genotyped samples. In GBLUP, the random additive effect (a) in equation (3) is assumed to follow a $ Nð0; Gσ 2 g ), where σ 2 g is the genomic-based genetic variance and GEBVs (â) are predicted from equation (4), but with A − 1 replaced by G − 1 and σ 2 a replaced by σ 2 g . The inverse of G matrix was estimated using write.realtionshipMatrix function in the synbreed package in R [40].

3) Bayesian ridge regression (BRR)
BRR is a Bayesian version of ridge regression with the shrinkage merit that was originally intended to deal with the problem of high correlation among predictors in linear regression models [41]. The random additive vector a is assigned a multivariate normal prior distribution with a common variance to all marker effects, that is a $ Nð0; I p σ 2 m ), where p is the number of markers. Parameter σ 2 m denotes the unknown genetic variance contributed by each individual marker and is assigned as where df a is degrees of freedom, S m is the scale parameter. Finally, the residual variance is assigned as σ 2 e $ χ −2 ðdf e ; S e Þ , where df e is degrees of freedom for residual variance, S e is the scale parameter for residual variance.

4) Bayesian LASSO (BLASSO)
BLASSO is a Bayesian version of LASSO regression with two properties of LASSO: 1) shrinkage and 2) variable selection. BLASSO assumes that the random additive effects in equation (3) is given a multivariate normal distribution with marker specific prior variance, which is assigned as where df e is degrees of freedom, S e is the scale parameter.

5) RKHS
RKHS assumes that the random additive marker effects in equation (3) are distributed as a~N(0, K σ 2 a ), where K is computed by means of a Gaussian Kernel that is given by K ij = exp(−hd ij ), where h is a semi-parameter that controls how fast the prior covariance function declines as genetic distance increase and d ij is the genetic distance between two samples computed as where x ik and x jk are the k th SNPs for the i th and j th samples, respectively [42]. RKHS method uses a Gibbs sampler for the Bayesian framework and assigned the prior distribution of σ 2 a and σ 2 e as σ 2 a $ χ −2 ðdf a ; S a Þ and σ 2 e $ χ −2 ðdf e ; S e Þ, respectively. Here we chose a Single-Kernel model as suggested by Perez and de Los Campos [39], where h value was defined as h = 0.25.

Model convergence and prior sensitivity analysis
The algorithm is extended by Gibbs sampling for estimation of variance components. The Gibbs sampler was run for 150,000 iterations with a burn-in of 50,000 iterations. A thinning interval was set to 1000. The convergence of the posterior distribution was verified using trace plots. Flat priors were given to all the models.

Heritability and type-B genetic correlation estimates
Pedigree-based individual narrow-sense heritabilities ( h 2 a ) and marker-based individual narrow-sense heritabilities (h 2 g Þ were calculated as respectively, where σ 2 a is the pedigree-based additive variance estimated from ABLUP, while σ 2 g is the marker-based additive variance estimated from GBLUP. σ 2 pa and σ 2 pg are phenotypic variances for pedigree-based and marker-based models, respectively. Type-B genetic correlation was calculated as r 12 , where σ a12 is covariance between additive effects of the same traits in different sites and σ 2 a1 and σ 2 a2 are estimated additive variances for the same traits in different sites, respectively [43]. One-tailed likelihood ratio test (LRT) was used to check the significance of the Type-B genetic correlation against one.

Model validation and estimation of GS accuracy
Tenfold cross-validation (90% of training and 10% validation) was performed for all the models, except in testing the various sizes of training data sets and the number of trees per family. GEBVs in VS for Bayesian and RKHS methods were estimated as Where Z 0 ij is the indicator covariate (− 1, 0, or 1) for the i th tree at the jth locus andâ j is the estimated effect at the j th locus. In this study, prediction accuracy (accuracy) was defined as the Pearson correlation between cross-validated GEBVs and EBVs as "true" or reference breeding values estimated from ABLUP using all the phenotypic data (y). Predictive ability (PA) was defined as the Pearson correlation between GEBVs and adjusted phenotypic values y' in equation (3).

Testing different statistical models and the size of training set on GS accuracy
To test the effect of different statistical models, we used the five models (ABLUP, GBLUP, BRR, BLASSO, and RKHS) and five TS/VS sizes (ratio of 1:1, 3:1, 5:1, 7:1, and 9:1 of training/validation population size, respectively) to evaluate the accuracies of different models. For each of 25 models, 10 replicate runs were carried out for each scenario of four traits.

Testing the number of families on GS accuracy
To test the effect of the number of families on GS, we randomly selected family numbers from 10 to 120 to test the efficiency of GS. The purpose is to examine the efficiency of using a small subset of families in clonal selection.

Testing the number of trees per family on GS accuracy
To test the effect of the number of trees per family on GS, we randomly selected 1 to 20 trees per family as TS, remaining trees as VS.

Testing site effect on GS accuracy
In order to consider genotype and environment interaction (G × E) effect on GS, different GS scenarios were tested: 1) within-site GS, both training and validation sets are from one single site, where EBVs from single site model including G × E term was assumed as true breeding values of the site; 2) cross-site GS, using one site data as TS to predict GEBVs in another site; 3) joint-site GS, average EBVs were assumed as true breeding values when unstructured additive variance-covariance were used in equation (1) .

Testing the effect of relatedness on GS accuracy
In order to test whether the different relatedness (family structures) affect the accuracy and PA, three different scenarios were used in this study. There were: 1) TS and VS selected based on random sampling, but more likely from the same full-sib families; 2) for comparison purpose, TS and VS selected based on half-sib family structure in which the TS and VS shared female parents, but from different families; 3) for comparison purpose, TS and VS selected based on unrelated family structure in which the TS and VS shared different female and male parents.

Testing subset of markers (SNPs) on GS accuracy
To test the impact of the number of SNPs on the accuracy of genomic prediction models, 15 subsets of SNPs (10, 25, 50, 100, 250, 500, 750, 1 K, 2 K, 4 K, 6 K, 8 K, 10 K, 50 K, and 100 K) and two different types of sampling strategies: 1) randomly selected SNP subsets and 2) SNP subsets selected with the largest positive effects were implemented. The single marker effects were estimated using single-marker regression for association testing in training data set. To examine the impact of relatedness on the number of markers, we used full-sib and half-sib family structures between training and validation populations.

Selection response with genomic selection
Selection response could be calculated as the ratio between selection accuracy and breeding cycle length in years.
The relative efficiency of GS to traditional BLUP-based selection (TBS) is Where EBVs are the estimated breeding values using the full data from equation (1), and T TBS and T GS are the breeding cycle lengths under TBS and GS, respectively [15]. We assumed that the T GS is reduced to 12.5 years from the current approximately 25 years of a breeding cycle by omitting or reducing progeny testing time about 10-15 years.

Heritability and type-B genetic correlation
Heritabilities of the tree height based on GBLUP from two single sites and the joint sites were higher than those from ABLUP models (Table 1). This is in contrast with the wood quality traits (Pilodyn, velocity, and MOE) where all heritabilities from the GBLUP models were lower than those obtained from the ABLUP models. For example, heritability of Pilodyn (0.34) from the GBLUP model was smaller than that from the ABLUP model (0.41).
The type-B genetic correlations for tree height estimated from ABLUP and GBLUP models (as 0.48 and 0.41, respectively), were significantly different from 1, however, the type-B genetic correlations of wood quality traits (0.80-0.94) were higher and statistically were not significantly different from 1.

Accuracy of different statistical methods and the size of training set
Estimates of accuracy were obtained using different statistical methods and for different ratios of TS/VS for each of four traits (Fig. 1). It was observed that ABLUP had higher accuracy than that using the four genomic selection methods (GBLUP, BRR, BLASSO, and RKHS) for tree height, Pilodyn, velocity, and MOE. Tree height had higher accuracy than the three wood quality traits using ABLUP and GS ( Fig. 1 and Table 2). Among the four GS methods, GBLUP, BLASSO, and RKHS had approximately similar accuracies, but a slightly higher accuracy was observed when BRR was implemented for tree height. Nevertheless, these four GS methods had little differences on accuracy for the three wood quality traits.
GS accuracy increased as the ratio of training to validation population size increased. However, for GS methods, the maximum accuracy was reached at the ratio of 5:1 between training to validation populations for tree height while maximum accuracy seemed to change minimally after the 3:1 ratio for wood quality traits.

Impact of the number of families on GS accuracy
We estimated the effect of the number of families on GS. The accuracies of all four traits increased when the number of families increased from 10 to 120 families in both the ABLUP and GBLUP model building (Fig. 2). PA had a similar trend, except for tree height. PA of tree height increased from 10 to 30 families and then had similar values up to 120 families in the model building.

Impact of the number of trees per family on GS accuracy
In this study, all 128 full-sib families planted in both progeny trials were selected. Five trees per family in site 1 (Vindeln) and 20 trees per family were selected if there were enough trees for some families in site 2 (Hädanberg).
Thus, in joint-site cross-validation model, maximum 20 trees per family were tested. Accuracies and PA from ABLUP for all the traits were higher than those from GBLUP when we randomly selected a subset trees per family as TS (Fig. 3). It was also observed that accuracy and PA had similar increased trends as the number of trees within-family increased, but it flattened (stabilized) as the tree numbers reached between 6 and 14, depending on method (e.g. ABLUP and GBLUP) and traits. The accuracies of tree height, Pilodyn, velocity, and MOE increased initially from 0.48, 0.39, 0.52, and 0.41 to 0.84, 0.64, 0.78, and 0.72, respectively, and then stabilized after tree number reached 14, 6, 12, and 12 per family for ABLUP, respectively. The GS accuracies of tree height, Pilodyn, velocity, and MOE also stabilized after the tree number reached 18, 6, 10 and 10 per families for GBLUP, respectively. This may indicate that more trees within a family (16)(17)(18)(19) are required for a reliable training set in GS for growth trait than for wood quality traits (6)(7)(8)(9)(10)(11)(12).

Testing site effect on GS accuracy
Accuracy and PA of ABLUP and GBLUP were listed in Table 2 for three selection scenarios: Within-site training and selection, cross-site training and selection (e.g. training based on one site while selection for the second site) and joint-site training and selection using full-sib family structure. For all four traits, accuracy of within-site training and selection were always higher than the second scenario of cross-site training and selection. However, the accuracy differences between within-site and cross-site were larger for tree height than for three wood quality traits. For example, the average accuracy of two within-site selections is 0.76 relative to 0.44 for average across-site for tree height from GBLUP. However, for MOE, the average accuracy of two within-site selection is 0.64 relative to 0.59 for average cross-site selection. PA had a similar pattern. The joint-site model accuracies from both ABLUP and GBLUP were the higher than those of within-site and cross-site training and selection. Especially, for instance, tree height accuracy in the joint-site (0.81) was higher than average of within-site (0.76) and average of cross-site (0.44). Similarly, PAs from the joint-site model for all four traits were higher than cross-site training and selection model, but were not always higher than within-site training and selection model. For instance, PAs from ABLUP (0.26) and GBLUP (0.23) models in site 2 were higher than those from the joint-site (0.20 in both models) for tree height. Relative efficiency (RE) of GBLUP to ABLUP was lower than 1 for all the selection scenarios and traits, ranging from 0.80 to 0.95 with an average of 0.88. However, RE per year (assuming halving a breeding cycle time) reached from 1.60 to 1.89 with an average of 1.76 and were not related to any traits and selection scenarios.

Relatedness
Compared with the full-sib family structure, GS models built with a half-sib family structure led to a considerable decrease in accuracy and PA (Table 3). For instance, in the half-sib family structure, GS accuracy and PA from GBLUP model decreased from 0.81 and 0.20 to 0.55 and 0.11, respectively for tree height and from 0.69 and 0.36 to 0.50 and 0.26, respectively for MOE. However, both RE and RE per year changed little between half-sib and full-sib structure in GS selection. For example, RE and RE per year increased slightly from 0.89 and 1.78 to 0.98 and 1.97, respectively for velocity from full-sib to half-sib population while RE and REs per year decreased slightly from 0.89 and 1.78 to 0.83 and 1.67, respectively from full-sib to half-sib population for tree height.
Compared with half-sib family structure, GS models built with an unrelated family between TS and VS had a considerable decrease in accuracy and PA from GBLUP (Table 3). For example, in the unrelated family structure, GS accuracy and PA from GBLUP model decreased from 0.55 and 0.11 to 0.24 and 0.06, respectively for tree height and from 0.50 and 0.26 to 0.19 and 0.10, respectively for MOE. Especially for velocity, GS accuracy and PA from GBLUP model with a marked decrease from 0.62 and 0.32 to 0.09 and 0.02, respectively, was observed. However, it is worth to note that ABLUP models built with unrelated family had zero accuracies between training and validation populations.

Impact of the number of SNPs on GS accuracy
Accuracy and PA using subsets of markers with the largest positive effects were slightly higher than those using subsets of random markers until the subset of random markers reached 100 K SNPs (Fig. 4). Accuracy and PA using a subset of random markers increased with the increase in the number of markers, until all the markers were included, except that tree height accuracy stabilized with 4 K SNPs. However, accuracy and PA using the subset of markers with the largest positive effects showed different trends. It increased initially, then stabilized and finally decreased until it got to the same level as the random markers selection at the highest number of markers of 100 K SNPs. The use of subsets of markers with the largest positive effects had higher accuracy and PA than random markers until most of the markers were used. For example, PA using a subset of markers with the largest positive effects increased initially from 0.10 to 0.23 with 1 K SNPs and then decreased to 0.19 with 10 K SNPs for tree height. Trait had a similar influence on the number of markers to reach the plateau of the accuracy and PA using a subset of markers with the highest positive effects. The accuracy and PA reached a plateau when the number of markers with the highest positive effects increased to 4 K-6 K for all the traits.

Impact of the number of SNPs and relatedness on GS accuracy
It required fewer number of markers in the full-sib family structure than the half-sib family structure to reach the same accuracy and PA (Fig. 5). For example, the accuracies observed with 250 markers in the full-sib family structure were similar as using all 100 K markers in the half-sib family structure for tree height, Pilodyn, velocity, and MOE. The same PA with random 750 markers used in the full-sib family structure for all four traits required at least 100 K markers in half-sib family structure.

Heritability estimates
Heritability is an essential genetic parameter in selective breeding and its values are dependent on the relative contributions of genetic and environmental variations, and vary among traits [43]. In our study, heritabilities for wood quality traits were higher than those for tree height, which is expected and agrees with previous reports for Norway spruce [1,27]. Tan et al. [4] reported that heritability estimates obtained from GBLUP were higher than those from ABLUP for growth and wood quality traits in Eucalyptus. In contrast, Lenz et al. [13] and Gamal EL-Dien et al. [11] reported that heritability estimates obtained from GBLUP were lower than those from ABLUP for similar growth and wood quality traits in black spruce and interior spruce (Picea glauca [Moench] Voss × Picea engelmannii Parry ex Engelm.). In this study, the heritability estimates for tree height obtained from GBLUP were slightly higher than those from ABLUP, but there is no significant difference from ABLUP considering the estimated standard errors. The heritability estimates for wood quality traits obtained from GBLUP were slightly lower than those from ABLUP. The above two opposite situations indicate that pedigree-based ABLUP model may inflate heritability estimates for tree height and deflate heritability estimates for wood quality traits if estimates from GBLUP reflect true genetic relationships among families and account for Mendelian segregation within families. The impact of heritability on the accuracy seems to be low in this study, in line with the report in Douglas-fir [18] and interior spruce [11]. In our joint-site analyses, the heritabilities of tree height, Pilodyn, velocity, and MOE were low to moderate (0.15, 0.34, 0.37, and 0.36, respectively), but the accuracies were high (0.81, 0.66, 0.74, and 0.83, respectively). Several factors may explain this: (1) the large sample size of 1370 and the relatively small effective population size (N e = 55) likely negate the effect of low trait heritability on prediction accuracy [18]. Märtens et al. [44] demonstrated that increasing relatedness between training and validation populations leads to high prediction accuracy on yeast; (2) the accuracy is the correlation between EBVs assuming as true breeding values from ABLUP and GEBVs from GBLUP in VS in that heritability may affect PAs of ABLUP and GBLUP, but may have little influence on the correlation; (3) the accuracy estimate only represents the additive genetic effect. We found that PA is more similar to the narrow-sense heritability because PA involved both phenotypic and genetic values. For example, heritability of MOE from GBLUP model (0.36) is the same as from ABLUP model (0.36).
In the present study, tree height accuracy (0.81 from GBLUP in Table 2) in full-sib family structure with 1233 individuals in TS was similar as in the deterministic simulation with 50 QTLs and heritability of 0.2 [15]. Wood quality traits were similar to that in the simulation with 100 QTLs and a heritability of 0.4.

Different GS methods show similar results
As expected, the accuracies of four different genomic statistical methods did not clearly outperform each other. This contrasts with previous evaluation of the RKHS method that was reported to outperform other GS methods for low heritability growth traits [4]. It was usually observed that different genomic statistical methods produce similar results for growth and wood quality traits in other forest trees species [10,18,45]. With an exception, Resende et al. [9] compared Ridge Regression-BLUP (RR-BLUP), Bayes A, Cπ, and BLASSO for 17 traits in loblolly pine, and found that Bayes A and Bayes Cπ have higher PA than RR-BLUP and BLASSO for fusiform rust disease-resistance trait. They attribute this to a few genes with large effects that control disease resistance. When the number of markers increases the computational time for Bayesian methods take longer time to converge. Therefore, we also support the proposal that GBLUP is an effective method in providing the best compromise between computational time and prediction efficiency if there are no major gene effects [4,46].  The effect of training dataset and the number of trees per family on GS accuracy In this study, ratio of TS/VS varied from 1:1 to 9:1. We found that the accuracy improved from the TS/VS ratios of 1:1 to 3:1, but accuracy only improved a little after the TS/VS ratio of 3:1. This is different from other studies [4,13] showing that an increased ratio of the TS/VS beyond 3:1 still increases the accuracy. However, we found that our result concurs with other reports when the ratio of TS/VS is related to the number of trees per family [13]. In our case, each family  [47] reported that the number of trees per family has an important effect on the magnitude and precision of genetic parameter estimate. For tree height in that study, at least four trees per family at each site should be included in a half-sib family in order to estimate a more accurate heritability and 4-8 trees/per family per site still could improve the accuracy of heritability. However, further increase in the number of tree trees per family had little contribution toward increase of accuracy. Wood quality traits usually have higher heritabilities than those of growth traits and also have less G × E. Therefore, wood quality traits may need a lower number of trees than growth traits for obtaining a similar accurate estimate for genetic parameters. Such calculations could guide us to make a more accurate estimate on the number of trees per family required for phenotyping and genotyping.

The effect of the number of families on GS accuracy
The effect of the number of families used in cross-validation test was found important in this study with full-sib family structure. We found that the PA and accuracy increased greatly for all the traits from 10 to 120 families, except the PA for tree height, which stabilized after 30 families for cross-validation test. Based on resampling technique, Perron et al. [47] reported that the number of families has a less important effect on the magnitude and precision of genetic parameter estimate in a half-sib family structure. However, in this study, we found that the number of families is also important for estimates of GS accuracy and PA. It may be due to the small effective population size (55 unrelated parents) as compared to the study in Perron et al. [47]. One application of GS is in clonal forestry to select the best clones after selection and mating of several best parents (5-10). One question is how to build the training equation for such clonal selection. Should we use progenies of the selected parents or all parents fromthe trial? From this study, it seems that it is more efficient involving progenies from a larger number of parents (families). Model building and selection using 10 families seem to have lower accuracy than using a larger number of families. It must be tested whether it is due to the small size of families (10-20 trees per family) used in this study, and whether increasing the family size (for example 40 trees per family) as used in clonal progeny testing in Norway spruce in Swedish tree breeding program [2] would increase GS accuracy in a small group of elite families.

Genotype-by-environment interaction
G × E is usually important for growth traits when the seedlings are planted in different environments. We found that tree height in these two northern trials had a significantly strong G × E, indicated by the type-B genetic correlations (0.48 and 0.41) from ABLUP and GBLUP, respectively (Table 1). Such strong G × E resulted in a low accuracy and PA when one site is used as TS to predict BVs in another site as VS. A moderate to strong G × E for growth traits has been reported in several studies in southern and central Sweden [21,48,49], but not documented in northern Sweden. Chen et al. [21] reported that within a test series in southern and central Sweden, the averages of type-B genetic correlations varied from 0.60 to 0.89 in 6 test series and their type-B correlations were higher than those in this study. Such strong G × E in the present study should be considered in model fitting in order to improve PA and accuracy. Several advanced models have been built and tested in crops [50,51] and one study in tree species [52]. For instance, Oakey et al. [50] used marker and marker by environment interaction in RR-BLUP method to extend the genomic selection to multiple environments. As expected, we found that wood quality traits have no significant type-B genetic correlations and negligible change of accuracy and PA. A similar result has been reported in two southern Norway spruce open-pollinated progeny trials [27]. All these indicate that we could use a genomic model in one site to predict GEBV in another site in the same test series for wood quality traits and use of joint-site models could slightly improve the accuracy as in Table 2.

Effect of different family structures
We found that our genomic accuracy for tree height and Pilodyn using half-sib family structure was lower than that reported by Lenz et al. [13] in black spruce, even though more SNPs were used in this study (116 K in 20,695 contigs vs 0.49 K from SNP chip). The difference is likely due to lower heritabilities (i.e. 0.15 vs 0.42 for tree height in GBLUP) and the larger N e in this study (55 vs 27 unrelated parents). Accuracy and PA decreased from full-sib to half-sib family structure and from half-sib to unrelated family structure, indicating that GS model is more efficient in strongly structured populations where relatedness and LD are higher, and full-sib families also needed less number of markers than half-sib families for obtaining similar accuracy (Fig. 5). Similar results were obtained by other studies [12,13,45,53]. However, the relative efficiency between ABLUP and GBLUP (the accuracy ratio) is more or less similar in both full-sib and half-sib populations, which indicates that GS could be used in both half-sib and full-sib populations. The lower estimates of accuracy and PA (Table 3) obtained from GBLUP for unrelated family structure may be due to the low LD between marker and QTL in the unrelated population, which indicates that GS may not be used in unrelated family structure based on the current exome capture data.

The effect of the number of SNPs and LD in genomic prediction
To our knowledge, this study has used the largest number of SNPs (116,765 SNPs from 20,695 contigs) for GS in tree species [3,7,18]. When we used GS in the half-sib family structure, the accuracy and PA reduced about 20%. Moreover, the values of square of PA in half-sib and full-sib family structure accounted for less than 50% of heritability for all traits, which may indicate that even with such a large number of SNPs, we may still have not captured and explained most of the QTL effects. This may be attributed to the low LD in spruce in Fig. 6 (approximate 84 base pairs based on 517 unrelated individuals in Baison et al. [54]) and only exome regions being used. In humans, the exome constitutes a mere 1% of the whole genome (3Gb) [55]. For Norway spruce, however, the genome size is ca. 20 Gb and LD is lower than Humans [19]. Norway spruce has a mapped genome size of 3326.3 centiMorgan (cM) [33], which is larger compared to~2100 cM in white spruce (Picea glauca (Moench) Voss) [56]. There were about 5.6 SNPs per contig/gene on average based on 20,695 contigs/ genes that our SNPs come from. This translates to an average genome coverage of~6.2 markers/cM. If we use all independent 77,116 SNPs, then, it amounts to 3.7 SNPs per contig used.
Accuracy and PA obtained from the subset of markers with the largest positive effects were slightly higher than those from a subset of markers based on random selection, implying that using the subset of markers with the largest effects in genomic regions with small LD decay could track relationship more effectively than random markers. It also implies that using the subset of markers with the largest positive effect could also obtain some effects based on part of the short-range LD [13,45]. Thus, this factor could also be potentially useful to reduce genotyping the number of SNPs and GS cost, in highly structured population when the genome locations of markers are known.

Efficiency of genomic selection
In the present study, we observed that the efficiency of GS per year is greater than that based on traditional progeny test selection if GS is used to half generation time. In traditional pedigree-based progeny selection, the generation time for Norway spruce in Northern Sweden is at least 25 years, which is based on clonal replicated progeny trial of selected breeding trees. The clonal based progeny test procedure includes: seed sowing and growing to a sufficient size (2 yrs), vegetative propagation of seed plants by rooted cuttings (2 yrs), testing in field trials (15 yrs), and assessment of trials (1 yr). The final stage, the completion of a crossing scheme to create the next generation, is about 5 yrs. If we could omit the progeny testing of the first three stages (a total of 19 yrs) and complete flowering induction and mating within 15 years of GS selection, the time for a breeding cycle could be halved for Norway spruce. In this study, we only considered the efficiency of GS based on the timing of the Fig. 6 Within contigs LD decay estimated from 517 related individuals in Baison et al. [54] breeding cycle. Considering the cost for field testing, the benefit of GS might be higher if the cost for establishing and maintaining 3-4 progeny trials in each breeding population is more than genotyping.

Conclusions
As expected, the main advantage of genomic selection is the potential to shorten the breeding cycle. We observed that: 1. Using G matrix slightly altered heritability relative to A matrix with a slight increase for tree height and a decrease for wood quality traits. 2. ABLUP is about 11-14% more efficient than GBLUP for tree height and wood quality traits, while the four GS methods (GBLUP, BRR, BLASSO, RKHS) had a similar accuracy. 3. Efficiency of GS increased from 49 to 97% among four growth and wood quality traits if GS could reduce the generation time for a breeding cycle to half. 4. The GS accuracy improved from the TS/VS ratios of 1:1 to 3:1, but accuracy only improved marginally after a TS/VS ratio of 3:1. 5. The number of families and the number of trees per family also had an effect on GS efficiency. Wood quality traits need fewer number of trees withinfamily than tree height for a similar GS efficiency. 6. GS accuracy decreased from full-sib to half-sib family structure and from half-sib to unrelated family structure. The number of markers need to be increased greatly for half-sib family structure to have a similar efficiency to the full-sib family structure. 7. GS accuracy increased as the number of markers increased. The accuracy almost reached a plateau when the number of markers increased to 4 K-8 K for all the traits.