Sampling of plant material
In this study, 1370 individuals were selected from two 28yearold controlpollinated progeny trials with the same 128 families from a partial diallel mating design that consisted of 55 parents originating from Northern Sweden. 5–20 trees per family per site were selected in Vindeln (64.30°N, 19.67°E, altitude: 325 m) and Hädanberg (63.58°N, 18.19°E, altitude, 240 m). Buds and the first year fresh needles from 46 parents were sampled in a grafted archive at Skogforsk, Sävar (63.89°N, 20.54°E) and in a grafted seed orchard at Hjssjö (63.93°N, 20.15°E). Progenies were raised in the nursery at Sävar, and the trials were established in 1988 by Skogforsk in Vindeln and Hädanberg.
A completely randomized design without designed preblock was used in the Vindeln trial (site 1), which was divided into 44 postblocks. Singletree plot with a spacing of 1.5 m × 2 m was used in each rectangular block with 60 trees (6 × 10). The same design was also used in the Hädanberg trial (site 2) with 44 postblocks, but for the purpose of demonstration, there was an extra design with 47 extra plots, each plot with 16 trees (4 × 4). Based on the spatial analysis, in the final model, 47 plots were combined into two big postblocks.
Phenotyping
Tree height was measured in 2003 at the age of 17 years. Solidwood quality traits including Pilodyn penetration (Pilodyn) and acoustic velocity (velocity) were measured in October 2016. Surrogate wood density trait was measured using Pilodyn 6 J Forest (PROCEQ, Zurich, Switzerland) with a 2.0 mm diameter pin, without removing the bark. Velocity is highly related to microfibril angle (MFA) in Norway spruce [27] and was determined using Hitman ST300 (Fibergen, Christchurch, New Zealand). By combining the Pilodyn penetration and acoustic velocity, indirect modulus of elasticity (MOE) was estimated using the equation developed by Chen et al. [27].
Genotyping
Total genomic DNA was extracted from 1, 370 controlpollinated progeny and their 46 unrelated parents using the Qiagen Plant DNA extraction protocol with DNA quantification performed using the Qubit® ds DNA Broad Range Assay Kit, Oregon, USA. Probe design and evaluation were described in Vidalis et al. [28]. Sequence capture was performed using the 40,018 probes previously designed and evaluated for the materials [28] and samples were sequenced to an average depth of 15x at an Illumina HiSeq 2500 platform. Raw reads were mapped against the P. abies reference genome v1.0 using BWAmem [29, 30]. SAMTools [31] and Picard [32] were used for sorting and removal of PCR duplicates and the resulting BAM files were subsequently reduced to containing the probe only bearing scaffolds (24,919) before variant calling. Variant calling was performed using the Genome Analysis Toolkit (GATK) HaplotypeCaller [32] in Genome Variant Call Format (gVCF) output format. Samples were then merged into batches of ~ 200 before all samples were jointly called.
As per the recommendations from GATK’s best practices, Variant Quality Score Recalibration (VQSR) method was performed in order to avoid the use of hard filtering for exome/sequence capture data. The VQSR method utilizes machinelearning algorithms to learn from a clean dataset to distinguish what a good versus bad annotation profile of variants for a particular species should be like. For the VQSR analysis two datasets were created, a training subset and the final input file. The training dataset was derived from a Norway spruce genetic mapping population with loci showing expected segregation patterns [33]. The training dataset was designated as true SNPs and assigned a prior value of 12.0. The final input file was derived from the raw sequence data using GATK best practices with the following parameters: extended probe coordinates by + 100 excluding INDELS, excluding LowQual sites, and keeping only biallelic sites. The following annotation parameters QualByDepth, MappingQuality and BaseQRankSum, with tranches 100, 99.9, 99.0 and 90.0 were then applied to the two files for the determination of the good versus bad variant annotation profiles.
The recalibrated Variant Call Format was filtered based on the following steps: (1) removing indels; (2) keeping only biallelic loci; (3) treating genotype with a genotype quality (GQ) < 6 as missing; (4) filtering read depth (DP) < 2; (5) removing individual call rate < 50%; (6) removing variant call rate (“missingness”) < 90%; (7) minor allele frequency (MAF) < 0.01. After steps 1–4, we calculated discordance between 148 pairs technique replicates, the average discordance was less 1%. Thus, conditions of GQ < 6 and DP < 2 as missing are sufficient to do downstream analysis. After all filtering, 116,765 SNPs were kept for downstream analysis and 77,116 SNPs were independent based on LD (r^{2} < 0.2) calculated in PLINK [34]. The resultant SNPs were annotated using the default parameters for snpEff 4. The Ensembl general feature format (GTF, gene sets) information for the P. abies genome was utilized to build an annotation database. This analysis revealed that 90% of the variants were located within gene coding regions, with only 10% variants in intronic regions.
LD Knearest neighbour genotype imputation approach [35] was used to impute missing genotypes in TASSEL 5 [36]. After several rounds of imputation, a few of missing genotypes were imputed using random imputation of the codeGeno function in the synbreed package in R [37]. In total, 5.9% of missing genotypes were imputed.
Estimating breeding values
Breeding values for the genotypes in the trials were predicted using the following model:
$$ y= X\beta + Wb(s)+{Z}_1a+e $$
(1)
Where y is a vector of phenotypic observations of a single trait; β is a vector of fixed effects, including a grand mean and site effects, b(s) is a vector of postblock within site effects, a is a vector of site by additive effects of individuals. X, W, and Z_{1} are incidence matrices for β, b(s), and a, respectively. For joinsite crossvalidation, the average of breeding values was assumed as estimated (true) breeding values (EBVs) when unstructured variance and covariance were used for additive effects. Otherwise, EBVs in the single site were assumed as true or reference breeding values. The random additive effects (a) in equation (1) were assumed to follow \( a\sim N\left(0,A\left[\begin{array}{cc}{\sigma}_{a1}^2& {\sigma}_{a12}\\ {}{\sigma}_{a12}& {\sigma}_{a2}^2\end{array}\right]\right) \), where A is the additive genetic relationship matrix, \( {\sigma}_{a1}^2 \) and \( {\sigma}_{a2}^2 \) are the additive genetic variances for site 1 and site 2, respectively, \( {\sigma}_{a12} \) is additive genetic covariance between site 1 and site 2. The residual e was assumed to follow \( e\sim N\left(0,\left[\begin{array}{cc}{I}_{n1}{\sigma}_{e1}^2& 0\\ {}0& {I}_{n2}{\sigma}_{e2}^2\end{array}\right]\right) \), where \( {\sigma}_{e1}^2 \) and \( {\sigma}_{e2}^2 \) are the residual variances for site 1 and site 2, I_{n1} and I_{n2} are identity matrices, n1 and n2 are the number of individuals in each site, 0 is the zero matrix.
To obtain accurate heritability estimates for jointsite models, we used the equation as following:
$$ y= X\beta +\mathrm{W}b(s)+{Z}_1a+{Z}_2 sa+e $$
(2)
where sa is a vector of sitebyadditive interaction effects, a, sa, and e were assumed to be homogenous between the two sites.
Statistical analyses for genomic predictions
GBLUP, Bayesian ridge regression (BRR), BLASSO, and reproducing kernel Hilbert space (RKHS) were used to estimate GEBVs. We implemented GBLUP calculations using ASReml R [38]. And we implemented the BRR, BLASSO, and RKHS methods using BGLR function from the BGLR package in R [39]. The details of these statistical methods will be defined later. The GEBVs were estimated using the following mixed linear model:
$$ {y}^{\prime }= X\beta + Za+e $$
(3)
Where y’ is a vector of adjusted phenotypic observations by postblock effects and standardized site effect (transforming it to have zero mean and unit variance for each site), β is a vector of fixed effect, including a grand mean), a and e are vectors of random additive and random error effects, respectively, and X and Z are the incidence matrices.
The four genomicbased best linear unbiased prediction methods were compared with the traditional pedigreebased best linear unbiased prediction (ABLUP)

1)
ABLUP
The ABLUP is the traditional method that utilizes a pedigree relationship matrix (A) to predict the EBVs. For ABLUP the vector of random additive effect (a) in equation (3) is assumed to follow a normal distribution \( a\sim N\left(0,A{\sigma}_a^2\right) \), where \( {\sigma}_a^2 \) is the additive genetic variance. The residual vector e is assumed as \( e\sim N\left(0,I{\sigma}_e^2\right) \), where I is the identity matrix. The mixed model equation (3) was solved to obtain EBVs as:
$$ \left[\begin{array}{cc}{X}^{\prime }X& {X}^{\prime }Z\\ {}{Z}^{\prime }X& {Z}^{\prime }Z+{A}^{1}\alpha \end{array}\right]\left[\begin{array}{c}b\\ {}u\end{array}\right]=\left[\begin{array}{c}{X}^{\prime }y\\ {}{Z}^{\prime }y\end{array}\right] $$
(4)
The scalar α is defined as \( \alpha ={\sigma}_e^2/{\sigma}_a^2 \), where \( {\sigma}_e^2 \) is the residual variance, \( {\sigma}_a^2 \) is the additive genetic variance.

2)
GBLUP
The GBLUP model is the same as ABLUP, with the only difference being that the genomic relationship matrix (G) replaces the A matrix. The G matrix is calculated as \( G=\frac{\left(MP\right){\left(MP\right)}^T}{2{\sum}_{i=1}^q{p}_i\left(1{p}_i\right)} \), where M is the matrix of samples with SNPs encoded as 0, 1, 2 (i.e. the number of minor alleles), P is the matrix of allele frequencies with the i^{th} column given by 2(p_{i} − 0.5), where p_{i} is the observed allele frequency of all genotyped samples. In GBLUP, the random additive effect (a) in equation (3) is assumed to follow \( a\sim N\Big(0,G{\sigma}_g^2 \)), where \( {\sigma}_g^2 \) is the genomicbased genetic variance and GEBVs (\( \widehat{a} \)) are predicted from equation (4), but with A^{− 1} replaced by G^{− 1} and \( {\sigma}_a^2 \) replaced by \( {\sigma}_g^2 \). The inverse of G matrix was estimated using write.realtionshipMatrix function in the synbreed package in R [40].

3)
Bayesian ridge regression (BRR)
BRR is a Bayesian version of ridge regression with the shrinkage merit that was originally intended to deal with the problem of high correlation among predictors in linear regression models [41]. The random additive vector a is assigned a multivariate normal prior distribution with a common variance to all marker effects, that is \( a\sim N\Big(0,{I}_p{\sigma}_m^2 \)), where p is the number of markers. Parameter \( {\sigma}_m^2 \) denotes the unknown genetic variance contributed by each individual marker and is assigned as \( {\sigma}_a^2\sim {\chi}^{2}\left({df}_a,{S}_m\right) \), where df_{a} is degrees of freedom, S_{m} is the scale parameter. Finally, the residual variance is assigned as \( {\sigma}_e^2\sim {\chi}^{2}\left({df}_e,{S}_e\right) \), where df_{e} is degrees of freedom for residual variance, S_{e} is the scale parameter for residual variance.

4)
Bayesian LASSO (BLASSO)
BLASSO is a Bayesian version of LASSO regression with two properties of LASSO: 1) shrinkage and 2) variable selection. BLASSO assumes that the random additive effects in equation (3) is given a multivariate normal distribution with marker specific prior variance, which is assigned as \( a\sim N\left(0,T{\sigma}_m^2\right) \), where \( T=\mathit{\operatorname{diag}}\left({\tau}_1^2,\dots, {\tau}_p^2\right) \), Parameter \( {\tau}_j^2 \) is assigned as \( {\tau}_j^2\sim Exp\left({\lambda}^2\right) \) for j = 1,…, p, where λ^{2} is assigned as λ^{2}~Gamma(r, δ). The residual variance is assigned as \( {\sigma}_e^2\sim {\chi}^{2}\ \left({df}_e,{S}_e\right) \), where df_{e} is degrees of freedom, S_{e} is the scale parameter.

5)
RKHS
RKHS assumes that the random additive marker effects in equation (3) are distributed as a~N(0, \( \overline{K}{\sigma}_a^2 \)), where \( \overline{K} \) is computed by means of a Gaussian Kernel that is given by K_{ij} = exp(−hd_{ij}), where h is a semiparameter that controls how fast the prior covariance function declines as genetic distance increase and d_{ij} is the genetic distance between two samples computed as \( {d}_{ij}={\sum}_{k=1}^p{\left({x}_{ik}{x}_{jk}\right)}^2 \), where x_{ik} and x_{jk} are the k^{th} SNPs for the i^{th} and j^{th} samples, respectively [42]. RKHS method uses a Gibbs sampler for the Bayesian framework and assigned the prior distribution of \( {\sigma}_a^2 \) and \( {\sigma}_e^2 \) as \( {\sigma}_a^2\sim {\chi}^{2}\ \left({df}_a,{S}_a\right) \) and \( {\sigma}_e^2\sim {\chi}^{2}\ \left({df}_e,{S}_e\right) \), respectively. Here we chose a SingleKernel model as suggested by Perez and de Los Campos [39], where h value was defined as h = 0.25.
Model convergence and prior sensitivity analysis
The algorithm is extended by Gibbs sampling for estimation of variance components. The Gibbs sampler was run for 150,000 iterations with a burnin of 50,000 iterations. A thinning interval was set to 1000. The convergence of the posterior distribution was verified using trace plots. Flat priors were given to all the models.
Heritability and typeB genetic correlation estimates
Pedigreebased individual narrowsense heritabilities (\( {h}_a^2 \)) and markerbased individual narrowsense heritabilities (\( {h}_g^2\Big) \) were calculated as
$$ {h}_a^2=\frac{\sigma_a^2}{\sigma_{pa}^2},{h}_g^2=\frac{\sigma_g^2}{\sigma_{pg}^2} $$
respectively, where \( {\sigma}_a^2 \) is the pedigreebased additive variance estimated from ABLUP, while \( {\sigma}_g^2 \) is the markerbased additive variance estimated from GBLUP. \( {\sigma}_{pa}^2 \) and \( {\sigma}_{pg}^2 \) are phenotypic variances for pedigreebased and markerbased models, respectively. TypeB genetic correlation was calculated as \( {r}_{12}={\sigma}_{a12}/\sqrt{\sigma_{a1}^2{\sigma}_{a2}^2} \), where σ_{a12} is covariance between additive effects of the same traits in different sites and \( {\sigma}_{a1}^2\ \mathrm{and}\ {\sigma}_{a2}^2 \) are estimated additive variances for the same traits in different sites, respectively [43]. Onetailed likelihood ratio test (LRT) was used to check the significance of the TypeB genetic correlation against one.
Model validation and estimation of GS accuracy
Tenfold crossvalidation (90% of training and 10% validation) was performed for all the models, except in testing the various sizes of training data sets and the number of trees per family. GEBVs in VS for Bayesian and RKHS methods were estimated as
$$ \widehat{g_i}={\sum}_{j=1}^n{Z}_{ij}^{\prime }{\widehat{a}}_j $$
Where \( {Z}_{ij}^{\prime } \) is the indicator covariate (− 1, 0, or 1) for the i^{th} tree at the jth locus and \( {\widehat{a}}_j \) is the estimated effect at the j^{th} locus. In this study, prediction accuracy (accuracy) was defined as the Pearson correlation between crossvalidated GEBVs and EBVs as “true” or reference breeding values estimated from ABLUP using all the phenotypic data (y). Predictive ability (PA) was defined as the Pearson correlation between GEBVs and adjusted phenotypic values y’ in equation (3).
Testing different statistical models and the size of training set on GS accuracy
To test the effect of different statistical models, we used the five models (ABLUP, GBLUP, BRR, BLASSO, and RKHS) and five TS/VS sizes (ratio of 1:1, 3:1, 5:1, 7:1, and 9:1 of training/validation population size, respectively) to evaluate the accuracies of different models. For each of 25 models, 10 replicate runs were carried out for each scenario of four traits.
Testing the number of families on GS accuracy
To test the effect of the number of families on GS, we randomly selected family numbers from 10 to 120 to test the efficiency of GS. The purpose is to examine the efficiency of using a small subset of families in clonal selection.
Testing the number of trees per family on GS accuracy
To test the effect of the number of trees per family on GS, we randomly selected 1 to 20 trees per family as TS, remaining trees as VS.
Testing site effect on GS accuracy
In order to consider genotype and environment interaction (G × E) effect on GS, different GS scenarios were tested: 1) withinsite GS, both training and validation sets are from one single site, where EBVs from single site model including G × E term was assumed as true breeding values of the site; 2) crosssite GS, using one site data as TS to predict GEBVs in another site; 3) jointsite GS, average EBVs were assumed as true breeding values when unstructured additive variancecovariance were used in equation (1) .
Testing the effect of relatedness on GS accuracy
In order to test whether the different relatedness (family structures) affect the accuracy and PA, three different scenarios were used in this study. There were: 1) TS and VS selected based on random sampling, but more likely from the same fullsib families; 2) for comparison purpose, TS and VS selected based on halfsib family structure in which the TS and VS shared female parents, but from different families; 3) for comparison purpose, TS and VS selected based on unrelated family structure in which the TS and VS shared different female and male parents.
Testing subset of markers (SNPs) on GS accuracy
To test the impact of the number of SNPs on the accuracy of genomic prediction models, 15 subsets of SNPs (10, 25, 50, 100, 250, 500, 750, 1 K, 2 K, 4 K, 6 K, 8 K, 10 K, 50 K, and 100 K) and two different types of sampling strategies: 1) randomly selected SNP subsets and 2) SNP subsets selected with the largest positive effects were implemented. The single marker effects were estimated using singlemarker regression for association testing in training data set. To examine the impact of relatedness on the number of markers, we used fullsib and halfsib family structures between training and validation populations.
Selection response with genomic selection
Selection response could be calculated as the ratio between selection accuracy and breeding cycle length in years.
The relative efficiency of GS to traditional BLUPbased selection (TBS) is
$$ \mathrm{RE}=\frac{\mathrm{r}\left({\mathrm{GEBVs}}_{\mathrm{GS}},\mathrm{EBVs}\right)}{\mathrm{r}\left({\mathrm{EBVs}}_{\mathrm{TBS}},\mathrm{EBVs}\right)} $$
Thus, the relative efficiency of GS to TBS per year is
$$ \mathrm{RE}/\mathrm{year}=\frac{\mathrm{r}\left({\mathrm{GEBVs}}_{\mathrm{GS}},\mathrm{EBVs}\right)}{\mathrm{r}\left({\mathrm{EBVs}}_{\mathrm{T}\mathrm{BS}},\mathrm{EBVs}\right)}\times \frac{{\mathrm{T}}_{\mathrm{T}\mathrm{S}}}{{\mathrm{T}}_{\mathrm{GS}}} $$
Where EBVs are the estimated breeding values using the full data from equation (1), and T_{TBS} and T_{GS} are the breeding cycle lengths under TBS and GS, respectively [15]. We assumed that the T_{GS} is reduced to 12.5 years from the current approximately 25 years of a breeding cycle by omitting or reducing progeny testing time about 10–15 years.