Evaluation of the efficiency of genomic versus pedigree predictions for growth and wood quality traits in Scots pine

Genomic selection (GS) or genomic prediction is a promising approach for tree breeding to obtain higher genetic gains by shortening time of progeny testing in breeding programs. As proof-of-concept for Scots pine (Pinus sylvestris L.), a genomic prediction study was conducted with 694 individuals representing 183 full-sib families that were genotyped with genotyping-by-sequencing (GBS) and phenotyped for growth and wood quality traits. 8719 SNPs were used to compare different genomic with pedigree prediction models. Additionally, four prediction efficiency methods were used to evaluate the impact of genomic breeding value estimations by assigning diverse ratios of training and validation sets, as well as several subsets of SNP markers. Genomic Best Linear Unbiased Prediction (GBLUP) and Bayesian Ridge Regression (BRR) combined with expectation maximization (EM) imputation algorithm showed slightly higher prediction efficiencies than Pedigree Best Linear Unbiased Prediction (PBLUP) and Bayesian LASSO, with some exceptions. A subset of approximately 6000 SNP markers, was enough to provide similar prediction efficiencies as the full set of 8719 markers. Additionally, prediction efficiencies of genomic models were enough to achieve a higher selection response, that varied between 50-143% higher than the traditional pedigree-based selection. Although prediction efficiencies were similar for genomic and pedigree models, the relative selection response was doubled for genomic models by assuming that earlier selections can be done at the seedling stage, reducing the progeny testing time, thus shortening the breeding cycle length roughly by 50%.


Background
Genomic prediction or genomic selection (GS) was proposed by Meuwissen et al. [1] as a methodology to use genome-wide dense marker information to estimate genetic values for selection of breeding populations. The main difference between GS and previous approaches, such as marker-assisted selection (MAS), is that in MAS a requirement is to identify quantitative trait loci (QTL) first by linkage disequilibrium (LD) in breeding varieties and then use them as candidate genes for selection, whereas in GS it is not necessary to detect QTL and their significance, using markers prior to selection [2].
The application of GS requires phenotypic data and marker information of a training population (TP) of individuals that are used to develop prediction models to calculate genomic estimated breeding values (GEBV). GEBV are then validated through a validation population (VP) of individuals, or selection candidates, which are genetically related to the TP, and for which only marker data is available to predict their own GEBV [3][4][5]. Since the introduction of GS, many simulations and experimental results in animal-, crop-and tree breeding have shown the potential of GS to estimate genetic values, to shorten breeding cycles, to increase selection intensities and capture Mendelian segregation effects in order to increase genetic gains [6][7][8].
Practical application of GS in animal and crop breeding programs would not have been possible without the rapid and cost-effective development of next-generation sequencing (NGS) technologies which, consequently, have accelerated the discovery of thousands of single nucleotide polymorphisms (SNP) markers [9][10][11]. Among all the NGS technologies nowadays available, genomewide SNP arrays had been shown as preferable for their reproducibility, manageability and storage logistics, as well as their cost efficiency for breeding programs [2]. However, genome-wide SNP arrays require the availability of a reference genome to compare, contrast and detect SNP markers from the genome of the population of interest.
Scots pine (Pinus sylvestris L.) is the most widely distributed pine in the world [23,24]. It is also a highly impor-tant commercial species in Europe, particularly in northern countries [25], being the second foremost species for wood production in Sweden [26]. Despite its importance, neither a reference genome, nor a SNP array, are at present available for the species. While the first draft of the Scots pine genome is currently ongoing [27], and until a SNPchip is developed for the species, it is necessary to use other NGS methodologies in the meantime.
GBS uses restriction enzymes based complexity reduction sequencing method suited for complex, large genomes. GBS utilizes a barcoding system for multiplex sequencing, which increases its efficiency and reduces the genotyping costs [28,29]. GBS can generate a very large number of SNPs but also produces significant amount of missing data. The latter can be solved with the aid of different imputation methods, such as mean imputation (MI), expectation maximization (EM), family-based k-nearest neighbor (kNN-Fam) or singular value decomposition (SVD) [30,31]. The imputation with EM algorithm developed at the R package rrBLUP, was specially designed for GBS data assuming that markers follow a multivariate normal distribution and are imputed based on a realized relationship (averaged over all markers), resulting in higher accuracies of GEBV [32,33]. GBS marker information has been successfully used for parentage reconstruction in Scots pine [34], as well as to perform genomic predictions studies in livestock [35,36], maize [37], wheat [33], soybean [38], Picea glauca (Moench) Voss × Picea engelmannii Parry ex Engelm. [39] and radiata pine [40]. Therefore, GBS is an attractive technology that can be used to perform GS and genome-wide association studies (GWAS) for Scots pine [41,42].
A typical Scots pine breeding program consists of a combination of several selection strategies, essentially conventional progeny testing and breeding value prediction based on pedigree information and reliable phenotypic assessments, at age of 10-15 years. Usually the breeding cycle takes roughly 36 years when the selection strategy is backward selection based on polycross progeny tests of full-sibs, or 21 years when the strategy is forward selection [43]. One of the greatest advantages of GS in conifers is the potential to reduce the length of the breeding cycle, for example by shortening field progeny test time through early evaluation of greenhouse seedlings, based on molecular marker information. Furthermore, selection intensities can increase and therefore higher genetic gains per unit time could be achieved [2,44,45].
Traditional breeding value predictions consist on generating kinship coefficients between relatives to estimate the numerator relationship matrix (NRM), based on pedigree, i.e., a relationship matrix based on the expected proportion of the genome shared by two individuals [46,47]. The NRM is then used in a Best Linear Unbiased Prediction (BLUP) analysis [48] to calculate Estimated Breeding Values (EBV). On the other hand, EBV can be estimated by replacing the NRM in the BLUP analysis with a genomic realized relationship matrix (GRM or RRM), generated with the kinship coefficients or realized proportion of the genome shared between individuals, computed based on the marker information, i.e., the number of loci shared between individuals [45,49]. Hence, relationships between individuals are more accurately estimated given that the markers can account for Mendelian inconsistencies and for the contemporary and historical pedigree [2,50,51], if the number of markers is enough to path the identical-by-descent (IBD) status across the genome [52].
Multiple statistical methods are available to estimate GEBV. Genomic Best Linear Unbiased Prediction (GBLUP) is based on coancestry and the infinitesimal model in quantitative genetics, assuming that QTL allelic effects are normally distributed, and all have a similar contribution to the genetic variance. Conversely, most of the Bayesian approaches presume a prior non-normal distribution of QTL allelic effects (gamma or exponential distribution), thus the variance at each locus can vary [1,49,53]. For instance, Bayesian LASSO (BL) assumes that QTL effects follow a Laplace (or double exponential) distribution [54]. Nevertheless, Bayesian ridge regression (BRR) assigns QTL effects to a multivariate normal prior distribution with a common variance, which is modelled hierarchically through a scaled inverted chi-squared distribution [53,55,56].
The accuracy of GS predictions depends on the model selected, but also on other factors such as the level of LD, heritability of the trait, effective population size (Ne), TP size, density and amount of the SNP markers, and distribution of QTL effects [3,7,57]. Generally large Ne (i.e., low LD between SNP markers and QTL) normally decreases the precision of the GS models, as well as a small number of individuals in the TP or the low heritability of the trait of interest [45,58,59]. Increasing the number of high density markers and the size of TP can improve the efficiency of the GS models to a certain extent [60][61][62].
Despite the great number of articles published during the recent years on genomic prediction on forest species, different methodologies have been used to assess the effectiveness or accuracy of predictions, which complicates the comparison of different models and reliabilities between them. By definition, accuracy is the correlation between true breeding values (TBV) and EBV [63], but TBV are never known, therefore approximations to TBV need to be used. The most common methods used in tree breeding to estimate efficiency of genomic prediction models are, 1) the predictive ability (r 1 ) estimated as the Pearson product-moment correlation between the cross-validated GEBV and phenotypes, 2) predictive accuracy (r 2 ) estimated as r 1 scaled by the square root of heri-tability, 3) theoretical accuracy (r 3 ) which is the square root of reliability (i.e., squared correlation between TBV and EBV) [63,64], and 4) the Accuracy or prediction accuracy (r 4 ) defined as the Pearson product-moment correlation between the cross-validated GEBV and the pedigree based EBV (PEBV) estimated from PBLUP (pedigree based Best Linear Unbiased Prediction). Generally, r 4 showed the highest values whereas r 1 showed the lowest ones, when compared with the remaining methods, for instance in eucalypt hybrids (Eucalyptus urophylla × Eucalyptus grandis) [65], maritime pine (Pinus pinaster Ait.) [66,67], Norway spruce [68], or Eucalyptus nitens [69].
The genomic model may influence the effectiveness of the estimates and Bayesian approaches may seem more appropriate as they can accommodate different distributions of the allelic effects, however the literature on GS in forest trees showed similar results for most models. For instance, Chen et al. [68] observed similar r 1 and r 4 among four genomic prediction models (GBLUP, BRR, BL and reproducing kernel hilbert space (RKHS) in Norway spruce. Isik et al. [66] detected similar r 1 in maritime pine comparing GBLUP, BRR and BL prediction models. Although GBLUP and ridge regression BLUP (rrBLUP) were recommended by Tan et al. [61] for their computational advantages in a eucalypt hybrid study, similar r 1 were noted for GBLUP, rrBLUP, BL and RKHS. In an interior spruce study, Ratcliffe et al. [70] stated similar r 4 for rrBLUP and BayesCπ, which in turn performed better than the generalized ridge regression (GRR), whereas Thistlethwaite et al. [71] observed almost identical predictions with rrBLUP and GRR in Douglas-fir (Pseudotsuga menziensii Mirb. (Franco)). On the contrary, Resende et al. [58] observed better r 1 for disease resistance in a loblolly pine study with Bayesian methods when compared with BLUP-based methods.
The objective of this study was to assess, as proof-ofconcept for Scots pine, the effectiveness (or efficiency) of genomic versus pedigree predictions for growth and wood quality traits, using two imputation algorithms combined with four prediction models (GBLUP, BL, BRR and PBLUP) and comparing four methods to assess efficiencies (r 1 , r 2 , r 3 and r 4 ) under several training and validation population scenarios as well as with different numbers of SNPs.

Heritabilities
Narrow sense heritability estimates based on PBLUP were slightly higher than those based on GBLUP, excluding DBH2 (diameter at breast height assessed at 36 years old) which was higher for GBLUP-EM (Table 1). MOEs showed the same heritability for PBLUP and GBLUP-EM. GBLUP heritability estimates calculated from the RRM derived from EM imputation method were higher than those derived from the RND imputation method for almost all traits, except Ht1 (tree height measured at 10 years old) and MOEd. Standard errors were similar for growth traits regardless of the BLUP method used but they were always lower when derived from GBLUP methods. Among the genomic models, r 1 , r 2 and r 3 were larger for traits with higher narrow sense heritabilities (Ht2, DEN, MOEd and MOEs) than for traits with low narrow sense heritabilities (Ht1, DBH1, DBH2 or MFA microfibril angle) ( Table 2). Moreover, across all genomic models and imputation methods, a positive linear correlation between r 1 , r 2 , r 3 and trait heritabilities was observed (respectively r=0.97, p<0.0001; r=0.77, p<0.0001 and r=0.78, p<0.0001), but not between r 4 and heritabilities (r=0.15, p=0.3) (Fig. 1).

Prediction efficiency of the different models
Through 10-fold cross-validation, r 1 , r 2 and r 4 were estimated for all models and imputation methods, and addi-tionally r 3 was also estimated for GBLUP and PBLUP ( Table 2). The h (square root of heritability) estimated from the GBLUP-EM using the full data, was used to calculate r 2 , since the RRM captures IBD and identicalby-state (IBS) status between the individuals and can be considered a better estimation than the one from PBLUP.
The lowest prediction efficiency estimates were obtained for r 1 (0.19-0.44) and the highest for r 4 (0.66-0.84) for all traits, regardless of the model and imputation method used ( Table 2). The genomic prediction models performed similarly for all the different calculations of the efficiency (r 1 to r 4 ) for most traits, except in terms of r 2 for which BRR-EM showed slightly higher estimations for wood traits and diameter, compared with the other genomic prediction methods. Concerning r 3 , GBLUP-EM showed higher estimations than GBLUP-RND and PBLUP for all traits. Similarly, among all the genomic models r 4 showed higher values for GBLUP-EM for all traits. In summary, although the best r 4 were observed with PBLUP for all traits, genomic prediction models performed similarly or slightly better than PBLUP regarding Table 2 Prediction efficiencies of genetic models for eight phenotypic traits. Four prediction efficiencies (r 1 -predictive ability, r 2predictive accuracy, r 3 -theoretical accuracy, and r 4 -prediction accuracy, and their standard errors) for eight traits based on pedigree (PBLUP), and three genomic models (GBLUP, BL, and BRR) combined with two imputation methods (EM and RND)

Relative size effect of the training and validation populations
All models showed similar increasing patterns of r 1 , r 2 , r 3 and r 4 as the number of individuals in the TP increased, for all traits (Fig. 2). The lowest r 1 , r 2 , r 3 and r 4 were observed when half of the individuals were assigned to the TP. However, BRR-EM model only reached its highest r 1 , r 2 and r 4 when 90% of individuals were assigned at the TP, for all traits except Ht2 (height measured at 30 years old). In terms of r 1 all the remaining models perform similarly when TP size was between 70-90% of individuals, for almost all traits (Fig. 2a).
No clear pattern was observed regarding r 2 , with the peak for genomic models when 90% of individuals were allocated to the TP, whereas PBLUP for wood traits, by contrast, showed a plateau for a TP containing 70-90% of the individuals (Fig. 2b). Concerning r 3 , similar efficiencies were seen when TP included 70 to 90% of individuals for GBLUP-EM, performing better than GBLUP-RND and PBLUP (Fig. 2c).
Among all methods, PBLUP had the highest r 4 for all eight traits regardless of the TP ratio (Fig. 2d), whereas genomic models performed similarly. Nevertheless, unlike the Bayesian models and GBLUP-RND that required 80-90% of individuals to be allocated to the TP to reach the highest r 4 , GBLUP-EM needed a subsample of 70% or 80% individuals as TP for almost all traits.

Effect of increasing number of markers on predictions
The impact of the different subsets of SNPs was tested on BRR-EM and BL-EM models since such models consider different distribution of the QTL allelic effects. For instance, the variance at each locus can change, thus the effect of the SNP subsets on the model and its ability a b c d Fig. 1 Regression plots between all genomic models prediction efficiencies and narrow sense heritability ( h 2 ). Prediction efficiencies: a) r 1predictive ability, b) r 2 -predictive accuracy, c) r 3 -theoretical accuracy, and d) r 4 -prediction accuracy to predict BVs can be easily observed. The EM imputation method was selected due to the slightly higher values showed in the previous sections. The r 1 , r 2 and r 4 increased for all traits as the number of SNPs rose (Fig. 3). However, for almost all traits, the greatest increase on r 1 and r 2 was attained when the subset of markers was 1000 SNPs, yet some oscillations were observed for both models at SNPs subsets of 500 to 6K for almost all traits ( Fig. 3a and b). Although BRR and BL models had different patterns for the different number of SNPs, they showed similar r 1 and r 2 between 6K and 8K SNPs for most of the traits. Ht2, MOEs (static modulus of elasticity), DEN (density) and MOEd (dynamic modulus of elasticity) had the highest r 1 and r 2 .
Conversely, r 4 nearly followed an identical pattern of ascent for both models, as the number of SNPs increased (Fig. 3c). However, r 4 kept almost constant around the same value at a range of 3K to 8k SNPs, for both models and all traits.
In short, for both models, the highest values of r 4 and r 2 were observed at SNP subsets that varied between 6K 8K, while within the range 3K 8K SNPs, no substantial increase was identified in r 4 for any of the traits.

Relative selection response of GS
The relative genomic selection response, RSR GS:PS , was estimated for each genomic selection model (GBLUP, BRR and BL) considering only the EM imputation method since this method showed equal or slightly higher efficiencies (between 0.00-0.07) than RND method. The Swedish Scots pine breeding cycle combines several selection strategies sorted in two groups, according to their cycle lengths [43]. For the first group of strategies (i.e., backward selection) the cycle length takes up to 36 years, whereas for the second group of strategies (i.e., forward selection) it takes up to 21 years. In order to estimate RSR GS:PS , it was assumed that GS could help to reduce the cycle lengths to 18 and 11 years following two approaches. The first approach assumed that the cycle could be reduced to 18 years, shortening the progeny test time but considering that female flowering starts at 15-18 years in Scots pine [24]. The second approach presupposed that earlier flowering greenhouse stimulation [72] would produce female flowering at an earlier age in Scots pine, thus the breeding cycle could be reduced to about 11 years. In addition and for both approaches a reduction in the progeny test time was also assumed. The percentage of increase in selection efficiency for all traits and models showed the potential of GS when reducing the breeding cycle by 50% or more (Table 3). For backward selection strategy and aided by GS, a reduction of 50% in the breeding cycle length (i.e., from 36 to 18 years) resulted in percentages of increased selection efficiencies between 57.1%-143.5%. Moreover, a further reduction in the breeding cycle length of more than 50% (i.e., from 36 to 11 years) assisted by GS and flowering greenhouse stimulation, increased considerably the selection efficiency, being greater for prediction efficiencies r 1 , r 2 and r 3 (195.6%-298.4%) than for r 4 (157.1%-206.8%). On the other hand, for forward selection, a reduction of 3 years in the breeding cycle aided by GS showed small percentages of increase in selection efficiencies for r 1 , r 2 and r 2 ((5.4%-42.0%). Furthermore, this small reduction in the cycle length, showed that traditional phenotypic selection would be more effective, for some traits, in terms of r 4 given the low percentages of selection efficiencies showed (-8.3%-9.4%). Nevertheless, for forward selection strategy a reduction of 50% in the breeding cycle length (i.e. from 21 to 11 years) assisted by GS and flowering stimulation, increased the selection efficiencies for all prediction efficiency methods varying between 50%-126.9%. In summary, for all traits and genomic prediction models, the percentage of increase in selection efficiency exceeded 50% when the breeding cycle was reduced by 50%, reaching in many cases percentages that varied between 50%-143.5%.
Predictive ability (r 1 ) has been widely used but normally shows lower values than r 2 , r 3 and r 4 for different traits Table 3 Percentage of increase in selection efficiency of GS for each phenotypic trait, estimated for both selection strategies (Strategy 1 and 2), model ratio (i.e., GBLUP/PBLUP, BRR/PBLUP and BL/PBLUP) and prediction efficiency (r 1 -predictive ability, r 2 -predictive accuracy, r 3 -theoretical accuracy, and r 4 -prediction accuracy). Approach 1 and 2 are respectively the breeding cycle length assumptions without and with flowering greenhouse stimulation (i.e. 18 (Table S1) as it uses the individual phenotypes as approximation to TBV and could be comparable to heritability [62,68,78,87,88]. Predictive accuracy (r 2 ) and theoretical accuracy (r 3 ) have been used less frequently [40,74,78]; the former one is considered as an unbiased estimate of accuracy of selection from n-fold cross-validation, since the correlation between an individual phenotype and its TBV cannot be higher than the square root of heritability [52,89]. The r 3 [63] was usually used to evaluate the models with full datasets, however in the current study it was used in the cross-validation analyses as well, since different PEV were obtained for each fold, thus estimations can be evaluated in the same way as the remaining methods. Another method extensively used has been the prediction accuracy (r 4 ) which generally showed higher values than r 1 , r 2 , r 3 (Table S1). Congruent with previous studies [60,65,71,80], we observed that all genomic models showed the highest values for r 4 (0.65-0.75), followed by r 3 (0.59-0.68), r 2 (0.49-0.70), and r 1 (0.19-0.43) that had the lowest values as expected. Nevertheless, using r 4 (i.e., corr(EBV VP , PEBV y )) may inflate the prediction efficiency due to that the individuals in the validation population used to estimate EBV (EBV VP ) were a proportion of the individuals in the full dataset (y) used to estimate PEBV (PEBV y ) and therefore the correlation between them was generally higher [52,68,88].

Heritabilities
No clear pattern was detected between r 4 and heritability estimates for maritime pine [67] and norway spruce [68]. Additionally, Grattapaglia and Resende [3] noticed that r 3 did not significantly change under different simulated heritability scenarios. Whereas no trend was detected among r 4 and trait heritabilities, a positive and strong linear trend between r 1 , r 2 , r 3 and heritabilities was observed in the current study, i.e., traits with lower heritabilities (below 0.25) exhibited the lowest r 1 , r 2 and r 3 (Fig. 1). Higher prediction efficiencies were obtained for traits with moderate heritabilities (above 0.30) which is in line with the positive correlation between trait heritabilities and r 1 reported in loblolly pine [58] and maritime pine [66]. Chen et al. [68] concluded that values of narrow-sense heritability were more similar to values of r 1 than to r 4 , as r 1 involves both phenotypic and genetic values, however using r 2 instead of r 1 could remove influence of heritability since it is considered an unbiased estimation of the accuracy of selection from n-fold cross-validation [52,78,89].

Effect of the imputation method on the genomic prediction efficiencies
For species such as Scots pine with large and complex genomes [90] but without a reference genome, and with no SNP chips or exome panels developed, genotyping-bysequencing (GBS) method is considered as an attractive alternative to perform GS studies. When using GBS data, large amounts of missing data are produced, thus filtering and imputation of SNP markers are critical steps [42]. In an interior spruce genomic prediction study with GBS data [39], it was observed that the imputation method used had influence in the quality of genomic predictions and concluded that EM and kNN-Fam imputation methods, provided the highest genomic prediction accuracies (r 4 ). EM was the most efficient imputation method in a wheat breeding GS study [33] with GBS data. Our study partially supports those findings, since among the genomic prediction models used, slightly higher predictions were observed in terms of r 2 , r 3 , and r 4 , when EM imputation algorithm was combined with GBLUP and BRR. In contrast, r 1 was almost equal for each trait when BL model was used regardless of the imputation method used. We speculate that the slighly better performance of GBLUP-EM and BRR-EM could be due to that GBLUP and BRR respectively assume that QTL allelic effects are normally and multivariate normally distributed, and in addition the EM imputation method uses a kinship-based imputation algorithm which also assumes that marker genotypes follow a multivariate normal distribution [22,33]. Thus, the combination of the same assumptions during the imputation and breeding value estimations can result in higher prediction efficiencies.

Effect of the model on the prediction efficiencies
Traits of interest in tree breeding programs have different genetic architecture; thus, different genomic prediction models to evaluate prediction efficiencies may be used [44]. In a two generations maritime pine genomic selection prediction study [66] was observed similar r 1 among GBLUP, BRR and BL for growth and stem straightness traits, but with larger prediction bias (estimated by regressing the EBV on the GEBV in the validation set) when BL was used. In a different maritime pine study with three generations, larger prediction bias were detected for PBLUP than for GBLUP or BL [67]. Several statistical methods, namely, GBLUP, BRR, BL and RKHS, were compared in a Norway spruce study with relatively similar r 1 (0.16-0.44) and r 4 values (0.58-0.77) observed for all of them [68]. However in the same study PBLUP outperformed the genomic models in terms of r 4 . On the contrary, genomic models (GBLUP, rrBLUP, BL, and RKHS) performed better than PBLUP in terms r 1 (0.27 and 0.12, respectively) in eucalyptus hybrids, yet pedigree errors were observed in the populations studied, resulting in the underestimation PBLUP estimates [61]. The authors contemplated the possibility that the marker data captured precisely the Mendelian sampling variation, therefore the genetic variation was based on the true proportion of the genome that was IBD or IBS among individuals.
Our study is in line with the studies mentioned above, since similar prediction efficiencies were observed regardless the genomic model used. PBLUP outperformed the genomic models in terms of r 4 (0.75-0.84) for all traits, however in terms of r 1 , r 2 and r 3 GBLUP, BRR, BL and PBLUP showed similar prediction efficiencies for growth and wood quality traits (Table 2). In short, either GBLUP, BRR or BL provided similar prediction efficiencies for growth and wood quality traits in Scots pine.

Effects of the training and validation populations sizes on prediction efficiencies
Previous studies stated that r 1 and r 2 increased as the size of the training set increased without reaching a plateau which differed from our findings. For instance, Tan et al. [61] detected that r 1 ascended as the TP size rose for all models and traits evaluated in eucalypt hybrids. Similarly, Lenz et al. [60] asserted that r 4 increased as the TP size augmented, however after assigning TP of 67% of individuals the increase of r 4 was negligible. Nevertheless, some similarities were found with other studies, especially when utilizing GBLUP, for which r 4 rose as the TP size increased, achieving similar r 4 values for height when TP reached 80-90% of individuals, and 75-90% of individuals for wood quality traits [68]. In the current study, a TP size of 70-80% was enough to obtain similar values as the full TP size in terms of r 1 , r 2 , r 3 and r 4 , depending on the trait (Fig. 2). In the studies cited above [60,61,68], it was observed that the number of trees per family had an effect on the GS efficiency; however, in the current study in which the number of trees per family was very low, it was still observed the advantage of applying GS prediction methods in Scots pine.

Effect of the number of SNPs
In a general conifer breeding program simulation study Li et al. [57] detected an increase in the accuracy (correlation between GEBV and simulated TBV) of GEBV for traits with low and high heritability when the subset of SNP markers increased from 7K to 90K, for a TP with 1000 clones from five simulated generations. Moreover, the same pattern was observed for GBLUP, BRR, BL and RKHS models in Norway spruce [68], where r 1 and r 4 increased with number of markers reaching almost a plateau between 4K and 8K SNP markers, regardless of the model used. Similarly, in eucalypt hybrids [61], when the subset of SNP markers dropped below 5K larger reduction in the r 1 was observed for GBLUP and RKHS models; further, traits with lower heritabilities were observed to be more sensitive to the reduction in the number of SNP markers. On the contrary, in black spruce when markers were reduced randomly from 5K to 1K no noticeable decrease was found in r 4 for GBLUP and Bayesian framwork models [60]; nonetheless, when markers were further reduced to 500, the r 4 decreased dramatically.
Our results were in accordance with those studies, reaching similar efficiencies in terms r 1 and r 2 when the number of SNPs reached 6K-7K, or 3K-7K for r 4 (Fig. 3), to those achieved when using all 8719 SNPs and regardless of the genomic model used, therefore the number of SNPs had more influence on the prediction efficiency than the genomic model used.

Relative selection response of GS
A simulation study showed that when the breeding cycle length was reduced by 50% the RSR GS:PS doubled, and that when the cycle length was reduced by 75% the RSR GS:PS tripled at high marker levels [3]. This theory was confirmed by Resende et al. [91] that by reducing 50% the loblolly pine breeding cycle, reported a percentage of increase in selection efficiency of GS between 53-92% for DBH and 58-112% for Ht, compared to the traditional pedigree-based selection. Similarly, percentages of increase in GS efficiency varied between 106% to 139% for Ht when the breeding cycle length of interior spruce was reduced by 25% [70]. In Norway spruce, the percentages of increase in GS efficiency of MOE were between 69-83% when the cycle length was also shortened by 50% [68].
The results of the current study exhibited that a reduction of the cycle length by 50% increased the percentage GS efficiency to double for almost all traits, regardless the selection strategy (Table 3). Such reduction in the breeding cycle length of Scots pine could only be possible by shortening field-testing periods aided by the use genomic prediction at young ages, and that female flowering can start at earlier ages after greenhouse flowering stimulation [72]. Moreover, if cycles could even be shortened more than 50%, higher percentages of increase in GS efficiency could be reached which in the case of this study were almost triple than traditional pedigree-based selection (Table 3).

Conclusions
Our results provide an initial perspective in the use of genomic prediction in Scots pine and are encouraging to develop GS strategies for the species. Similar prediction efficiencies were observed among pedigree and all genomic prediction models for growth and wood quality traits, suggesting that genomic prediction methods can be applied as an alternative to traditional pedigree predictions for Scots pine.
Our study showed that GS could potentially reduce the breeding cycle by half, and under that assumption, the relative genomic selection efficiency could double depending on the selection strategy and the trait.
The results presented here are based on a relatively small population with a shallow pedigree, for which 8K SNPs were sufficient to reach high GS prediction efficiencies. More studies using different populations, preferably populations with deeper pedigrees should be carried out to better understand the predictive power of SNP markers for traits with complex inheritance patterns in the species. The predictive power of SNP markers should be tested over at least two generations because the marker-QTL phase is expected to change once the population undergoes through breeding, due to recombination of homologous chromosomes during the meiosis.

Plant material
In this study a Scots pine full-sib progeny trial (identified as F261-Grundtjärn), belonging to the Swedish tree improvement program at Skogforsk (The Forestry Research Institute of Sweden) was used. The trial consists of 184 full-sib families and 7240 trees (F1-generation), generated from a partial diallel mating design of 40 plustrees (F0-generation). The progeny trial was established in 1971 by Skogforsk as a randomized single tree plot design, divided into 14 post-blocks [92]. A more detailed description of the trial can be found in Fries et al. [93]. A number of 694 progeny trees (F1) from 183 families were selected for this study, such that the number of trees per family varied from one to seven with an average of four individuals per family.

Phenotypic data
Growth traits were measured in the 7240 progeny trees whereas wood properties were estimated in a subset of 694 progeny trees. Height (Ht) was measured when the progeny trees were 10 (Ht1) and 30 (Ht2) years old. Diameter at breast height (DBH) was also measured twice, at ages 30 (DBH1) and 36 (DBH2). In 2011, increment cores at breast height were extracted and processed by Silviscan (Innventia AB, Stockholm, Sweden). From the Silviscan analysis, three traits were used in this study: microfibril angle (MFA), static modulus of elasticity (MOEs) and wood mean density (DEN). In addition, dynamic modulus of elasticity (MOEd) predicted by Hitman ST300 (Fibergen, Christchurch, New Zealand) was also used in the current study. Wood traits are further described in Hong et al. [94].

DNA extraction and genotyping
The commercial NucleoSpin Plant II kit (Machery-Nagel, Dren, Germany) was used to extract genomic DNA from vegetative buds or needles from the 694 progeny trees and 46 parents. DNA concentration was determined with Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA). Then, three genomic libraries for GBS were prepared following the procedure described in Pan et al. [29] by using 827 samples (replicates included) and PstI high fidelity restriction enzyme (New England Biolabs, MA, USA). The libraries were sequenced on an Illumina HiSeq 2000 platform at SciLifeLab, Sweden.
Thereafter, paired-end raw reads of each GBS library were cleaned and demultiplexed by the process_radtags module of Stacks v.1.40 [95] on the basis of 300 barcodes with 48 bp. Cleaned reads of each sample were aligned to the Pinus taeda v1.0 [96] reference genome, using BWA mem v0.7.15 [97] with default parameters. Alignments were coordinate-sorted and indexed using Samtools v1.5 [98]. SNP markers were called using the mpileup command of SAMtools over all the samples simultaneously, with default parameters, and converted into VCF matrix using BCFtools v0.1.19 [99]. Furthermore, these variants were sorted to keep only high-quality SNPs. Using vcfutils in BCFtools with default parameters, the SNPs within 3bp around an indel or with mapping quality <20 were filtered out; using Vcftools v.0.1.12b [100], only SNPs with coverage ≥5x, genotype quality (GQ) ≥30, genotype calling rate >20% were retained. Using the custom Perl program (ReplicateErrfilter.pl), discordant genotypes of 66 replicated samples were detected and the SNP sites with ≥3 replicate errors were filtered out. After this step, 24,152 informative SNP markers were retained.
Finally missing genotypic data were imputed using two imputation methods to compare their prediction efficiencies. Random (RND) imputation with the codeGeno function in synbreed package [101] in R (R Core Team 2016) and imputation with the expectation maximization (EM) algorithm by the A.mat function implemented in rrBLUP package [32] in R. A total of 15,537 and 15,433 SNPs with minor allele frequency (MAF) lower than 1% and with a missing data threshold lower than 10% were removed using RND and EM imputation methods, respectively.

Statistical analysis
Initial analysis. Growth traits (Ht1, Ht2, DBH1 and DBH2) were available for all progeny trees in the trial, therefore univariate single site spatial analysis were performed in ASReml 4 standalone [64], with the objective to reduce the within-trial micro-environmental effects prior to any other analysis (see supplementary information S1). Briefly, diagnostic tools, variograms and plots of spatial residuals were used to detect design, treatment, local and extraneous effects. The predicted design effects and spatial residuals were extracted from the ASReml output files and used to remove micro-environmental effects from the raw data [102,103]. Wood properties were assessed for a subset of 694 progeny trees and the micro-environmental effects were scaled for the raw data by removing the variation of the experimental design features and post-block effects. The environmentally adjusted phenotypic data (predicted values of each tree) were used for the genetic analysis [104][105][106][107].

Best Linear Unbiased Prediction (BLUP).
The following model was used for PBLUP and GBLUP: where y is the vector of the adjusted phenotypic data for each trait, b is the vector of fixed effects (intercept), a is the vector of random effects and e is the vector of residual effects, which is assumed to follow a normal distribution as var(e) ∼ N 0, Iσ 2 e , where σ 2 e is the residual variance and I is the identity matrix. X and Z are the incident matrices of b and a.
In the PBLUP, the vector a (additive genetic effects) from Eq. 1 is assumed to follow a normal distribution with expectations of ∼ N 0, Aσ 2 a , where σ 2 a is the additive genetic variance and A is the numerator relationship matrix (NRM). Briefly, the diagonal elements (i) of A were estimated according to Lynch and Walsh [108] as: where g and h are the parent of individual i. The off-diagonal elements are the relationship between individuals i and j and were estimated as: For the GBLUP, the vector a is assumed to follow a normal distribution with expectations of ∼ N 0, Gσ 2 a ,where G is the genomic realized relationship matrix (RRM) estimated according to VanRaden [49] as: where M is the matrix of genotyped samples, P is the matrix of allele frequencies with the jth column given by 2(p j − 0.5), where p j is the observed allele frequencies of the genotyped samples. The elements of M were coded as 0, 1 and 2 (i.e., the number of minor alleles) for the estimation of the G matrix with function kin from the synbreed package in R in the case of RND imputed data, and with the function A.mat from the rrBLUP package in R, for the EM imputed data. PBLUP and GBLUP analyses were conducted in ASReml-R version 4.1.0.106.
Bayesian models. BRR and BL were implemented using the BGLR function from the BGLR package in R [109]. In brief, the following model was used: where y is the vector of n adjusted phenotypes, 1 n is the vector of ones, μ is a scalar denoting the intercept, W is the incidence matrix for the m vector of marker effects, and e is the vector of residual effects that follow a multivariate normal distribution e ∼ N 0, I n σ 2 e . In BRR, vector m from Eq. 5 is assigned a multivariate normal prior distribution with a common variance to all marker effects, that is m ∼ N 0, I p σ 2 m , where p is the number of markers, σ 2 m is the unknown genetic variance which is contributed by each marker and assigned as σ 2 m χ − 2(df m , S m ), where df m is degrees of freedom and S m is the scale parameter. Residual variance is assigned as σ 2 e χ − 2 df e , S e , with df e degrees of freedom and scale parameter for residual variance S e [55]. For the BL method assumes that vector m from Eq. 3 follows a hierarchical prior distribution with m ∼ N 0, Tσ 2 m , where T = diag τ 2 1 , . . . , τ 2 p . τ 2 j is assigned as τ 2 j Exp λ 2 , j = 1, . . . , p. λ 2 is assigned as λ 2 ∼  Gamma(r, δ). Finally, the residual variance is assigned as σ 2 e ∼ χ − 2 df e , S e , where df e is degrees of freedom and S e is the scale parameter for residual variance [54].
For the Bayesian methods, GEBV in the VP were estimated as, where Z ij is the indicator covariate (-1, 0, 1) for the i th tree at the j th locus and a j is the estimated effect at the j th locus.

Model convergence and prior sensitivity analysis.
Bayesian algorithms were extended using Gibbs sampling for estimation of variance components. The Gibbs sampler was run for 20,000 iterations with a burn-in of 1,000 iterations and a thinning interval of 100. The convergence of the posterior distribution was verified using trace plots.

Validation and evaluation methods
Cross validation. For all traits, pedigree based (PBLUP), genomic models (GBLUP, BRR and BL) and imputation method (EM and RND), a 10-fold cross-validation analysis was implemented, i.e., 90% of individuals randomly selected for the TP and 10% in the VP. BRR and BL were tested with eleven different sets of SNP markers randomly selected (i.e., 100, 200, 500, 1K, 2K, 3K, 4K, 5K, 6K, 7K and 8719). Additionally, to evaluate the performance of the pedigree versus genomic prediction models, different sizes of TP and VP were used. All individuals were randomly split into four different proportions of TP/VP, 80%, 70%, 60% and 50% (i.e., 555, 486, 417 and 347 individuals, respectively) for TP and the rest as VP. Each analysis was replicated 10 times.
Prediction efficiency of traditional and genomic genetic evaluations. Prediction efficiencies of pedigree and genomic models were evaluated and compared based on the predictive ability, predictive accuracy, theoretical accuracy, and prediction accuracy.
1) The predictive ability (r 1 ) was defined as the Pearson product-moment correlation between the EBV of the