We found in a diverse panel of elite maize inbred lines that prediction accuracies obtained with five different WGP models were remarkable similar, even for traits with drastically deviating genetic architecture. Our results suggest that small gains in accuracies (up to 0.14) can be gained if the WGP model is selected according to the genetic architecture underlying the trait.
Recently, Heslot et al. reported similar small differences for seven parametric WGP models when comparing them for different presumable highly polygenic agronomic traits over eight datasets of barley, Arabidopsis thaliana, maize, and wheat. For the metabolites, however, our results differ from those obtained from Clark et al., who investigated the influence of genetic architecture on prediction accuracies achieved by RR-BLUP or BayesB. Whereas these authors found only slight differences for simulated traits with a genetic architecture close to the infinitesimal genetic model, BayesB outperformed RR-BLUP by an increase in prediction accuracy of ≈0.4 if the trait is controlled by either a few common or a few rare QTL. Simulation also predicted a drop in prediction accuracy in case of RR-BLUP for traits controlled by a small number of QTL . Although LASSO, elastic net, and BayesB showed higher accuracies compared to RR-BLUP for metabolites, we found the differences to be remarkable small in case of LASSO or elastic net and negliable in the case of BayesB.
One major reason of the minor differences in prediction accuracies among the different models lies in the high level of LD found in elite breeding germplasm of maize. Our results suggest that with this level of LD (r
2=0.1 at ≈ 500 kb), accuracies are quite similar irrespective whether the effect of large QTL are precisely captured (as in the case of LASSO, elastic net, or BayesB) or spread over a larger region (as in the case of RR-BLUP and RKHS). Since our population was highly diverse for elite maize germplasm in Europe, it is unlikely that breeders are confronted with lower levels of LD unless they work with highly exotic germplasm for which LD has been reported to decline within 5-10 kb .
Moreover, the high similarity of RKHS and RR-BLUP suggest that either (i) non-additive, epistatic genetic effects are not present, (ii) these are so small that they are negligible in WGP for the investigated traits, or (iii) RKHS regression is unable to capture them. In either case, for prediction purposes RKHS does not seem to yield any advancements over RR-BLUP for situations comparable to our germplasm and traits. Dominance, as another source of non-additive genetic, effects cannot be present in the inbred lines investigated in this study. For predicting heterozygeous F1 maize hybrids, however, it has been shown that modeling dominance effects can result in higher prediction accuracies .
Although BayesB reached for 5 of the 6 traits a higher prediction accuracy than the worst model, we cannot recommend it because of the excessively larger computation time and the negliable differences in prediction accuracies compared with RR-BLUP in case of the metabolites as the result of probably only sampling error.
We found the approach to partition genetic variance over chromosomes useful for guiding the breeder which WGP model to prefer in the case of little or no prior knowledge on the genetic architecture. Whereas for the agronomic traits an approximately linear increase of cumulative explained genetic variance matched with a superiority of the L
2 penalty (RR-BLUP), the L
1penalty (LASSO) or a mixture of both penalties (elastic net) performed better in the case of the metabolites with a strong convex curve curvature (Figure 2A). Although for dry matter yield and plant height, barely significant association signals with a proportion of explained genetic variance <9% led to a chromosomal genetic variance slightly above the range expected from length of the chromosome (Figure 2B), these effects were too small to justify the use of the elastic net or LASSO.
As an alternative to this approach, Hayes et al. estimated successively the genetic variance explained by each chromosome segment and compared it with the genetic variance captured by the remaining part of the genome. To correct for the non-independence of neighbouring segments, they applied a bias correction using an expectation maximization (EM) algorithm. Such a correction is not necessary if the variance components for all chromosomes are estimated simultaneously as applied in this study; this is a further advantage besides its straightforward implementation using standard mixed model software packages such as ASReml.