 Research article
 Open Access
 Published:
The importance of phenotypic data analysis for genomic prediction  a case study comparing different spatial models in rye
BMC Genomics volume 15, Article number: 646 (2014)
Abstract
Background
Genomic prediction is becoming a daily tool for plant breeders. It makes use of genotypic information to make predictions used for selection decisions. The accuracy of the predictions depends on the number of genotypes used in the calibration; hence, there is a need of combining data across years. A proper phenotypic analysis is a crucial prerequisite for accurate calibration of genomic prediction procedures. We compared stagewise approaches to analyse a real dataset of a multienvironment trial (MET) in rye, which was connected between years only through one check, and used different spatial models to obtain better estimates, and thus, improved predictive abilities for genomic prediction. The aims of this study were to assess the advantage of using spatial models for the predictive abilities of genomic prediction, to identify suitable procedures to analyse a MET weakly connected across years using different stagewise approaches, and to explore genomic prediction as a tool for selection of models for phenotypic data analysis.
Results
Using complex spatial models did not significantly improve the predictive ability of genomic prediction, but using row and column effects yielded the highest predictive abilities of all models. In the case of MET poorly connected between years, analysing each year separately and fitting year as a fixed effect in the genomic prediction stage yielded the most realistic predictive abilities. Predictive abilities can also be used to select models for phenotypic data analysis. The trend of the predictive abilities was not the same as the traditionally used Akaike information criterion, but favoured in the end the same models.
Conclusions
Making predictions using weakly linked datasets is of utmost interest for plant breeders. We provide an example with suggestions on how to handle such cases. Rather than relying on checks we show how to use year means across all entries for integrating data across years. It is further shown that fitting of row and column effects captures most of the heterogeneity in the field trials analysed.
Background
Genomic prediction (GP) was first introduced in 2001 [1] as a method that allows the prediction of genomic estimated breeding values (GEBV) for plants and animals by using information of genetic markers. In plant breeding, GP has been adopted as another stage of the breeding scheme [2], not diminishing the importance of the phenotypic analysis usually carried out in several environments. Merging the phenotype and the genotype analyses has been addressed through the socalled stagewise analysis [3]. In the first stage environments are analysed separately and genotype means are computed and then submitted in the GP stage to predict GEBV based on dense genetic markers such as single nucleotide polymorphisms (SNPs).
In plant breeding, assessing genotypic adaptability and stability, and predicting breeding values of the genotypes in other environments and other years, makes use of multienvironment trials (METs), which aim to evaluate as many genotypes as possible in as many as possible locations [4–7]. These METs are typically laid out as generalised lattice designs testing a large number of different genotypes per trial. The number of tested genotypes is limited by factors such as seed production, production cycle length and availability of physical resources, e.g. land and budget [8].
Within years, genotypes are tested in series of trials, which are connected by checks. Checks are lines grown in every trial as controls because their performance is known and/or they are already commercial material. Checks can be also used to connect years. In the rye breeding program considered in this paper, a completely different set of genotypes is tested in each year, but these genotypes are from the same breeding population. The accuracy of a genomic prediction model depends on the number of genotypes used for calibration. So there is definitely a need to combine data across years. Low connectivity across years is a challenge when trying to combine data across years, and this is one main motivation for this paper. Furthermore, the unbalancedness due to the design layout and the different and large number of evaluated genotypes increases the heterogeneity introducing high complexity to the variancecovariance structure among adjusted genotype means [3].
Analysis of METs could be done as singlestage analysis, modelling the complete observed data at the level of individual plots, or using a stagewise approach, where experiments are analysed first at the level of environments (or trials), obtaining adjusted means per genotype, which are then summarised across environments (or trials) in the next stage [3]. A singlestage analysis accounts entirely for the variancecovariance structure of the recorded observations [6], therefore it is regarded as the gold standard. However, it has been shown that in a stagewise analysis, a loss of information occurring in the transition through stages can be minimized by an appropriate weighting scheme [9].
If feasible, a singlestage approach is preferable to a stagewise analysis [10]. Nevertheless, the latter is acceptable for GP, since it is simple, computationally more efficient and also allows to easily account for any specifics of randomisation layout and error modelling for each environment [3]. It should be stressed, however, that in a stagewise analysis the weights are chosen to approximate the variancecovariance matrix of adjusted means from previous stages. We used here a threestage approach and compared different spatial correlation structures in the first stage to correct field heterogeneity at the trial level.
Spatial error models may provide more accurate estimates of genotype effects than models not accounting for spatial adjustment [11, 12] but they are computationally more demanding and convergence may be difficult to reach. Any effort in terms of improving the genomic predictions would include checking if these improved estimates have an effect on the predictive ability when markers are added to the model. The performance of alternative spatial models can be assessed by kfold cross validation (CV).
Similarly, the merits of different spatial models used to compute adjusted means in the first stage can be compared by the same CV procedure, if the same GP procedure is used for each analysis. This suggests that genomic predictioncross validation (GPCV) can be used to identify the bestfitting mixed model in stage one. The common method of model selection makes use of information criteria based on the log likelihood, e.g. the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) [13]. When the restricted maximum likelihood (REML) method is used, models can only be compared by information criteria if they have the same fixed effects; otherwise, the maximum likelihood (ML) method should be used [13]. CV is, in this sense, not used to tune parameters as in many penalization methods (e.g. adaptive Lasso, SCAD (Smoothly Clipped Absolute Deviation), machine learning methods) but only as a tool to compare models that use REML. REML is considered the best available method of variance parameter estimation, preferable to ML [14]. Consequently, it is of interest to devise model selection procedures that can use REML and also can compare models with different fixed effects. GPCV has already been used to judge environments in order to optimise the accuracy in GP [15]. We used this tool here as model selection method in comparison to the traditional use of AIC.
The aims of this work were: i) to assess the advantage for the predictive ability when using a spatial model for phenotypic analysis, ii) to compare stagewise approaches for GP when the data are weakly connected across years, and iii) to compare AIC and GPCV as methods of selection of models for phenotypic data analysis towards GP in rye.
Methods
Field layout and data set
A commercial rye breeding program by KWSLOCHOW established in Poland and Germany aims to develop superior hybrid varieties for the seed market. The implementation of GP within the breeding program makes use of the measurements of hybrid performance of the first cycles of phenotypic evaluation of the material (Cycle1). Selections made in Cycle1 are intensively evaluated in further cycles, aiming to doublecheck the selection decisions. For our purposes, these additional cycles do not add much useful information. Hence, we used only the first cycles of the program. The populations tested in each year consist of S _{2} genotypes, which display genetic relatedness and population stratification due to complex genealogical history [16].
Besides the phenotypic data, a 16K Infinium iSelect HD Custom BeadChip was used to characterise 1610 individuals from Cycle12009 and Cycle12010 and 6 checks. Several traits were evaluated during this project: grain dry matter yield, plant height and thousand kernel weight, as well as ordinal scores of rust, mildew and lodging among others. In this work we used grain dry matter yield measurements of the phases of selection Cycle12009, Cycle12010 and Cycle12012, and marker information for the genotypes of 2009 and 2010. Although no marker information of year 2012 was available, it makes sense to use this dataset to observe the trend in one additional year and in this way, support the results of the phenotypic analysis of previous years.
A Cycle1 experiment consists of subsets of 320 genotypes from the S _{2} populations tested in several locations within each of the two countries involving two testers (Tables 1 and 2). We define a trial as the physical unit within a location, where a subset of genotypes that were testcrossed to the same tester is evaluated. Trials at a location were laid out as αdesigns with two replicates. Each trial was randomized independently from the others using the software CycDesign (VSN International; http://www.vsni.co.uk/). (However, we are aware that some breeders tend to use the same randomization layout in several locations. Ideally, each trial should have a different randomization). In our notation, trials of a Cycle1 experiment are labelled as S1, S2, …, S24. Row and column coordinates of the plots to account for spatial variation are available.
Normally throughout the program, only a single tester was used per location and year, but in some locations, some subsets of genotypes were testcrossed with the two available testers. This is the case, for example, for location GL4 in Cycle12009, where the genotypes evaluated in the trials S1, S2 and S3 were testcrossed with both Tester1 and Tester2, and it is also the case of locations PL1, PL2, PL3 and PL4 evaluating genotypes of trials S7, S8 and S9 with both testers. In each year, four common checks were testcrossed with the testers and grown twice in each trial. Over the years 2009 and 2010 one check was in common and none was shared with 2012 (Table 3).
The field layout of some trials was not perfectly rectangular. Some trials at a given location and year had fewer blocks but larger size, i.e., there were two different block sizes within a few trials. Blocks were nested within rows of the field layout.
In the genetic dataset, homozygous marker genotypes were coded as 1 and 1, and the heterozygous type, missing values and technical failures were coded as 0. 58.7% of the markers corresponded to homozygous alleles and 16.1% were heterozygous. Only a 0.03% of the markers were recorded as missing values or technical failures; therefore, an imputation method would not have a strong impact on the subsequent analyses. Monomorphic markers and markers with minor allele frequency (MAF) less than 1% or missing information of more than 10% per marker were dropped. A total of 11285 markers passed the quality test and were used for GP.
Models
In this section we present the models used in the first stage of the analysis and the models of the approaches followed to adjust the year effect either in the second or the third stage. Figures 1 and 2 depict a general scheme that helps visualizing the methodology.
First stage
In the first stage we computed adjusted genotype means by location and year. The factors used for the analysis were genotypes (G), testers (T), trials (S), replicates (R) nested within trials and blocks (B) nested within replicates. We defined a baseline model as
where Y_{ h i j k v } is the observed grain dry matter yield of the hth genotype testcrossed with the vth tester in the kth block within the jth replicate of the ith trial, (G T)_{ h v } is the effect of the hth genotype testcrossed with the vth tester, S_{ i } is the effect of the ith trial $\left[\phantom{\rule{0.3em}{0ex}}{S}_{i}\sim N\left(0,{\sigma}_{S}^{2}\right)\right]$, R_{ i j } is the effect of the jth replicate nested within the ith trial $[\phantom{\rule{0.3em}{0ex}}{R}_{\mathit{\text{ij}}}\sim N(0,{\sigma}_{R}^{2}\left)\right]$, B_{ i j k } is the effect of the kth block nested within the jth replicate of the ith trial $[\phantom{\rule{0.3em}{0ex}}{B}_{\mathit{\text{ijk}}}\sim N(0,{\sigma}_{B}^{2}\left)\right]$ and e_{ h i j k v } is the plot error associated with the Y_{ h i j k v } observation $[\phantom{\rule{0.3em}{0ex}}{e}_{\mathit{\text{hijkv}}}\sim N(0,{\sigma}_{e}^{2}\left)\right]$. In model equation (1) we assumed genotypes crossed with testers as a fixed effect to be able to compute genotype adjusted means per tester, whereas the other effects were considered as random effects due to the nested design structure [17].
Table 4 summarises the further models. Some SAS code to fit the first stage models is provided in the supplementary material (Additional file 1). The first model (M1) will be referred to as the baseline model because it was the simplest model and represented the randomisation structure. In the second model (M2) we considered additionally the effects of the oth row (W_{ i j o }) and the qth column (V_{ i j q }) both within the jth replicate of the ith trial. Subsequently, we added a spatially correlated residual plot effect different from the baseline model, which uses the independent model (ID) with homogeneous variances. We fitted one and twodimensional spatial models with and without the socalled nugget, a geostatistical term to designate an independent error effect. As onedimensional models we used the autoregressive AR(1) variancecovariance nested within blocks without nugget (M3) and with nugget (M7), and linear variance LV within blocks with nugget (M4). In the AR(1) we accounted for the correlation between plots in the same block assuming an exponential decay of correlation with distance, whereas by using LV, it is assumed that the covariance among plots in the same block decays linearly with spatial distance [18, 19]. The most common extension of the spatial model in two dimensions is the direct product structure AR(1) × AR(1), which assumes that an AR(1) model holds both along rows and along columns [20]. The twodimensional models were fitted along rows and columns within replicates without nugget (M5), with nugget (M8), adding rows and columns as effects without nugget (M6) and with nugget (M9). The LV model can also be extended in two dimensions [21]; however, for METs, where the arrangement of the plots might not be perfectly rectangular, this LV × LV model was cumbersome to fit with the software we used, thus we did not consider this model.
Note that we use (G T)_{ h v } as fixed effect, which is necessary to obtain the genotype by tester means. The purpose is also to recover the information of the entries that are grown in the same locations but using different testers (e.g. in Cycle1 location GL4 and the Polish locations PL1 to PL4), so that we captured the effect of the tester in the shared locations.
Second stage
In the second stage we computed genotype means across locations and testers. This was done either separately for each year (Approach 1) or also averaging across years (Approach 2). The years 2009 and 2010, where molecular marker data were available, were connected through only one check. The resulting fundamental question is then how to fit the year effect. Either the year effect is estimated by the mean of all tested entries (Approach 1) or we rely on the adjustment by the one single check (Approach 2). We assume that genotypes tested in each year can be regarded as a random sample from the same parent population. Based on the structure of the breeding program, this is a realistic assumption that motivates the approaches described in the following.
Both approaches were compared using the M^{(1)} resulting from the analysis of the baseline model in the first stage.
Approach 1: Yearwise analysis
Each year was analysed in the second stage using a threeway interaction model of genotypes (G), locations (L) and testers (T) as factors to obtain adjusted genotypes means of each year. The model was
where ${M}_{\mathit{\text{hsv}}}^{\left(1\right)}$ represents the adjusted mean of grain dry matter yield of the hth genotype, testcrossed with the vth tester in the sth location, G_{ h }, L_{ s } and T_{ v } are the main effects of the hth genotype, the sth location and the vth tester, respectively, (G L)_{ h s }, (G T)_{ h v } and (L T)_{ s v } are the twoway interaction effects, (G L T)_{ h s v } is the effect of the threeway interaction and e_{ h s v } is the residual error associated with ${M}_{\mathit{\text{hsv}}}^{\left(1\right)}$$\left[\phantom{\rule{0.3em}{0ex}}{e}_{\mathit{\text{hsv}}}\sim N\left(0,{\sigma}_{{e}_{\left[\mathit{\text{hsv}}\right]}}^{2}\right)\right]$, with ${\sigma}_{{e}_{\left[\mathit{\text{hsv}}\right]}}^{2}$ the variance of the hsvth adjusted mean $\left({M}_{\mathit{\text{hsv}}}^{\left(1\right)}\right)$ obtained in the first stage.
Location was considered as random effect $\left[{L}_{s}\sim N\left(0,{\sigma}_{L}^{2}\right)\right]$ and hence, all the interactions containing this factor are random [17]. The crossed effect of genotypes and testers [ (G T)_{ h v }] could have been a fixed effect since genotypes and testers are taken as fixed factors in this stage. However, the crossed effects that include G were taken as random here because the factor genotype was used as random in the GP stage. But note that in the first and the second stage we needed to take genotype main effects as fixed in order to compute adjusted means [3]. Besides, since not every genotype was tested with every tester (e.g. in Cycle1 locations GL1 to GL3 and GL5 to GL8), we needed to take (G T)_{ h v } random to be able to estimate genotype means across levels of testers.
In this approach, the year effect was adjusted in two ways, hereafter referred as to Approach 1a and Approach 1b. Approach 1a used years as fixed factors in the GP stage and Approach 1b used a manual adjustment after the second stage by simply calculating the mean of the genotypes by year $\left({\stackrel{\u0304}{M}}_{r}^{\left(1\right)}\right)$ and subtracting it to each genotype adjusted mean of the corresponding year (Figure 1). The rationale behind the latter approach is the assumption that the correction for the year effect is better represented by the simple mean of the complete sample of genotypes per year than by just a few checks. The resulting year effectcorrected genotype means $\left({M}_{\mathit{\text{hsv}}}^{(1\ast )}\right)$ are forwarded to the GP stage, and through CV are evaluated as predictors.
As in the transition from the first to the second stage, there is a loss of information in passing on from the second to the third stage because the (G L T)_{ h s v } effect is confounded with the residual error term. This loss can be minimized by weighting the adjusted means [3]. We used the Smith et al. scheme [6], where adjusted means are weighted by the diagonal elements of the inverse of their variancecovariance matrix computed in the first stage.
At this stage, we computed the heritability for each year using the ad hoc method described in Piepho and Möhring [22] as
where ${\sigma}_{G}^{2}$ is the genetic variance and $\stackrel{\u0304}{v}$ is the mean variance of a difference of two adjusted genotype means, corresponding to the best linear unbiased estimators (BLUE). Even though this is not the best method to estimate heritability [23], the square root of this heritability estimate gives a rough idea of an upper limit for the predictive abilities.
Approach 2: Across years analysis
The model to account for the year effect in the second stage through the shared check was
where ${M}_{\mathit{\text{hrsv}}}^{\left(1\right)}$ represents the adjusted mean of grain dry matter yield of the hth genotype, testcrossed with the vth tester, in the sth location and rth year, G_{ h } is the main effect of the hth genotype, L_{ s } is the main effect of the sth location and D_{ r v } the main effect of the vth tester within the rth year, which can be extended as D_{ r v }=A_{ r }+(A T)_{ r v }, with A_{ r } the effect of the year and T denoting the tester [17]. (G D)_{ h r v }, (G L)_{ h s } and (L D)_{ r s v } are the twoway interaction effects, (G L D)_{ h r s v } is the effect of the threeway interaction and e_{ h r s v } is the residual error associated to ${M}_{\mathit{\text{hrsv}}}^{\left(1\right)}\left[\phantom{\rule{0.3em}{0ex}}{e}_{\mathit{\text{hrsv}}}\sim N\left(0,{\sigma}_{{e}_{\left[\mathit{\text{hrsv}}\right]}}^{2}\right)\right]$, with ${\sigma}_{{e}_{\left[\mathit{\text{hrsv}}\right]}}^{2}$ the variance of the hrsvth adjusted mean $\left({M}_{\mathit{\text{hrsv}}}^{\left(1\right)}\right)$ obtained in the first stage. The effects containing D_{ r v } can be extended as (G D)_{ h r v }=(G A)_{ h r }+(G A T)_{ h r v }, (L D)_{ r s v }=(L A)_{ r s }+(L A T)_{ r s v } and (G L D)_{ h r s v }=(G L A)_{ h r s }+(G L A T)_{ h r s v }.
We considered genotypes and testers as fixed factors and location and year as random factors $\left[{L}_{s}\sim N\left(0,{\sigma}_{L}^{2}\right)\right.$ and $\left(\right)close="]">{A}_{r}\sim N\left(0,{\sigma}_{A}^{2}\right)$. All effects involving A_{ r } are random except (A T)_{ r v } because we do not want to recover interyear information since there are only two years and the year by tester classification is very disconnected (years do not share testers). Moreover, the (A T)_{ r v } term is analogous to a block factor in an incomplete block design because it is free of G_{ h }; therefore, due to the unbalancedness and the small number of years, we can use it as a fixed effect. Furthermore, the main year effect (A_{ r }) can be dropped considering that the adjustment of the genotype means is the same for A_{ r }+(A T)_{ r v } as for only (A T)_{ r v }.
Including all the effects, the final model (4) is
To minimise the loss of information in the transition to the GP stage, we weighted the adjusted means using the inverse of the squared standard errors, which is also appropriate since we are not fitting random block effects [9].
Third stage: Genomic prediction
At the third stage, the dataset of p markers was merged with the n grain dry matter yield adjusted means by years of evaluated models. GP was performed using ridgeregression best linear unbiased prediction (RRBLUP), where the genotypic values are predicted using the marker information by regressing each SNP on the phenotype [24].
The model was
where, M^{(2)} is the n×1 vector of phenotypic records, here, containing the adjusted means calculated from the second stage, X and β are, respectively, the design matrix and parameter vector of fixed effects, Z is the n×p marker matrix, whose elements z_{ h m } represent the SNP genotype of the mth marker of the hth genotype entry and take the values 1, 0, or +1 for the aa, Aa, and AA genotypes [24], u is the pdimensional vector of SNP effects and e is the error vector. The term Z u is interpreted as the genetic effect and its estimate $\mathbf{Z}\widehat{\mathbf{u}}$ as the GEBV. The GEBV of the hth genotype corresponds to $\mathit{\text{GEB}}{V}_{h}=\sum _{m=1}^{p}{\xfb}_{m}{z}_{\mathit{\text{hm}}}$, with m=1,⋯,p the number of markers, ${\xfb}_{m}$ is the estimated effect of the mth marker and z_{ h m } the SNP genotype of the mth marker for the hth genotype entry. The assumptions of the model are that the error is normally distributed with zero mean and variance R[ e∼N(0,R)] and that u has a normal distribution with zero mean and variance ${\mathbf{I}}_{p}{\sigma}_{u}^{2}[\phantom{\rule{0.3em}{0ex}}u\sim N(0,{\mathbf{I}}_{p}{\sigma}_{u}^{2}\left)\right]$. R is a diagonal matrix with diagonal elements equal to the inverses of the diagonal elements of the inverse of the original variancecovariance matrix of the adjusted means of the second stage [6]. I_{ p } is the pdimensional identity matrix and ${\sigma}_{u}^{2}$ represents the proportion of the genetic variance contributed by each individual SNP.
Under the model equation (5) the variance of the observed data is $\mathit{\text{var}}\left({M}^{\left(2\right)}\right)=\mathbf{\Gamma}{\sigma}_{u}^{2}+\mathbf{R}$, in which Γ=Z Z^{T} and Z^{T} denotes the transpose of Z[24]. To speed up the computation, Γ was rescaled by replacing Z with $\mathbf{Z}/\sqrt{p}$, with p the number of markers [25].
In the yearwise analysis (Approach 1a), the genotype adjusted means by year are merged in the M^{(2)} vector, and vector β contains the intercept and the year effect. In the acrossyears analysis (Approach 2), where year effect was already accounted for, M^{(2)} contains the genotype adjusted means and vector β contains only the intercept. In the yearwise analysis correcting genotype adjusted means for year effects (Approach 1b), the model used did not include a fixed year factor (since we had already adjusted for it) but a common intercept, thus the model was the same as for acrossyears analysis.
To measure the influence of the relationship among the genotypes on the predictions, we used the adjusted means obtained in the second stage and the pedigree information of the entries in a mixed model testing genotypes and crosses as random effects, so that the variances of both effects would give us an estimation of how much the variation is attributed to the pedigree, e.g. the crosses. The model was
where ${M}_{\mathit{\text{ah}}}^{\left(2\right)}$ is the adjusted mean of the hth genotype obtained in the second stage, G_{ h } is the effect of the hth genotype, C_{ a } is the effect of the ath grand parent (gp) cross, e.g. (gp1 × gp2) × (gp3 × gp4), and e_{ a h } the associated error. Additionally, we plotted the relationship heatmap of estimated coefficients of relatedness for individuals based on marker data computed according to Wimmer et al. [26].
Cross validation for model comparison
To evaluate model performance, kfold CV was carried out. In CV, the data is split into k subsets t times. k1 subsets are used as the training set (TS) and the one other subset is the validation set (VS). The TS is used to estimate the parameters that then are used to predict the observations in the VS. The performance of the model was assessed by the Pearson correlation coefficient between the predicted GEBV and the corresponding observations of the VS. This correlation is referred to as predictive ability [23]. As in the first stage, the predictive ability was not adjusted by the square root of the heritability. Although breeding programs are most of the time operating with closely related genotypes, breeders are also interested in knowing the results in a scenario with more distantly related genotypes, for example, using genotypes that share the same grandparents either in the TS or in the VS but not in both. Hence, we wanted to check if accounting for the effect of population structure in the randomisation of CV would make the spatial error models improve the predictive abilities. We chose two scenarios given the relatedness level of the entries and followed the suggested sampling schemes from Albrecht et al. [27], which takes into consideration this fact in the CV procedure. In the first sampling scheme, hereafter called “within crosses” (WC), random sampling is done using all genotypes in the dataset; in the second scheme, hereafter referred to as “across crosses” (AC), genotypes were clustered by cross, so that complete crossgroups were used randomly either in the VS or the TS. There were 349 crosses of different sizes, sharing none, one or two grand parents. The general overview of the methodology is depicted in Figure 2.
Model selection
Two strategies for selecting the best phenotypic model were used in the first stage. In strategy one the best model for all locations is selected, that is, there is no model selection per location but across locations. In strategy two, model selection is locationspecific (Figure 3). For both strategies we computed the AIC and performed genomic predictioncross validation (GPCV), both per locationyear combination. To accomplish the GPCV approach, we used the adjusted means per location and year of all spatial and nonspatial models. Then, means of genotypes by yearlocation combination were joined with the molecular marker data to perform GPCV, in which genetic values were regressed on markers and validation of the model was done using kfold CV. Predictions of unobserved records and predictive abilities of each model were obtained for each yearlocation combination. We assessed the predictive ability of the models using the Pearson correlation coefficient (ρ) between the predicted GEBV and the observed phenotypic value. Hereafter we denote this predictive ability as ρGPCV. Predictive abilities were not adjusted with the square root of the heritability, as suggested by Dekkers [28], since this adds an extra error due to heritability computation [15, 23].
For strategy one (across locations model selection), the number of locations with the best fits (either AIC or ρGPCV) was counted, so that the model with the best fits in the majority of locations was identified as the best model. For strategy two (locationspecific model selection), two datasets were built: “Mix 1”, containing the adjusted means of the locations with the best fit according to the AIC and “Mix 2”, containing the adjusted means of the locations with the highest ρGPCV. Thus, after the first stage we had in total eleven data sets of adjusted means, nine corresponding to each tested model from strategy one, plus two more datasets from strategy two: A mixed data set (Mix 1) with the best models per locationyear according to the AIC, and another mixed set (Mix 2) with best models per locationyear according to the ρGPCV.
Softwares
All analyses were performed using SAS. Stage 1 and 3 used the MIXED procedure and Stage 2 used PROC HPMIXED. Relationship matrix was calculated using the Synbreed Package [29] for R 2.15.
Results
First stage  strategy 1: Model selection across locations
In the first stage  strategy 1, we did model selection across locations using AIC and predictive abilities (ρGPCV) per locationyear combination. According to the AIC, the results favoured the twodimensional models (Table 5). To do a fair comparison between selection methods using AIC and ρGPCV, we first describe AIC for years 2009 and 2010, for which ρGPCV were also available and then, as additional information, for year 2012, for which ρGPCV was not available since the marker information was missing.
For years 2009 and 2010, M9 and M8 had the majority of best fits across locations. M9 (Baseline + row + column and AR(1) × AR(1) + nugget) resulted in 12 out of 22 cases as the best model. M8 (Baseline and AR(1) × AR(1) + nugget) was best in 7 out of 22 cases. The baseline model + row + column (M2) fitted the best 9% of the times and M6 5% of the times.
A similar tendency was observed in 2012, where 43% of the times (6 out of 14) M9 had the best fit and M8 was best 29% of the times. For this year 2012, models M7, M8 and M9 could not be fitted in some locations. Another third of the times (29%), M2 had best fits. Interestingly, M2 had the best fits in the locations that had convergence problems for models M8 and M9. M1, M3, M4, M5 and M7 never had best fits in any of both groups of years.
The predictive abilities (ρGPCV) per locationyear combination showed a rather different pattern for best models within locations; however, the twodimensional models were also more frequently selected than onedimensional models (Table 6). M8 (Baseline and AR(1) × AR(1) + nugget) showed in seven of 22 settings the highest ρGPCV per locationyear combination followed by M9 (Baseline + row + column and AR(1) × AR(1) + nugget) with six out of 22 times. The baseline model + row + column (M2) was selected twice and models M3, M4 and M6 had also one, three and three selections out of 22, respectively. M1, M5 and M7 had no best fits at all.
One location of 2009 (PL3) produced a negative predictive ability for all models. We did not consider this location in the counting of best fits, since a higher negative number is actually a worse fit in regard to predictions, but low or high negative are both interpreted as zero prediction. Despite the negative correlations, this location was included in the mixed datasets produced from the sitespecific model selection. We used the adjusted means produced from the baseline model. Another location (GL1 2009) showed way lower predictive abilities than the rest of the locations. To understand these two situations, we calculated the repeatability of the trait in each location for the baseline model. The repeatability R is defined as the ratio of the betweenindividual component to the total phenotypic variance [30], which in our case, and following the methodology described by Nakagawa and Schielzeth [31], corresponds to
where ${\sigma}_{\mathit{\text{GT}}}^{2}$ is the betweengroups variance and corresponds to the variance of the effect (G T)_{ h v } fitted as random effect, and in the denominator, the total phenotypic variance given by the sum of the betweengroups variance ${\sigma}_{\mathit{\text{GT}}}^{2}$ and the withingroups variances, i.e. replicates within trials $\left({\sigma}_{S}^{2}+{\sigma}_{R}^{2}\right)$ and blocks within replicates $\left({\sigma}_{B}^{2}\right)$ plus the residual variance $\left({\sigma}_{e}^{2}\right)$. The interpretation of this repeatability strictly refers to the expected withingroup correlations among measurements, i.e. the agreement among measurements; thus, the gist of the definition of repeatability is related to the reproducibility of the absolute values of measurements. A slightly higher repeatability in Cycle12009 was observed for location GL4 (Table 6), which involved more trials, i.e. more genotypes, in comparison with other locations in Germany. The trend in Cycle12010 was in favour of the Polish locations, which overal l had more homogeneous and higher repeatabilities. We discuss the relation between repeatabilities and predictive abilities in the next section.
Second stage: fitting genotypes by year vs. across years
From a methodological point of view, fitting the year effect in the GP stage was easier and more direct than accounting for the year effect in the second stage, in the sense that the model for the latter approach became too complex and the variance covariance matrix of adjusted means was not possible to be produced using the procedure HPMIXED of SAS given the high computer power required. Instead, we computed the adjusted means with corresponding standard errors, which were then used to do the weighting to pass on from the second to the third stage.
The adjusted means obtained from the acrossyears analysis (Approach 2) were plotted against the year effectcorrected genotype adjusted means (from Approach 1b) to compare the difference of adjustments, in the former case based on one single check against the adjustment given the simple mean of the genotypes in each year (Figure 4). Below the two principal lines, an observation corresponding to the shared check across years stood out from the others, reflecting the year adjustment. At first glance, it is clear that the check was the only observation pulled down implying that the year adjustment of this check was not strong enough to pull down the observations of the whole year. Both approaches were examined later using the predictive abilities obtained in the GP stage.
Third stage: genomic prediction
The predictive abilities of the GP stage were taken as the definitive decision criterion for identifying the best strategy for model selection, the best model, and the most reliable approach to account for year effects, and to identify the consequences of population stratification in GP. We start by presenting results of the comparison of the approaches used for fitting the year effect, since with these we only used the baseline model. Then we present the differences between sampling methods for CV together with the comparison of the models and the model selection strategies.
Comparison of approaches to account for year effect in GP
The GPCV for the approach using the year as a fixed term in the third stage (Approach 1a) yielded a predictive ability of 0.70 (Table 7), whereas predictive ability for the approach accounting for a fixed year effect in the second stage (Approach 2) was 0.74. The predictive ability reached 0.68, using the yeareffectcorrected adjusted means in the GPCV (Approach 1b). The scatter plots of GEBV ($\mathbf{Z}\widehat{\mathbf{u}}$) against the observed phenotypic values (adjusted means) in the three cases are depicted in Figure 5. In Approach 1a, we plotted the GEBV against the corrected observed phenotypic values, calculated as ${M}^{\left(2\right)}X\widehat{\mathbf{\beta}}$, where M^{(2)} is the vector of genotype adjusted means obtained in the second stage and $\mathbf{X}\widehat{\mathbf{\beta}}$ the predicted year effect (Figure 5A). For Approach 2, the observed phenotypic values M^{(2)} against $\mathbf{Z}\widehat{\mathbf{u}}$ are shown (Figure 5B). For Approach 1b, M^{(2∗)} against $\mathbf{Z}\widehat{\mathbf{u}}$ are plotted, with M^{(2∗)} the yeareffectcorrected adjusted means of genotypes (Figure 5C).
Comparison of model selection strategies using different sampling methods in cross validation
Fitting model (6) to measure the influence of the relationship among genotypes on predictions yielded variance components for genotypes, crosses and error for year 2009 of 4.03, 3.67 and 1.66, respectively, and for year 2010 of 4.72, 10.70 and 1.32, respectively. Thus, the cross effect in 2009 is contributing in about 40% and in the next year more than 60% to the total variation explained by the data.
The markerbased relationship heatmap (Figure 6) shows some clusters among genotypes of the same cross indicating genetic relatedness. The predictive abilities using five times 5fold CV of datasets resulting from first stage analysis of all spatial and nonspatial models plus the mixed datasets were in general very similar within sampling strategies (Table 7). For the acrosscrosses (AC) sampling scheme, the predictive abilities were lower than the ones obtained with the withincrosses (WC) sampling scheme. In the AC sampling, we fixed the initial seed of the random number generator used for randomization in the CV procedure at the same value for all models to be able to compare the models when the same crosses were used in the training set.
We compared the models and the sampling methods using a paired ttest (α=5%) by resembling a randomized complete block design, where the predictive ability of each repetition of the CV was taken as a block, thus accounting for the dependence among observations from the same samples (Table 7). For the first sampling method (WC), three groups were identified with some overlaps, but showing not much of a difference among models. From the acrosscrosses sampling strategy (AC), five groups were distinguished with some overlaps: M2 had the highest predictive ability and models M4, M6 and M7 had the worst predictive abilities.
Potential bias of GP is another important element that could be used to compare models. We computed the bias as suggested by [32, 33]. The comparison of the biases of all models followed a rather similar trend as the predictive abilities showed in Table 7. We present the analysis of bias as supplementary material (Additional file 2).
The heritability (square root of heritability) for the baseline model was estimated as 0.68 (0.82) for year 2009, 0.73 (0.85) for year 2010 and 0.69 (0.83) for 2012 using the equation (3). In principle, the ad hoc method may approximate the true value of heritability but making the unrealistic assumption of uncorrelated genotypes [23]. We computed the heritability to have a rough idea of how much could we expect from the predictive abilities. The predictive ability divided by square root of heritability is an estimate of the accuracy of GP [23], and the square root of the heritability provides the upper bound for the predictive ability [30], thus one expects that the predictive abilities are not very far from the square root of heritability. In this case, the square roots of the heritabilities are somewhat larger than the corresponding predictive abilities, indicating that the predictions are not sufficiently accurate due to limited data size, thus not exhausting completely the genetic variance. To explore in which extend could have our models explained the variance not captured by the markers, we fitted an additional component accounting for the polygenic effect in the GP stage [24]. The baseline model (M1) yielded a genotypic variance of 2.99; when we incorporated the polygenic effect, the genotypic variance was 2.72 and polygenic variance was 0.36, indicating that about 88% of the total genetic variance was captured by the RRBLUP model.
Discussion
Selecting the models at the first stage produced different results than assessing them in the third stage. AIC had better scores for the models that used row and column effects, e.g. Models M9, M6 and M2 (Table 5) or M8 that had a 2dimensional variancecovariance error structure. ρGPCV also picked M8 and M9 (Table 6) but the choices were more spread over the models covering even the baseline model. In general, in the first stage, both AIC and ρGPCV produced better scores for the twodimensional models, whereas in the third stage the baseline and onedimensional models seemed to be better than the more complex models (Table 7). The explanation of this pattern may be related to the second stage, where the interaction genotype × location played a role. The twodimensional models performed very well in modelling heterogeneity within field, but when the means were integrated across the whole experiment, including all locations and years, the twodimensional spatial error models seemed to overadjust the means, yielding a poorer predictive ability in the GP stage. The onedimensional spatial error models and the twodimensional model without spatial error structure were sufficient to estimate appropriately adjusted means. This corroborates Piepho and Williams [21] who concluded that for small portions of a field, a particular spatial model may hold well but if fitted all across the field it may fail. In a wheat experiment, Lado et al. [34] found that using moving averages as covariable significantly improved the predictive abilities of GP. They recognised strong heterogeneous patterns of irrigation in the field, that were not controlled with a single blocking system.
Models M1, M3 and M7 were never selected as having the best fits either by AIC or ρGPCV. These models had in common that none of them used rows and columns as additional factors, strengthening the conclusion that rowcolumn designs may have the potential to correctly control field heterogeneity and thus enhance predictive ability of genomic prediction.
Fitting a locationspecific error model did not have an advantage over fitting a common model across locations. Neither did the dataset composed of means computed using models have best AIC fits (Mix 1) nor the second dataset containing the means computed using models with highest ρGPCV (Mix 2) produce better predictive abilities in the GP stage.
The models with nugget had better fits than the corresponding baseline model without the nugget. The drawback was that fitting those models was not straightforward, since almost every location required a separate coding specifying initial values and lower boundary constraints on the covariance parameters. Good statistical and biological reasons have been presented of why including a nugget to analysis of field experiment is beneficial [35].
If we ignore the twodimensional spatial models (M5, M6, M8 and M9), the AIC privileges M2 and ρGPCV yields more diverse results with the majority of choices for M2 and M4. In fact, when the spatial component of a resolvable rowcolumn design based on linear variance (LV) does not lead to an improved fit, returning to classical rowcolumn design provides randomisation protection [36].
Williams and Luckett [37] performed studies aiming to find the optimal plot size, the optimal plot arrangements and the best spatial model (the socalled uniformity trials) and showed that in cotton and barley row and column designs are well suited for variety testing in plant breeding trials. Moreover, recent simulation studies from Möhring et al. [38] showed that designs including rows and columns outperformed onedimensional blocking. In the same work, the authors mention that blocking in the direction of plots with common long sides is preferable, which is common in cereal breeding [39].
We cannot affirm that ρGPCV was better than AIC for model selection or vice versa, nor that the results showed the same trend; but if we would have used either of these two strategies to select the best model, we would have selected the M9 with AIC or M8 with ρGPCV. The GP predictive ability obtained by M2 (Table 7) was slightly better than M8 and M9 (specifically AC sampling method); however, this model (M2) was not highlighted by either of the two selection criteria (AIC or ρGPCV).
In practice, the fact that there were no large statistical differences is good news for the breeders because the baseline model (M1), or even better, the simplest model with rowcolumn adjustment (M2), are appropriate for phenotypic analysis towards GP.
As a model selection method, GPCV is of interest because it may allow to compare models with different fixed effects, even when REML is used for estimating the variance parameters. No simple recommendation has been reported concerning the best model selection criterion in the case of spatial models [13, 40]. Predictive abilities have been used between environments as similarity measure and then to join similar environments into clusters [15]. Thus, in a sense ρGPCV allows giving an interpretation to the environment under scrutiny and the displayed trend do not depart far from the classical AIC. The repeatabilities (R) presented in parallel to the ρGPCV (Table 6) show a low correlation (ρ=0.36, pvalue = 0.0965) with the predictive abilities from the baseline model. In fact, we expected that for location PL3 of 2009, which had a negative predictability, the R was very low almost zero, but this was not the case; hence we could not conclude that the low predictive ability is mainly due to environmental effects. Riedelsheimer et al. [41] also reported negative predictive accuracies when testing unrelated crosses in the CV procedure and observed that using unrelated crosses could have provided a negative prediction signal due to opposite linkage phases with important QTL displayed in the TS, suggesting that the negative predictive accuracies are associated with the marker pattern.
In this study we explored three ways to adjust the year effect given the weak connectivity across years. Using the single check (Approach 2) to make the year adjustment was not a better choice than adjusting by the simple year mean (Approach 1b) or accounting for the year effect in the GP stage (Approach 1a), even though the estimated predictive ability was the highest. The “year clouds” produced using Approach 2 (Figure 5B) did not overlap perfectly, from which we concluded that the correction was not appropriate and generated an overfitting of the markers in the GPCV procedure due to the fact that markers also predicted the year effect and not the SNPeffects alone. Using the yearmean correction for adjusted means in the second stage (Approach 1b) produced a lower ρGPCV, that, given the overlay of the clouds of predicted vs. observed values, seems to be more realistic. However, fitting the year effect manually, i.e. using ordinary least squares estimation (OLSE) vs. fitting it as a fixed effect in the GP stage, i.e. using generalised least squares estimation (GLSE) can definitively yield a more precise estimate. Indeed, the residual variance in Approach 1b using year effectcorrected adjusted means was around 3.9 (in average for the five replicates) and in Approach 1a using the year fixed effect in the GP stage yielded residual variance of 3.0 (in average for the five replicates). In Approach 1a, where we fitted the year in the GP stage, we removed the year effect from the observed adjusted means derived from the second stage $\left({M}^{\left(2\right)}X\widehat{\mathbf{\beta}}\right)$ to avoid bias of the predictive abilities; however, there would still be some bias because the subtracted year effect was not the true effect but an estimate of the year effect.
Models were eventually assessed and compared using the ρGPCV in the third stage. The two sampling scenarios to perform the CV procedure aimed to recreate the cases where the material was genetically close, with some individuals coming from the same parental cross, and more distantly related to avoid individuals from the same parental cross in the randomisation procedure of CV. This more distantly related material shows some identicalbystate (IBS) similarity, therefore it was not unrelated in the theoretical sense of population genetics. This more distantly related scenario may be seen also as a case where one tries to predict a scenario whose linking information is weak or lacking, e.g. different genotypes and/or locations in the TS and VS [42–44].
The predictive abilities obtained for GP using WC sampling were located in the middlehigh range and using AC sampling, predictive abilities were placed in the middle range. The predictive ability of the AC sampling was significantly lower than WC, as expected for GP of a dataset showing population structure. Riedelsheimer et al. [41] drew similar conclusions using unrelated biparental maize families. They concluded that predictive accuracy could be increased by adding crosses (families) sharing both parents to the TS. In this respect, the use of pedigree and marker information to borrow information from both sources is suggested [44].
Conclusions
The main conclusions of this study are: (i) Fitting a traditional model including row and column factors across all locations was good enough to account for field heterogeneity in the first stage under GP frame. This also suggests that rowcolumn designs may be preferable to designs with a single blocking factor; (ii) AIC and ρGPCV did not have the same trend in selecting across models, but both favoured in the end models M8 and M9; however, none of the methods picked the model with highest predictive ability. Fitting a locationspecific error model did not produce an advantage over fitting a common model across locations; (iii) the baseline model (M1) and the simplest rowcolumn adjustment (M2) had in overall the best results, which is very good news since in routine analysis complex models may require much programming expertise and powerful computers; (iv) in a dataset weakly connected across years, a more reasonable modelwise structure is to account for the year factor in the genomic prediction stage rather than in a previous stage, to ensure that the effect is not confounded with the markers adjustment, and (v) datasets of distantly related genotypes may have a poor performance for GP purposes; however, increasing the size of the crosses may be an opportunity to enhance predictive ability in these cases of disconnected datasets on related sets of genotypes.
Abbreviations
 GP:

Genomic prediction
 GEBV:

Genomic estimated breeding value
 SNP:

Single nucleotide polymorphism
 MET:

Multienvironment trial
 CV:

Cross validation
 AIC:

Akaike information criterion
 BIC:

Bayesian information criterion
 REML:

Restricted maximum likelihood
 ML:

Maximum likelihood
 MAF:

Minor allele frequency
 ρGPCV:

Predictive ability
 BLUE:

Best linear unbiased estimator
 RRBLUP:

Ridge regressionbest linear unbiased predictor
 VS:

Validation set
 TS:

Training set
 WC:

Withincrosses
 AC:

Acrosscrosses
 OLSE:

Ordinary least squares estimation
 GLSE:

Generalised least squares estimation
 IBS:

Identicalbystate.
References
 1.
Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001, 157: 18191829.
 2.
SchulzStreeck T, Ogutu JO, Piepho HP: Comparisons of singlestage and twostage approaches to genomic selection. Theor Appl Genet. 2013, 126: 6982.
 3.
Piepho HP, Möhring J, SchulzStreeck T, Ogutu JO: A stagewise approach for the analysis of multienvironment trials. Biom J. 2012, 54: 844860.
 4.
Burgueño J, Crossa J, Cotes JM, San Vicente F, Das B: Prediction assessment of linear mixed models for multienvironment trials. Crop Sci. 2011, 51: 944954.
 5.
Piepho HP, Möhring J, Melchinger AE, Büchse A: Blup for phenotypic selection in plant breeding and variety testing. Euphytica. 2008, 161: 209228.
 6.
Smith A, Cullis B, Gilmour A: The analysis of crop variety evaluation data in Australia. Aust NZ J Stat. 2001, 43: 129145.
 7.
Crossa J, Burgueño J, Cornelius PL, McLaren G, Trethowan R, Krishnamachari A: Modelling genotype x environment interaction using additive genetic covariances of relatives for predicting breeding values of wheat genotypes. Crop Sci. 2006, 46: 17221733.
 8.
Besag J, Kempton R: Statistical analysis of field experiments using neighbouring plots. Biometrics. 1986, 42: 231251.
 9.
Möhring J, Piepho HP: Comparison of weighting in twostage analysis of plant breeding trials. Crop Sci. 2009, 49: 19771988.
 10.
Cullis B, Gogel B, Verbyla A, Thompson R: Spatial analysis of multienvironment early generation variety trials. Biometrics. 1998, 54: 118.
 11.
Duarte JB, Vencovsky R: Spatial statistical analysis and selection of genotypes in plant breeding. Pesqui Agropecu Bras. 2005, 40: 107114.
 12.
Zimmerman DL, Harville DA: A random field approach to the analysis of fieldplot experiments and other spatial experiments. Biometrics. 1991, 47: 223239.
 13.
Spilke J, Richter C, Piepho HP: Model selection and its consequences for different splitplot designs with spatial covariance and trend. Plant Breed. 2010, 129: 590598.
 14.
Searle SR, Casella G, McCulloch CE: Variance Components. 1992, Hoboken: John Wiley & Sons
 15.
Heslot N, Jannink JL, Sorrells ME: Using genomic prediction to characterize environments and optimize prediction accuracy in applied breeding data. Crop Sci. 2013, 53: 921933.
 16.
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D: Efficient control of population structure in model organism association mapping. Genetics. 2008, 178: 17091723.
 17.
Piepho HP, Buechse A, Emrich K: A hitchhiker’s guide to mixed models for randomized experiments. J Agron Crop Sci. 2003, 189: 310322.
 18.
Williams ER: A neighbour model for field experiments. Biometrika. 1986, 73: 279287.
 19.
Piepho HP, Richter C, Williams E: Nearest neighbour adjustment and linear variance models in plant breeding trials. Biom J. 2008, 50: 164189.
 20.
Gilmour A, Cullis B, Verbyla AP: Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat. 1997, 2: 269293.
 21.
Piepho HP, Williams ER: Linear variance models for plant breeding trials. Plant Breed. 2010, 129: 18.
 22.
Piepho HP, Möhring J: Computing heritability and selection response from unbalanced plant breeding trials. Genetics. 2007, 177: 18811888.
 23.
Ould Estaghvirou SB, Ogutu JO, SchulzStreeck T, Knaak C, Ouzunova M, Gordillo A, Piepho HP: Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding. BMC Genomics. 2013, 14: 860
 24.
Piepho HP: Ridge regression and extensions for genomewide selection in maize. Crop Sci. 2009, 49: 11651176.
 25.
Piepho HP, Ogutu JO, SchulzStreeck T, Estaghvirou B, Gordillo A, Technow F: Efficient computation of ridgeregression best linear unbiased prediction in genomic selection in plant breeding. Crop Sci. 2012, 52: 10931104.
 26.
VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91: 44144423.
 27.
Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schön CC: Genomebased prediction of testcross values in maize. Theor Appl Genet. 2011, 123: 339350.
 28.
Dekkers JCM: Prediction of response to markerassisted and genomic selection using selection index theory. J Anim Breed Genet. 2007, 124: 331341.
 29.
Wimmer V, Albrecht T, Auinger HJ, Schön CC: synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics. 2012, 28: 129.
 30.
Falconer DS, Mackay TFC: Introduction to Quantitative Genetics, 4th edn. 1996, Harlow: Pearson Prentice Hall
 31.
Nakagawa S, Schielzeth H: Repeatability for gaussian and nongaussian data: a practical guide for biologists. Biol Rev. 2010, 85: 935956.
 32.
Le Roy P, Filangi O, Demeure O, Elsen JM: Comparison of analyses of the XVth QTLMAS common dataset III: genomic estimations of breeding values. BMC Proc. 2012, 6 (Suppl 2): 3
 33.
Wang CL, Ma PP, Zhang Z, Ding XD, Liu JF, Fu WX, Weng ZQ, Zhang Q: Comparison of five methods for genomic breeding value estimation for the common dataset of the 15th QTLMAS Workshop. BMC Proc. 2012, 6 (Suppl 2): 13
 34.
Lado B, Matus I, Rodríguez A, Inostroza L, Poland J, Belzile F, Del Pozo A, Quincke M, Castro M, von Zitzewitz J: Increased genomic prediction accuracy in wheat breeding through spatial adjustment of field trial data. G3. 2013, 3: 21052114.
 35.
Wilkinson GN, Eckert SR, Hancock TW, Mayo O: Nearest neighbour (nn) analysis of field experiments. J R Stat Soc Ser B Stat Methodol. 1983, 45: 151211.
 36.
Williams ER, John JA, Whitaker D: Construction of resolvable spatial rowcolumn designs. Biometrics. 2006, 62: 103108.
 37.
Williams ER, Luckett DJ: The use of uniformity data in the design and analysis of cotton and barley variety trials. Aust J Agric Res. 1988, 39: 339350.
 38.
Möhring J, Williams ER, Piepho HP: Efficiency of augmented prep designs in multienvironmental trials. Theor Appl Genet. 2014, 127: 10491060.
 39.
Patterson HD, Hunter EA: The efficiency of incomplete block designs in national list and recommended list cereal variety trials. J Agric Sci. 1983, 101: 427433.
 40.
Lee H, Ghosh SK: Performance of information criteria for spatial models. JSCS. 2009, 79: 93106.
 41.
Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink JL, Melchinger AE: Genomic predictability of interconnected biparental maize populations. Genetics. 2013, 194: 493503.
 42.
SchulzStreeck T, Ogutu JO, Gordillo A, Karaman Z, Knaak C, Piepho HP: Genomic selection allowing for markerbyenvironment interaction. Plant Breed. 2013, 132: 532538.
 43.
Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sorrells ME, Raman B, Cairns JE, Tarekegne A, Semagn K, Beyene Y, Grudloyma P, Technow F, Riedelsheimer C, Melchinger AE: Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3. 2012, 2: 14271436.
 44.
Burgueño J, de los Campos G, Weigel K, Crossa J: Genomic prediction of breeding values when modeling genotype x environment interaction using pedigree and dense molecular markers. Crop Sci. 2012, 52: 707719.
Acknowledgements
We thank KWSLOCHOW for providing the datasets used in this study. We are grateful to the Synbreed and RyeSelect project members for their helpful and constructive comments during the discussion sessions. This research was funded by KWSLOCHOW GMBH and the German Federal Ministry of Education and Research (Bonn, Germany) within the AgroClusterEr “RyeSelect: Genomebased precision breeding strategies for rye” (Grant ID: 0315946A);
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
AMBV participated in the design of the study, conducting the analysis, interpreting the results, writing and editing the manuscript. JM participated in the design of the study, conducting the analysis and drafting the manuscript. MSchm supervised the collection of the KWSLOCHOW data set and implemented the models for the whole dataset. MSchö and CCS were responsible for data controlling and participated in the conception of the study. HPP conceived the study, participated in its design, writing and editing the manuscript and oversaw the project. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
BernalVasquez, A., Möhring, J., Schmidt, M. et al. The importance of phenotypic data analysis for genomic prediction  a case study comparing different spatial models in rye. BMC Genomics 15, 646 (2014). https://doi.org/10.1186/1471216415646
Received:
Accepted:
Published:
Keywords
 Stagewise analysis
 Genomic prediction
 Cross validation
 Spatial models
 Multienvironment trials (MET)
 Restricted maximum likelihood (REML)