Skip to main content

New cycle, same old mistakes? Overlapping vs. discrete generations in long-term recurrent selection

Abstract

Background

Recurrent selection is a foundational breeding method for quantitative trait improvement. It typically features rapid breeding cycles that can lead to high rates of genetic gain. Usually, generations are discrete in recurrent selection, which means that breeding candidates are evaluated and considered for selection for only one cycle. Alternately, generations can overlap, with breeding candidates considered for selection as parents for multiple cycles. With recurrent genomic selection but not phenotypic selection, candidates can be re-evaluated by using genomic estimated breeding values without additional phenotyping of the candidates themselves. Therefore, it may be that candidates with true high breeding values discarded in one cycle due to underestimation of breeding value could be identified and selected in subsequent cycles. The consequences of allowing generations to overlap in recurrent selection are unknown. We assessed whether maintaining overlapping and discrete generations led to differences in genetic gain for phenotypic, genomic truncation, and genomic optimum contribution recurrent selection by stochastic simulation.

Results

With phenotypic selection, overlapping generations led to decreased genetic gain compared to discrete generations due to increased selection error bias. Selected individuals, which were in the upper tail of the distribution of phenotypic values, tended to also have high absolute error relative to their true breeding value compared to the overall population. Without repeated phenotyping, these individuals erroneously believed to have high value were repeatedly selected across cycles, leading to decreased genetic gain. With genomic truncation selection, overlapping and discrete generations performed similarly as updating breeding values precluded repeatedly selecting individuals with inaccurately high estimates of breeding values in subsequent cycles. Overlapping generations did not outperform discrete generations in the presence of a positive genetic trend with genomic truncation selection, as individuals from previous breeding cycles typically had truly lower breeding values than candidates from the current generation. With genomic optimum contribution selection, overlapping and discrete generations performed similarly, but overlapping generations slightly outperformed discrete generations in the long term if the targeted inbreeding rate was extremely low.

Conclusion

Maintaining discrete generations in recurrent phenotypic selection leads to increased genetic gain, especially at low heritabilities, by preventing selection error bias. With genomic truncation selection and genomic optimum contribution selection, genetic gain does not differ between discrete and overlapping generations assuming non-genetic effects are not present. Overlapping generations may increase genetic gain in the long term with very low targeted rates of inbreeding in genomic optimum contribution selection.

Peer Review reports

Background

Quantitative trait improvement is achieved by cyclically increasing mean genetic value of breeding populations via recurrent selection. Recurrent phenotypic selection, reviewed by Hallauer & Darrah [1], is a breeding strategy in which top-performing individuals are selected from a population and crossed to generate a new population for selection in the subsequent breeding cycle [1,2,3]. Recurrent phenotypic selection likely began with the invention of agriculture and is widely used to this day for quantitative trait improvement [1, 4,5,6,7]. The advantage of this breeding strategy is that the breeding cycle length is short, as individuals can be selected as parents soon after they are born. Shorter cycle length leads to faster genetic gain, which is the rate of increase in mean genetic value due to selection in a population over time [8].

The main disadvantage of phenotypic selection is that selection accuracy tends to be low, because individuals are selected based on a single phenotypic observation, and selection accuracy directly impacts the rate of genetic gain [1]. This disadvantage is exacerbated at low trait heritabilities, as phenotypes are less indicative of true breeding values [5]. Different breeding schemes to improve the accuracy of phenotypic selection have been developed which involve testing families of progeny of selection candidates (e.g. half-sibs, full-sibs, or inbred lines) across multiple replicates or environments [1]. Most applied breeding programs of cereal crops are currently practicing some form of recurrent selection among families, especially inbred families. While recurrent selection by family improves accuracy, it also increases the breeding cycle length, which limits the rate of genetic gain that can be realized.

With the availability of genomic selection, recurrent selection schemes are being modified to use genomic estimated breeding values (GEBVs) rather than single phenotypic observations for parent selection [9,10,11,12,13,14]. This is often referred to as “rapid-cycling genomic selection” [14]. This approach can improve selection accuracy without increasing the breeding cycle length, thus increasing the rate of genetic gain. Recurrent phenotypic and genomic selection fundamentally differ in that estimates of breeding value based on phenotype are defined at the individual level, whereas GEBVs are defined at the marker or population level [10]. In recurrent phenotypic selection, individuals are phenotyped once prior to selection, and this comprises the only assessment of the individuals’ breeding values. In genomic selection, observations of marker effects or genetic relationships increase in number as new relatives are phenotyped. Thus, the accuracy of estimates of individual breeding values increases with genomic prediction even in absence of additional phenotypic data for evaluated individuals [10]. For example, an individual with a high true breeding value may have a low estimated breeding value in a given genomic selection cycle due to error, but in a subsequent cycle its breeding value estimate may be higher—in better agreement with its true breeding value—as the prediction model is updated with information from relatives.

This raises the question: if possible, should individuals from previous selection cycles be considered again as selection candidates in subsequent cycles? Or, in other words, should generations be allowed to overlap in phenotypic and genomic recurrent selection programs? Conventionally, individuals are only considered as candidates for selection during the cycle when they are evaluated. However, in clonally propagated or perennial species, non-inbred individuals could be selected directly as parents for multiple seasons. In self-compatible species with multiple inflorescences, selected individuals could be self-pollinated and the resultant seed could be used for crossing in multiple selection cycles, even though the selfed progeny would not be identical to the parent genotype. In line breeding, inbred lines can be re-used indefinitely. In practice, it is common for plant breeders to recycle favored parents across cycles of selection, leading to overlap, even if the parent has not been phenotyped and statistically evaluated alongside the current selection candidates. The effect on genetic gain of maintaining discrete or overlapping selection generations has not been formally evaluated or reported. Given that selection accuracy may vary with cycle in breeding individuals from previous generations in genomic but not phenotypic selection, we hypothesized that allowing overlapping generations may be more favorable for rapid recurrent genomic selection compared to rapid recurrent phenotypic selection. Unexpectedly, we found that overlapping generations decreased the rate of genetic gain under phenotypic selection compared to discrete generations.

This study had two primary objectives: (1) to determine if generations should be overlapping or discrete in phenotypic and genomic recurrent selection programs, and (2) to determine in what selection scenarios overlapping and discrete generations can be recommended for recurrent selection. The effects of overlapping and discrete generations on average parental age, genomic inbreeding, genetic variance, and the selection accuracy were also examined.

Methods

Stochastic simulations in the R package AlphaSimR were conducted to examine various recurrent selection scenarios [15]. All simulations were run on the Biocluster High Performance Computing system housed in the Carl R. Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign and maintained by the Computer Network Resource Group. Two main trait and pipeline architectures were considered: (1) recurrent selection on a purely additive trait in a single cohort per breeding cycle (RS-A), and (2) recurrent selection on a trait with additive, year, and additive x year effects with multiple cohorts per breeding cycle (RS-AY). For both architectures, an outbred, diploid, hermaphroditic founder population was generated with the runMacs function. Individuals had ten chromosomes with 1000 segregating sites per chromosome.

RS-A scenarios

For the RS-A scenarios, with the purely additive trait, 100 sites per chromosome were assigned additive effects and 50 sites per chromosome were genotyped by a simulated SNP-chip. As such, the same 50 sites per chromosome were genotyped every cycle, with random distribution of the sites on the chromosome. The sites were assigned additive effects by drawing effects from a normal distribution and scaling the effects to achieve an additive genetic variance of 1; for further details, please see Gaynor, 2021 [15, 16]. All remaining segregating sites were neutral and ungenotyped. Supplementary File 1 contains the script used to generate the base founder population. To start each simulation replicate, 100 individuals were drawn from the founder population. Starting mean genetic value was 0, genetic variance was 1, and starting narrow-sense heritabilities were either 0.1, 0.5, or 0.9. To form phenotypes for a given genotype, random error was added to the true breeding value. The random error was drawn from a normal distribution with the appropriate error variance to achieve the scenario narrow-sense heritability. In the first year, 20 parents were selected phenotypically. See Supplementary File 2 for the script used to start each simulation. After the first year, a breeding cycle consisted of crossing the selected parents, phenotypic evaluation and parent selection before flowering, then restarting the cycle by making 100 random crosses of the selected parents which produced 1 progeny per cross (Fig. 1). The number of crosses and progeny per cross were arbitrary and unlikely to affect the comparison of overlapping and discrete generations.

Fig. 1
figure 1

Overview of recurrent mass selection scheme for RS-A scenarios. For the RS-A scenarios, only the parental selection units varied in this study. For an overview of the RS-AY scenarios, see the Conventional scenario in Gaynor et al., 2017 [18]

Several factors were considered in the RS-A scenario (Fig. 2). Parents were selected from either discrete or overlapping generations. For discrete generations, parents were only selected from the current breeding cycle. For overlapping generations, parents were selected from any breeding cycle. Then, the selection on either phenotypic value, true breeding value, or GEBV as estimated by ridge regressed best linear unbiased prediction (RR-BLUP) was used. In phenotypic selection only, selection on either unreplicated phenotypes or thrice-replicated phenotypes was considered; in all other cases, phenotypes were unreplicated. To replicate phenotypes, the cycle error variance used to draw error values was divided by the number of replicates (3), and the phenotype was created by adding the true breeding value and the error value. In the case of genomic selection only, truncation vs. optimum contribution selection (OCS), as well as training the model on all generations (allGen) vs. training on the most recent previous five generations (fiveGen) to mimic what may occur in practical situations were also considered (Fig. 2). If selection occurred on phenotype or true breeding value, truncation selection of the top 20 individuals was always used. In the genomic selection scenarios, either truncation selection of the top 20 individuals was used or OCS was used with minimum effective population sizes (Ne) of 10, 45, and 90. Higher minimum effective population size implied stricter control of inbreeding. OCS was implemented with the R package optiSel [17]. All RS-A scenarios were run for 50 breeding cycles and replicated across 10 simulations. See Supplementary File 1 for custom optiSel functions used in the study, and see Supplementary File 3 for the core script used to run the RS-A simulations.

Fig. 2
figure 2

Overview of the RS-A scenario factors. Shaded boxes indicate factors and unshaded boxes indicate levels of factors. Solid lines connecting shaded boxes indicate that all combinations of factor levels were tested, while solid lines connecting unshaded factor levels to shaded factors indicate the subsequent shaded factors only apply to the connected factor level

RS-AY scenarios

For the RS-AY scenarios, with selection on an additive, year, and additive x year trait and multiple cohorts per cycle, a modification of the general breeding scheme of the Conventional Program described in Gaynor et al., 2017, was used [18]. As in the RS-A scenarios, 100 segregating sites per chromosome were assigned additive effects, and 50 sites per chromosome were genotyped by a simulated SNP-chip; all other sites were neutral and ungenotyped. To start each simulation replicate, 100 individuals were drawn from the founder population. Starting mean genetic value was 0, and genetic variance was 1. Supplementary File 4 contains the script used to start the RS-AY scenarios, and Supplementary File 5 contains a script to store the year effects. Phenotypes in subsequent stages were simulated using a custom R script according to the assumptions of a compound symmetry model, i.e. uniform genetic variances and covariances across environments (Supplementary File 4). Compound symmetry was assumed rather than the default AlphaSimR Finlay-Wilkinson model because compound symmetry allowed perhaps more intuitive tracking of year and additive x year effects. The Finlay-Wilkinson additive x year effect of a genotype is considered the product of the mean value of genotypes in a given year and a genotype-specific slope, where the genotype-specific slope is essentially another additive trait value. Therefore, error bias due to additive x year and year effects under the Finlay-Wilkinson model would be tracked as an overall genotype x year error bias. This is acceptable, but it is less clear to see the separate impacts of bias due to phenotyping in a good year and bias due to a genotype’s comparative advantage in a given year. Year effects were drawn from a normal distribution with mean 0 and variance 0.2. Additive x year effects for each site were drawn from a normal distribution with mean 0 and variance scaled to achieve the targeted total additive x year variance of 0.2. As such, the variance of the distribution from which the additive x year effects were drawn was the variance of the additive marker effects (\({\sigma }_{a}^{2}\)) times the targeted additive x year variance (\({\sigma }_{ay}^{2}\)) of 0.2 divided by the genetic variance of 1 (\({\sigma }_{G}^{2}\)) in the base population, or \(\frac{{\sigma }_{a}^{2}* {\sigma }_{ay}^{2}}{{\sigma }_{G}^{2}}\). Plot error effects were drawn from a normal distribution with mean 0 and variance scaled to achieve variable broad-sense heritabilities (H2) at each stage in the breeding cycle. Phenotypes were the sum of the additive, year, additive x year, and plot error effects.

For the RS-AY scenario, 30 selected parents entered the breeding pipeline at stage 1 and were crossed randomly into 100 biparental crosses with 97 progeny each. In stage 2, doubled haploid lines were produced from each of the year 1 progeny. In stage 3, the doubled haploid lines were phenotyped in headrows at initial H2 = 0.1 from which 500 individuals are advanced. In stage 4, the 500 individuals advanced from stage 3 were then phenotyped at initial H2 = 0.2 in a preliminary yield trial, and 50 individuals were advanced. In stage 5, the 50 individuals advanced from stage 4 entered an advanced yield trial with phenotyping at initial H2 = 0.5, from which 10 individuals were advanced. In stage 6, the 10 individuals advanced from stage 5 were phenotyped in an elite yield trial at initial H2 = 0.8, and all individuals were advanced. In stage 7, all individuals from stage 6 were reevaluated in the second year of the elite yield trial at initial H2 = 0.8. In stage 8, a single variety was chosen from the varietal means of the elite yield trials. In RS-AY scenarios with discrete generations, the 20 top-ranked individuals from stage 4 and all individuals from stage 5 of the most recent cycle were selected as parents (modified from the scheme in Gaynor et al., 2017, which implicitly allowed overlapping generations) [18]. In scenarios with overlapping generations, the 20 top-ranked individuals from stage 4 and the 10 top-ranked individuals from stage 5 were selected as parents from all cycles conducted in the breeding program. In the genomic selection scenarios, all records from stages 4–7 from all cycles conducted in the breeding program comprised the training set, regardless of whether generations were overlapping or discrete. Each stage was assumed to take one year. The breeding program was run for 40 years. The scripts to run each RS-AY scenario are located in Supplementary Files 69.

Responses and statistical analysis

For each parent selection scenario in RS-A, mean genetic value was always recorded in the current generation of individuals in a given cycle to examine the genetic trend due to selection. For RS-AY, mean genetic value was recorded in the current generation of parents in a given year. For both situations, selection error bias, mean genomic inbreeding, selection accuracy, and average parental age were also recorded in the selected parents of the current generation only. Selection error bias per cycle was the ratio of absolute error in the selected parents to absolute error in all selection candidates, where error was the deviation of the phenotype or GEBV from the true breeding value. For RS-AY, selection error bias was decomposed into component error due to year, additive x year, and plot error. The ratio of each absolute component error in the selected parents to absolute component error in all selection candidates was the selection error bias for the component. Mean genomic inbreeding per cycle was the average probability of allelic identity-by-descent between pairs of individuals, where identity-by-descent was tracked directly via the setTrackRec() function and pullIbdHaplo() function rather than estimated. Mean genomic inbreeding was estimated using all of the 1000 segregating sites per chromosome. Selection accuracy was Pearson’s correlation of GEBV or phenotype and the simulated true breeding value. By definition, selection accuracy was one for scenarios with selection on true breeding value. See Supplementary File 10, Table S1 and Supplementary File 10, Table S2 for the raw response variables from each simulation replicate and cycle (for RS-A) or year (for RS-AY).

To test for differences in responses by scenario for RS-A, time points representing the short-term, medium-term, and long-term were chosen as cycle 5, cycle 25, and cycle 45 respectively. For RS-AY, differences in responses were only interrogated at the terminal year 40. The RS-A and RS-AY scenarios were considered separate experiments. The RS-AY experiment was conceived subsequently to RS-A in order to explore additional sources of selection error bias (i.e. year and genotype x year effects).

For each time point, and for all responses studied except mean parental age and year error bias, the following linear model was constructed with the R package nlme:

Yij = µ + Si + Rj(i) + εij.

Yij was the response of interest for the ith scenario and the jth simulation replicate, µ was the grand mean, Si was the fixed effect of the ith scenario, Rj(i) was the random effect of the jth simulation nested in the ith scenario with N(0, σj(i)2), and εij was the random residual error with N(0, Rσε2) where σε2 was the error variance, and R was a matrix whose diagonal was a weighting factor used to model unique error variances for each scenario [19]. Differences in means by scenario were tested by the anova.lme function in nlme [19]. Pre-planned contrasts of differences in responses by scenario were made at α = 0.05 with the pairs function in the R packages emmeans for the discrete vs. overlapping variations of otherwise identical scenarios [20, 21]. Contrasts for OCS at Ne = 10 were not possible in the long term because the optimization of GEBV and mean genomic inbreeding ceased to solve around cycle 35.

Because mean parental age in the selected individuals was uniformly one with no variance in the RS-A discrete scenarios, Student’s t test was conducted with the t.test function in R to test whether mean parental age at each timepoint significantly differed from µ = 1 for each overlapping scenario at α = 0.05 subject to Bonferroni correction given the number of tests in the family. Because mean parental age for the RS-AY discrete scenarios was uniformly 3.67, Student’s t test was conducted as above to test whether mean parental age significantly differed from µ = 3.67. Similarly, because year error bias was uniformly one with no variance in the RS-AY discrete scenarios, Student’s t test was used to examine whether mean year error bias significantly differed from µ = 1 for the RS-AY overlapping scenarios at α = 0.05 subject to Bonferroni correction given the number of tests in the family. (Year error bias was 1 in the discrete scenarios because all candidates were evaluated in the same year and therefore had the same year value.)

Results

Genetic trends

In the RS-A case, significant differences in mean genetic value by scenario were observed (see Supplementary File 11, Table S3). In terms of mean genetic value, unreplicated discrete phenotypic selection outperformed unreplicated overlapping phenotypic selection in the long term for all heritabilities, and in the medium term if h2 = 0.1 or 0.5 (Fig. 3; see Supplementary File 12, Table S4). Performance of unreplicated discrete and overlapping phenotypic selection did not significantly differ in the short term (Fig. 3; see Supplementary File 12, Table S4). If phenotyping was replicated three times, then discrete phenotypic selection outperformed overlapping in the long and medium term if h2 = 0.1 or 0.5, and in the short term if h2 = 0.1 only (Fig. 3; see Supplementary File 12, Table S4). In contrast, if true breeding value was used for selection, then mean genetic value of discrete vs. overlapped selection did not differ significantly at any timepoint.

Discrete and overlapping generations appeared to perform similarly with genomic selection in the RS-A scenarios (Fig. 3; see Supplementary File 12, Table S4 and Supplementary File 13, Figure S1). The exceptions were that overlapping generations always outperformed discrete generations with OCS at Ne = 100 and h2 = 0.5 or 0.9 regardless of training set used, and in the long term discrete generations outperformed overlapping with OCS at Ne = 45 and h2 = 0.9 with training on the previous five generations (see Supplementary File 12, Table S4 and Supplementary File 13, Figure S1). Also, in the short term, overlapping generations outperformed discrete with OCS at Ne = 100 at h2 = 0.5 or 0.9 with training on the previous five generations as well as training on all generations (see Supplementary File 12, Table S4 and Supplementary File 13, Figure S1).

In the RS-AY case, significant differences in mean genetic value by scenario were observed at year 40 (see Supplementary File 11, Table S3). Discrete genomic selection outperformed overlapping genomic selection, and discrete phenotypic selection outperformed overlapping phenotypic selection (Fig. 4; see Supplementary File 12, Table S5).

Fig. 3
figure 3

Mean genetic value for selected RS-A scenarios. Mean genetic value per cycle for the RS-A scenarios of phenotypic selection, thrice-replicated phenotypic selection, genomic truncation selection with all generations used in the training set (allGen truncation), and selection on true breeding value. Values are surrounded by the 95% confidence interval of the cycle mean

Fig. 4
figure 4

Mean genetic value for RS-AY scenarios. Mean genetic value per cycle for the RS-AY scenarios of phenotypic selection and genomic selection surrounded by the 95% confidence interval of the cycle mean

Selection error bias

For the RS-A cases, significant differences in mean selection error bias by scenario were observed (see Supplementary File 11, Table S3). For unreplicated phenotypic selection, selection error bias was always higher in overlapping selection scenarios, except in the short- and medium-term for h2 = 0.9 (Fig. 5; see Supplementary File 12, Table S4). Notably, this pattern mirrors the observed trend in mean genetic value. If phenotyping was replicated three times, selection error bias remained higher in overlapping generations in the same scenarios as unreplicated phenotypic selection (Fig. 5; see Supplementary File 12, Table S4). With selection on true breeding value, by definition selection error bias did not differ between overlapping and discrete generations, as error for all candidates was zero (Fig. 5). For genomic truncation selection, selection error bias also did not differ between overlapping and discrete scenarios at any point if the training set was composed of all generations (Fig. 5; see Supplementary File 12, Table S4). However, if the training set was composed of the previous five generations, then selection error bias in overlapping scenarios was significantly higher than discrete in the long-term with genomic truncation selection (see Supplementary File 12, Table S4, and Supplementary File 14, Figure S2).

Fig. 5
figure 5

Selection error bias for selected RS-A scenarios. Selection error bias per cycle for the RS-A scenarios of phenotypic selection, thrice-replicated phenotypic selection, genomic truncation selection with all generations used in the training set (allGen truncation), and selection on true breeding value. Values are surrounded by the 95% confidence interval of the cycle mean

For genomic OCS with the training set composed of all generations, discrete and overlapping selection error bias did not significantly differ except in the short and medium term if Ne = 100 and h2 = 0.5 or 0.9, in which case overlapping selection error bias was significantly higher (see Supplementary File 12, Table S4 and Supplementary File 14, Figure S2). If the training set was composed of the previous five generations, then in the short term selection error bias did not significantly differ except at Ne = 100 for h2 = 0.5 or 0.9, in which case overlapping selection had a higher selection error bias (see Supplementary File 12, Table S4 and Supplementary File 14, Figure S2). In the medium and long term with training on the previous five generations, discrete always had higher selection error bias than overlapping (see Supplementary File 12, Table S4 and Supplementary File 14, Figure S2).

For the RS-AY cases, significant differences in mean selection error bias by scenario were observed (see Supplementary File 11, Table S3). Discrete phenotypic selection had significantly lower selection error bias than overlapping phenotypic selection, but no significant difference was observed for discrete vs. overlapping genomic selection (Fig. 6; see Supplementary File 12, Table S5). Significant differences in additive x year error bias and plot error bias were also observed (Fig. 6; see Supplementary File 11, Table S3). Discrete phenotypic selection had significantly lower additive x year error bias than overlapping phenotypic selection, but no significant difference was observed for discrete vs. overlapping genomic selection (Fig. 6; see Supplementary File 12, Table S5). On the other hand, plot error bias was significantly lower for discrete vs. overlapping phenotypic selection and discrete vs. overlapping genomic selection (Fig. 6; see Supplementary File 12, Table S5). Year error bias significantly differed from 1 with overlapping phenotypic selection, but did not significantly differ from 1 with overlapping genomic selection (Fig. 6; see Supplementary File 13, Figure S1).

Fig. 6
figure 6

Selection error bias for RS-AY scenarios. Selection error bias per cycle for the RS-AY scenarios of phenotypic selection and genomic selection surrounded by the 95% confidence interval of the cycle mean. Overall selection error bias is show as well as error bias due to year, additive x year, and plot error

Mean genomic inbreeding

Significant differences in mean genomic inbreeding by scenario were observed in the RS-A cases (see Supplementary File 11, Table S3). For unreplicated and thrice-replicated phenotypic selection, mean genomic inbreeding was significantly higher with discrete selection at h2 = 0.1 at all time points but did not significantly differ for other heritabilites (see Supplementary File 12, Table S4; Supplementary File 15, Figure S3; Supplementary File 16, Figure S4). Mean genomic inbreeding did not significantly differ with selection on true breeding value (see Supplementary File 12, Table S4 and Supplementary File 17, Figure S5). For genomic truncation selection with training on all generations, no significant differences in mean inbreeding were observed between discrete and overlapping scenarios except in the long term at h2 = 0.9, for which overlapping generations led to higher inbreeding than discrete (see Supplementary File 12, Table S4 and Supplementary File 18, Figure S6). With training on the previous five generations, overlapping truncation genomic selection led to higher inbreeding in the short term at h2 = 0.1 and in the medium term at h2 = 0.5 and 0.9, but with no significant differences in the long term (see Supplementary File 12, Table S4 and Supplementary File 19, Figure S7).

With genomic OCS, discrete selection sometimes led to higher inbreeding than overlapping selection despite optimization of the inbreeding rate. With training on all generations, this occurred for h2 = 0.1 in the medium term for Ne = 10 and the short and medium terms for Ne = 45, but did not occur for Ne = 100 (see Supplementary File 12, Table S4; Supplementary File 20, Figure S8; Supplementary File 21, Figure S9; Supplementary File 22, Figure S10). For h2 = 0.5, this occurred in the medium term for Ne = 10, and the medium and long term for Ne = 45 (see Supplementary File 12, Table S4; Supplementary File 20, Figure S8; Supplementary File 21, Figure S9). However, in the short and medium term at h2 = 0.5 with training on all generations, overlapping led to higher inbreeding than discrete at Ne = 100 (see Supplementary File 12, Table S4 and Supplementary File 22, Figure S10). For h2 = 0.9, discrete selection led to higher inbreeding in the short and medium term at Ne = 10, the medium term at Ne = 45, and the short term only at Ne = 100 (see Supplementary File 12, Table S4; Supplementary File 20, Figure S8; Supplementary File 21, Figure S9; Supplementary File 22, Figure S10).

With genomic OCS and training on the previous five generations, discrete selection led to higher rates of inbreeding in the medium and long term at h2 = 0.1 for all levels of Ne, and additionally in the short term for Ne = 45 (see Supplementary File 12, Table S4; Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13). At h2 = 0.5, discrete selection again led to higher inbreeding in the short term if Ne = 45 and the medium term for Ne = 45 and 100 only (see Supplementary File 12, Table S4; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13). At h2 = 0.9, discrete selection led to higher inbreeding rates in the short term for Ne = 10, lower inbreeding rates in the short term if Ne = 100, higher inbreeding rates in the medium and long term for Ne = 45, and higher inbreeding rates in the short and long term for Ne = 100 (see Supplementary File 12, Table S4; Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13).

With RS-AY, significant differences in mean genomic inbreeding by scenario were also present at year 40 (see Supplementary File 11, Table S3). Discrete phenotypic selection led to significantly higher inbreeding than overlapping phenotypic selection, and discrete genomic selection also led to significantly higher inbreeding than overlapping genomic selection (see Supplementary File 12, Table S5 and Supplementary File 26, Figure S14).

Genetic variance

Significant differences in mean genetic variance by scenario were observed in RS-A (see Supplementary File 11, Table S3). For unreplicated phenotypic selection, significant differences in genetic variance in the current generation were only observed at h2 = 0.1 in the medium and long term, with overlapping selection maintaining higher genetic variance (see Supplementary File 12, Table S4 and Supplementary File 15, Figure S3). For replicated phenotypic selection, genetic variance was significantly lower with overlapped selection only in the long term at h2 = 0.1 (see Supplementary File 12, Table S4 and Supplementary File 16, Figure S4). No significant differences in genetic variance were observed for selection on true breeding value (see Supplementary File 12, Table S4 and Supplementary File 17, Figure S5). For genomic truncation selection, no significant differences in genetic variance were observed regardless of training set or heritability (see Supplementary File 12, Table S4; Supplementary File 18, Figure S6; Supplementary File 19, Figure S7).

For genomic OCS, no significant differences in genetic variance were observed if all generations were used in the training set (see Supplementary File 12, Table S4; Supplementary File 20, Figure S8; Supplementary File 21, Figure S9; Supplementary File 22, Figure S10). If the previous five generations were used in the training set, then at all heritabilities overlapping selection maintained greater genetic variance than discrete in the medium term if Ne = 100 only, while if Ne = 45 overlapping had higher genetic variance only if h2 = 0.5 or 0.9 (see Supplementary File 12, Table S4; Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13). In the long term, overlapping selection maintained greater genetic variance if Ne = 45 at h2 = 0.1 or 0.9, and if Ne = 100 at all heritabilities (see Supplementary File 12, Table S4; Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13).

For the RS-AY scenarios, significant differences in genetic variance were observed among scenarios (see Supplementary File 11, Table S3). Discrete genomic selection had significantly higher genetic variance than overlapping genomic selection, whereas discrete phenotypic selection led to significantly lower genetic variance than overlapping phenotypic selection (see Supplementary File 12, Table S5 and Supplementary File 26, Figure S14).

Selection accuracy

Significant differences in mean selection accuracy by scenario were observed in the RS-A cases (see Supplementary File 11, Table S3). Selection accuracy, as measured in the selected parents of the current generation per cycle, did not significantly differ between overlapping and discrete generations with replicated or unreplicated phenotypic selection (see Supplementary File 12, Table S4; Supplementary File 15, Figure S3; Supplementary File 16, Figure S4). For selection on true breeding value, selection accuracy was by definition 1 for both discrete and overlapping generations. For genomic truncation selection, no differences in accuracy among overlapping and discrete generations were observed regardless of training set (see Supplementary 12, Table S4; Supplementary File 18, Figure S6; Supplementary File 19, Figure S7).

In genomic OCS, with the training set composed of all generations, selection accuracy was higher for overlapping generations in the short term if Ne = 100 and h2 = 0.5 (see Supplementary File 12, Table S4 and Supplementary File 22, Figure S10). Overlapping generations also had higher accuracies in the medium term if h2 = 0.5 and Ne = 45. (see Supplementary File 12, Table S4 and Supplementary File 21, Figure S9). No significant differences were observed in the long term for OCS with training on all generations (see Supplementary File 20, Figure S8; Supplementary File 21, Figure S9; Supplementary File 22, Figure S10). In genomic OCS with training on the previous five generations only, overlapping selection had higher selection accuracy in the short term only if h2 = 0.5 or 0.9 and Ne = 100 (see Supplementary File 12, Table S4 and Supplementary File 25, Figure S13). In the medium term, overlapping selection had higher accuracies at all levels of Ne for h2 = 0.1, but only at Ne = 45 or 100 for h2 = 0.5 or 0.9 (see Supplementary File 12, Table S4; Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13). In the long term, overlapping selection had higher accuracies at all levels of h2 and Ne observed with OCS and training on the previous five generations (see Supplementary File 12, Table S4; Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13).

In the RS-AY cases, significant differences in mean selection accuracy were observed by scenario (see Supplementary File 11, Table S3). Discrete phenotypic selection produced higher selection accuracy than overlapping phenotypic selection, and discrete genomic selection produced higher selection accuracy than overlapping genomic selection (see Supplementary File 12, Table S5 and Supplementary File 26, Figure S14).

Mean parental age

By definition, the age of the selected parents under discrete generations was always one in the RS-A scenarios. Both thrice-replicated and unreplicated overlapping phenotypic truncation selection always resulted in mean parental age significantly greater than 1 for overlapping relative to discrete generations (see Supplementary File 15, Figure S3; Supplementary File 16, Figure S4; Supplementary File 28, Table S8). Interestingly, selection on true breeding value always resulted in mean parental age significantly greater than 1 with overlapping generations in the medium and long term (see Supplementary File 17, Figure S5 and Supplementary File 28, Table S8). With genomic truncation selection and training on all generations, mean parental age was always higher with overlapping generations (see Supplementary File 18, Figure S6 and Supplementary File 28, Table S8). With truncation selection and training on the previous five generations, overlapping generations had significantly higher mean parental age except in the medium term at h2 = 0.1 (see Supplementary File 19, Figure S7 and Supplementary File 28, Table S8). With genomic OCS and training on all generations, mean parental age in overlapping scenarios was not significantly different from discrete at Ne = 10 in the medium term only, but was significantly higher in the short and long terms (see Supplementary File 20, Figure S8; Supplementary File 21, Figure S9; Supplementary File 22, Figure S10; Supplementary File 28, Table S8). Mean parental age was always signficantly higher than discrete for Ne= 45 and 100 with genomic OCS and training on all generations (see Supplementary File 21, Figure S9; Supplementary File 22, Figure S10; Supplementary File 28, Table S8). With genomic OCS and training on the previous five generations, mean parental age did not significantly differ between overlapping and discrete generations if Ne = 10 in the short term (see Supplementary File 23, Figure S11 and Supplementary File 28, Table S8). However, at all other timepoints and levels of Ne overlapping selection led to significantly higher mean parental age than discrete (see Supplementary File 23, Figure S11; Supplementary File 24, Figure S12; Supplementary File 25, Figure S13; Supplementary File 28, Table S8).

In the RS-AY scenarios, mean parental age was 3.67 years under discrete selection. For the overlapping scenarios, mean parental age was significantly greater than 3.67 years with both phenotypic and genomic selection (see Supplementary File 26, Figure S14 and Supplementary File 27, Table S7).

Discussion

The possibility of allowing generations to overlap in recurrent selection is not often considered. Although recycling a preferred parent across generations is common in applied breeding programs, nonpreferred individuals are generally discarded permanently. Here, the underlying theoretical basis for practicing discrete as opposed to overlapping recurrent phenotypic selection is demonstrated. Mean magnitude of error in selected individuals is larger than mean magnitude of error in the overall population, creating selection error bias. Over breeding cycles, selection error bias causes the magnitude of selection error to increase in phenotypically selected populations with overlapping generations. This propagation of selection error results in decreased genetic gain, whereas with discrete phenotypic selection the population recovers each cycle because the magnitude of the deviation of observed phenotypic value from true breeding value remains random in the selected individuals. Maintaining discrete generations in phenotypic selection prevents making the “same old mistakes” of selecting individuals erroneously believed to be exceptional repeatedly across cycles.

Notably, at higher heritabilities, the propagation of error takes more cycles to affect gain because the phenotypes of selected individuals deviate less from their true breeding value compared to lower heritabilities. Discrete generations still outperformed overlapping generations if phenotypic observations were replicated three times, though the relative outperformance was slightly less than without replication as phenotypic value deviated less from true breeding value. However, with selection on true breeding value, no differences in mean genetic value were observed between discrete and overlapping generations, as is expected in absence of selection error.

The propagation of error under overlapping phenotypic selection can be thought of as failure to observe regression to a mean when individuals are not adequately evaluated; phenotypes at the tails of a distribution, far from the mean, are on average more likely to have larger magnitudes of error (Fig. 7). In breeding for population improvement, individuals in the upper tail of the phenotypic distribution—and outliers beyond the upper tail of the distribution— are inherently of interest. Many phenotypes are in the tails of the distribution due to error. In selection from discrete generations the total number of outliers is small, whereas in selection from overlapping generations the total number of outliers grows as breeding cycles are completed and total number of selection candidates grows. Thus, the number of highly erroneous phenotypes selected as parents is limited under discrete selection, and this restriction causes discrete phenotypic selection to outperform overlapping phenotypic selection, particularly at low heritabilities.

Fig. 7
figure 7

Selection error bias illustration. Phenotypic values, true breeding values, and errors of selected and unselected individual candidates at h2 = 0.1 in the first cycle of overlapping phenotypic selection for the RS-A pipeline. The magnitude of error is greater at the tails of the phenotypic values, including the upper tail from which individuals are selected

The effect of overlapping vs. discrete generations in genomic truncation selection has not been previously evaluated to the authors’ knowledge. Mean genetic value does not significantly differ in discrete and overlapping genomic truncation selection, in contrast to phenotypic selection. Addition of new data to the model with each generation of genomic selection eliminates the problem of error propagation observed in phenotypic selection, as estimates of breeding value are improved by replicated observations of allele-phenotype combinations (which is synonymous with observations of more relatives). Though we hypothesized that overlapping generations might lead to more genetic gain than discrete as accuracy of GEBVs increased in older individuals with phenotyping of progeny, this was not the case due to the positive genetic trend from selection [22]. In other words, older individuals tended to have lower true breeding values than younger individuals in the presence of effective selection, so any increase in accuracy did not result in selection of older individuals due to their truly lower values. Generally, the mean parental age did not substantially increase in overlapping genomic truncation selection compared to discrete (although the small increase observed was significant), indicating that parents with the best GEBVs were usually from the most recent generation or most recently past generations.

It is perhaps worth clarifying that this study does not directly explore the optimal generation interval or optimal introgression of older material. Although we observed that re-use of old, inadequately evaluated material as parents decreased genetic gain, this does not imply that use of old materials always decreases genetic gain (although this would certainly be expected in most cases). For example, with use of true breeding values and allowance of overlapping generations, the observed mean parental age was slightly greater than one. This means that parents from on average the most recently past breeding cycle were truly competitive with candidates from the current breeding cycle. However, our study does not fully explore variables (e.g. selection intensity and genetic variance) which affect this observation. Our findings are more indicative of the consequences of inadequate evaluation than the consequences of selecting old material, because if the old material in our study had been adequately evaluated, it would never have been selected.

Because we observed in previous simulations that overlapping truncation selection underperformed discrete selection at high heritabilities in the long term due to inbreeding, we tested whether controlling genomic inbreeding by OCS led to greater mean genetic values in overlapping than discrete OCS scenarios. It is also well-established that genomic selection requires genomic control of inbreeding for maximal long-term gain, and at times genomic control of inbreeding can increase short-term gain relative to truncation selection [23,24,25,26]. However, we did not generally observe that overlapping selection outperformed discrete selection in OCS scenarios except at relatively high effective population size and high heritability. Interestingly, there is an explicit penalty to use of individuals from past generations in OCS due not to their genetic values but rather their addition to the rate of inbreeding [25]. If overlapping generations are allowed, control of inbreeding generally results from increasing the number of parents selected and not from increasing the generation interval in canonical OCS [22]. Thus, in contrast to genomic truncation selection, the relatively similar performance of overlapping and discrete OCS is likely due to the control of inbreeding as well as balance of gain per cycle and increased selection accuracy per cycle. With OCS at high Ne and h2 = 0.5 or 0.9, overlapping generations always had higher mean genetic values than discrete. This may indicate that overlapping generations allow more flexibility than discrete in balancing increases in inbreeding and genetic gain when inbreeding was more strictly constrained, as more individuals with more combinations of genetic value and relatedness were available to meet the constraints imposed. This is in agreement with the observation of Villanueva et al. (2000) that the optimal generation interval was higher with more stringent restrictions on inbreeding, as well as use of fewer parents [22].

As demonstrated in the RS-AY scenarios, error can propagate from any source with overlapping phenotypic selection— year error, genotype x year interaction error, or random plot error. Because we simulated greater plot error variance than year or genotype x year variance in stages from which parents were selected, we observed relatively more selection error bias due to plot error than other sources with overlapping phenotypic selection. Increasing the variance of the year or genotype x year values would likely increase their relative contributions to overall selection error; in applied breeding programs, the relative contribution of each source of error depends on the program. Additionally, we expect that selection error bias is not specific to plant breeding and can occur in other cyclical systems in which repeated selection occurs in the presence of random observational error.

The propagation of error was not restricted by movement of cohorts through advancement stages alone in the RS-AY scenario; restriction of propagation of error was accomplished by use of a statistical method to estimate breeding value. In the RS-AY scenarios, we only tested use of RR-BLUP to estimate breeding value, which is equivalent to genomic best linear unbiased prediction. We expect that methods which use other relationship matrices for the random genotypic effect, such as pedigree BLUP, should also restrict propagation of error. Even BLUP with an identity relationship matrix for the random genotypic effects and unreplicated phenotypes should restrict propagation of error, since unreplicated phenotypes would be shrunken to the mean by the heritability. This highlights the general utility of BLUPs in preventing selection error bias.

To build on the conclusions of this study, it would be useful to test relative performance of overlapping and discrete generations under different genomic selection schemes, such as the modified reciprocal recurrent selection practiced in commercial hybrid breeding programs. Testing non-additive genetic architectures may also be relevant. Though speculative, it would also be interesting to test discrete and overlapping generations with multi-trait genomic selection. We hypothesize that in cases where multiple objectives are to be optimized (e.g. multiple phenotypic traits with different trait architectures), overlapping generations may provide more combinations of traits within genomic selection candidates and increase multi-trait gain.

Conclusion

Based on the trends observed, generations should be kept discrete under recurrent mass phenotypic selection to avoid decreased genetic gain due to selection error bias. With genomic truncation selection, we observed no advantage to allowing overlapping generations under the assumptions used, though with genomic OCS it appeared the overlapping generations allowed more effective control of inbreeding than discrete generations at high effective population sizes with low targeted inbreeding rates.

Data Availability

All data generated or analysed during this study are included in this published article and its supplementary information files.

References

  1. Hallauer AR, Darrah LL. Compendium of recurrent selection methods and their application. CRC Crit Rev Plant Sci. 1985;3:1–33.

    Article  Google Scholar 

  2. Harlan JR, De Wet JMJ, Price EG. Comparative evolution of cereals. Evolution. 1973;27:311–25.

    Article  PubMed  Google Scholar 

  3. Duvick DN. Plant breeding, an evolutionary concept. Crop Sci. 1996;36:539–48.

    Article  Google Scholar 

  4. Lewers KS, Palmer RG. (2010). Recurrent selection in soybean. Plant Breed Rev, 275–313.

  5. Rutkoski JE. A practical guide to genetic gain. Adv Agron. 2019;157:217–49.

    Article  Google Scholar 

  6. Zhang L, Richards RA, Condon AG, Liu DC, Rebetzke GJ. Recurrent selection for wider seedling leaves increases early biomass and leaf area in wheat (Triticum aestivum L.). J Exp Bot. 2015;66(5):1215–26.

    Article  PubMed  CAS  Google Scholar 

  7. Ceballos H, Morante N, Sanchez T, Ortiz D, Aragon I, Chávez AL, … Dufour D. Rapid cycling recurrent selection for increased carotenoids content in cassava roots. Crop Sci. 2013;53(6):2342–51.

    Article  CAS  Google Scholar 

  8. Eberhart SA. (1970). Factors effecting efficiencies of breeding methods. African soils. 1970;15:655–680.

  9. Dudley JW. From means to QTL: The Illinois long-term selection experiment as a case study in quantitative genetics. Crop Sci. 2007;47:1–20.

    Article  Google Scholar 

  10. Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T, Iwata H, et al. Genomic selection in plant breeding: knowledge and prospects. Adv Agron. 2011;110:77–123.

    Article  Google Scholar 

  11. Goddard ME, Hayes BJ. (2007). Genomic selection. J Anim Breed Genet. 2007;124:323–330.

  12. Heffner EL, Sorrells ME, Jannink JL. (2009). Genomic selection for crop improvement. Crop Sci. 2009;49:1–12.

  13. Jannink JL, Lorenz AJ, Iwata H. (2010). Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics. 2010;9:166–177.

  14. Heslot N, Jannink JL, Sorrells ME. (2015). Perspectives for genomic selection applications and research in plants. Crop Sci. 2015;55:1–12.

  15. Gaynor RC, Gorjanc G, Hickey JM. (2021). AlphaSimR: an R package for breeding program simulations. G3, 11(2), jkaa017.

  16. Gaynor RC. (2021). Traits in AlphaSimR. https://cran.r-project.org/web/packages/AlphaSimR/vignettes/traits.pdf.

  17. Wellmann R. Optimum contribution selection for animal breeding and conservation: the R package optiSel. BMC Bioinformatics. 2019;20:1–13.

    Article  Google Scholar 

  18. Gaynor RC, Gorjanc G, Bentley AR, Ober ES, Howell P, Jackson R, et al. A two-part strategy for using genomic selection to develop inbred lines. Crop Sci. 2017;57:2372–86.

    Article  Google Scholar 

  19. Pinheiro J, Bates D, DebRoy S, Sarkar D. (2017). nlme: linear and nonlinear mixed effects models. R Core Team. R package version 3.1–131.

  20. Lenth R, Singmann H, Love J, Buerkner P, Herve M. Emmeans: Estimated marginal means, aka least-squares means. R Core Team R package version. 2018;1(1):3.

    Google Scholar 

  21. Hothorn T, Bretz F, Westfall P, Heiberger RM, Schuetzenmeister A, Scheibe S, et al. (2016). Package ‘multcomp’. Simultaneous inference in general parametric models. Project for Statistical Computing, Vienna, Austria.

  22. Villanueva B, Bijma P, Woolliams JA. Optimal mass selection policies for schemes with overlapping generations and restricted inbreeding. Genet Sel Evol. 2000;32:1–17.

    Article  Google Scholar 

  23. Meuwissen THE. Maximizing the response of selection with a predefined rate of inbreeding. J Anim Sci. 1997;75:934–40.

    Article  PubMed  CAS  Google Scholar 

  24. Jannink JL. Dynamics of long-term genomic selection. Genet Sel Evol. 2010;42:1–35.

    Article  Google Scholar 

  25. Meuwissen THE, Sonesson AK. Maximizing the response of selection with a predefined rate of inbreeding: overlapping generations. J Anim Sci. 1998;76:2575–83.

    Article  PubMed  CAS  Google Scholar 

  26. Woolliams JA, Berg P, Dagnachew BS, Meuwissen THE. (2015). Genetic contributions and their optimization. J Anim Breed Genet. 2015;132:89–99.

Download references

Acknowledgements

We thank Anthony J. Studer for advising and supporting ML throughout the course of this work. We thank Stephen P. Moose, Daniel Davidson, and David Slater for providing computational resources which enabled the study. We thank R. Chris Gaynor for developing AlphaSimR and providing code to model compound symmetry.

Funding

This work was supported by the Jonathan Baldwin Turner fellowship of the University of Illinois College of ACES and the Crop Sciences Department.

Author information

Authors and Affiliations

Authors

Contributions

ML executed the study, wrote code for analysis, interpreted data, and drafted the manuscript. JR discovered selection error bias, wrote code for analysis, interpreted data, and edited the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Jessica E. Rutkoski.

Ethics declarations

Ethics approval

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Labroo, M.R., Rutkoski, J.E. New cycle, same old mistakes? Overlapping vs. discrete generations in long-term recurrent selection. BMC Genomics 23, 736 (2022). https://doi.org/10.1186/s12864-022-08929-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12864-022-08929-3