 Research article
 Open Access
 Published:
Bias in estimates of variance components in populations undergoing genomic selection: a simulation study
BMC Genomics volume 20, Article number: 956 (2019)
Abstract
Background
After the extensive implementation of genomic selection (GS), the choice of the statistical model and data used to estimate variance components (VCs) remains unclear. A primary concern is that VCs estimated from a traditional pedigreebased animal model (PAM) will be biased due to ignoring the impact of GS. The objectives of this study were to examine the effects of GS on estimates of VC in the analysis of different sets of phenotypes and to investigate VC estimation using different methods. Data were simulated to resemble the Danish Jersey population. The simulation included three phases: (1) a historical phase; (2) 20 years of conventional breeding; and (3) 15 years of GS. The three scenarios based on different sets of phenotypes for VC estimation were as follows: (1) Pheno_{1}: phenotypes from only the conventional phase (1–20 years); (2) Pheno_{1 + 2}: phenotypes from both the conventional phase and GS phase (1–35 years); (3) Pheno_{2}: phenotypes from only the GS phase (21–35 years). Singlestep genomic BLUP (ssGBLUP), a singlestep Bayesian regression model (ssBR), and PAM were applied. Two base populations were defined: the first was the founder population referred to by the pedigreebased relationship (Pbase); the second was the base population referred to by the current genotyped population (Gbase).
Results
In general, both the ssGBLUP and ssBR models with all the phenotypic and genotypic information (Pheno_{1 + 2}) yielded biased estimates of additive genetic variance compared to the Pbase model. When the phenotypes from the conventional breeding phase were excluded (Pheno_{2}), PAM led to underestimation of the genetic variance of Pbase. Compared to the VCs of Gbase, when phenotypes from the conventional breeding phase (Pheno_{2}) were ignored, the ssBR model yielded unbiased estimates of the total genetic variance and markerbased genetic variance, whereas the residual variance was overestimated.
Conclusions
The results show that neither of the singlestep models (ssGBLUP and ssBR) can precisely estimate the VCs for populations undergoing GS. Overall, the best solution for obtaining unbiased estimates of VCs is to use PAM with phenotypes from the conventional phase or phenotypes from both the conventional and GS phases.
Background
In animal breeding, the prediction of breeding values requires accurate and unbiased estimates of variance components (VCs). There are several approaches available for VC estimation depending on the practical conditions (for instance, population structure) [1]. Among these methods, restricted maximum likelihood (REML) [2] is a wellknown method that has been widely applied under animal models to obtain genetic parameters for traits of interest [3,4,5,6,7,8]. Alternatively, Bayesian methods with Markov chain MonteCarlo (MCMC) procedures such as Gibbs sampling can be adopted to obtain marginal inferences for VCs [9,10,11,12].
Before genomic information was available, Sorensen and Kennedy [13] showed that unbiased estimation of VCs in a base population could be obtained by an animal model when all data leading to selection decisions and complete relationships were incorporated, while ignoring data from selected ancestors led to biased estimation due to not accounting for gametic disequilibrium. Moreover, in small populations with few generations, an animal model with pedigree information can account for bias due to inbreeding and the Bulmer effect [14]. However, these results were obtained assuming the infinitesimal model. To date, although there are currently several genomic prediction models available, the choice of statistical model and which data should be used to estimate VCs in the genomics era remain unclear.
The use of genomic selection (GS) has greatly enhanced genetic gains due to improved selection accuracy and shortened generation intervals (e.g., dairy cattle breeding); this has led to faster changes in gene frequencies compared to traditional phenotype and pedigreebased genetic improvement programmes. In such situations, the extra selection acting on genomic information increases the challenges of obtaining unbiased VC estimates. Concerns have been raised about biased VC estimates caused by ignoring genomic information and selective genotyping [15]. Hence, it can be expected that after performing GS for several generations, the estimates of VC from a pedigreebased animal model (PAM) would be biased due to not accounting for the impact of GS.
Singlestep genomic BLUP (ssGBLUP) derived by [16,17,18] has provided a way to predict breeding values using phenotype, pedigree and genomic information simultaneously for both genotyped and nongenotyped individuals via a combined relationship matrix (H). Thus, in contrast to typical genomic BLUP (GBLUP) [19, 20] using phenotypes only from genotyped individuals, a large number of historical phenotypes from nongenotyped individuals can also be used for analyses. Alternatively, a singlestep Bayesian regression model (ssBR) was proposed by [21, 22]. In contrast to ssGBLUP, in which markers for nongenotyped individuals are implicitly imputed, ssBR requires the explicit imputation of the markers for nongenotyped individuals, followed by fitting of the marker effects in the model. Previous studies have reported that ssGBLUP and the original ssBR model perform equally in terms of accuracy of prediction when assuming that VCs are known [23, 24]. However, an attractive feature of ssBR that has been neglected is that ssBR offers separate estimation of marker variance (\( {\sigma}_{\alpha}^2 \)) and total genetic variance (\( {\sigma}_g^2 \), denoted as \( {\sigma}_{\varepsilon}^2 \) in ssBR). Therefore, ssBR could be employed as a variance component model for VC estimation that includes two additive genetic components.
The compatibility between pedigreebased relationships and markerbased relationships is a crucial factor for ensuring unbiased estimation when using singlestep methods [19, 25, 26]. This concern is due to the different base populations to which these two relationships refer. More specifically, the pedigreebased relationship refers to a founder population in which individuals are assumed to be unrelated (denoted as Pbase hereafter). The base population of markerbased relationships can be defined as the population from which the allele frequencies were obtained to compute the relationships (denoted as Gbase hereafter) [27, 28]. Therefore, the VCs estimated using singlestep methods with phenotypes from a population undergoing GS may not be directly comparable with the VCs estimated using pedigreebased methods [29]. The objectives of this study were (1) to examine the effects of genomic selection on estimates of VC based on the scenario of choosing phenotypes from different phases of breeding programme and (2) to investigate the estimation of VCs using different methods in a population undergoing GS.
Results
To determine the effects of GS on the estimation of VCs, we analysed data from three scenarios reflecting no application of GS (Pheno_{1}), application of GS (Pheno_{1 + 2}), and a lack of information from the previous conventional breeding scheme (Pheno_{2}). Table 2 presents the means and standard deviations (SDs) of the estimated VCs and heritabilities over replicates. In general, the use of singlestep methods with all the phenotypes and genotypes (Pheno_{1 + 2}) yielded biased estimates of the total genetic variance of Pbase. Using PAM with all the phenotypes (Pheno_{1 + 2}) produced unbiased VC estimates and heritability of Pbase. When the phenotypes from the conventional breeding phase were excluded (Pheno_{2}), PAM led to the underestimation of genetic variance (P < 0.01) of Pbase. As expected, we obtained unbiased VC estimates and heritability when using PAM with phenotypes only from the conventional phase (Pheno_{1}).
For ssGBLUP, in general, the estimates of genetic variance were all biased (P < 0.005) across all the scenarios (Table 2). In contrast to the VCs from Pbase, when using phenotypes only from the conventional phase (Pheno_{1}), the genetic variance was significantly underestimated, and the residual variance was significantly overestimated. Furthermore, when including phenotypes from the GS phase in the model (Pheno_{1 + 2}), the genetic variance was significantly underestimated, although unbiased estimated residual variance was observed. In contrast to the VCs from Gbase, when ignoring data from the conventional breeding phase (Pheno_{2}), ssGBLUP significantly overestimated genetic variance, although an unbiased estimate of residual variance was obtained.
For ssBR, the convergence of the Gibbs sampler was assessed by estimating Monte Carlo error (MCE) (via batch means). The MCEs for the estimates of the total genetic variance and residual variance were at the level of 10^{− 3}, and for the estimates of marker variances, they were at the level of 10^{− 6}. Two estimated genetic variances are reported (Table 2). In contrast to VCs from Pbase, when using phenotypes only from the conventional phase (Pheno_{1}), unbiased total genetic variance and residual variance were obtained. This led to an unbiased estimate of heritability. However, the markerbased genetic variance was significantly overestimated (P < 0.001) compared to the variance in Gbase, resulting in a biased estimate of heritability. Conversely, when including phenotypes from the GS phase in the model (Pheno_{1 + 2}), the markerbased genetic variance was unbiased, but the total genetic variance was significantly underestimated (P < 0.001). In contrast to the VCs from Gbase, when ignoring data from the previous conventional breeding phase (Pheno_{2}), ssBR yielded unbiased estimates of the total genetic variance and markerbased genetic variance, whereas the residual variance was overestimated.
Discussion
In this study, we addressed the effects of GS on estimates of VCs. Two singlestep methods (ssGBLUP and ssBR), together with the traditional pedigreebased animal model (PAM), were used and compared based on simulated datasets that mimic dairy cattle populations. The first question addressed in this study was aimed at determining the impact of the choice of phenotypes from different phases of a breeding program on the estimation of VCs. The results showed that both the ssGBLUP and ssBR models led to biased VC estimates across all scenarios.
Selection of data to be included in the estimation of genetic variance
We showed that PAM yielded unbiased estimates of VC when including the phenotypes from the GS phase (Pheno_{1 + 2}). We also demonstrated that when using phenotypes only from the conventional selection phase (Pheno_{1}), PAM produced unbiased VC estimates. The scenarios of Pheno_{2}, which reflects current breeding programmes using GS, led to biased estimates of VC by using PAM. This was mainly caused by ignoring the information from the selection decisions. Another possible cause of this bias might be explained by the negative LD between QTLs across the genome [30, 31]. In particular, this negative LD is stronger in a population selected according to GS information than in a population selected based on only the pedigree relationships [the mean (SD) variance of TBV for animals from generation 35 decreased to 2.46 (0.04)].
Reductions in genotyping costs have contributed to the comprehensive implementation of GS in many genetic selection programmes. This offers the opportunity to select for new traits that are difficult to measure or not yet among the breeding goals. In this situation, a new phenotyping strategy might be used, or new phenotypes may be more likely to be collected from populations under intense GS. In the present study, based on the scenario of Pheno_{2} with PAM, our results confirmed the reduction of genetic variance due to ignoring the information from the conventional breeding phase; i.e., there are no phenotypes to account for the selection conducted in the previous period, and previous selection cannot be properly handled in the current model. Therefore, the use of PAM including only the phenotypes from the GS phase (Pheno_{2}) resulted in biased estimates of VC.
Changed base population
With respect to the genomic relationship, an arbitrary base population may exist close to the current population undergoing GS. It can be expected that using such a base population (more recent) would lead to smaller estimates of genetic variance than using Pbase. Therefore, in this study, the Gbase corresponding to the current GS phase was defined separately from Pbase. Consequently, it may be improper to compare VCs estimated in the GS phase (e.g., Pheno2) with the VCs in Pbase, which is generally referred to by the pedigree. This concern was initially raised by Powell et al. [25]; that is, VCs estimated using genetic markers can be erroneous since an inconsistent base population is defined compared to the base population with common founders. A similar study was conducted by Veerkamp et al. [32] based on a small dataset from Holsteins, where VCs for milk yield, dry matter intake, and body weight were estimated with PAM, GBLUP, and ssGBLUP. Their results showed that ssGBLUP produced the most precise estimates of VCs; however, they questioned whether the VCs obtained using genomic information might not be comparable with the VCs obtained using only pedigree information since these relationship matrices might refer to different base populations. Our results confirmed that when phenotypes only from the GS phase (Pheno_{2}) were used, both ssGBLUP and ssBR yielded smaller genetic variance estimates compared with the VCs in Pbase (Table 2).
In the present simulation study, we directly used the allele frequencies calculated from Pbase to avoid the compatibility issue between the G and A_{22} matrices in ssGBLUP [26, 27, 33]. In practice, the allele frequencies of Pbase are unknown, but they can be estimated using the approach proposed by Gengler et al. [34] by regressing the gene contents of ancestors on the genotypes of the progenies.
Two estimates of genetic variance in ssBR
As originally introduced by Fernando et al. [21, 22], the ssBR model is essentially a marker effect model with all markers fitted in the model. Consequently, this feature results in a model with two estimates of additive genetic variances, i.e., the total genetic variance (\( {\sigma}_{\varepsilon}^2 \)) and the markerbased genetic variance, which can be obtained by multiplying \( \sum \limits_{j=1}^m2{p}_j\left(1{p}_j\right) \) by the estimated marker variance (\( {\sigma}_{\alpha}^2 \)). When using phenotypes only from the conventional phase (Pheno_{1}), the estimated total genetic variance was unbiased; however, the markerbased genetic variance was biased upwards, although the allele frequencies from Pbase were used. This result can be explained by the fact that a small proportion (1.2%) of animals (only progenytested bulls) in the pedigree were genotyped, resulting in poor imputation for nongenotyped animals and biased estimation of marker variance.
Apart from the estimated marker variance, the total genetic variance shows a relationship with the conditional variance of the breeding values of nongenotyped individuals (g_{1}) given the breeding values of genotyped individuals (g_{2}), i.e., \( Var\left({\mathbf{g}}_1{\mathbf{g}}_2\right)=\left({\mathbf{A}}_{11}{\mathbf{A}}_{12}{\mathbf{A}}_{22}^{1}{\mathbf{A}}_{21}\right){\sigma}_g^2 \) [16]. With the assumption of multivariate normality, the markers of nongenotyped individuals were imputed via a pedigreebased linear system. A residual vector (ε) accounting for the remaining portion of the breeding values that could not be modelled by the imputed markers was added to the markerbased breeding values to obtain the final g_{1}. The accuracy of this imputation quality depends on the genetic relationships between genotyped and nongenotyped animals; i.e., it is expected that less precise imputation will be achieved for old ancestors than for younger ones. More specifically, the imputation is based on a linear relationship between genotyped and nongenotyped individuals, which may not approximate the distribution of marker genotypes very well [21]. Thus, better methods (e.g., an imputation method based on a peeling algorithm [35]) for imputing the genotypes of nongenotyped individuals conditional on the genotyped individuals would be expected to yield more accurate results. In addition, as pointed out by [21], in the singlestep method, we do not observe g_{2}, but M_{2}; this indicates that the conditioning is on the observed marker information, and the conditional genetic variance estimated in ssBR is therefore actually only an approximation of the genetic variance.
Conclusions
This study contributes to a better understanding of the effects of GS on VC estimation. The results show that neither of the singlestep models (ssGBLUP and ssBR) can precisely estimate the VCs for populations undergoing GS. Furthermore, this study has demonstrated that when the complete data are analysed with both preGS data and data from the GS phase, the classic PAM can yield unbiased estimates of VC. Therefore, an implication of these findings is that the best solution for obtaining unbiased estimates of VC is to use PAM with phenotypes from the conventional phase or phenotypes from both conventional and GS phases.
Methods
Simulation of data
Populations that were similar to the Danish Jersey dairy cattle population in terms of the breeding scheme and population structure were simulated over a 35year period with 5 replicates for each scenario. The simulation was conducted with the following three phases: (1) The historical phase, covering 3000 nonoverlapping generations, was run to generate an initial linkage disequilibrium (LD) structure. The simulated genome consisted of 30 chromosomes of 100 centiMorgans (cM) each with 100 QTLs and 10,000 biallelic SNP markers. The QTLs and markers were uniformly distributed within each chromosome. This resulted in a total of 3000 QTLs and 300,000 markers across the whole genome. The offspring inherited alleles at these loci from their parents following Mendel’s rules allowing for mutation (assumed only to happen in the historical phase) and recombination. A recurrent mutation rate of 2.5 × 10^{− 5} for both markers and QTLs was set to establish mutationdrift equilibrium in the historical generations. The number of recombination per chromosome (per Morgan) was sampled from a Poisson distribution with a mean equal to the length of the chromosome, and crossovers were uniformly located along the chromosome. This part of the simulation was implemented with QMSim software [36]. Generation 3000 was used as the base population, in which 40,000 SNPs were randomly chosen from the pool of 300,000 markers, and 2000 QTLs were randomly chosen from the pool of 3000 QTLs. (2) In the next phase, 20 years of conventional breeding were simulated. Each year, 50 young bulls were selected based on their parent average (PA) and the progeny tested, and 10,000 cows in different age groups were maintained. Only cows in the first lactation, however, were assumed to have phenotypes. At the end of phase (2), all proven bulls were genotyped. (3) In the last phase, 15 years of genomic selection were simulated. Each year, 500 bulls and 2000 heifers were selected for genotyping at one year of age based on their PA genomic estimated breeding values (GEBV) computed by ssGBLUP. After genomic evaluation, 50 of these 500 bulls were selected for breeding. The simulations of phase (2) and (3) were performed with ADAM software [37]. For the herdyearseason (HYS) effect (contemporary group effect), individuals were allocated to 100 herds, 35 years and 4 seasons.
Scenarios
Three scenarios based on the use of phenotypic information from different phases of breeding programs for the estimation of VCs were explored: (1) Pheno_{1}: phenotypes only from conventional phase (1–20 years) were used; (2) Pheno_{1 + 2}: phenotypes from both the conventional phase and genomic selection phase (1–35 years) were used; (3) Pheno_{2}: phenotypes from only the genomic selection phase were used (21–35 years). Within each scenario, the three singletrait models (PAM, ssGBLUP, and ssBR) were applied. An overview of the subsets used and the average number of individuals in the pedigree, phenotypes, and genotypes for each scenario over 5 replicates are shown in Table 1. For validation, VCs and heritabilities from two different base populations (Pbase and Gbase) are presented in Table 2. The genetic variance of Pbase was calculated from the variance of “true” breeding values (TBVs) based on animals from the founder population, while the genetic variance of Gbase was calculated from the variance of TBVs based on animals from years 18, 19, and 20, i.e., the last three years before the start of GS (animals in Gbase were related). Hence, when genomic information was included and phenotypes were only collected from GS phase (Pheno_{2}), the VC estimates needed to be compared with the VCs from Gbase.
Statistical models for VC estimation
PAM
The classical animal model [38] using pedigreebased relationships can be written as follows:
where y represents the vector of phenotypes; β is a vector of fixed effects, i.e., herdyearseason effects: a is a vector of additive genetic effects following \( N\left(\mathbf{0},\mathbf{A}{\sigma}_g^2\right) \), where A is the numerator relationship matrix, and \( {\sigma}_g^2 \) is the additive genetic variance; X and Z are design matrices relating phenotypes to fixed effects and random animal effects, respectively; and e is a vector of residuals following \( N\left(\mathbf{0},\mathbf{I}{\sigma}_e^2\right) \), where \( {\sigma}_e^2 \) is the residual variance.
ssGBLUP
The model equation of the regular ssGBLUP model [16, 17] was the same as model (1) but used an H matrix that combines the markerbased (G) and pedigreebased (A) relationship matrices to replace the numerator relationship matrix (A) in the classical animal model. Therefore, the vector of GEBVs is assumed to be distributed as \( N\left(\mathbf{0},\mathbf{H}{\sigma}_g^2\right) \). The mixed model equations (MME) require the inverse of the H matrix [18]:
where \( {\mathbf{A}}_{22}^{1} \) is the inverse of the pedigreebased relationship matrix for the genotyped individuals, and G is constructed according to [19]:
where Z is a centred marker covariate matrix containing 0 − 2p_{j}, 1 − 2p_{j}, and 2 − 2p_{j} for genotypes of AA, AB, and BB, respectively; p_{j} is the allele frequency at locus j, and m is the total number of markers. For the scenarios of Pheno_{1} and Pheno_{1 + 2}, the allele frequencies were estimated from the animals in Pbase, whereas for the scenario of Pheno_{2}, the allele frequencies were estimated from the animals in Gbase. The variance components of PAM and ssGBLUP were estimated with REML [2] using the average information algorithm (AIREML) [8] as implemented in the DMU package [39].
ssBR
An ssBR model [21] based on BayesC [40, 41] with the assumption of all markers having nonzero effects (i.e., π = 0) and a common variance for all markers can be specified as follows:
where subscript 1 denotes nongenotyped individuals, and subscript 2 denotes genotyped individuals. Thus, y represents the vector of phenotypes; β is a vector including elements of herdyearseason effects; X and Z are design matrices; M_{2} contains observed marker covariates for genotyped individuals; M_{1} contains imputed marker covariates for nongenotyped individuals; the imputation is conducted from the following linear relationship: \( {\mathbf{M}}_1={\mathbf{A}}_{12}{\mathbf{A}}_{22}^{1}{\mathbf{M}}_2 \) based on the assumption of multivariate normality, where A_{12} and A_{22} are the submatrices of A; α is the vector of marker effects \( \boldsymbol{\upalpha} \sim N\left(\mathbf{0},\mathbf{I}{\sigma}_{\alpha}^2\right) \); ε is the vector of imputation residual deviations, \( \boldsymbol{\upvarepsilon} \sim N\left(\mathbf{0},\left({\mathbf{A}}_{11}{\mathbf{A}}_{12}{\mathbf{A}}_{22}^{1}{\mathbf{A}}_{21}\right){\sigma}_{\upvarepsilon}^2\right) \), due to the inaccuracy of imputation, where A_{11}, A_{12}, A_{22} and A_{21} are the submatrices of A; \( {\sigma}_{\alpha}^2 \) is the variance of the marker effects under the assumption that all markers exhibit common genetic variance and can explain all additive genetic variance; \( {\sigma}_{\varepsilon}^2 \) is the imputation residual variance; and e is a vector of residuals, \( \mathbf{e}\sim N\left(\mathbf{0},\mathbf{I}{\sigma}_e^2\right) \). For location parameters, a flat prior is assigned for β, and normal priors are specified for α, ε, and e, with null mean and variance of \( {\sigma}_{\alpha}^2 \), \( {\sigma}_{\varepsilon}^2 \), and \( {\sigma}_e^2 \), respectively. For dispersion parameters, scaled inverse chisquared distributions with scale factors, \( {S}_{\alpha}^2=0.0002 \), \( {S}_{\varepsilon}^2=2 \), \( {S}_e^2=2.5 \), and degree of freedom, ν_{α(ε, e)} = 4, are assumed to be the prior distributions for \( {\sigma}_{\alpha}^2 \), \( {\sigma}_{\upvarepsilon}^2 \), and \( {\sigma}_e^2 \), respectively. The final GEBV for individuals can be written as follows:
where g_{1} and g_{2} represent the vectors of GEBV for nongenotyped and genotyped individuals, respectively.
The use of ssBR allows the inference of the additive genetic variance from two sources of information: first, the total genetic variance, approximated by the estimated imputation residual variance, \( {\sigma}_{\varepsilon}^2 \); second, the markerbased genetic variance, computed as the estimated marker variance, \( {\sigma}_{\alpha}^2 \), multiplied by \( \sum \limits_{j=1}^m2{p}_j\left(1{p}_j\right) \), where p_{j} is the allele frequency at locus j and m is the total number of markers. Similar to ssGBLUP, for the scenarios of Pheno_{1} and Pheno_{1 + 2}, allele frequencies were calculated based on the stored genotypes in the base population (Pbase). For the of Pheno_{2} scenario, the allele frequencies were calculated based on the current genotyped population. In addition, an extra fixed effect (μ_{g}) was fitted in the model to account for the unknown expectation for genotyped individuals [21]. The ssBR program was written in Fortran 95. A Gibbs sampler was used to draw inferences for all model parameters from their posterior distributions. The length of the chain was set to 50,000, with a burnin of 20,000 iterations. The convergence of the posterior distribution for each parameter investigated was assessed using the boa and coda packages [42, 43].
Availability of data and materials
The datasets analysed during the current study are available in the figshare repository (https://doi.org/10.6084/m9.figshare.10547921.v3).
Abbreviations
 cM:

Centimorgan
 GEBV:

Genomic estimated breeding values
 GS:

Genomic selection
 HYS:

Herdyearseason
 MCMC:

Markov chain MonteCarlo
 PA:

Parent average
 PAM:

Pedigreebased animal model
 REML:

Restricted maximum likelihood
 ssBR:

Singlestep Bayesian regression
 ssGBLUP:

Singlestep genomic BLUP
 VC:

Variance components
References
 1.
Hofer A. Variance component estimation in animal breeding: a review. J Anim Breed Genet. 1998;115:247–65.
 2.
Patterson HD, Thompson R. Recovery of interblock information when block sizes are unequal. Biometrika. 1971;58:545–54.
 3.
Meyer K. Present status of knowledge about statistical procedures and algorithms to estimate variance and covariance components, 4th world Congr. Edinburgh: Genet. Appl. Livest. Prod; 1990. p. 407–18.
 4.
Smith SP, Graser HU. Estimating variancecomponents in a class of mixed models by restricted maximumlikelihood. J Dairy Sci. 1986;69:1156–65.
 5.
Gilmour AR, Thompson R, Cullis BR. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics. 1995;51:1440–50.
 6.
Johnson DL, Thompson R. Restricted maximumlikelihoodestimation of variancecomponents for Univariate animalmodels using sparsematrix techniques and average information. J Dairy Sci. 1995;78:449–56.
 7.
Madsen P, Jensen J, Thompson R. Estimation of (co)variance components by REML in multivariate mixed linear models using average of observed and expected information, 5th world Congr. Guelph: Genet. Appl. Livest. Prod; 1994. p. 19–22.
 8.
Jensen J, Mäntysaari EA, Madsen P, Thompson R. Residual maximum likelihood estimation of (co) variance components in multivariate mixed linear models using average information. J Indian Soc Agric Stat. 1997;49:215–36.
 9.
Ducrocq V. Estimation of genetic parameters arising in nonlinear models, 4th world Congr. Edinburgh: Genet. Appl. Livest. Prod; 1990. p. 419–28.
 10.
Gianola D, Fernando RL. Bayesian methods in animal breeding theory. J Anim Sci. 1986;63:217–44.
 11.
Gianola D, Foulley JL. Varianceestimation from integrated likelihoods (veil). Genet Sel Evol. 1990;22:403–17.
 12.
Gianola D, Foulley J, Fernando R. Prediction of breeding values when variances are not known. Genet Sel Evol. 1986;18:485–98.
 13.
Sorensen DA, Kennedy BW. Estimation of genetic variances from unselected and selected populations. J Anim Sci. 1984;59:1213–23.
 14.
Martinez V, Bunger L, Hill WG. Analysis of response to 20 generations of selection for body composition in mice: fit to infinitesimal model assumptions. Genet Sel Evol. 2000;32:3–21.
 15.
Jensen J. Estimation of genetic variance in the age of genomics. J Anim Breed Genet. 2016;133:333–3.
 16.
Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–63.
 17.
Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42:2.
 18.
Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–52.
 19.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.
 20.
Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91:47–60.
 21.
Fernando RL, Dekkers JCM, Garrick DJ. A class of Bayesian methods to combine large numbers of genotyped and nongenotyped animals for wholegenome analyses. Genet Sel Evol. 2014;46:50.
 22.
Fernando RL, Cheng H, Golden BL, Garrick DJ. Computational strategies for alternative singlestep Bayesian regression models with large numbers of genotyped and nongenotyped animals. Genet Sel Evol. 2016;48:96.
 23.
Gao H, Koivula M, Jensen J, Stranden I, Madsen P, Pitkanen T, Aamand GP, Mantysaari EA. Short communication: genomic prediction using different singlestep methods in the Finnish red dairy cattle population. J Dairy Sci. 2018;101:10082–8.
 24.
Lee J, Cheng H, Garrick D, Golden B, Dekkers J, Park K, Lee D, Fernando R. Comparison of alternative approaches to singletrait genomic prediction using genotyped and nongenotyped Hanwoo beef cattle. Genet Sel Evol. 2017;49:2.
 25.
Powell JE, Visscher PM, Goddard ME. Reconciling the analysis of IBD and IBS in complex trait studies. Nat Rev Genet. 2010;11:800–5.
 26.
Vitezica Z, Aguilar I, Misztal I, Legarra A. Bias in genomic predictions for populations under selection. Genet Res. 2011;93:357–66.
 27.
Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G. Singlestep methods for genomic evaluation in pigs. Animal. 2012;6:1565–71.
 28.
Legarra A, Christensen OF, Vitezica ZG, Aguilar I, Misztal I. Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships. Genetics. 2015;200:455.
 29.
Legarra A. Comparing estimates of genetic variance across different relationship models. Theor Popul Biol. 2016;107:26–30.
 30.
Sorensen D, Fernando R, Gianola D. Inferring the trajectory of genetic variance in the course of artificial selection. Genet Res. 2001;77:83–94.
 31.
Lehermeier C, de los Campos G, Wimmer V, Schon CC. Genomic variance estimates: With or without disequilibrium covariances? J Anim Breed Genet. 2017;134:232–41.
 32.
Veerkamp RF, Mulder HA, Thompson R, Calus MPL. Genomic and pedigreebased genetic parameters for scarcely recorded traits when some animals are genotyped. J Dairy Sci. 2011;94:4189–97.
 33.
Christensen OF. Compatibility of pedigreebased and markerbased relationship matrices for singlestep genetic evaluation. Genet Sel Evol. 2012;44:37.
 34.
Gengler N, Mayeres P, Szydlowski M. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dualpurpose Belgian blue cattle. Animal. 2007;1:21–8.
 35.
Meuwissen THE, Svendsen M, Solberg T, Odegard J. Genomic predictions based on animal models using genotype imputation on a national scale in Norwegian Red cattle. Genet Sel Evol. 2015;47:79.
 36.
Sargolzaei M, Schenkel FS. QMSim: a largescale genome simulator for livestock. Bioinformatics. 2009;25:680–1.
 37.
Pedersen LD, Sorensen AC, Henryon M, AnsariMahyari S, Berg P. ADAM: a computer program to simulate selective breeding schemes for animals. Livest Sci. 2009;121:343–4.
 38.
Henderson CR. Applications of linear models in animal breeding, University of Guelph, [Guelph, Ont.]; 1984.
 39.
Madsen P., Jensen J., A User's Guide to DMU  A Package for Analysing Multivariate Mixed Models. Version 6, Release 5.2: http://dmu.agrsci.dk/DMU/Doc/Current/dmuv6_guide.5.2.pdf, 2013.
 40.
Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186.
 41.
Kizilkaya K, Fernando RL, Garrick DJ. Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes. J Anim Sci. 2010;88:544–51.
 42.
Smith BJ. boa: An R package for MCMC output convergence assessment and posterior inference. J Stat Softw. 2007;21:1–37.
 43.
Plummer M, Best N, Cowles K, Vines K. CODA: convergence diagnosis and output analysis for MCMC. R News. 2006;6:7–11.
Acknowledgements
The first author is grateful for Rohan L. Fernando and Dorian J. Garrick’s willingness to share their pearls of wisdom during a visit at Iowa State University. Discussions with Hao Cheng and Hailin Su are also greatly appreciated. The first author acknowledges the funding from Innovation Fund Denmark (grant number 14020136), Nordic Cattle Genetic Evaluation (Aarhus, Denmark), and Aarhus University.
Funding
This research was funded by Innovation Fund Denmark (grant number 140–20136), Nordic.
Cattle Genetic Evaluation (Aarhus, Denmark), and Aarhus University. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Affiliations
Contributions
HG, PM, GPA, and JJ conceived and designed the study. HG developed the ssBR software package, conducted all analyses and wrote the draft manuscript. PM maintained the DMU software package used in the statistical analysis. JRT helped with the design of the simulation. ACS maintained the ADAM software package used in the simulation. GPA and JJ contributed to the interpretation of the results and helped coordinate the project. All authors provided critical feedback and helped shape the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Gao, H., Madsen, P., Aamand, G.P. et al. Bias in estimates of variance components in populations undergoing genomic selection: a simulation study. BMC Genomics 20, 956 (2019). https://doi.org/10.1186/s1286401963238
Received:
Accepted:
Published:
Keywords
 Singlestep
 Imputation error
 Genetic variance
 Marker variance
 Bayesian regression