Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (Triticum aestivum L.)

Table 1 Accuracies of imputing measured as average correlations (cor) between observed and estimated marker genotypes

Algorithm	Ref 50*	Ref 100*	Ref 200*	Ref 300*
Algorithm	cor	cor	cor	cor
Low to high marker density
Beagle	0.61	0.70	0.75	0.78
FImpute	0.68	0.73	0.77	0.80
IMPUTE2	0.74	0.77	0.81	0.84
Random Forest	0.56	0.61	0.66	0.69
Genotyping-by-sequencing-like
Beagle	0.76	0.85	0.92	0.95
FImpute	0.59	0.79	0.91	0.95
IMPUTE2	0.68	0.82	0.91	0.95
Random Forest	0.54	0.64	0.75	0.83

Map- dependent (Beagle, FImpute, and IMPUTE2) and map-independent (Random Forest) algorithms were applied with reference population sizes of 50, 100, 200, and 300 lines out of 371, and imputing was performed for a low to high marker density and for a GBS-like data scenario.
*For GBS-like imputation scenarios, Ref 50, Ref 100, Ref 200, and Ref 300 refer to missing value rates 72.8%; 61.5%; 38.8%; 16.1% for all lines of the population, corresponding to scenarios with reference population sizes of 50, 100, 200, and 300, of the total of 371 lines.

ISSN: 1471-2164