Genome-wide distribution of genetic diversity and linkage disequilibrium in elite sugar beet germplasm

Background Characterization of population structure and genetic diversity of germplasm is essential for the efficient organization and utilization of breeding material. The objectives of this study were to (i) explore the patterns of population structure in the pollen parent heterotic pool using different methods, (ii) investigate the genome-wide distribution of genetic diversity, and (iii) assess the extent and genome-wide distribution of linkage disequilibrium (LD) in elite sugar beet germplasm. Results A total of 264 and 238 inbred lines from the yield type and sugar type inbreds of the pollen parent heterotic gene pools, respectively, which had been genotyped with 328 SNP markers, were used in this study. Two distinct subgroups were detected based on different statistical methods within the elite sugar beet germplasm set, which was in accordance with its breeding history. MCLUST based on principal components, principal coordinates, or lapvectors had high correspondence with the germplasm type information as well as the assignment by STRUCTURE, which indicated that these methods might be alternatives to STRUCTURE for population structure analysis. Gene diversity and modified Roger's distance between the examined germplasm types varied considerably across the genome, which might be due to artificial selection. This observation indicates that population genetic approaches could be used to identify candidate genes for the traits under selection. Due to the fact that r2 >0.8 is required to detect marker-phenotype association explaining less than 1% of the phenotypic variance, our observation of a low proportion of SNP loci pairs showing such levels of LD suggests that the number of markers has to be dramatically increased for powerful genome-wide association mapping. Conclusions We provided a genome-wide distribution map of genetic diversity and linkage disequilibrium for the elite sugar beet germplasm, which is useful for the application of genome-wide association mapping in sugar beet as well as the efficient organization of germplasm.


Background
Sugar beet (Beta vulgaris subsp. vulgaris) is a member of the family Amaranthaceae [1]. It is an important crop for sucrose production in the temperate climate zone, which accounts for about one quarter to one third of the worldwide sugar production [2]. Sugar beet is a diploid species with n = nine chromosomes and a haploid genome size of 758 Mb [3]. Physical mapping and sequencing of the sugar beet genome is in progress [4].
At present, hybrid varieties account for most of the sugar beet production. Seed and pollen parent heterotic pools are the basic material for hybrid breeding [5], where the former consists of monogerm germplasm and the latter of multigerm germplasm (e.g. [6]). Due to the strong negative correlation between root yield and sugar content in sugar beet [7], the germplasm of the individual heterotic pools is usually classified as yield type (with emphasis on root yield), sugar type (with emphasis on sugar content), or normal type (intermediate in both characters) [8]. The relatively independent development of these different types of germplasm through decades might have resulted in divergent populations. Such information, however, is not available for sugar beet. Molecular markers reflect the actual level of genetic variation existing among genotypes at the DNA level and therefore have been widely applied in population genetics research. In beets, the most frequently used class of molecular markers are microsatellites or simple sequence repeat (SSR) markers as they are highly polymorphic and co-dominantly inherited (e.g. [9]). The recent advances in genomic technologies, however, have provided with single nucleotide polymorphism (SNP) markers a powerful tool for a more direct analysis of sequence-based polymorphisms [10]. They are the most abundant class of sequence variability in the genome, co-dominantly inherited, easily automated and, thus, appropriate for high throughput analyses [11]. Therefore, they are now the marker system of choice for various crop species such as maize [12], rice [13], barley [14], and soybean [15]. For sugar beet, a few studies have been carried out on the identification of SNPs [16,1]. No earlier study, however, evaluated SNP markers with respect to their usefulness to characterize genetic diversity and population structure in elite sugar beet germplasm. Furthermore, no information is available on the number of SNPs required for such analyses.
Various methods have been proposed for examining population structure. One of the most frequently used methods is STRUCTURE, a model-based approach to assign individuals to subgroups [17]. Furthermore, principal component analysis (PCA) and principal coordinate analysis (PCoA) are considered favourable for uncovering population structure [18,19]. Laplacian eigenfunctions (LAP), as a weighted PCA, were recently reported to describe population structure [20]. Another model-based approach, MCLUST, was reported being appropriate for determining the clusters and membership simultaneously without genetic assumptions [21]. Despite that advantages and disadvantages of the different methods are known, few empirical comparisons are available in a plant genetics context.
The identification of genes underlying phenotypic variation can be performed in two different directions: (i) from phenotype to genotype, which is used in quantitative genetics approaches and (ii) from genotype to phenotype, which evaluates signatures of selection [22]. High density SNP markers allow to evaluate the genomic changes that occurred by artificial selection during breeding and have the potential to help identifying likely targets of past selection. To our knowledge, however, such analyses have not been performed for sugar beet yet.
The potential of using association mapping approaches in sugar beet has come to the forefront (e.g. [23,24]). This approach depends on the extent and distribution of linkage disequilibrium (LD). Several studies examining LD in beets are available, where these were based on a relatively few RFLP, SSR, RAPD or AFLP makers ( [25][26][27]9,6]). However, to the best of our knowledge, no earlier study examined the extent and genomewide distribution of LD in elite sugar beet germplasm with a high number of genome-wide distributed markers.
The objectives of this study were to (i) explore the patterns of population structure in the pollen parent heterotic pool using different methods, (ii) investigate the genome-wide distribution of genetic diversity, and (iii) assess the extent and genome-wide distribution of LD in elite sugar beet germplasm.

Plant materials and molecular markers
A total of 502 diploid sugar beet inbreds from the pollen parent heterotic pool were examined in this study. Among them, 264 accessions were yield types and 238 sugar types. All plant materials used in this study are proprietary to KWS SAAT AG (Einbeck, Germany). All 502 sugar beet inbreds were genotyped by KWS SAAT AG, following standard protocols, with 328 SNPs markers, which were distributed across the genome. A total of 26,33,41,35,40,42,39,32, and 40 of these markers map to linkage group A to I, respectively (unpublished data). This data set comprises no inbreds or markers with more than 20% missing data.

Statistical analyses
The model-based approach implemented in software package STRUCTURE [17] was used to examine population structure. STRUCTURE was run for K = 1-10 subgroups using the linkage model neglecting prior information. Each run consisted of a burn-in period of 100,000 steps followed by 100,000 Monte Carlo Markov Chain replicates, assuming that allele frequencies are uncorrelated across clusters. Five replications were performed for each K value. To determine the most probable value of K, an ad hoc criterion was used [28]. That run of the estimated number of subgroups showing the maximum likelihood was used to assign inbreds with membership probabilities surpassing a certain threshold (i.e. maximum probabilities among the subgroups, membership probabilities of 0.60, 0.70, and 0.80) to subgroups. The results from STRUCTURE were displayed by DISTRUCT software [29].
The allele frequencies at each marker and for each inbred were calculated and used for PCA analyses [18]. The number of significant PCA eigenvalues was tested by Eigenanalysis (cf. [30]). Furthermore, the modified Rogers distance (MRD) was calculated [31]. PCoA [19] based on MRD estimates between pairs of inbred lines was performed. In addition, we used LAP [20] to reveal the population structure, where the threshold of correlation coefficients eps was set to 0.8. Finally, the model-based approach MCLUST was used to determine the number of subgroups as well as to provide the membership probabilities [21]. Due to the large number of dimensions (328 markers), MCLUST analysis was performed on 1-150 PCA components, PCoA coordinates, or LAP lapvectors, respectively. Models for 1 to 15 subgroups were examined. The correspondence between the inbreds' assignment by MCLUST and STRUCTURE and the germplasm type information were compared.
In order to determine the number of SNPs required to detect the underlying population structure, a resampling analysis was performed. In each of 100 repetitions, subsets of the markers (9 to 252 by 9 grad) were either randomly selected (random sampling) or sampled in such a way that the selected markers were equally distributed across the genome (stratified sampling) [12]. Based on the selected markers, PCA was performed for all the inbreds and 10 PCA components were used for MCLUST analysis. The correspondence between the inbreds' assignment by MCLUST based on the entire set of 328 SNPs and different resampling subsets was compared. The MRD was calculated for each pair of inbreds based on the selected SNP markers and the coefficient of variation (CV) across all 100 repetitions was calculated. Furthermore, subsets of the markers (9 to 252 by 9 grad) showing the highest polymorphic information content (PIC) or MRD between the two germplasm types were selected. Based on the selected markers, PCA was performed as described above. The correspondence between the inbreds' assignment by MCLUST based on the entire set of 328 SNPs and the SNP subsets was compared.
Gene diversity was calculated for the yield type as well as sugar type inbreds for each marker separately. Similarly, MRD between yield type and sugar type inbreds was calculated on an individual marker basis.
The squared correlation of allele frequencies (r 2 ) at two SNP loci was calculated to measure the LD level. This measure was chosen as it can be interpreted as the proportion of variance which the allele frequency of the first marker explains of the allele frequency of the second marker [32]. The 95% quantile of r 2 for unlinked loci pairs was used as significance threshold for the linked loci pairs. A nonlinear regression of r 2 vs. the genetic map distance (cM) was performed according to [33]. The expectation of r 2 between adjacent sites is: where C = 4Ner, r the recombination rate, n the sample size, and Ne the effective population size. The average r 2 (r 2 ) at binned genetic distances was calculated. Furthermore, the r 2 for all linked loci pairs within 5 cM segments across the genome was calculated. All LD analyses were performed for the entire germplasm set, yield type, and sugar type inbreds.
If not stated differently, all analyses were performed with the statistical software R [35].

Results
The log likelihood revealed by STRUCTURE increased gradually from K = 1 to K = 10 and showed no obvious optimum (Additional file 1). In contrast, the maximum of the ad hoc measure ΔK was observed for K = 2. Based on the membership probability thresholds of 0.80, 0.70, and 0.60, 36%, 60%, and 84% of the inbreds of the entire germplasm set could be assigned to two subgroups, respectively. With the maximum membership probability criterion, the assignment by STRUCTURE showed for 94.4% of the inbreds correspondence with the germplasm type information (Figure 1, Additional file 2). PCA, PCoA, as well as LAP revealed two distinct clusters for the entire germplasm set (Additional file 2). The first and second principal component explained 22.7% and 5.4% of the molecular variance, respectively. In PCoA based on MRD estimates between all pairs of sugar beet inbreds, the first two principal coordinates explained 23.2% and 5.5% of the molecular variance. In addition, the first and second lapvectors of LAP explained 14.6% and 3.5% of the molecular variance, respectively.
The number of subgroups identified by MCLUST based on 1-150 PCA components varied from 1 to 9, while the number for 1-150 PCoA coordinates or LAP lapvectors varied from 2 to 9 (Additional file 3). When the number of subgroups was set to two, MCLUST analysis based on 8-50 PCA components, 8-50 PCoA coordinates, and 1-100 LAP lapvectors showed with >90% a high correspondence of assignment with the germplasm type information (Figure 2, Additional file 4). MCLUST was used to assign inbreds based on different resampling subsets of all SNPs to clusters, where the correspondence to the clustering using all SNPs improved with increasing number of SNP markers. When the number of SNP markers reached about 100, not much higher correspondence could be obtained by further increasing the number of SNPs (Figure 3). Similarly, the CV of MRD among all pairs of inbreds decreased as the number of SNP markers increased (Figure 4). When the number of SNP markers reached about 100, not much lower CV of MRD could be obtained by further increasing the number of SNPs. The stratified resampling strategy revealed a slightly higher correspondence and lower CV compared to the random resampling strategy. Furthermore, MCLUST analysis based on SNP markers selected for their high PIC values revealed a higher correspondence to the clustering using all SNPs than based on the SNP markers selected for a high MRD between yield and sugar types as well as based on the above mentioned stratified and random resampling strategy (Figure 3).
The average gene diversity of the entire germplasm set, yield type, and sugar type inbreds were 0.338, 0.199, and 0.365, respectively. Gene diversity for yield type and sugar type inbreds varied across the genome (Additional file 5). For most genome regions, the sugar type inbreds showed a higher gene diversity than the yield type inbreds. However, for a few regions, the opposite was true. The average MRD among all inbreds was 0.562, and the MRD between yield type and sugar type inbreds was 0.311. A different degree of divergence between these two germplasm types was observed across the genome (Additional file 6).
The 95% quantile of r 2 values for unlinked loci pairs in the entire germplasm set, yield type, and sugar type inbreds was 0.167, 0.117, and 0.071, respectively (Table  1). A total of 18.97%, 31.84%, and 32.02% of linked loci pairs in the entire germplasm set, yield type and sugar type inbreds, respectively, showed an r 2 level higher than the r 2 Q95 of unlinked loci pairs. A total of 0.93%, 6.22%, and 0.74% of r 2 values between linked loci pairs in the germplasm sets were larger than 0.8. LD decayed to r 2 Q95 of unlinked loci pairs within 7.4 cM, 45.1 cM, and 20.6 cM for the entire germplasm set, yield type, and sugar type inbreds, respectively ( Figure 5, Additional file 7). The r 2 between marker loci within binned genetic distances decreased as the genetic distance intervals increased ( Figure 6). When the intervals reached 15-20 cM, the r 2 reached a plateau. For all intervals,  the yield type inbreds showed higher r 2 values than the entire germplasm set and sugar type inbreds, while the latter two showed similar trends. The r 2 for all linked loci pairs within 5 cM segments varied considerably across the genome (Additional file 8). The effective population size for the entire germplasm set, yield type, and sugar type inbreds were 52.7, 21.2, and 72.7, respectively, and these values varied considerably between the different linkage groups (Table 2).

Discussion
Comparison of different approaches for detecting population structure Knowledge about the patterns of population structure is essential for efficient germplasm organization. Therefore, various approaches have been developed for this purpose. The method implemented in the software   STRUCTURE is one of the most frequently used approaches. However, when dealing with thousands of individuals and markers, the high computational requirements of STRUCTURE analyses make it impractical [36]. Instead, PCA, PCoA, as well as LAP have the potential to extract the fundamental structure of a dataset without assuming any population genetic model [18,19]. Furthermore, as these methods are not computationally intensive, they might be possible alternatives for detecting population structure. These approaches, however, do not allow to make directly statistical inferences about the number of subgroups. Furthermore, the assignment of inbreds to subgroups is not defined. MCLUST, however, could determine the numbers of subgroup as well as the cluster membership probability simultaneously without genetic assumptions [21]. Nevertheless, MCLUST applied directly to the raw marker data had in our study only a low power to identify population structure (data not shown). This might be due to the fact that many markers explain a small part of the population structure information. To overcome this problem, MCLUST was applied in our study on principal components (PC), principal coordinates (PCo), or lapvectors.
The number of subgroups (from 1 to 15) were examined by MCLUST based on 1-150 PC, PCo, and lapvectors. Our results suggested that the number of subgroups varied between one and nine (Additional file 3). The number of subgroups showed a high variability if less than 20 PC, PCo, or lapvector were used which explained together less than 75% of the variance. However, when the number of PC was higher than 50, the number of subgroups started to vary again (Additional file 4). The explanation for this observation is unclear and requires further research. These findings suggested that determining the number of subgroups using MCLUST applied to PC, PCo, or lapvector is not straight forward and requires careful consideration of the numbers of dimensions used for the analyses.
When the number of subgroups was set to two according to the results of PCA, PCoA, and LAP, we observed for 10-40 PC, 10-50 PCo, and 1-100 lapvectors >95% correspondence with the germplasm type information (Additional file 4) and >90% correspondence with the assignment by STRUCTURE (data not shown). The above mentioned methods also had with >85% a high correspondence of assignment with each other (data not shown). These findings suggested that these methods might be time-saving alternatives to STRUCTURE analyses, if the assignment of genotypes to subgroups is of interest and the numbers of subgroups is known.

Population structure of the elite sugar beet germplasm
Results of earlier studies revealed that cultivated sugar beet genotypes are genetically distinct from wild beet genotypes [37,9]. Moreover, the results of [6] indicated that the seed and pollen parent heterotic pools of  cultivated sugar beet showed two distinct clusters after 40 years of recurrent reciprocal selection. Therefore, in our study, the population structure of one of these two heterotic pools, namely the pollen parent heterotic pool was examined in further detail. The results of the STRUCTURE analysis revealed the presence of two subgroups in the entire pollen parent germplasm set (Additional file 1). This observation was in accordance with the clustering observed in the PCA, PCoA and LAP analyses as well as with the MCLUST analysis and with the number of examined germplasm types (Figure 2, Additional file 2). Furthermore, 99.6% of the inbreds in the subgroup 1 based on the MCLUST analysis with 10 PCs were sugar types and 98.5% of the inbreds in the subgroup 2 yield types. The observed pattern of population structure might be explained by the fact that due to a negative correlation between root yield and sugar content [7], the selection on both traits in an originally undifferentiated population could lead to differentiated populations. The observation of distinct subgroups was further made possible by the occurrence of only few recombination events between the two germplasm types [8]. Nevertheless, we observed a higher average MRD for all the inbreds than for that between two germplasm types. This observation indicated that higher variation existed within the populations than between the populations.
Our explanation is in accordance with the observation that the IIlinois long term selection experiment for grain protein (high vs. low protein) and oil concentration (high vs. low oil) in maize had lead to phenotypically but also genotypically divergent populations [38]. Due to the fact that germplasm type information was in very good agreement with molecular marker information, sugar type and yield type inbreds were the basis for all further analyses.

Comparison of different numbers of SNPs for detecting population structure
As the SNP number and selection strategy is expected to affect the estimates of population structure (c.f. [14]), we examined these aspects in our study. The correspondence of assignment by MCLUST based on subsets of 9-252 SNPs vs. the whole SNP set improved with an increasing number of SNPs (Figure 3). Similarly, the CV of MRD estimates among all pairs of inbreds decreased with increasing number of SNPs (Figure 4). This is due to the fact that a high number of SNPs provides a high precision for determining population structure as well as for measuring the genetic distance between inbreds. When the SNP numbers selected at random or in a stratified fashion reached about 100, the before mentioned trends of the correspondence as well as the CV reached a plateau and not much further improvement could be obtained by further increasing the number of SNPs. As the costs for genotyping will also increase with an increasing number of SNPs, our results indicated that in the examined sugar beet germplasm about 100 SNPs would be required to determine the same population structure as the whole SNPs set did and that this estimation would be done with a similar precision.
We observed a slightly higher correspondence ( Figure  3) as well as lower CV of MRD (Figure 4) for the stratified than for the random resampling strategy. This observation suggested that by choosing markers that are equally distributed across the genome, it is possible to reduce their number compared to randomly distributed markers while achieving the same level of precision in assigning inbreds to subgroups as well as estimating MRD. An even higher correspondence can be obtained with the same number of markers if they were selected with respect to their PIC values ( Figure 3). This observation suggested that with SNPs selected for a high PIC value, the number of SNP markers required to determine the same population structure could be further reduced.
The number of SNPs predicted in our study to be required for MRD estimates is considerably lower than that calculated for maize [12]. This observation might be explained by differences in the number of genotypes studied. [12] examined three times more genotypes than we did, which increases the number of markers required to unambiguously identifying each genotype. Furthermore, [12] examined 25 times more SNPs than we did, which also increases the number of markers required to achieve a similar precision as the whole SNPs set did.

Genome-wide distribution of genetic diversity
Elite sugar beet germplasm has been intensively selected since the mid of the last century [8]. Consequently, the genomic regions controlling traits of economic importance are expected to be shaped by this selection. Therefore, characterizing the genome-wide distribution of genetic diversity of elite sugar beet germplasm which has been selected for different traits, such as sugar content vs. root yield might help to identify the genes controlling these traits. A similar approach has been successfully applied to identify a panel of known genes as well as some interesting candidate genes and QTLs in Holstein cattle [22].
We observed an average gene diversity of 0.338 for the entire germplasm set. This finding is in good accordance with results of [37] where a gene diversity of 0.31 was observed in USDA sugar beet gene bank materials assessed with RAPD markers. In contrast, the gene diversity observed in our study was lower than the values reported earlier ( [26,9,6]), where an average gene diversity of 0.51-0.62 was observed in weed beet and sugar beet populations using SSR markers. This difference might be explained by the examined marker types. SNP and RAPD markers are typically bi-allelic, whereas SSR markers are multi-allelic, which has the potential to increase gene diversity (c.f. [12]).
The average gene diversity of the sugar type inbreds was higher than that of the yield type inbreds (Additional file 5). This observation might be explained by ascertainment bias during SNP development or a higher selection intensity applied during breeding of yield type sugar beets compared to sugar type inbreds. Our explanation was supported by the fact that the effective population size Ne of the yield type inbreds was considerably lower than that of the sugar type inbreds (Table 2), which indicated stronger bottleneck effects for the yield types than for the sugar type inbreds. However, it should be noted that the calculation of Ne assumes idealized populations [34], and that where these idealizations are violated such as selected populations or selected SNPs, the calculated Ne will deviate from the true value. Another reason for our finding of a higher gene diversity of the sugar type inbreds compared to the yield type inbreds might be that it is more difficult to introduce new germplasm from exotic sources into the yield types than into the sugar types.
The unequal distribution of genetic diversity across the genome could be explained by the ascertainment bias during SNP development. However, more likely, this observation is due to the selection history of the different genome regions. Therewith, the genome-wide distribution maps of genetic diversity (Additional file 5 and 6) might be a first step to identify the target genes or regions selected during breeding history. For example, genes related to sugar content and root yield might be present in the most divergent genomic regions between these two germplasm types. Common genes under selection in the breeding program of the both germplasm types (e.g. disease resistant genes) might be present in the genomic regions showing the same level of gene diversity and low MRD (Additional file 5 and 6).

Genome-wide distribution of LD and consequences for association mapping
The power and resolution of association mapping depend greatly on the genome-wide distribution of LD assessed with a high number of markers [39]. We observed that a total of 18.97%, 31.84%, and 32.01% of the linked loci pairs in the entire germplasm set, yield and sugar type inbreds, respectively, showed r 2 values higher than the significance threshold ( Table 1). The percentages observed in our study were lower than that reported earlier [6]. In contrast, the values of our study were higher than that of earlier studies [26,27,9], where 1.1%-14.3% of the loci paris were observed to be in significant LD. These differences might be explained by the facts that (i) different significance thresholds were used, (ii) a rather high marker density was applied in our study compared to earlier studies, (iii) different marker types were used in these studies, i.e. SNPs in our study vs. SSRs or RAPDs in other studies, and (iv) different plant materials was examined, i.e homozygous elite inbreds of sugar beet in our study and [6] vs. random mating wild beets in other studies.
As r 2 between SNPs decayed with genetic map distance, we suggest that linkage between SNPs is an important factor influencing the patterns of LD in the studied germplasm. The r 2 reached the threshold of significant LD within 7.4 cM, 45.1 cM, and 20.6 cM for the entire germplasm set, yield type and sugar type inbreds, respectively. In addition, r 2 at binned genetic map distances reached a plateau at 15-20 cM for the entire gemplasm set and the two germplasm types. The decay distance we observed was longer than that reported by [6], where r 2 declined to 0.1 at 10 cM, and that of [25] where only marker pairs <3 cM showed a high extent of LD. The difference might be due to (i) the rather high density of markers examined in our study compared with earlier studies and (ii) different regression methods used to measure the decay of LD. The observation of slower LD decay for yield type inbreds than for sugar type inbreds, which might be due to the different selection history as outlined above, resulted in smaller effective population sizes Ne calculated for the yield type inbreds than the sugar type inbreds ( Table 2). The results indicated that different numbers of markers are required for genome-wide association mapping in the different types of germplasm.
The high proportion of SNP loci pairs in significant LD as well as the decay of LD with distance suggested that association mapping is a tool applicable in the context of sugar beet breeding. However, both in the entire germplasm set and the two groups of the germplasm types we observed only for very few (0.74-6.22%) linked SNP paris r 2 values >0.8 (Table 1). Such high r 2 values are required in order to allow the detection of markerphenotype associations explaining less than 1% of the phenotypic variance [32]. This in turn indicates that for genome-wide association mapping in sugar beet, the number of markers has to be dramatically increased compared to the number applied in our study.
We observed different LD levels along the linkage groups of sugar beet (Additional file 8). This observation suggests that estimating the number of markers required for genome-wide association mapping from the genomewide average of LD is dubious. In this case, important QTL might be not detected as locally occuring low levels of LD decrease the power to detect them. Therefore, the genome-wide distribution of LD has to be considered when designing SNP genotyping arrays in the context of genome-wide association mapping. Furthermore, the LD patterns found in the pollen parent heterotic pool might not be the right information source for designing SNP genotyping arrays for other germplasm.

Conclusions
We identified based on different statistical methods two distinct subgroups in the elite sugar beet germplasm of the pollen parent heterotic pool, which is in accordance with its breeding history. MCLUST based on principal components, principal coordinates, or lapvectors might be an alternative method to STRUCTURE for population structure analysis. Gene diversity and MRD between the examined germplasm types varied considerably across the genome, which might be due to artificial selection. This fact could be used to identify candidate genes for the traits under selection using population genetics tools. Furthermore, similar approaches using sequences of wild and cultivated sugar beet genotypes might be used to identify the domestication genes. Due to the fact that r 2 >0.8 is required to detect marker-phenotype association explaining less than 1% of the phenotypic variance, our observation of a low proportion of SNP loci pairs fulfilling this criterion suggests that the number of markers has to be dramatically increased for genome-wide association mapping.