Genetic diversity and population structure
Our study provides an overview of genetic variation in US and CIMMYT spring and winter wheat cultivars using genome-wide distributed SNP markers. Here we confirmed the utility of the wheat OPA for genotyping large populations of hexaploid wheat lines . Most of the SNPs that were polymorphic within the complete set of 478 cultivars were also polymorphic in all subpopulations. Of the 849 polymorphic SNPs, 89% were polymorphic in both spring and winter wheat populations and from 70% to 85% were polymorphic across populations. Such a widespread distribution of polymorphic loci among populations suggests that the SNP discovery performed in a set of genetically diverse wheat landraces and wild emmer wheat [19, 36] was successful in recovering alleles represented in both growth habit groups. However, the distribution of MAF showed a higher proportion of medium frequency alleles in the spring wheat than in the winter wheat population. Currently, it is not clear whether this observed bias is caused by historical events, such as demography or selection, or if it is the result of ascertainment schemes applied during SNP discovery process. If the latter is true, the bias is likely small given the high proportion of polymorphic SNPs shared between spring and winter populations.
The proportion of genetic differentiation explained by growth habit (9.7%) was only slightly lower than the proportion of variation among subpopulations within the growth habit groups (12.9%). Historical gene flow between the spring and winter wheat groups during crop improvement and breeding can potentially be responsible for the low level of genetic differentiation between these populations. The proportion of variance between growth habit and among populations was significantly lower than the within-population genetic variance, indicating that each of the breeding programs included in our study employs genetically diverse lines. These results also indicate that the polymorphic SNPs included in the wheat OPA are represented in most populations and, therefore, will be useful for genotyping diverse collections of wheat cultivars.
In spite of the high proportion of shared SNPs among populations as well as small among-population genetic variance components, the model-based clustering approach was able to successfully assign cultivars to clusters. The clustering analysis performed using the whole genome SNP set produced more genetically distinct clusters than clustering obtained with smaller sets of SNPs from the A-, B- or D- genomes. Although cultivars can be optimally clustered at the same value of K using the A- and D- genome SNP sets, the proportion of genetic ancestry of cultivars in these clusters was variable for different SNP sets, implying that the three wheat genomes have different degrees of genetic differentiation among breeding programs. This outcome may be a consequence of inadequate representation of SNP alleles within a particular genome in different populations, or alternatively, can reflect the different impact that demography, population and breeding history had on genomes of wheat lines. Strong selection for adaptation to diverse environmental conditions, together with different founders and introgression histories can modulate the differentiation of allelic frequencies among breeding populations and genomes and result in the slightly different clustering patterns obtained here using the three genome-specific SNP sets. A similar trend was documented for US, Australian and UK varieties using DArT markers , which showed that wheat genomes are differentiated in allelic frequency among national breeding programs .
Even when the full SNP data set was used, wheat cultivars in the 17 wheat populations rarely shared the same membership coefficient in the inferred clusters reflecting the complexity of the breeding histories of lines included in this analysis. Most wheat lines showed evidence of admixture with the portions of their genomes assigned to 2 - 4 different inferred clusters, which is an expected result of the frequent crosses used in wheat breeding programs between adapted germplasm and the donors of different traits. When the whole population was forced to divide into two groups (K = 2), the clusters aligned mainly by growth habit, with most spring and winter wheat cultivars being assigned to separate clusters. The same grouping by growth habit was also apparent when the analyses were performed separately for the A- and B-genome SNPs. However, the informativeness of D-genome SNPs for the separation of spring and winter varieties was low, likely a consequence of low allelic frequency differentiation between these two wheat groups as evidenced by the low inter-population FST obtained for the D-genome.
Clustering using the combined SNP set showed that the inferred number of clusters in our population is smaller than the number of pre-defined breeding populations, largely due to the fact that breeding programs from the same region tend to use cultivars of common ancestry. The spring wheat population included cultivars from breeding programs targeting more geographically separated areas that were also more genetically differentiated that the winter wheat population. A high level of genetic differentiation was observed between the populations originating from two major geographical locations one including northern states SD, MN and MT and the other including Mexico and western states WA, ID and CA. In contrary, the winter wheat populations largely originating from the central states showed higher levels of admixture and a lower extent of genetic differentiation.
Genetic differentiation of spring and winter wheat
The characterization of FST across chromosomes provided additional insights into the structure of genetic variation between the spring and winter wheat populations. Assuming the same evolutionary processes affect neutral loci, identifying genomic regions showing elevated FST between spring and winter wheat populations should make it possible to localize the targets of selection controlling growth habit phenotype. However, the substantial heterogeneity of FST estimates for SNP loci across the wheat genome make it impossible to use single-locus FST values for detecting past selection events. This problem was circumvented by calculating FST for a group of sequential SNP loci which was shown to be an efficient strategy to reduce variation in FST estimates relative to estimates based on individual loci . The highest degree of genetic differentiation was identified for the loci mapped to the wheat chromosome 5A, which probably results from the presence of Vrn1 gene locus, the major gene involved in regulation of flowering time in wheat . This locus is responsible for most of the natural variation in the growth habit in hexaploid wheat [65, 66]. Additional regions showing unusually high level of genetic differentiation between spring and winter wheat lines were detected on the chromosomes 2A, 2B, 6B and 7B (Table 5). Three out of seven regions with elevated FST were co-localized with previously mapped genes known to be involved in flowering time regulation. Some wheat chromosome 6B substitution lines are known to affect flowering time in the absence of vernalization , but since the responsible gene has not been mapped, it is not possible to determine if this gene locus overlaps with high FST region identified on the chromosome 6B in this study. Although the distribution of empirical FST estimates cannot serve as a formal test for selection, this finding suggests that high FST genomic regions can harbor genes subject to diversifying selection providing good targets for further studies.
The genetic differentiation of some of the genomic regions can also be due to structural rearrangements abundant in one of the populations. For example, chromosomal inversions are known to be a major barrier for gene flow between populations due to limited recombination near the affected genomic regions and also one of the mechanisms facilitating reproductive isolation and species formation . Previously it was demonstrated that pericentomeric inversion polymorphisms are widespread in wheat . We found that one of these inversions overlaps with one of the regions with elevated FST detected on the chromosome 6B. This structural rearrangement can potentially impact the frequency of allele exchange between the spring and winter wheat populations and contribute to the genetic differentiation of this genomic region.
Using genome-wide SNP data we demonstrated the extensive amount of LD in the populations of wheat cultivars. The variation in the patterns of LD among the populations and wheat genomes reflects the complexity of evolutionary and breeding history of wheat . The extent of LD and LD decay estimated using SNP loci combined from all three wheat genomes was similar in both spring and winter wheat populations. In the analyses using the individual genomes, the differences between spring and winter lines in LD decay to 50% were also very small varying from no difference in the B-genome (7 cM both) to 1.3 cM in the A-genome (6.3 cM spring and 5 cM winter).
Analyses of LD decay by breeding population showed similar profiles among populations except for the CIMMYT population, which had the lowest LD among completely linked loci and the slowest rate of LD decay. A possible explanation for this observation is the intensive usage of synthetic wheat lines in the CIMMYT program. Synthetic wheats are generated by hybridization of diverse tetraploid (A- and B-genomes) and Ae. tauschii (D-genome) accessions followed by chromosome duplication using colchicine. The synthetic wheats and their derivatives have greatly increased genetic diversity in hexaploid wheat, particularly in the D-genome [71–73]. It is well known that the introduction of new haplotypes from divergent populations can increase the extent of LD .
Depending on the genomic location of genes controlling important adaptive traits, these broad crosses can have a differential impact on LD in different genomes. For example, because the Vrn-A1 gene has a stronger effect than the Vrn-B1 gene, it has higher number of widely distributed haplotypes  and is thus more likely to have a stronger effect on LD. Therefore, the divergence in the extent of LD between wheat populations is probably related to unique breeding histories and selection pressures applied to genes located in the different genomes during the process of cultivar development.
A genetic bottleneck may also increase the level of LD [2, 74]. The last polyploidization event resulting in the origin of hexaploid bread wheat approximately 8,000-10,000 years ago had a dramatic impact on the level of genetic diversity in the D-genome [19, 75] suggestive of strong population bottleneck. We hypothesize that the longer extent of significant LD in the D-genome compared to that in the A- and B-genomes in both spring and winter wheat populations can mostly be explained by this polyploidization event . However, the difference in LD between the D-genome, and the A- and B-genomes in spring wheat was not as high as in winter wheat. This result can probably be explained by 1) the larger number of breeding cycles involved in the development of spring wheat cultivars than in the development of winter wheat cultivars, and/or by 2) the inclusion of synthetic-derived wheat cultivars in the CIMMYT spring population.
Rates of LD decay varied among populations, but as expected, individual populations showed higher overall levels of LD than the combined datasets. These higher LD levels were also reflected in elevated levels of long-range LD extending above 10 cM. Interestingly, across all populations, LD decayed to 50% of its initial value within relatively narrow genetic intervals ranging from 6 to 9 cM. This rate of LD decay is probably associated with the high level of genetic diversity used in the individual breeding program. The cultivars in all of these programs captured comparable number of recombination events resulting in fast erosion of LD. However, each population showed variation in the extent of long range LD which was highest in the SD winter wheat population and WB spring wheat population. As pointed out earlier, these differences are probably the consequence of breeding history and selection specific to each breeding program.
The comparison of the LD levels obtained in our study with results obtained in other studies dealing with wheat and other inbreeding crops was complicated by the differences in the type of markers used for genotyping, and by sample size variation in the different studies. Both factors can impact LD estimates. Previously reported LD estimates in wheat were obtained using more polymorphic SSR markers [22, 24, 25]. In a sample of 43 U.S. spring and winter wheat cultivars it was shown that 70 out of 123 SSR loci (57%) with significant LD were linked at <10 cM . In our study 86% of SNP loci (211/246) showing significant LD in the combined population of spring and winter wheat were located at less than 10 cM. The larger proportion of alleles with significant LD at less than 10 cM detected in our study is most likely due to sample size differences across the two studies (478 vs. 43 lines) used to estimate significant LD. The extent of significant LD in our population was more than 4 times higher than the SNP-based estimates obtained for a population of 91 European spring and winter cultivated barley . These results suggest that the genetic diversity and number of recombination events in European barley germplasm are significantly higher than in the sample of U.S. and CIMMYT wheat cultivars. Therefore, association mapping studies in wheat would require a smaller number of markers per unit of genetic distance than needed in cultivated barley.
Variation in the extent of LD along the chromosome affect the number of tagSNPs (subset of SNPs that capture a large fraction of the allelic variation of all SNP loci ) required in each genomic region to ensure that causal mutations are in LD with neighboring SNPs. The interaction of many factors affecting the rate of LD decay in the different parts of the genome complicates the determination of the number of tagSNPs required to gain sufficient power for genome-wide association mapping. Estimates of this number for autogamous plant species varied from 9,600 to 29,400 SNPs for soybean cultivars  to 250,000 for the more diverse Arabidopsis natural populations . LD values of 0.8 or higher have been recommended as an acceptable threshold for tagSNP selection . In our study, for loci located from 0.0 to 0.2 cM apart, the median LD was approximately 0.8. If markers are evenly distributed at 0.2 cM intervals, the causative mutation would be found at about 0.1 cM from one of the flanking markers and have an approximate LD of 0.8. In a 3,500 cM hexaploid wheat map, placing markers at 0.2 cM will require at least 17,500 markers. This number would vary depending on if more liberal or conservative LD thresholds were selected.
The evolutionary history of an allele also has a strong impact on the probability of detecting marker-trait associations. Alleles of loci that are involved in local adaptation and subjected to recent selection can be more readily detected using an even more sparsely distributed set of markers. For example, marker-trait associations of alleles involved in the regulation of flowering time in Arabidopsis  and cultivated barley  were detected using a relatively low number of SNP markers. Genome-wide re-sequencing efforts similar to the ones performed for Arabidopsis , rice  and maize  will be required to provide comprehensive information for tagSNP selection in wheat. These efforts will also need to be complemented by assessment of the portability of selected tagSNPs to other populations. Otherwise, inadequate genome coverage may result in failure to identify critical associations [78, 80]. The possibility of performing GWAM in the large polyploid wheat genome will be tested in future using a larger panel of up to 9,000 genome-wide distributed SNP markers currently under development.