Skip to main content

Genetic diversity and population structure of wheat landraces in Southern Winter Wheat Region of China



Wheat landraces are considered a valuable source of genetic diversity for breeding programs. It is useful to evaluate the genetic diversity in breeding studies such as marker-assisted selection (MAS), genome-wide association studies (GWAS), and genomic selection. In addition, constructing a core germplasm set that represents the genetic diversity of the entire variety set is of great significance for the efficient conservation and utilization of wheat landrace germplasms.


To understand the genetic diversity in wheat landrace, 2,023 accessions in the Jiangsu Provincial Crop Germplasm Resource Bank were used to explore the molecular diversity and population structure using the Illumina 15 K single nucleotide polymorphism (SNP) chip. These accessions were divided into five subpopulations based on population structure, principal coordinate and kinship analysis. A significant variation was found within and among the subpopulations based on the molecular variance analysis (AMOVA). Subpopulation 3 showed more genetic variability based on the different allelic patterns (Na, Ne and I). The M strategy as implemented in MStratv 4.1 software was used to construct the representative core collection. A core collection with a total of 311 accessions (15.37%) was selected from the entire landrace germplasm based on genotype and 12 different phenotypic traits. Compared to the initial landrace collections, the core collection displayed higher gene diversity (0.31) and polymorphism information content (PIC) (0.25), and represented almost all phenotypic variation.


A core collection comprising 311 accessions containing 100% of the genetic variation in the initial population was developed. This collection provides a germplasm base for effective management, conservation, and utilization of the variation in the original set.

Peer Review reports


Wheat is one of the most important staple crops for more than one-third of the human population, providing about 19% of the calories and 21% of the protein [1]. Approximately 90 to 95% of wheat grown worldwide is bread wheat (Triticum aestivum L.) (2n = 6x = 42, AABBDD) [2]. Multiple rounds of rare natural hybridization between different wheat species and relatives led to the currently cultivated wheat, but also caused genetic bottlenecks due to the exclusion of adaptive alleles [3, 4]. Modern cultural practices and improved cultivars to take advantage of those practices significantly increased wheat production. However, the development of high-yielding modern wheat cultivars is at the expense of losing much of the diversity in landraces and older varieties. In the last century, wheat landraces were almost completely replaced by modern cultivars, reducing the overall diversity of the species [5].

Wheat landraces show a much higher genetic diversity than elite varieties [6]. Potentially valuable traits in landraces include early growth vigour [7], cold, heat or drought tolerance [8,9,10], disease resistance, water use efficiency [11], and quality traits suited for local food preferences. Developing new cultivars from landrace populations is a feasible strategy to improve wheat productivity and stability, especially in vulnerable environments in breeding programs.

Scientists have been conscientious in conserving wheat landraces for a long time. Large numbers of landraces were collected, conserved, studied, and analyzed, and the potential for utilization and incorporation of their beneficial traits into new varieties was explored [5]. The Türkiye scientist Gökgöl, collected and characterized 18,000 wheat landraces from Türkiye; among them, 256 varieties were new [12]. More than 60 distinct wheat landraces were collected in five mountainous regions of Tajikistan [13]. Over 30 bread wheat landraces in three regions from the western Tian-Shan mountains were collected in Uzbekistan [14]. These landraces were thoroughly phenotyped, genotyped, conserved in gene banks and used in wheat breeding. In China, nearly 13,900 wheat landraces from different geographic and climatic conditions are conserved in the National Gene Bank [15]. Chinese wheat landraces are characterized by earliness, large numbers of grains per spike, high adaptiveness, and a long history of cultivation [16].

Three strategies were applied to represent and exploit the diversity of landraces in previous studies: (1) measuring diversity and developing a core collection from extensive collections to represent the overall genetic diversity with minimal repetition; (2) exploiting the most favorable alleles of important traits in breeding programs; and (3) retaining phenotypic variation and related genetic association for targeted traits through large-scale and precise phenotypic analysis combined with GWAS [17]. According to Frankel et al. [18], a core collection with the minimum redundancies represents the genetic variation of an entire collection, and facilitates maintenance, research, and utilization of germplasm resources.

During the last few decades, several core collections of wheat have been constructed, and they have played an important role in the conservation and improved use of wheat genetic resources. A worldwide bread wheat core collection of 372 accessions (372CC) was selected with a set of 38 simple sequence repeat (SSR) markers [19]. Hao et al. [20] established a mini-core collection of 231 Chinese wheat accessions with an estimated 70% representation of the genetic variation from the initial collection using 78 SSR markers. Using 36,720 SNP markers, Mourad et al. [21] analyzed the genetic diversity and population structure of a 103 accessions spring wheat core collection representing worldwide germplasm collection.

Wheat is grown in ten agro-ecological zones in China, which vary widely in climate, soil, cultivar adaptation and management. The adaptation to these different environments led to the creation of landraces. China has rich genetic resources of wheat landraces, which are important for production and breeding. In this study, the morphological description and genomic characterization of wheat landraces collected from 2008 to 2014 at the Jiangsu Academy of Agricultural Sciences, Nanjing, China, were undertake to develop opportunities for their use in breeding. In total, 2023 wheat landraces collected from 23 administrative districts were evaluated for agronomic traits in field trials. The genetic diversity was analyzed in a large collection consisting of 2,023 wheat landraces using 15 K Illumina chip. Analyses of the polymorphic markers provided kinship information among groups, the population structure of the accessions, and the genetic properties among subpopulations. We also established a core collection to reduce redundancy in the collection. This core collection will be useful for further utilization of this large set of landraces.


Plant material

We used 2,023 wheat landraces accessions conserved at the Gene Bank, located at the Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, China. These accessions were collected from 23 provinces in China (Fig. 1). All details about the 2,023 wheat landrace accessions are shown in Additional file 1: Table S1. Of these, 937 (46.32%) accessions were obtained from Jiangsu Province. All of these accessions were precisely evaluated their traits in field trials.

Fig. 1
figure 1

Geographic locations of 2,023 wheat landraces. Red stars represent the geographic distribution of the core collection (311 accessions), and blue dots represent the geographic distribution of the other 1,712 accessions. The core collection is a subset of the original set (2,023 accessions)

Phenotyping and data analysis

The 2,023 wheat landraces accessions for twelve agronomics traits were evaluated in two environments, of which 1,526 were evaluated in Luhe in 2018 and 497 in Luhe in 2019, respectively. These traits include heading date and flowering date related to maturity; awn type, glume color, spike type, plant height and spike length in relation to plant morphology; and spikelet number per spike, sterile spikelet number per spike, grain number per spikelet, grain number per spike and thousand kernel weight related to yield. A brief description of each trait and data scoring is presented in Additional file 2: Table. S2. The phenotypic diversity (\({H}^{{\prime }}\)) was calculated as the Shannon index, \({H}^{{\prime }}={\sum }_{i=1}^{n}{P}_{j}ln{P}_{j}\), where \(n\) is the number of phenotypic classes for a character and \({P}_{j}\) is the proportion of the total number of entries in the \(i\) class [22]. \({H}^{{\prime }}\) was estimated for the twelve agronomics traits.

SNP genotyping

DNA samples were genotyped with 15 K Axiom® Wheat Breeder Genotyping Array (China Golden Marker Biotechnology Co., Ltd, Beijing) according to the manufacturer’s guidelines. The array comprised 13,947 SNP markers. Quality filtration was performed on the markers using PLINK v1.07 [23]. Minor allele frequency (MAF) less than 5% (--maf 0.05), individuals with more than 20% missing SNP calls (--mind 0.2) and markers with more than 5% missing data (--geno 0.05) were considered for filtration. Physical map positions of all SNP markers were obtained from the Ensembl plants Triticum aestivum database ( Markers lacking information for consensus chromosome location were removed. Finally, 7,926 SNP markers and 2,023 genotypes were subjected to further analysis.

Analysis of genetic diversity

Parameters measuring the genetic diversity of the population such as PIC, gene diversity, heterozygosity (H) and MAF were calculated using PowerMarker V3.25 [24]. Other parameters such as average pairwise divergence or observed nucleotide diversity (π), expected nucleotide diversity or estimated mutation rate (θ) [25] and Tajima’s D [26] were calculated using TASSEL v5.2.65 [27].

The AMOVA and estimation of genetic indices were performed using GenAlex 6.41. For this analysis, the genetic indices such as fixation index (FST), different alleles (Na), number of effective alleles (Ne), Shannon’s index (I), observed heterozygosity (HO), expected heterozygosity (HE), and inbreeding coefficient (F) were calculated.

Inference of structure, PCA and kinship

To determine the population structure, a filtered marker set (7,926) was pruned using the linkage disequilibrium (LD) based pruning method in PLINK (--indep-pairwise 10 5 0.3). Population structure analysis was calculated using a Bayesian model-based clustering method with STUCTURE 2.3.4 [28] using the pruned markers (2,228). STRUCTURE was run under the ‘admixture model’ with a burn-in period of 100 000 followed by 100,000 replications of Markov Chain Monte Carlo. Three independent runs each were performed with the number of clusters (K) varying from 1 to 10. The most likely number of subpopulations (K) was determined by using web-based STRUCTURE HARVESTER, and a ΔK statistic based on the relative rate of change in the likelihood of the data between successive K values was used to determine the optimal number of clusters [29, 30]. CLUMPP software was used to generate a consolidated population (Q) matrix from the STRUCTURE runs for the best K value. Lines with probability of membership 0.6 were assigned to a subgroup. Pairwise genetic distances were calculated using the Powermarker V3.25 under the Nei (1983) [31] model. PCA was performed using TASSEL on 7,926 SNP markers. A relative kinship matrix was constructed by TASSEL 5.0, and a heat map was generated in R ( [27]. The geographic structure of the population was studied through PCA and performed on the correlation matrix calculated with the mean country data across years for landraces and the mean data across years for modern cultivars.

Construction of the core collection

The core collection’s minimal size was estimated using MStrat Software v4.1 [32]. The analysis included three replicates with 30 iterations for each replicate and step of 1 were used. The core collection size was determined based on maximization (M) and random (R) algorithm methods.


Genetic diversity of the landrace germplasm

The total number of putative SNPs called from 2,023 wheat landraces were 13,199. After filtering, 7,926 SNP markers were used for genetic diversity, and population structure analysis. The B genome had the highest number of SNPs (3,218,  40.60%), followed by the A genome (3,022,  38.13%), and the D genome (1,686,  21.27%) (Fig. 2; Table 1). The number of SNPs per chromosome ranged from 121 to 715 with an average of 377. In the A genome, chromosome 2A had the highest number of polymorphic markers with 692, and chromosome 3A harbored the lowest number (277); in the B genome, the highest and lowest number of markers were detected on chromosome 3B and 4B (715 and 191, respectively); in the D genome, chromosome 4D had the lowest number of SNPs (121), and chromosome 3D had the highest number (316). To characterize the distribution of SNPs in more detail, we used 1 Mb as a step to plot the distribution of SNPs on each chromosome (Additional file 3: Fig. S1). The number of SNPs on each chromosome was consistent with the physical length of the respective chromosome. The average marker density was approximately 1.77 Mb/SNP. The D genome had the lowest SNP marker density (2.34 Mb/SNP), and the B genome had the highest marker density (1.61 Mb/SNP) (Table 1).

Fig. 2
figure 2

Distribution of 7,963 SNPs across 21 chromosomes of 2,023 wheat landraces

Table 1 Summary of genetic diversity among 2,023 landraces accessions. The parameters include number of SNP marker (N), marker coverage, minor allele frequency (MAF), genetic diversity (Hs), heterozygosity(H), polymorphic information content (PIC), nucleotide diversity (π/bp), expected nucleotide diversity (θ/bp) and Tajima’s D

Summary statistics of various genetic diversity estimates for each genome of 2,023 wheat landraces had similar values (Table 1). The gene diversity (Hs) in this study ranged from 0.10 to 0.5 with the lowest mean in chromosome 3A (0.23) and highest in chromosome 6A (0.36). Among the three genomes, the B genome showed the highest mean diversity (0.32). The Hs with a value above 0.4 was observed in maximum number of markers (28.24%) and observed least for the value less than 0.1 (2.03%). The PIC was observed to range from 0.09 to 0.38. The mean PIC value on each chromosome showed a similar tend with Hs which ranged from 0.20 (3A) to 0.29 (6A). At the genome level, both A (0.23) and B (0.24) genomes were lower than the D genome (0.26). The MAF from 0.3 to 0.5 was observed in 25.18% of the markers, whereas, MAF less than 0.1 was observed in 29.35%.

The observed nucleotide diversity or average pairwise divergence (π/bp) ranged from 0.23 (7A) to 0.35 (3A) with an average of 0.30. Expected nucleotide diversity or expected number of polymorphic sites (θ/bp) were similar with an average of 0.12. Tajima’s D ranged from 3.89 (A) to 4.50 (D) with an average of 4.08. This value showed significant deviation from the neutral evolution (D = 0) which means the population may have gone through balancing selection. A positive value of D also indicates that rare alleles were present at low frequencies in the population.

Population structure of the landrace accessions

The population structure of the 2,023 accessions was analyzed using 7,963 high-quality SNPs. STRUCTURE software identified the number of subpopulations. The number of cluster (K) was plotted against ΔK to determine the optimum number of subpopulations. The largest ΔK value was observed at K = 2 suggesting the presence of two main groups (Fig. 3a). The percentage of the membership of each accession in the two groups was presented in Additional file 4: Table S3. When using a probability of membership threshold of 60%, 1,726 and 184 accessions were respectively assigned into subgroups G1 and G2 and the remaining 76 accessions were placed in a mixed subgroup (Gmix) (Fig. 3b). The main groups were further subdivided into Sub1, Sub2, Sub3, Sub4 and Sub5 subpopulations (Fig. 3b). The Sub1 subpopulation included 84 accessions (38.10% from Jiangsu and 16.67% from Sichuan); Sub2 included 528 accessions (39.02% from Jiangsu, 21.02% from Zhejiang, and 10.98% from Shanghai); 115 accessions were in Sub3 (40.00% from Jiangsu and 11.30% from Guizhou); Sub4 included 387 accessions (23.00% from Jiangsu, 19.64% from Henan, 18.60% from Sichuan and 16.80% from Guizhou); Sub5 included 292 accessions, almost 92.47% were from Jiangsu. The remaining 617 accessions, accounting for 30.50% of all germplasm, were classified Mix as they had membership probabilities lower than 0.60 for any given subgroup (Additional file 5: Table S4).

Fig. 3
figure 3

Representation of genetic structure of 2,023 landraces based on population structure analysis and principal component analysis (PCA). a Estimated ΔK of 2,023 landraces over three runs for each K value. b Estimated population structure in 2,023 landraces assessed by STRUCUTRE software. Each individual is represented by a thin vertical bar, partitioned into up to K colored segments. Sub1, Sub2, Sub3, Sub4, Sub5 and Mix are subgroups identified by STRUCTURE assigned with the maximum membership probability. c Display of PCA accessions colored by population subgroups. The different colored pots represent the different subgroups inferred by STRUCTURE analysis. d Heatmaps of kinship matrix based on the 7,962 SNP markers

PCA based on 2,228 SNP molecular markers showed a similar, five-cluster distribution pattern, with the mixed subgroup being in the middle of the five defined subgroups (Fig. 3c). In scatterplots, the first three principal components explained 29.36, 12.09 and 7.74% of the total variation, respectively. Overall, five clusters were clearly identified by PCA, in agreement with the results from STRUCTURE. We also calculated a kinship analysis to examine genetic clustering among the landraces, and a heat map was generated on their kinship relationship values using R package (Additional file 6: Table S5). Analysis of kinship indicated five clusters with most accessions (blue) having closely familial relationships (Fig. 3d).

Genetic differentiation of populations

F-statistics was calculated from 1,406 accessions after removing the 617 Mix population. Binary allelic data per locus was used for statistical analysis and more than 1.3 alleles were effective except for the Sub1 population. As expected, the heterozygosity (HE) and Shannon’s diversity index (I) were the most discriminatory measures of differences among the five subgroups, with average genetic diversity estimated to be 0.20 and 0.31 for HE and the I, respectively (Table 2). Sub3 showed the highest genetic variability (HE = 0.36; I = 0.53), whereas Sub1 showed the lowest (HE = 0.05; I = 0.09). The inbreeding coefficients (F) for Sub2, Sub3, Sub4 and Sub5 were > 0.7 whereas that for Sub1 was considerably lower (0.30). Comparing the value of the HO in each subpopulation, Sub1 exhibited the lowest HO value.

Table 2 Diversity based on SNPs among the five subgroups

Analysis of the fixation index (FST) values, a measure of genetic differentiation between populations, revealed that the highest genetic differentiation was between Sub1 and Sub3 (FST=0.47), and the slightest difference was between Sub2 and Sub5 (0.11) (Table 3). The Sub3 subpopulation showed the most significant genetic differentiation from other subpopulations.

Table 3 FST values between subpopulations assessed with SNP markers

AMOVA based on the pairwise genetic distances using GenAlEx 6.51b2. AMOVA revealed that 32.84% of the total variation was explained by the differences among the populations, whereas 67.16% of the variation was within the populations (Table 4). This confirmed much greater variation within than between subpopulations.

Table 4 The analysis of molecular variance (AMOVA) using 7,963 SNPs and the genetic differentiation among the five subpopulations of the 1,406 wheat landraces

Core collection

Maximization (M) and Random (R) algorithm methods were used to predict the optimal sample size of the core germplasm (Fig. 4). The M score was higher than the R score, regardless of the sample size change, indicating that the M method of sampling alleles was significantly more efficient than the R method. When 304 accessions were selected, the M curve nearly reached a plateau (score = 3,041), indicating that 304 accessions (15.0%) were more suitable to define the core collection. We used the M method to extract 51, 102, 203, 304, 405 and 506 samples; these six sample sizes of the core collection captured 2.5, 5.0, 10.0, 15.0, 20.0 and 25.0% of the raw materials, respectively (Table 5).

Fig. 4
figure 4

Prediction of core collection sample size by maximization (M) method and random (R) method

Table 5 Nested core collection sample size predicted by maximization (M) method and random (R) method

Considering some landrace accessions with outstanding disease resistance, 311 accessions, accounting for 15.37% of the original set, formed the final core collection of (Fig. 1). Among them, 13 accessions were from Sub1, 81 were from Sub2, 17 were from Sub3, 60 were from Sub4, 45 were from Sub5, and 95 were from Mix. The genetic diversity index and PIC values were 0.31 and 0.25, respectively, and higher than those of full collection (0.30 and 0.24) (Table 6). The neighbor-joining tree constructed with the 7,926 SNP markers showed that the final primary core accessions were evenly distributed among the original collection and were highly representative (Fig. 5 and Additional file 7: Table S6). After accounting for uniformity and redundancy in the agronomic traits, we finally selected 311 accessions as the core collection. A comparison of diversity indices (\({H}^{{\prime }}\)) between the full landrace collection and the 311-core collection showed no significant differences at 12 agronomic traits (Table 7).

Table 6 Comparison of number of alleles, gene diversity and polymorphism information content (PIC) between the 2,023 landraces and core accessions subgroups at the genome level
Fig. 5
figure 5

Neighbor-joining clustering of 2,023 bread wheat landrace accessions. Red lines represent the core collection, and black lines represent the original wheat landrace collection

Table 7 Comparison of genetic diversity index (\({H}^{{\prime }}\)) between the 2,023 landraces and core accessions at the phenotypic level


Diversity among landraces was initially described using spike morphology traits and botanical variety classification [14, 33]. Some landraces were mixtures of different wheat morphotypes that were easily identified by spike color or awn features. Landraces with the same name but originating from different regions often had different phenotypes. Likewise, landraces with similar morphotype had different origin and names. With this study we gained insights into the genetic diversity of landraces accessions preserved in the wheat collection at Jiangsu Academy of Agricultural Sciences. Although the yield of landraces is generally less than that of commercial varieties grown under current agronomic conditions, they remain important sources of genetic variation in searching for novel sources of resistance to biotic and abiotic stress [34]. For example, Chinese landraces such as Wangshuibai, Haiyanzhong, Baisanyuehuang and Huangfangzhu from Jiangsu province have high levels of resistance to Fusarium head blight (FHB) resistance and have been used as donor sources in breeding [35,36,37,38].

Genetic diversity of landrace accessions

Evaluation of genetic diversity in germplasm resources is of great significance for conservation, breeding and research. Studies have repeatedly documented much higher genetic diversity in landraces than among elite cultivars [6, 39]. A study by Sansaloni et al. [40] revealed landraces with unexplored diversity and genetic footprints left by selection in different geographical regions; indeed, very little of the genetic diversity had been used in modern breeding. This was also confirmed by analysis of the collection assembled by Watkins in the early 1900s [41, 42]. Selection in modern breeding programs has led to decreased genetic diversity in current wheat populations, and unless diversity can be maintained in gene banks, it will be lost for future generations [43]. Thus, landraces may hold novel variability not present in modern elite cultivars [17, 44, 45].

In the present study, 7,926 high quality SNPs and 12 phenotypic data of related traits obtained from 2,023 Chinese landrace accessions were used. A large portion of the polymorphic markers were mapped to the B genome (40.60%), followed by the A genome (38.13%) and the D genome (21.27%) (Fig. 2), which was in agreement with previous studies [46]. Interestingly, the Hs, PIC and π on the D genome was higher than the A and B genome in this study (Table 1). Generally, the D genome was the least diverse genome in previous studies [21, 47]. The greater diversity of the D genome in Chinese landrace accessions may indicate a greater possibility that the D genome has novel genetic variations [48], which can be used in elite wheat breeding programs to reduce the bottleneck of the D genome and broaden the genetic base [49].

Population structure and relationship

The population structure analysis is the first step in conducting the association mapping studies. In the present study, STRUCTURE, PCA and kinship analysis showed that there are most probably five subpopulations in the studied collection of landrace accessions (Fig. 3). In each subpopulation, there were genotypes from different regions (Additional file 8: Fig. S2a and b). A few accessions showed a certain association between geographical origin and population structure (Additional file 8: Fig. S2c, d and e). This is a common phenomenon for most cereal landraces worldwide because of informal seed exchange systems involving regional and countrywide farming communities [50, 51]. Nearly 30.50% of landrace accessions were classified into Mix subpopulations, which may also be indirectly attributed to the continuous gene flow of landrace genotypes among the different regions.

Genetic differentiation among populations is reflected by FST [52, 53]. FST measures population differentiation due to genetic structure and a value greater than 0.15 predicts significant genetic differentiation between subpopulations [54]. High genetic differentiation among subpopulations is indicative of a low level of gene flow between subpopulations. For example, a low level of gene flow was also reported among the wheat landrace populations of Mediterranean origin [55]. This phenomenon may be due to deploying newly developed cultivars across multiple countries and less using of old wheat landraces and locally selected germplasm in breeding programs [56]. AMOVA indicated that most of the genetic variation (67.16%) occurred within subpopulations, confirming the existence of considerable unique variation in subpopulations (Table 4). Previous studies have reported similar results, but it is still unclear whether genetic variation within subpopulations is due to variations that occurred during different domestication processes or introduced by farmers and traders from other regions [56]. In this study, 30.50% of landrace accessions were classified as Mix in population structure analysis, which may be attributed to germplasm exchange between different regions.

A core collection of wheat landraces

A core collection that represents the genetic diversity of a crop in a minimal number of accessions is an effective way to achieve efficient conservation and utilization of germplasm [57, 58]. Ideally, a core collection should be approximately 10% of the total collection and retain 70% of the genetic diversity from the initial collections [59]. The number of accessions selected for the core collection depends on the size of the initial collection and the sampling ratio [38]. Li et al. [60] proposed sampling 5–40% of accessions to construct core germplasm, with 10% being optimum. Van Treuren et al. [61] developed an advanced cultivar core collection of bare cultivars using a sampling percentage of 26.92%. Hao et al. [20] constructed a mini-core collection, accounting for 5% of the initial collection and representing 91.5% of the genetic diversity of the initial collection. Xu et al. [57] suggested that a sampling percentage of 20% was an appropriate size to construct a core collection for barely. In this study, we selected 15.37% (311/2023) of accessions as our core collection.

Representative core accessions have been selected in diverse crops using various sampling strategies and clustering methods [20, 62,63,64]. Previous studies indicated that M strategy performs well when accessions come from populations with restricted gene flow or are from self-pollinated species [57, 65, 66]. The MSTRAT algorithm is one of the representative core selection methods for implementing the M strategy [32]. Here, we used the M strategy as implemented in MStratv 4.1 software and successfully established a representative core collection with high genetic diversity.

Using genotypic and phenotypic information along with clustering to construct a core collection is more efficient than using genotypic or phenotypic information alone [65]. It is important to verify the quality of a core collection, as the quality determines the direction of subsequent research [67]. In the present study, the genetic diversity indices (\({H}^{{\prime }}\)) of 12 morphological characters in the core collection was not significantly different from the entire collection, indicating that the core collection can effectively represent the variation range of 12 morphological traits of the original set. In general, molecular markers reflect changes in genetic variation at the DNA level, without environmental interference, hence providing valuable data to describe genetic diversity. In this study, the 311 accessions were selected as a core collection of wheat landraces, which retained 100% of alleles in a primary core collection. The genetic diversity and PIC value of the core collection were higher than the initial collection. The combined results indicate that the core collection selected in this study well represents the initial landrace collection.


Constructing the core collections of wheat landrace will enhance the efficiency of management and utilization of accessions in the germplasm banks. In the present study, we constructed a core collection of 311 accessions representing 100% of the SNPs identified among 2,023 wheat landrace accessions held by the Jiangsu Provincial Crop Germplasm Resource Genebank. The evaluation showed that this core collection is high-quality and valuable for phenotypic and genetic studies. The core collection can be used as a primary germplasm resource for mining novel genes, genetic association and functional gene analyses.

Data availability

The datasets used or analyzed during the current study are available in this published article in the additional files. SNPs data used in this study is availability in the in China National Center for Bioinformation (CNCB) repository under accession number GVM000783 (



Molecular variance analysis


Core collection


Degrees of freedom


Fusarium head blight


Inbreeding coefficients


Genome-wide association study




Genetic diversity indices

HE :


HO :

Observed heterozygosity


Gene diversity


Shannon’s diversity index


Linkage disequilibrium


Minor allele frequency


Marker-assisted selection


Mean sum of squares

Na :

Number of different alleles


Number of effective alleles


Polymorphism information content


Single nucleotide polymorphism


Sum of squares


Simple sequence repeat


Expected nucleotide diversity or estimated mutation rate


Average pairwise divergence or observed nucleotide diversity


  1. Bhatta M, Regassa T, Rose DJ, Baenziger PS, Eskridge KM, Santra DK, Poudel R. Genotype, environment, seeding rate, and top-dressed nitrogen effects on end-use quality of modern Nebraska winter wheat. J Sci Food Agr. 2017;97:5311–8.

    Article  CAS  Google Scholar 

  2. Pascual L, Ruiz M, López-Fernández M, Pérez-Peña H, Benavente E, Vázquez JF, et al. Genomic analysis of Spanish wheat landraces reveals their variability and potential for breeding. BMC Genomics. 2020;21:122.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Marcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, Jakobsen KS et al. Ancient hybridizations among the ancestral genomes of bread wheat. Science. 2014;345.

  4. Zencirci N, Baloch FS, Habyarimana E, Chung G. Wheat landraces. New York, NY, USA: Springer; 2021. pp. 1–11.

    Book  Google Scholar 

  5. Newton AC, Akar T, Baresel JP, Bebeli PJ, Bettencourt E, Bladenopoulos KV, et al. Cereal landraces for sustainable agriculture. A review. Agron Sustain Dev. 2010;30:237–69.

    Article  Google Scholar 

  6. Wingen LU, West C, Leverington-Waite M, Collier S, Orford S, Goram R, et al. Wheat landrace genome diversity. Genetics. 2017;205:1657–76.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Monteagudo A, Casas AM, Cantalapiedra CP, Contreras-Moreira B, Gracia MP, Igartua E. Harnessing novel diversity from landraces to improve an elite barely variety. Front Plant Sci. 2019;10:434.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Khateeb WA, Shalabi AA, Schroeder D, Musallam I. Phenotypic and molecular variation in drought tolerance of Jordanian durum wheat (Triticum durum Desf.) Landraces. Physiol Mol Biol Pla. 2017;23:311–19.

    Article  Google Scholar 

  9. Pinto RS, Molero G, Reynolds MP. Identification of heat tolerant wheat lines showing genetic variation in leaf respiration and other physiological traits. Euphytica. 2017;213:76.

    Article  Google Scholar 

  10. Mothammadi R, Amri A, Ahmadi H, Jafarzadeh J. Characterization of tetraploid wheat landraces for cold tolerance and agronomic traits under rainfed condition of Iran. J Agr Sci. 2015;153:631–45.

    Article  Google Scholar 

  11. Khazaei H, Monneveux P, Shao HB, Mohammady S. Variation for stomatal characteristics and water use efficiency among diploid, tetraploid and hexaploid Iranian wheat landraces. Genet Resour Crop Ev. 2010;57:307–14.

    Article  Google Scholar 

  12. Karagoz A. Wheat landraces of Turkey. Emir J Food Agr. 2014;26:149–56.

    Article  Google Scholar 

  13. Husenov B, Muminjanov H, Dreisigacker S, Otambekova M, Akin B, Subasi K, et al. Genetic diversity and agronomic performance of wheat landraces currently grown in Tajikistan. Crop Sci. 2021;61:2548–64.

    Article  Google Scholar 

  14. Baboev S, Muminjanov H, Turakulov K, Buronov A, Mamatkulov I, Koc E, et al. Diversity and sustainability of wheat landraces grown in Uzbekistan. Agron Sustain Dev. 2021;41:34.

    Article  CAS  Google Scholar 

  15. Li XJ, Xu X, Yang XM, Li XQ, Liu WH, Gao AN, et al. Genetic diversity of the wheat landrace Youzimai from different geographic regions investigated with morphological traits, seedling resistance to powdery mildew, gliadin and microsatellite markers. Cereal Res Commun. 2012;40:95–106.

    Article  Google Scholar 

  16. Dong YS, Zheng DS. Wheat genetic resources in China. Beijing, China: Agriculture; 2000.

    Google Scholar 

  17. Lopes MS, El-Basyoni I, Baenziger PS, Singh S, Royo C, Ozbek K, et al. Exploiting genetic diversity from landraces in wheat breeding for adaptation to climate change. J Exp Bot. 2015;66:3477–86.

    Article  CAS  PubMed  Google Scholar 

  18. Frankel OH. Genetic perspectives of germplasm conservation. In: Arber W, Llimensee K, Peacock W, Starlinger P, editors. Genetic manipulation: impact on Man and Society. Cambridge: Cambridge University Press; 1984. pp. 161–70.

    Google Scholar 

  19. Balfourier F, Bouchet S, Robert S, Oliveira RD, Rimbert H, Kitt J, et al. Worldwide phylogeography and history of wheat genetic diversity. Sci Adv. 2019;5(5):eaav0536.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Hao CY, Dong YC, Wang LF, You GX, Zhang HN, Ge HM, Jia JZ, Zhang XY. Genetic diversity and construction of core collection in Chinese wheat genetic resources. Chin Sci Bull. 2008;53:1518–26.

    Article  CAS  Google Scholar 

  21. Mourad AMI, Belamkar V, Baenziger PS. Molecular genetic analysis of spring wheat core collection using genetic diversity, population structure, and linkage disequilibrium. BMC Genomics. 2020;21:434.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Jain SK, Qualset CO, Bhatt GM, Wu KK. Geographical patterns of phenotypic diversity in a world collection of durum wheats. Crop Sci. 1975;15:700–4.

    Article  Google Scholar 

  23. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics. 2005;21:2128–29.

    Article  CAS  PubMed  Google Scholar 

  25. Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969;61:893–903.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Tajima F. The effect of change in population size on DNA polymorphism. Genetics. 1989;123:597–601. .

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bradbury PJ, Zhang ZW, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–35.

    Article  CAS  PubMed  Google Scholar 

  28. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005;14:2611–20.

    Article  CAS  PubMed  Google Scholar 

  30. Earl DA, vonHoldt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour. 2012;4:359–61.

    Article  Google Scholar 

  31. Nei M, Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol. 1983;19:153.

    Article  CAS  PubMed  Google Scholar 

  32. Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL. MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered. 2001;92:93–4.

    Article  CAS  PubMed  Google Scholar 

  33. Baboev SK, Buranov AK, Bozorov TA, Adylov BS, Morgunov AI, Muminzhonov K. Biological and agronomical assessment of wheat landraces cultivated in mountain areas of Uzbekistan. Sel’Skokhozyaistvennaya Biologiya. 2017;52:553–60.

    Article  Google Scholar 

  34. Manickavelu A, Joukhadar R, Jighly A, Lan C, Huerta-Espino J, Stanikzai AS, et al. Genome wide association mapping of stripe rust resistance in Afghan wheat landraces. Plant Sci. 2016;252:222–9.

    Article  CAS  PubMed  Google Scholar 

  35. Li T, Bai GH, Wu SY, Gu SL. Quantitative trait loci for resistance to fusarium head blight in a Chinese wheat landrace Haiyanzhong. Theor Appl Genet. 2011;122:1497–502.

    Article  PubMed  Google Scholar 

  36. Zhang XH, Pan HY, Bai GH. Quantitative trait loci responsible for fusarium head blight resistance in Chinese landrace Baishanyuehuang. Theor Appl Genet. 2012;125:495–502.

    Article  CAS  PubMed  Google Scholar 

  37. Li T, Bai GH, Wu SY, Gu SL. Quantitative trait loci for resistance to fusarium head blight in the Chinese wheat landrace Huangfangzhu. Euphytica. 2012;185:93–102.

    Article  Google Scholar 

  38. Zhang M, Zhang R, Yang JZ, Luo PG. Identification of a new QTL for Fusarium head blight resistance in the wheat genotype Wang shui-bai. Mol Biol Rep. 2010;37:1031–35.

    Article  CAS  PubMed  Google Scholar 

  39. Moore G. Strategic pre-breeding for wheat improvement. Nat Plants. 2015;1:15018.

    Article  CAS  PubMed  Google Scholar 

  40. Sansaloni C, Franco J, Santos B, Percival-Alwyn L, Singh S, Petroli C, et al. Diversity analysis of 80,000 wheat accessions reveals consequences and opportunities of selection footprints. Nat Commun. 2020;11:4572.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Winfield MO, Allen AM, Wilkinson PA, Burridge AJ, Barker GLA, Coghill J, et al. High-density genotyping of the A.E. Watkins Collection of hexaploid landraces identifies a large molecular diversity compared to elite bread wheat. Plant Biotechnol J. 2018;16:165–75.

    Article  CAS  PubMed  Google Scholar 

  42. Wingen L, Orford S, Goram R, Leverington-Waite M, Bilham L, Patsiou TS, et al. Establishing the A.E. Watkins landrace cultivar collection as a resource for systematic gene discovery in bread wheat. Theor Appl Genet. 2014;127:1831–42.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Marone D, Russo MA, Mores A, Ficco DBM, Laidò G, Mastrangelo AM, et al. Importance of landraces in cereal breeding for stress tolerance. Plants. 2021;10:1267.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Riaz A, Hathorn A, Dinglasan E, Ziems LA, Richard C, Singh D, et al. Into the vault of the Vavilov wheats: old diversity for new alleles. Genet Resour Crop Ev. 2017;64:531–44.

    Article  Google Scholar 

  45. Vikram P, Franco J, Burgueño-Ferreira J, Li HH, Sehgal D, Pierre CS, et al. Unlocking the genetic diversity of Creole wheats. Sci Rep-UK. 2016;6:23092.

    Article  CAS  Google Scholar 

  46. Alipour H, Bihamta MR, Mohammadi V, Peyghambari SA, Bai G, Zhang G. Genotyping-by-sequencing (gbs) revealed molecular genetic diversity of Iranian wheat landraces and cultivars. Front Plant Sci. 2017;8:1293.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Tang WJ, Dong ZD, Gao LF, Wang XC, Li TB, Sun CW, Chu ZL, et al. Genetic diversity and population structure of modern wheat (Triticum aestivum L.) cultivars in Henan Province of China based on SNP markers. BMC Plant Biol. 2023;23:542.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Ogbonnaya FC, Abdalla O, Mujeeb-Kazi A, Kazi AG, Xu SS, Gosman N, et al. Synthetic hexaploids: harnessing species of the primary gene pool for wheat improvement. Plant Breed Rev. 2013;37:35–122.

    Article  Google Scholar 

  49. Bhatta M, Morgounov A, Belamkar V, Poland J, Baenziger PS. Unlocking the novel genetic diversity and population structure of synthetic hexaploid wheat. BMC Genomics. 2018;19:591.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Bishaw Z. Wheat and barley seed systems in Ethiopia and Syria. PhD Thesis, Wageningen University and Research Center, Germany, 2004. p. 383.

  51. Negisho K, Shibru S, Pillen K, Ordon F, Wehner G. Genetic diversity of Ethiopian durum wheat landraces. PLoS ONE. 2021;16:e0247016.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Tehseen MM, Istipliler D, Kehel Z, Sansaloni CP, da Silva Lopes M, Kurtulus E, et al. Genetic diversity and population structure analysis of Triticum aestivum L. landrace panel from Afghanistan. Genes. 2021;12:340.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Luo ZN, Brock J, Dyer JM, Kutchan T, Schachtman D, Augustin M, et al. Genetic diversity and population structure of a Camelina sativa spring panel. Front Plant Sci. 2019;10:184.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Frankham R, Ballou JD, Briscoe DA, McInnes KH. Introduction to conservation genetics. Cambridge: Cambridge University Press; 2002.

    Book  Google Scholar 

  55. Rufo R, Alvaro F, Royo C, Soriano JM. From landraces to improved cultivars: Assessment of genetic diversity and population structure of Mediterranean wheat using SNP markers. PLoS ONE. 2019;14:e0219867.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Tehseen MM, Tonk FA, Tosun M, Istipliler D, Amri A, Sansaloni CP, et al. Exploring the genetic diversity and Population structure of wheat Landrace Population conserved at ICARDA Genebank. Front Genet. 2022;13:900572.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Xu JQ, Wang L, Wang HD, Mao CZ, Kong DD, Chen SY, et al. Development of a core collection of six-rowed hulless barley from the Qinghai-Tibetan plateau. Plant Mol Biol Rep. 2020;38:305–13.

    Article  CAS  Google Scholar 

  58. Liu JM, Gao SL, Xu YY, Wang MZ, Ngiam JJ, Rui Wen NC, et al. Genetic diversity analysis of Sapindus in China and extraction of a core germplasm collection using EST-SSR markers. Front Plant Sci. 2022;13:857993.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Brown AHD. Core collections: a practical approach to genetic resources management. Genome. 1989;31:818–24.

    Article  Google Scholar 

  60. Li ZC, Zhang HL, Zeng YW, Yang ZY, Shen SQ, Sun CQ, Wang XK. Studies on sampling schemes for the establishment of core collection of rice landraces in Yunnan, China. Genet Resour Crop Ev. 2002;49:67–74.

    Article  CAS  Google Scholar 

  61. Van Treuren R, Tchoudinova I, van Soest LJM, van Hintum TJL. Marker-assisted acquisition and core collection formation: a case study in barley using AFLPs and pedigree data. Genet Resour Crop Ev. 2006;53:43–52.

    Article  Google Scholar 

  62. Franco J, Crossa J, Taba S, Shands H. A sampling strategy for conserving genetic diversity when forming core subsets. Crop Sci. 2005;45:1035–44.

    Article  Google Scholar 

  63. Hu J, Zhu J, Xu HM. Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops. Theor Appl Genet. 2000;101:264–8.

    Article  CAS  Google Scholar 

  64. Wang JC, Hu J, Xu HM, Zhang S. A strategy on constructing core collections by least distance stepwise sampling. Theor Appl Genet. 2007;115:1–8.

    Article  CAS  PubMed  Google Scholar 

  65. Lee HY, Ro NY, Jeong HJ, Kwon JK, Jo J, Ha Y, et al. Genetic diversity and population structure analysis to construct a core collection from a large Capsicum germplasm. BMC Genet. 2016;17:142.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Gu XZ, Cao YC, Zhang ZH, Zhang BX, Zhao H, Zhang XM, et al. Genetic diversity and population structure analysis of Capsicum germplasm accessions. J Integr Agr. 2019;18:1312–20.

    Article  Google Scholar 

  67. Odong TL, Jansen J, Van Eeuwijk FA, van Hintum TJL. Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theor Appl Genet. 2013;126:289–305.

    Article  CAS  PubMed  Google Scholar 

Download references


The plant materials (seeds) of this study were available from the Genebank of the Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, China. The authors are grateful to Dr. Robert McIntosh of Sydney University for critically reading and improving the manuscript.


This work was supported by the National Key R&D Program of China (2021YFD1200600); the Zhongshan Biological Breeding Laboratory (ZSBBL-KY2023-02); the Key Research and Development Program of Jiangsu province (BE2022346); Jiangsu Agriculture Science and Technology Innovation Fund (JASTIF) (Grant [17] 3004) and the International Cooperation Fund of Jiangsu Academy of Agricultural Sciences.

Author information

Authors and Affiliations



LY performed phenotype, DNA extraction, SNP genotyping, carried out genetic diversity analysis, population structure, PCA and kinship analysis, constructed a core collection, and drafted the manuscript. FBS extracted the DNA of the accessions, collected and managed the phenotype dataset. ZQF participated in the DNA extractions, SNP genotyping, and phenotype data collection. CJ and GW contributed to phenotype, DNA extractions. ZWL participated in phenotype analysis. WJZ designed and supervised the study, assisted in the conception of the study, discussion and revision of the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Jizhong Wu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Additional file 1:

Table S1. Detailed information of 2,023 wheat landraces accessions.

Additional file 2:

Table S2. Categories and descriptive statistics for the 12 agronomic traits.

Additional file 3:

Fig. S1. Distribution and density of filtered single nucleotide polymorphisms (7,963 SNPs) across 21 chromosomes. Horizontal display chromosome length. The number of SNPs in a given region is indicated at the bottom right side.

Additional file 4:

Table S3. Individual Q matrix calculated in STRUCTURE (K = 2).

Additional file 5:

Table S4. Individual Q matrix calculated in STRUCTURE (K = 5).

Additional file 6:

Table S5. The kinship relationships matrix between accessions.

Additional file 7:

Table S6. Estimates of evolutionary divergence between accessions.

Additional file 8:

Fig. S2. Grouping of 2,023 wheat landrace accessions by principal component analysis. a-b Plots of PC1, PC2 and PC3 of landrace accessions based on predicted group membership from STRUCTURE (K = 5). b-c Plots of PC1, PC2 and PC3 from principal component analysis of landrace accessions from different regions of China. e Geographic locations of 2,023 wheat landraces based on predicted group membership from STRUCTURE (K = 5).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Fu, B., Zhang, Q. et al. Genetic diversity and population structure of wheat landraces in Southern Winter Wheat Region of China. BMC Genomics 25, 664 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: