Genetic diversity and population structure of wheat landraces in Southern Winter Wheat Region of China

Liu, Ying; Fu, Bisheng; Zhang, Qiaofeng; Cai, Jin; Guo, Wei; Zhai, Wenling; Wu, Jizhong

doi:10.1186/s12864-024-10564-z

Research
Open access
Published: 03 July 2024

Genetic diversity and population structure of wheat landraces in Southern Winter Wheat Region of China

Ying Liu¹,
Bisheng Fu^1,2,
Qiaofeng Zhang¹,
Jin Cai^1,2,
Wei Guo^1,2,
Wenling Zhai¹ &
…
Jizhong Wu ORCID: orcid.org/0000-0002-0860-6504^1,2,3

BMC Genomics volume 25, Article number: 664 (2024) Cite this article

47 Accesses
Metrics details

Abstract

Background

Wheat landraces are considered a valuable source of genetic diversity for breeding programs. It is useful to evaluate the genetic diversity in breeding studies such as marker-assisted selection (MAS), genome-wide association studies (GWAS), and genomic selection. In addition, constructing a core germplasm set that represents the genetic diversity of the entire variety set is of great significance for the efficient conservation and utilization of wheat landrace germplasms.

Results

To understand the genetic diversity in wheat landrace, 2,023 accessions in the Jiangsu Provincial Crop Germplasm Resource Bank were used to explore the molecular diversity and population structure using the Illumina 15 K single nucleotide polymorphism (SNP) chip. These accessions were divided into five subpopulations based on population structure, principal coordinate and kinship analysis. A significant variation was found within and among the subpopulations based on the molecular variance analysis (AMOVA). Subpopulation 3 showed more genetic variability based on the different allelic patterns (Na, Ne and I). The M strategy as implemented in MStratv 4.1 software was used to construct the representative core collection. A core collection with a total of 311 accessions (15.37%) was selected from the entire landrace germplasm based on genotype and 12 different phenotypic traits. Compared to the initial landrace collections, the core collection displayed higher gene diversity (0.31) and polymorphism information content (PIC) (0.25), and represented almost all phenotypic variation.

Conclusions

A core collection comprising 311 accessions containing 100% of the genetic variation in the initial population was developed. This collection provides a germplasm base for effective management, conservation, and utilization of the variation in the original set.

Peer Review reports

Background

Wheat is one of the most important staple crops for more than one-third of the human population, providing about 19% of the calories and 21% of the protein [1]. Approximately 90 to 95% of wheat grown worldwide is bread wheat (Triticum aestivum L.) (2n = 6x = 42, AABBDD) [2]. Multiple rounds of rare natural hybridization between different wheat species and relatives led to the currently cultivated wheat, but also caused genetic bottlenecks due to the exclusion of adaptive alleles [3, 4]. Modern cultural practices and improved cultivars to take advantage of those practices significantly increased wheat production. However, the development of high-yielding modern wheat cultivars is at the expense of losing much of the diversity in landraces and older varieties. In the last century, wheat landraces were almost completely replaced by modern cultivars, reducing the overall diversity of the species [5].

Wheat landraces show a much higher genetic diversity than elite varieties [6]. Potentially valuable traits in landraces include early growth vigour [7], cold, heat or drought tolerance [8,9,10], disease resistance, water use efficiency [11], and quality traits suited for local food preferences. Developing new cultivars from landrace populations is a feasible strategy to improve wheat productivity and stability, especially in vulnerable environments in breeding programs.

Scientists have been conscientious in conserving wheat landraces for a long time. Large numbers of landraces were collected, conserved, studied, and analyzed, and the potential for utilization and incorporation of their beneficial traits into new varieties was explored [5]. The Türkiye scientist Gökgöl, collected and characterized 18,000 wheat landraces from Türkiye; among them, 256 varieties were new [12]. More than 60 distinct wheat landraces were collected in five mountainous regions of Tajikistan [13]. Over 30 bread wheat landraces in three regions from the western Tian-Shan mountains were collected in Uzbekistan [14]. These landraces were thoroughly phenotyped, genotyped, conserved in gene banks and used in wheat breeding. In China, nearly 13,900 wheat landraces from different geographic and climatic conditions are conserved in the National Gene Bank [15]. Chinese wheat landraces are characterized by earliness, large numbers of grains per spike, high adaptiveness, and a long history of cultivation [16].

Three strategies were applied to represent and exploit the diversity of landraces in previous studies: (1) measuring diversity and developing a core collection from extensive collections to represent the overall genetic diversity with minimal repetition; (2) exploiting the most favorable alleles of important traits in breeding programs; and (3) retaining phenotypic variation and related genetic association for targeted traits through large-scale and precise phenotypic analysis combined with GWAS [17]. According to Frankel et al. [18], a core collection with the minimum redundancies represents the genetic variation of an entire collection, and facilitates maintenance, research, and utilization of germplasm resources.

During the last few decades, several core collections of wheat have been constructed, and they have played an important role in the conservation and improved use of wheat genetic resources. A worldwide bread wheat core collection of 372 accessions (372CC) was selected with a set of 38 simple sequence repeat (SSR) markers [19]. Hao et al. [20] established a mini-core collection of 231 Chinese wheat accessions with an estimated 70% representation of the genetic variation from the initial collection using 78 SSR markers. Using 36,720 SNP markers, Mourad et al. [21] analyzed the genetic diversity and population structure of a 103 accessions spring wheat core collection representing worldwide germplasm collection.

Wheat is grown in ten agro-ecological zones in China, which vary widely in climate, soil, cultivar adaptation and management. The adaptation to these different environments led to the creation of landraces. China has rich genetic resources of wheat landraces, which are important for production and breeding. In this study, the morphological description and genomic characterization of wheat landraces collected from 2008 to 2014 at the Jiangsu Academy of Agricultural Sciences, Nanjing, China, were undertake to develop opportunities for their use in breeding. In total, 2023 wheat landraces collected from 23 administrative districts were evaluated for agronomic traits in field trials. The genetic diversity was analyzed in a large collection consisting of 2,023 wheat landraces using 15 K Illumina chip. Analyses of the polymorphic markers provided kinship information among groups, the population structure of the accessions, and the genetic properties among subpopulations. We also established a core collection to reduce redundancy in the collection. This core collection will be useful for further utilization of this large set of landraces.

Methods

Plant material

We used 2,023 wheat landraces accessions conserved at the Gene Bank, located at the Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, China. These accessions were collected from 23 provinces in China (Fig. 1). All details about the 2,023 wheat landrace accessions are shown in Additional file 1: Table S1. Of these, 937 (46.32%) accessions were obtained from Jiangsu Province. All of these accessions were precisely evaluated their traits in field trials.

Phenotyping and data analysis

The 2,023 wheat landraces accessions for twelve agronomics traits were evaluated in two environments, of which 1,526 were evaluated in Luhe in 2018 and 497 in Luhe in 2019, respectively. These traits include heading date and flowering date related to maturity; awn type, glume color, spike type, plant height and spike length in relation to plant morphology; and spikelet number per spike, sterile spikelet number per spike, grain number per spikelet, grain number per spike and thousand kernel weight related to yield. A brief description of each trait and data scoring is presented in Additional file 2: Table. S2. The phenotypic diversity (\({H}^{{\prime }}\)) was calculated as the Shannon index, \({H}^{{\prime }}={\sum }_{i=1}^{n}{P}_{j}ln{P}_{j}\), where \(n\) is the number of phenotypic classes for a character and \({P}_{j}\) is the proportion of the total number of entries in the \(i\) class [22]. \({H}^{{\prime }}\) was estimated for the twelve agronomics traits.

SNP genotyping

DNA samples were genotyped with 15 K Axiom^® Wheat Breeder Genotyping Array (China Golden Marker Biotechnology Co., Ltd, Beijing) according to the manufacturer’s guidelines. The array comprised 13,947 SNP markers. Quality filtration was performed on the markers using PLINK v1.07 [23]. Minor allele frequency (MAF) less than 5% (--maf 0.05), individuals with more than 20% missing SNP calls (--mind 0.2) and markers with more than 5% missing data (--geno 0.05) were considered for filtration. Physical map positions of all SNP markers were obtained from the Ensembl plants Triticum aestivum database (https://plants.ensembl.org/Triticum_aestivum). Markers lacking information for consensus chromosome location were removed. Finally, 7,926 SNP markers and 2,023 genotypes were subjected to further analysis.

Analysis of genetic diversity

Parameters measuring the genetic diversity of the population such as PIC, gene diversity, heterozygosity (H) and MAF were calculated using PowerMarker V3.25 [24]. Other parameters such as average pairwise divergence or observed nucleotide diversity (π), expected nucleotide diversity or estimated mutation rate (θ) [25] and Tajima’s D [26] were calculated using TASSEL v5.2.65 [27].

The AMOVA and estimation of genetic indices were performed using GenAlex 6.41. For this analysis, the genetic indices such as fixation index (F_ST), different alleles (Na), number of effective alleles (Ne), Shannon’s index (I), observed heterozygosity (H_O), expected heterozygosity (H_E), and inbreeding coefficient (F) were calculated.

Inference of structure, PCA and kinship

To determine the population structure, a filtered marker set (7,926) was pruned using the linkage disequilibrium (LD) based pruning method in PLINK (--indep-pairwise 10 5 0.3). Population structure analysis was calculated using a Bayesian model-based clustering method with STUCTURE 2.3.4 [28] using the pruned markers (2,228). STRUCTURE was run under the ‘admixture model’ with a burn-in period of 100 000 followed by 100,000 replications of Markov Chain Monte Carlo. Three independent runs each were performed with the number of clusters (K) varying from 1 to 10. The most likely number of subpopulations (K) was determined by using web-based STRUCTURE HARVESTER, and a ΔK statistic based on the relative rate of change in the likelihood of the data between successive K values was used to determine the optimal number of clusters [29, 30]. CLUMPP software was used to generate a consolidated population (Q) matrix from the STRUCTURE runs for the best K value. Lines with probability of membership 0.6 were assigned to a subgroup. Pairwise genetic distances were calculated using the Powermarker V3.25 under the Nei (1983) [31] model. PCA was performed using TASSEL on 7,926 SNP markers. A relative kinship matrix was constructed by TASSEL 5.0, and a heat map was generated in R (http://www.r-project.org) [27]. The geographic structure of the population was studied through PCA and performed on the correlation matrix calculated with the mean country data across years for landraces and the mean data across years for modern cultivars.

Construction of the core collection

The core collection’s minimal size was estimated using MStrat Software v4.1 [32]. The analysis included three replicates with 30 iterations for each replicate and step of 1 were used. The core collection size was determined based on maximization (M) and random (R) algorithm methods.

Results

Genetic diversity of the landrace germplasm

The total number of putative SNPs called from 2,023 wheat landraces were 13,199. After filtering, 7,926 SNP markers were used for genetic diversity, and population structure analysis. The B genome had the highest number of SNPs (3,218, ∼ 40.60%), followed by the A genome (3,022, ∼ 38.13%), and the D genome (1,686, ∼ 21.27%) (Fig. 2; Table 1). The number of SNPs per chromosome ranged from 121 to 715 with an average of 377. In the A genome, chromosome 2A had the highest number of polymorphic markers with 692, and chromosome 3A harbored the lowest number (277); in the B genome, the highest and lowest number of markers were detected on chromosome 3B and 4B (715 and 191, respectively); in the D genome, chromosome 4D had the lowest number of SNPs (121), and chromosome 3D had the highest number (316). To characterize the distribution of SNPs in more detail, we used 1 Mb as a step to plot the distribution of SNPs on each chromosome (Additional file 3: Fig. S1). The number of SNPs on each chromosome was consistent with the physical length of the respective chromosome. The average marker density was approximately 1.77 Mb/SNP. The D genome had the lowest SNP marker density (2.34 Mb/SNP), and the B genome had the highest marker density (1.61 Mb/SNP) (Table 1).

Table 1 Summary of genetic diversity among 2,023 landraces accessions. The parameters include number of SNP marker (N), marker coverage, minor allele frequency (MAF), genetic diversity (Hs), heterozygosity(H), polymorphic information content (PIC), nucleotide diversity (π/bp), expected nucleotide diversity (θ/bp) and Tajima’s D

Full size table

Summary statistics of various genetic diversity estimates for each genome of 2,023 wheat landraces had similar values (Table 1). The gene diversity (Hs) in this study ranged from 0.10 to 0.5 with the lowest mean in chromosome 3A (0.23) and highest in chromosome 6A (0.36). Among the three genomes, the B genome showed the highest mean diversity (0.32). The Hs with a value above 0.4 was observed in maximum number of markers (28.24%) and observed least for the value less than 0.1 (2.03%). The PIC was observed to range from 0.09 to 0.38. The mean PIC value on each chromosome showed a similar tend with Hs which ranged from 0.20 (3A) to 0.29 (6A). At the genome level, both A (0.23) and B (0.24) genomes were lower than the D genome (0.26). The MAF from 0.3 to 0.5 was observed in 25.18% of the markers, whereas, MAF less than 0.1 was observed in 29.35%.

The observed nucleotide diversity or average pairwise divergence (π/bp) ranged from 0.23 (7A) to 0.35 (3A) with an average of 0.30. Expected nucleotide diversity or expected number of polymorphic sites (θ/bp) were similar with an average of 0.12. Tajima’s D ranged from 3.89 (A) to 4.50 (D) with an average of 4.08. This value showed significant deviation from the neutral evolution (D = 0) which means the population may have gone through balancing selection. A positive value of D also indicates that rare alleles were present at low frequencies in the population.

Population structure of the landrace accessions

The population structure of the 2,023 accessions was analyzed using 7,963 high-quality SNPs. STRUCTURE software identified the number of subpopulations. The number of cluster (K) was plotted against ΔK to determine the optimum number of subpopulations. The largest ΔK value was observed at K = 2 suggesting the presence of two main groups (Fig. 3a). The percentage of the membership of each accession in the two groups was presented in Additional file 4: Table S3. When using a probability of membership threshold of 60%, 1,726 and 184 accessions were respectively assigned into subgroups G1 and G2 and the remaining 76 accessions were placed in a mixed subgroup (Gmix) (Fig. 3b). The main groups were further subdivided into Sub1, Sub2, Sub3, Sub4 and Sub5 subpopulations (Fig. 3b). The Sub1 subpopulation included 84 accessions (38.10% from Jiangsu and 16.67% from Sichuan); Sub2 included 528 accessions (39.02% from Jiangsu, 21.02% from Zhejiang, and 10.98% from Shanghai); 115 accessions were in Sub3 (40.00% from Jiangsu and 11.30% from Guizhou); Sub4 included 387 accessions (23.00% from Jiangsu, 19.64% from Henan, 18.60% from Sichuan and 16.80% from Guizhou); Sub5 included 292 accessions, almost 92.47% were from Jiangsu. The remaining 617 accessions, accounting for 30.50% of all germplasm, were classified Mix as they had membership probabilities lower than 0.60 for any given subgroup (Additional file 5: Table S4).

PCA based on 2,228 SNP molecular markers showed a similar, five-cluster distribution pattern, with the mixed subgroup being in the middle of the five defined subgroups (Fig. 3c). In scatterplots, the first three principal components explained 29.36, 12.09 and 7.74% of the total variation, respectively. Overall, five clusters were clearly identified by PCA, in agreement with the results from STRUCTURE. We also calculated a kinship analysis to examine genetic clustering among the landraces, and a heat map was generated on their kinship relationship values using R package (Additional file 6: Table S5). Analysis of kinship indicated five clusters with most accessions (blue) having closely familial relationships (Fig. 3d).

Genetic differentiation of populations

F-statistics was calculated from 1,406 accessions after removing the 617 Mix population. Binary allelic data per locus was used for statistical analysis and more than 1.3 alleles were effective except for the Sub1 population. As expected, the heterozygosity (H_E) and Shannon’s diversity index (I) were the most discriminatory measures of differences among the five subgroups, with average genetic diversity estimated to be 0.20 and 0.31 for H_E and the I, respectively (Table 2). Sub3 showed the highest genetic variability (H_E = 0.36; I = 0.53), whereas Sub1 showed the lowest (H_E = 0.05; I = 0.09). The inbreeding coefficients (F) for Sub2, Sub3, Sub4 and Sub5 were > 0.7 whereas that for Sub1 was considerably lower (0.30). Comparing the value of the H_O in each subpopulation, Sub1 exhibited the lowest H_O value.

Table 2 Diversity based on SNPs among the five subgroups

Full size table

Analysis of the fixation index (F_ST) values, a measure of genetic differentiation between populations, revealed that the highest genetic differentiation was between Sub1 and Sub3 (F_ST=0.47), and the slightest difference was between Sub2 and Sub5 (0.11) (Table 3). The Sub3 subpopulation showed the most significant genetic differentiation from other subpopulations.

Table 3 F_ST values between subpopulations assessed with SNP markers

Full size table

AMOVA based on the pairwise genetic distances using GenAlEx 6.51b2. AMOVA revealed that 32.84% of the total variation was explained by the differences among the populations, whereas 67.16% of the variation was within the populations (Table 4). This confirmed much greater variation within than between subpopulations.

Table 4 The analysis of molecular variance (AMOVA) using 7,963 SNPs and the genetic differentiation among the five subpopulations of the 1,406 wheat landraces

Full size table

Core collection

Maximization (M) and Random (R) algorithm methods were used to predict the optimal sample size of the core germplasm (Fig. 4). The M score was higher than the R score, regardless of the sample size change, indicating that the M method of sampling alleles was significantly more efficient than the R method. When 304 accessions were selected, the M curve nearly reached a plateau (score = 3,041), indicating that 304 accessions (15.0%) were more suitable to define the core collection. We used the M method to extract 51, 102, 203, 304, 405 and 506 samples; these six sample sizes of the core collection captured 2.5, 5.0, 10.0, 15.0, 20.0 and 25.0% of the raw materials, respectively (Table 5).

Table 5 Nested core collection sample size predicted by maximization (M) method and random (R) method

Full size table

Considering some landrace accessions with outstanding disease resistance, 311 accessions, accounting for 15.37% of the original set, formed the final core collection of (Fig. 1). Among them, 13 accessions were from Sub1, 81 were from Sub2, 17 were from Sub3, 60 were from Sub4, 45 were from Sub5, and 95 were from Mix. The genetic diversity index and PIC values were 0.31 and 0.25, respectively, and higher than those of full collection (0.30 and 0.24) (Table 6). The neighbor-joining tree constructed with the 7,926 SNP markers showed that the final primary core accessions were evenly distributed among the original collection and were highly representative (Fig. 5 and Additional file 7: Table S6). After accounting for uniformity and redundancy in the agronomic traits, we finally selected 311 accessions as the core collection. A comparison of diversity indices (\({H}^{{\prime }}\)) between the full landrace collection and the 311-core collection showed no significant differences at 12 agronomic traits (Table 7).

Table 6 Comparison of number of alleles, gene diversity and polymorphism information content (PIC) between the 2,023 landraces and core accessions subgroups at the genome level

Full size table

Table 7 Comparison of genetic diversity index (\({H}^{{\prime }}\)) between the 2,023 landraces and core accessions at the phenotypic level

Full size table

Discussion

Diversity among landraces was initially described using spike morphology traits and botanical variety classification [14, 33]. Some landraces were mixtures of different wheat morphotypes that were easily identified by spike color or awn features. Landraces with the same name but originating from different regions often had different phenotypes. Likewise, landraces with similar morphotype had different origin and names. With this study we gained insights into the genetic diversity of landraces accessions preserved in the wheat collection at Jiangsu Academy of Agricultural Sciences. Although the yield of landraces is generally less than that of commercial varieties grown under current agronomic conditions, they remain important sources of genetic variation in searching for novel sources of resistance to biotic and abiotic stress [34]. For example, Chinese landraces such as Wangshuibai, Haiyanzhong, Baisanyuehuang and Huangfangzhu from Jiangsu province have high levels of resistance to Fusarium head blight (FHB) resistance and have been used as donor sources in breeding [35,36,37,38].

Genetic diversity of landrace accessions

Evaluation of genetic diversity in germplasm resources is of great significance for conservation, breeding and research. Studies have repeatedly documented much higher genetic diversity in landraces than among elite cultivars [6, 39]. A study by Sansaloni et al. [40] revealed landraces with unexplored diversity and genetic footprints left by selection in different geographical regions; indeed, very little of the genetic diversity had been used in modern breeding. This was also confirmed by analysis of the collection assembled by Watkins in the early 1900s [41, 42]. Selection in modern breeding programs has led to decreased genetic diversity in current wheat populations, and unless diversity can be maintained in gene banks, it will be lost for future generations [43]. Thus, landraces may hold novel variability not present in modern elite cultivars [17, 44, 45].

In the present study, 7,926 high quality SNPs and 12 phenotypic data of related traits obtained from 2,023 Chinese landrace accessions were used. A large portion of the polymorphic markers were mapped to the B genome (40.60%), followed by the A genome (38.13%) and the D genome (21.27%) (Fig. 2), which was in agreement with previous studies [46]. Interestingly, the Hs, PIC and π on the D genome was higher than the A and B genome in this study (Table 1). Generally, the D genome was the least diverse genome in previous studies [21, 47]. The greater diversity of the D genome in Chinese landrace accessions may indicate a greater possibility that the D genome has novel genetic variations [48], which can be used in elite wheat breeding programs to reduce the bottleneck of the D genome and broaden the genetic base [49].

Population structure and relationship

The population structure analysis is the first step in conducting the association mapping studies. In the present study, STRUCTURE, PCA and kinship analysis showed that there are most probably five subpopulations in the studied collection of landrace accessions (Fig. 3). In each subpopulation, there were genotypes from different regions (Additional file 8: Fig. S2a and b). A few accessions showed a certain association between geographical origin and population structure (Additional file 8: Fig. S2c, d and e). This is a common phenomenon for most cereal landraces worldwide because of informal seed exchange systems involving regional and countrywide farming communities [50, 51]. Nearly 30.50% of landrace accessions were classified into Mix subpopulations, which may also be indirectly attributed to the continuous gene flow of landrace genotypes among the different regions.

Genetic differentiation among populations is reflected by F_ST [52, 53]. F_ST measures population differentiation due to genetic structure and a value greater than 0.15 predicts significant genetic differentiation between subpopulations [54]. High genetic differentiation among subpopulations is indicative of a low level of gene flow between subpopulations. For example, a low level of gene flow was also reported among the wheat landrace populations of Mediterranean origin [55]. This phenomenon may be due to deploying newly developed cultivars across multiple countries and less using of old wheat landraces and locally selected germplasm in breeding programs [56]. AMOVA indicated that most of the genetic variation (67.16%) occurred within subpopulations, confirming the existence of considerable unique variation in subpopulations (Table 4). Previous studies have reported similar results, but it is still unclear whether genetic variation within subpopulations is due to variations that occurred during different domestication processes or introduced by farmers and traders from other regions [56]. In this study, 30.50% of landrace accessions were classified as Mix in population structure analysis, which may be attributed to germplasm exchange between different regions.

A core collection of wheat landraces

A core collection that represents the genetic diversity of a crop in a minimal number of accessions is an effective way to achieve efficient conservation and utilization of germplasm [57, 58]. Ideally, a core collection should be approximately 10% of the total collection and retain 70% of the genetic diversity from the initial collections [59]. The number of accessions selected for the core collection depends on the size of the initial collection and the sampling ratio [38]. Li et al. [60] proposed sampling 5–40% of accessions to construct core germplasm, with 10% being optimum. Van Treuren et al. [61] developed an advanced cultivar core collection of bare cultivars using a sampling percentage of 26.92%. Hao et al. [20] constructed a mini-core collection, accounting for 5% of the initial collection and representing 91.5% of the genetic diversity of the initial collection. Xu et al. [57] suggested that a sampling percentage of 20% was an appropriate size to construct a core collection for barely. In this study, we selected 15.37% (311/2023) of accessions as our core collection.

Representative core accessions have been selected in diverse crops using various sampling strategies and clustering methods [20, 62,63,64]. Previous studies indicated that M strategy performs well when accessions come from populations with restricted gene flow or are from self-pollinated species [57, 65, 66]. The MSTRAT algorithm is one of the representative core selection methods for implementing the M strategy [32]. Here, we used the M strategy as implemented in MStratv 4.1 software and successfully established a representative core collection with high genetic diversity.

Using genotypic and phenotypic information along with clustering to construct a core collection is more efficient than using genotypic or phenotypic information alone [65]. It is important to verify the quality of a core collection, as the quality determines the direction of subsequent research [67]. In the present study, the genetic diversity indices (\({H}^{{\prime }}\)) of 12 morphological characters in the core collection was not significantly different from the entire collection, indicating that the core collection can effectively represent the variation range of 12 morphological traits of the original set. In general, molecular markers reflect changes in genetic variation at the DNA level, without environmental interference, hence providing valuable data to describe genetic diversity. In this study, the 311 accessions were selected as a core collection of wheat landraces, which retained 100% of alleles in a primary core collection. The genetic diversity and PIC value of the core collection were higher than the initial collection. The combined results indicate that the core collection selected in this study well represents the initial landrace collection.

Conclusions

Constructing the core collections of wheat landrace will enhance the efficiency of management and utilization of accessions in the germplasm banks. In the present study, we constructed a core collection of 311 accessions representing 100% of the SNPs identified among 2,023 wheat landrace accessions held by the Jiangsu Provincial Crop Germplasm Resource Genebank. The evaluation showed that this core collection is high-quality and valuable for phenotypic and genetic studies. The core collection can be used as a primary germplasm resource for mining novel genes, genetic association and functional gene analyses.

Data availability

The datasets used or analyzed during the current study are available in this published article in the additional files. SNPs data used in this study is availability in the in China National Center for Bioinformation (CNCB) repository under accession number GVM000783 (https://ngdc.cncb.ac.cn/gvm/getProjectDetail?project=GVM000783).

Abbreviations

AMOVA:: Molecular variance analysis
CC:: Core collection
df:: Degrees of freedom
FHB:: Fusarium head blight
F:: Inbreeding coefficients
GWAS:: Genome-wide association study
H:: Heterozygosity
H':: Genetic diversity indices
H_E :: Heterozygosity
H_O :: Observed heterozygosity
Hs:: Gene diversity
I:: Shannon’s diversity index
LD:: Linkage disequilibrium
MAF:: Minor allele frequency
MAS:: Marker-assisted selection
MS:: Mean sum of squares
N_a :: Number of different alleles
Ne:: Number of effective alleles
PIC:: Polymorphism information content
SNP:: Single nucleotide polymorphism
SS:: Sum of squares
SSR:: Simple sequence repeat
θ:: Expected nucleotide diversity or estimated mutation rate
π:: Average pairwise divergence or observed nucleotide diversity

References

Bhatta M, Regassa T, Rose DJ, Baenziger PS, Eskridge KM, Santra DK, Poudel R. Genotype, environment, seeding rate, and top-dressed nitrogen effects on end-use quality of modern Nebraska winter wheat. J Sci Food Agr. 2017;97:5311–8.
Article CAS Google Scholar
Pascual L, Ruiz M, López-Fernández M, Pérez-Peña H, Benavente E, Vázquez JF, et al. Genomic analysis of Spanish wheat landraces reveals their variability and potential for breeding. BMC Genomics. 2020;21:122.
Article CAS PubMed PubMed Central Google Scholar
Marcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, Jakobsen KS et al. Ancient hybridizations among the ancestral genomes of bread wheat. Science. 2014;345.
Zencirci N, Baloch FS, Habyarimana E, Chung G. Wheat landraces. New York, NY, USA: Springer; 2021. pp. 1–11.
Book Google Scholar
Newton AC, Akar T, Baresel JP, Bebeli PJ, Bettencourt E, Bladenopoulos KV, et al. Cereal landraces for sustainable agriculture. A review. Agron Sustain Dev. 2010;30:237–69.
Article Google Scholar
Wingen LU, West C, Leverington-Waite M, Collier S, Orford S, Goram R, et al. Wheat landrace genome diversity. Genetics. 2017;205:1657–76.
Article PubMed PubMed Central Google Scholar
Monteagudo A, Casas AM, Cantalapiedra CP, Contreras-Moreira B, Gracia MP, Igartua E. Harnessing novel diversity from landraces to improve an elite barely variety. Front Plant Sci. 2019;10:434.
Article PubMed PubMed Central Google Scholar
Khateeb WA, Shalabi AA, Schroeder D, Musallam I. Phenotypic and molecular variation in drought tolerance of Jordanian durum wheat (Triticum durum Desf.) Landraces. Physiol Mol Biol Pla. 2017;23:311–19.
Article Google Scholar
Pinto RS, Molero G, Reynolds MP. Identification of heat tolerant wheat lines showing genetic variation in leaf respiration and other physiological traits. Euphytica. 2017;213:76.
Article Google Scholar
Mothammadi R, Amri A, Ahmadi H, Jafarzadeh J. Characterization of tetraploid wheat landraces for cold tolerance and agronomic traits under rainfed condition of Iran. J Agr Sci. 2015;153:631–45.
Article Google Scholar
Khazaei H, Monneveux P, Shao HB, Mohammady S. Variation for stomatal characteristics and water use efficiency among diploid, tetraploid and hexaploid Iranian wheat landraces. Genet Resour Crop Ev. 2010;57:307–14.
Article Google Scholar
Karagoz A. Wheat landraces of Turkey. Emir J Food Agr. 2014;26:149–56.
Article Google Scholar
Husenov B, Muminjanov H, Dreisigacker S, Otambekova M, Akin B, Subasi K, et al. Genetic diversity and agronomic performance of wheat landraces currently grown in Tajikistan. Crop Sci. 2021;61:2548–64.
Article Google Scholar
Baboev S, Muminjanov H, Turakulov K, Buronov A, Mamatkulov I, Koc E, et al. Diversity and sustainability of wheat landraces grown in Uzbekistan. Agron Sustain Dev. 2021;41:34.
Article CAS Google Scholar
Li XJ, Xu X, Yang XM, Li XQ, Liu WH, Gao AN, et al. Genetic diversity of the wheat landrace Youzimai from different geographic regions investigated with morphological traits, seedling resistance to powdery mildew, gliadin and microsatellite markers. Cereal Res Commun. 2012;40:95–106.
Article Google Scholar
Dong YS, Zheng DS. Wheat genetic resources in China. Beijing, China: Agriculture; 2000.
Google Scholar
Lopes MS, El-Basyoni I, Baenziger PS, Singh S, Royo C, Ozbek K, et al. Exploiting genetic diversity from landraces in wheat breeding for adaptation to climate change. J Exp Bot. 2015;66:3477–86.
Article CAS PubMed Google Scholar
Frankel OH. Genetic perspectives of germplasm conservation. In: Arber W, Llimensee K, Peacock W, Starlinger P, editors. Genetic manipulation: impact on Man and Society. Cambridge: Cambridge University Press; 1984. pp. 161–70.
Google Scholar
Balfourier F, Bouchet S, Robert S, Oliveira RD, Rimbert H, Kitt J, et al. Worldwide phylogeography and history of wheat genetic diversity. Sci Adv. 2019;5(5):eaav0536.
Article PubMed PubMed Central Google Scholar
Hao CY, Dong YC, Wang LF, You GX, Zhang HN, Ge HM, Jia JZ, Zhang XY. Genetic diversity and construction of core collection in Chinese wheat genetic resources. Chin Sci Bull. 2008;53:1518–26.
Article CAS Google Scholar
Mourad AMI, Belamkar V, Baenziger PS. Molecular genetic analysis of spring wheat core collection using genetic diversity, population structure, and linkage disequilibrium. BMC Genomics. 2020;21:434.
Article PubMed PubMed Central Google Scholar
Jain SK, Qualset CO, Bhatt GM, Wu KK. Geographical patterns of phenotypic diversity in a world collection of durum wheats. Crop Sci. 1975;15:700–4.
Article Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Article CAS PubMed PubMed Central Google Scholar
Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics. 2005;21:2128–29.
Article CAS PubMed Google Scholar
Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969;61:893–903.
Article CAS PubMed PubMed Central Google Scholar
Tajima F. The effect of change in population size on DNA polymorphism. Genetics. 1989;123:597–601. .
Article CAS PubMed PubMed Central Google Scholar
Bradbury PJ, Zhang ZW, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–35.
Article CAS PubMed Google Scholar
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–59.
Article CAS PubMed PubMed Central Google Scholar
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005;14:2611–20.
Article CAS PubMed Google Scholar
Earl DA, vonHoldt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour. 2012;4:359–61.
Article Google Scholar
Nei M, Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol. 1983;19:153.
Article CAS PubMed Google Scholar
Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL. MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered. 2001;92:93–4.
Article CAS PubMed Google Scholar
Baboev SK, Buranov AK, Bozorov TA, Adylov BS, Morgunov AI, Muminzhonov K. Biological and agronomical assessment of wheat landraces cultivated in mountain areas of Uzbekistan. Sel’Skokhozyaistvennaya Biologiya. 2017;52:553–60.
Article Google Scholar
Manickavelu A, Joukhadar R, Jighly A, Lan C, Huerta-Espino J, Stanikzai AS, et al. Genome wide association mapping of stripe rust resistance in Afghan wheat landraces. Plant Sci. 2016;252:222–9.
Article CAS PubMed Google Scholar
Li T, Bai GH, Wu SY, Gu SL. Quantitative trait loci for resistance to fusarium head blight in a Chinese wheat landrace Haiyanzhong. Theor Appl Genet. 2011;122:1497–502.
Article PubMed Google Scholar
Zhang XH, Pan HY, Bai GH. Quantitative trait loci responsible for fusarium head blight resistance in Chinese landrace Baishanyuehuang. Theor Appl Genet. 2012;125:495–502.
Article CAS PubMed Google Scholar
Li T, Bai GH, Wu SY, Gu SL. Quantitative trait loci for resistance to fusarium head blight in the Chinese wheat landrace Huangfangzhu. Euphytica. 2012;185:93–102.
Article Google Scholar
Zhang M, Zhang R, Yang JZ, Luo PG. Identification of a new QTL for Fusarium head blight resistance in the wheat genotype Wang shui-bai. Mol Biol Rep. 2010;37:1031–35.
Article CAS PubMed Google Scholar
Moore G. Strategic pre-breeding for wheat improvement. Nat Plants. 2015;1:15018.
Article CAS PubMed Google Scholar
Sansaloni C, Franco J, Santos B, Percival-Alwyn L, Singh S, Petroli C, et al. Diversity analysis of 80,000 wheat accessions reveals consequences and opportunities of selection footprints. Nat Commun. 2020;11:4572.
Article CAS PubMed PubMed Central Google Scholar
Winfield MO, Allen AM, Wilkinson PA, Burridge AJ, Barker GLA, Coghill J, et al. High-density genotyping of the A.E. Watkins Collection of hexaploid landraces identifies a large molecular diversity compared to elite bread wheat. Plant Biotechnol J. 2018;16:165–75.
Article CAS PubMed Google Scholar
Wingen L, Orford S, Goram R, Leverington-Waite M, Bilham L, Patsiou TS, et al. Establishing the A.E. Watkins landrace cultivar collection as a resource for systematic gene discovery in bread wheat. Theor Appl Genet. 2014;127:1831–42.
Article PubMed PubMed Central Google Scholar
Marone D, Russo MA, Mores A, Ficco DBM, Laidò G, Mastrangelo AM, et al. Importance of landraces in cereal breeding for stress tolerance. Plants. 2021;10:1267.
Article CAS PubMed PubMed Central Google Scholar
Riaz A, Hathorn A, Dinglasan E, Ziems LA, Richard C, Singh D, et al. Into the vault of the Vavilov wheats: old diversity for new alleles. Genet Resour Crop Ev. 2017;64:531–44.
Article Google Scholar
Vikram P, Franco J, Burgueño-Ferreira J, Li HH, Sehgal D, Pierre CS, et al. Unlocking the genetic diversity of Creole wheats. Sci Rep-UK. 2016;6:23092.
Article CAS Google Scholar
Alipour H, Bihamta MR, Mohammadi V, Peyghambari SA, Bai G, Zhang G. Genotyping-by-sequencing (gbs) revealed molecular genetic diversity of Iranian wheat landraces and cultivars. Front Plant Sci. 2017;8:1293.
Article PubMed PubMed Central Google Scholar
Tang WJ, Dong ZD, Gao LF, Wang XC, Li TB, Sun CW, Chu ZL, et al. Genetic diversity and population structure of modern wheat (Triticum aestivum L.) cultivars in Henan Province of China based on SNP markers. BMC Plant Biol. 2023;23:542.
Article CAS PubMed PubMed Central Google Scholar
Ogbonnaya FC, Abdalla O, Mujeeb-Kazi A, Kazi AG, Xu SS, Gosman N, et al. Synthetic hexaploids: harnessing species of the primary gene pool for wheat improvement. Plant Breed Rev. 2013;37:35–122.
Article Google Scholar
Bhatta M, Morgounov A, Belamkar V, Poland J, Baenziger PS. Unlocking the novel genetic diversity and population structure of synthetic hexaploid wheat. BMC Genomics. 2018;19:591.
Article PubMed PubMed Central Google Scholar
Bishaw Z. Wheat and barley seed systems in Ethiopia and Syria. PhD Thesis, Wageningen University and Research Center, Germany, 2004. p. 383.
Negisho K, Shibru S, Pillen K, Ordon F, Wehner G. Genetic diversity of Ethiopian durum wheat landraces. PLoS ONE. 2021;16:e0247016.
Article CAS PubMed PubMed Central Google Scholar
Tehseen MM, Istipliler D, Kehel Z, Sansaloni CP, da Silva Lopes M, Kurtulus E, et al. Genetic diversity and population structure analysis of Triticum aestivum L. landrace panel from Afghanistan. Genes. 2021;12:340.
Article CAS PubMed PubMed Central Google Scholar
Luo ZN, Brock J, Dyer JM, Kutchan T, Schachtman D, Augustin M, et al. Genetic diversity and population structure of a Camelina sativa spring panel. Front Plant Sci. 2019;10:184.
Article PubMed PubMed Central Google Scholar
Frankham R, Ballou JD, Briscoe DA, McInnes KH. Introduction to conservation genetics. Cambridge: Cambridge University Press; 2002.
Book Google Scholar
Rufo R, Alvaro F, Royo C, Soriano JM. From landraces to improved cultivars: Assessment of genetic diversity and population structure of Mediterranean wheat using SNP markers. PLoS ONE. 2019;14:e0219867.
Article CAS PubMed PubMed Central Google Scholar
Tehseen MM, Tonk FA, Tosun M, Istipliler D, Amri A, Sansaloni CP, et al. Exploring the genetic diversity and Population structure of wheat Landrace Population conserved at ICARDA Genebank. Front Genet. 2022;13:900572.
Article CAS PubMed PubMed Central Google Scholar
Xu JQ, Wang L, Wang HD, Mao CZ, Kong DD, Chen SY, et al. Development of a core collection of six-rowed hulless barley from the Qinghai-Tibetan plateau. Plant Mol Biol Rep. 2020;38:305–13.
Article CAS Google Scholar
Liu JM, Gao SL, Xu YY, Wang MZ, Ngiam JJ, Rui Wen NC, et al. Genetic diversity analysis of Sapindus in China and extraction of a core germplasm collection using EST-SSR markers. Front Plant Sci. 2022;13:857993.
Article PubMed PubMed Central Google Scholar
Brown AHD. Core collections: a practical approach to genetic resources management. Genome. 1989;31:818–24.
Article Google Scholar
Li ZC, Zhang HL, Zeng YW, Yang ZY, Shen SQ, Sun CQ, Wang XK. Studies on sampling schemes for the establishment of core collection of rice landraces in Yunnan, China. Genet Resour Crop Ev. 2002;49:67–74.
Article CAS Google Scholar
Van Treuren R, Tchoudinova I, van Soest LJM, van Hintum TJL. Marker-assisted acquisition and core collection formation: a case study in barley using AFLPs and pedigree data. Genet Resour Crop Ev. 2006;53:43–52.
Article Google Scholar
Franco J, Crossa J, Taba S, Shands H. A sampling strategy for conserving genetic diversity when forming core subsets. Crop Sci. 2005;45:1035–44.
Article Google Scholar
Hu J, Zhu J, Xu HM. Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops. Theor Appl Genet. 2000;101:264–8.
Article CAS Google Scholar
Wang JC, Hu J, Xu HM, Zhang S. A strategy on constructing core collections by least distance stepwise sampling. Theor Appl Genet. 2007;115:1–8.
Article CAS PubMed Google Scholar
Lee HY, Ro NY, Jeong HJ, Kwon JK, Jo J, Ha Y, et al. Genetic diversity and population structure analysis to construct a core collection from a large Capsicum germplasm. BMC Genet. 2016;17:142.
Article PubMed PubMed Central Google Scholar
Gu XZ, Cao YC, Zhang ZH, Zhang BX, Zhao H, Zhang XM, et al. Genetic diversity and population structure analysis of Capsicum germplasm accessions. J Integr Agr. 2019;18:1312–20.
Article Google Scholar
Odong TL, Jansen J, Van Eeuwijk FA, van Hintum TJL. Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theor Appl Genet. 2013;126:289–305.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The plant materials (seeds) of this study were available from the Genebank of the Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, China. The authors are grateful to Dr. Robert McIntosh of Sydney University for critically reading and improving the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2021YFD1200600); the Zhongshan Biological Breeding Laboratory (ZSBBL-KY2023-02); the Key Research and Development Program of Jiangsu province (BE2022346); Jiangsu Agriculture Science and Technology Innovation Fund (JASTIF) (Grant no.cx [17] 3004) and the International Cooperation Fund of Jiangsu Academy of Agricultural Sciences.

Author information

Authors and Affiliations

Institute of Germplasm Resources and Biotechnology/Jiangsu Provincial Key Laboratory of Agrobiology, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu, 210014, China
Ying Liu, Bisheng Fu, Qiaofeng Zhang, Jin Cai, Wei Guo, Wenling Zhai & Jizhong Wu
Zhongshan Biological Breeding Laboratory, Nanjing, Jiangsu, 210014, China
Bisheng Fu, Jin Cai, Wei Guo & Jizhong Wu
Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, Yangzhou University, Yangzhou, 225009, China
Jizhong Wu

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bisheng Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qiaofeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Cai
View author publications
You can also search for this author in PubMed Google Scholar
Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wenling Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LY performed phenotype, DNA extraction, SNP genotyping, carried out genetic diversity analysis, population structure, PCA and kinship analysis, constructed a core collection, and drafted the manuscript. FBS extracted the DNA of the accessions, collected and managed the phenotype dataset. ZQF participated in the DNA extractions, SNP genotyping, and phenotype data collection. CJ and GW contributed to phenotype, DNA extractions. ZWL participated in phenotype analysis. WJZ designed and supervised the study, assisted in the conception of the study, discussion and revision of the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Jizhong Wu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Additional file 1:

Table S1. Detailed information of 2,023 wheat landraces accessions.

Additional file 2:

Table S2. Categories and descriptive statistics for the 12 agronomic traits.

Additional file 3:

Fig. S1. Distribution and density of filtered single nucleotide polymorphisms (7,963 SNPs) across 21 chromosomes. Horizontal display chromosome length. The number of SNPs in a given region is indicated at the bottom right side.

Additional file 4:

Table S3. Individual Q matrix calculated in STRUCTURE (K = 2).

Additional file 5:

Table S4. Individual Q matrix calculated in STRUCTURE (K = 5).

Additional file 6:

Table S5. The kinship relationships matrix between accessions.

Additional file 7:

Table S6. Estimates of evolutionary divergence between accessions.

Additional file 8:

Fig. S2. Grouping of 2,023 wheat landrace accessions by principal component analysis. a-b Plots of PC1, PC2 and PC3 of landrace accessions based on predicted group membership from STRUCTURE (K = 5). b-c Plots of PC1, PC2 and PC3 from principal component analysis of landrace accessions from different regions of China. e Geographic locations of 2,023 wheat landraces based on predicted group membership from STRUCTURE (K = 5).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Liu, Y., Fu, B., Zhang, Q. et al. Genetic diversity and population structure of wheat landraces in Southern Winter Wheat Region of China. BMC Genomics 25, 664 (2024). https://doi.org/10.1186/s12864-024-10564-z

Download citation

Received: 06 February 2024
Accepted: 25 June 2024
Published: 03 July 2024
DOI: https://doi.org/10.1186/s12864-024-10564-z

Genetic diversity and population structure of wheat landraces in Southern Winter Wheat Region of China

Abstract

Background

Results

Conclusions

Background

Methods

Plant material

Phenotyping and data analysis

SNP genotyping

Analysis of genetic diversity

Inference of structure, PCA and kinship

Construction of the core collection

Results

Genetic diversity of the landrace germplasm

Population structure of the landrace accessions

Genetic differentiation of populations

Core collection

Discussion

Genetic diversity of landrace accessions

Population structure and relationship

A core collection of wheat landraces

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us