Skip to main content

Genetic diversity and population structure of 93 rice cultivars (lines) (Oryza sativa Xian group) in Qinba in China by 3 types of genetic markers



The Qinba region is the transition region between Indica and Japonica varieties in China. It has a long history of Indica rice planting of more than 7000 years and is also a planting area for fine-quality Indica rice. The aims of this study are to explore different genetic markers applied to the analysis population structure, genetic diversity, selection and optimization of molecular markers of Indica rice, thus providing more information for the protection and utilization on germplasm resources of Indica rice.


Fifteen phenotypic traits, a core set of 48 SSR markers which originated protocol for identification of rice varieties-SSR marker method in agricultural industry standard of the People's Republic of China (Ministry of Agriculture of the PRC, NY/T1433-2014, Protocol for identification of rice varieties-SSR marker method, 2014), and SNPs data obtained by genotyping-by-sequencing (GBS, NlaIII and MseI digestion, referred to as SNPs-NlaIII and SNPs-MseI, respectively) for this panel of 93 samples using the Illumina HiSeq2000 sequencing platform, were employed to explore the genetic diversity and population structure of 93 samples.


The average of coefficient of variation (CV) and diversity index (He) were 29.72% and 1.83 ranging from 3.07% to 137.43%, and from 1.45 to 2.03, respectively. The correlation coefficient between 15 phenotypic traits ranged from 0.984 to -0.604. The first four PCs accounted for 70.693% phenotypic variation based on phenotypic analysis. A total of 379 alleles were obtained using SSR markers, encompassing an average of 8.0 alleles per primer. Polymorphic bands (PPB) and polymorphism information content (PIC) was 88.65% and 0.77, respectively. The Mantel test showed that the correlation between the genetic distance matrix based on SNPs-NlaIII and SNPs-MseI was the largest (R2=0.88), and that based on 15 phenotypic traits and SSR was the smallest (R2=0.09). The 93 samples could be clustered into two subgroups by 3 types of genetic markers. Molecular variance analysis revealed that the genetic variation was 2% among populations and 98% within populations (the Nm was 0.16), Tajima’s D value was 1.66, the FST between the two populations was 0.61 based on 72,824 SNPs.


The population genetic variation explained by SNPs was larger than that explained by SSRs. The gene flow of 93 samples used in this study was larger than that of naturally self-pollinated crops, which may be caused by long-term breeding selection of Indica rice in the Qinba region. The genetic structure of the 93 samples was simple and lacked rare alleles.

Peer Review reports

Key message

It was found that there was a significant correlation between the genetic distance obtained by the two types of SNPs markers, while the lowest correlation occured between the genetic distance of phenotypic traits and SSR data. The population genetic variation explained by SNPs was larger than that explained by SSRs among DNA molecular markers. Bayesian clustering algorithm was superior to the other two clustering methods. The genetic structure of 93 samples representative of the diversity present in Qinba area in China of Oryza sativa Xian group was simple and lacked rare alleles.


According to the origin and the history of rice cultivate in China, the two major types of Oryza sativa L. are classified as O. sativa L. subsp. hsien Ting and O. sativa L. subsp. keng Ting proposed by Ding Y [1, 2], and the naming method of O. sativa L. subsp. indica Kato (also known as O. sativa Xian group) and O. sativa L. subsp. japonica Kato (also known as O. sativa Xian group ) for O. sativa L. proposed by Kato was used internationally at the present [3], in which Indica rice is distributed mainly in the southern Qinling Mountains in China. The Qinba area is the climate transition area between the northern and southern areas as well as the transition area from Indica rice to Japonica rice, which is also the most suitable planting area for Indica rice in China. Germplasm resources form the basis of all breeding work; the analysis of genetic diversity and genetic structure is beneficial to mining excellent breeding materials and improving breeding efficiency. Particularly, in-depth genetic dissection of Indica rice germplasm resources have not been conducted. The population genetic structure is the non-random distribution of genes or genotypes in space and time, including genetic variations within populations and genetic differentiation between populations. Population structure analysis is essential to explore the biological adaptability, population formation process, evolutionary mechanism, protection, and development of biological resources. At the same time, populations with identical or similar genetic backgrounds is most suitable for genome-wide association studies (GWAS), therefore, the study of population genetic structure plays an important role in the field of biology, in which the selection of genetic markers is the top strategic priority, ranging from earlier morphological markers to more recent different types of DNA molecular markers [4,5,6,7]. The Indica rice genome has simple sequence repeats (SSRs) that span approximately 10-50 kb [8, 9]. In the last few decades, SSR molecular markers have become important tools in the field of biology, particularly in terms of population structure, genetic mapping, and other related fields, SSR markers have also become the designated markers of the International Fingerprint Mapping Center [10,11,12]. These are employed in judicial identification, identification of new varieties of plants, such as rice, rape, and corn [13,14,15,16,17,18]. SSR markers are also used in DNA fingerprinting for breed protection [19]. However, SSR markers are scarce, show unbalanced distribution in the genome, have weak electrophoretic resolution, and are relatively time-consuming and labor-intensive to study, and thus it is difficult to construct high-density genetic maps. With the recent development of next-generation sequencing technology, most biological studies have rapidly improved, in particular, the use of single nucleotide polymorphisms (SNPs) based on genome-wide scans. With the release of extensive rice genome sequencing data, one SNP in every hundred base pairs or even every dozens of base pairs has been identified, indicating that there are numerous SNPs in the rice genome [6, 20]. A small number of SNPs can be used to resolve many problems, so the sequencing technology was born based on simplified genome by restriction site-associated DNA (RAD) tags [21]. The frontrunner among these technologies is genotyping-by-sequencing (GBS), which has recently gained attention because it utilizes methylation-sensitive restriction endonucleases (type II enzyme), thereby avoiding repetitive regions of the genome (methylated regions). GBS technology can rapidly identify high-density polymorphisms, especially SNPs [22]. In this study, two type II enzymes (NlaIII and MseI) were selected by simulated whole-genome enzyme digestion, which generated RAD tags for sequencing to obtain SNPs datasets, referred to as SNPs-NlaIII and SNPs-MseI, respectively. Simultaneously, a core set of SSR markers from NY/T1433-2014 [23] that originated in the Agricultural Standards of the People's Republic of China and 15 phenotypic traits were employed to explore gene flow and population genetic structure of 93 samples and to provide reference for future research studies using different genetic markers employed in related fields.


Phenotypic traits diversity and cluster analysis

Phenotypic diversity analysis

Data of the 15 phenotypic traits of the 93 samples are summarized in Table 1. The basic statistical analysis and diversity of the 15 phenotypic traits based on phenotypic data is shown in Table 2. The coefficient of variation (CV) was 29.72% on average and ranged from 3.07% (brown rice rate) to 137.43% (chalkiness). The average diversity index (He) was 1.92 and ranged from 1.55 to 2.08, with brown rice rate and chalkiness having lower He, indicating that there were relatively few phenotypes in these two traits. Overall, the Indica rice materials tested had relatively more phenotypes on these 15 traits; the distribution in each phenotype was uneven Tables 3 and 4.

Table 1 Phenotypic data of 93 samples
Table 2 Basic statistical analysis and diversity of the 15 phenotypic traits
Table 3 Pearson correlation coefficient analysis of the 15 phenotypic traits

Phenotypic traits clustering

Average Euclidean distance was 5.19, ranging from 0.90 (between W723 and W742) to 13.73 (between W699 and W733). Clustering result based on the 15 phenotypic traits was shown in Fig. 1, which demonstrated that the 92 samples were clustered together in addition to W669 and showed a single genetic basis for the population.

SSR marker analysis

Polymorphism of SSR markers

A total of 378 bands was detected using 48 core SSRs primer pairs (Table 5). Among these, 336 polymorphic bands were detected. The average number of polymorphic fragments was 7, ranging from 1 to 14. The highest number (14) of polymorphic bands was detected by RM278 while RM311 is the least bands. The average value of PPB (Percentage of polymorphic bands) was 88.87%, ranging from 50% to 100%. The average value of PIC (Polymorphism information content) was 0.77, ranging from 0.19 to 0.88. Data showed that core SSR in rice can produce rich bands and high polymorphic rate.

Fig. 1
figure 1

Cluster diagram based on the 15 phenotypic traits. a via PC clustering; b via UPGMA clustering

Table 4 Eigenvalue and contributive percentage of principal components and component scores coefficient matrix of the 15 phenotypic traits
Table 5 Information and polymorphism of 48 SSR primers

Clustering based on SSR

PC, in which the first three PC (eigenvalue) to select and their cumulative contribution of variance accounted for 15.76%, and the unweighted pair-group method with arithmetic means (UPGMA) were performed, which demonstrated that the 93 genotypes could be divided into 2 subgroups (Fig. 2).

Fig. 2
figure 2

The cluster diagram based on SSR. a by the PC clustering; b by the UPGMA clustering

Bayesian clustering based on SSR markers

A total of 378 SSR bands was used to elucidate the population structure of the entire pool of 93 rice germplasms. The best K was K = 2, suggesting that the 93 rice germplasms were best divided into two subgroups (Fig. 3).

Fig. 3
figure 3

Bayesian clustering based on SSR markers; Red: group I; Green: group II. Each vertical line on the X-axis correspond to a sample. The proportion of each color represents probability rate with which a given genotype belongs to each group

SNPs marker analysis

A total of 39,872 SNPs-NlaIII and 35,547 SNPs-MseI passed the minor allele frequency (MAF) lower limit of 0.05 using NlaIII and MseI digestion, respectively. Merged data of SNPs-NlaIII and SNPs-MseI, with a total of 72,824 SNPs including 67,621 SNPs that aligned to specific chromosomes and 5,023 SNPs unlocalized, were then obtained.

Linkage disequilibrium (LD) and haplotype analysis

From the total of 6,288,753 loci (93 samples × 67,621SNPs), 326,873 (5.198%) were heterozygous. The 67,621 SNPs were unevenly distributed on the 12 chromosomes (Fig. 4a); chromosome 1 contained the largest number of makers (8,425), while chromosome 8 included the least (3,953). LD, as represented by inter-loci R2 values, was calculated for the 84,255 SNP pairs. R2 value had a minimum of 0.2 and an average of 0.73. 46,322 SNP pairs (54.98%) had R2 values higher than 0.8, while 7,841 pairs (9.31%) were in complete LD (R2=1). The 12 chromosomes yielded a total of 6,568 predicted haplotypes (Fig. 4b), with chromosome 1 possessing the most haplotypes (776) and chromosome 10 possessing the least (349). The largest haplotype was composed of 95 SNPs. The longest haplotype spanned over 200.0 kb; the average haplotype length was 33.71kb.

Fig. 4
figure 4

The distribution of SNPs and haplotype among the 12 chromosomes. a SNPs distribution. b Haplotype distribution

AMOVA and gene flow

The average MAF of the 93 samples was 0.21. Tajima’s D value was 1.66, which suggests low levels of both low- and high-frequency polymorphisms, indicating a decrease in population size and/or balancing selection that resulted in more haplotypes and lacked rare alleles in this population. Analysis of molecular variance (AMOVA) showed that the genetic variation was 98% within the population and 2% between populations, which indicated the existence of slight genetic variation among 93 samples. The genetic differentiation coefficient (FST) between the two populations was 0.61, and gene flow (Nm) was 0.16. Further investigation showed that the gene flow of selfing crops was the smallest, and that of annual herbaceous plants was the lowest. If Nm > 1, which indicates that the level of gene flow between populations is high, then genetic differentiation among populations is small; if Nm > 4, then gene communication between populations is more adequate and genetic differentiation is smaller; and Nm < 1 indicates that population differentiation may have occurred due to genetic drift. The gene flow was 0.16, which indicates that the gene flow among populations in the Qinba region is lower, but nearly 2.5-fold higher than that of conventional inbred plants, which may result in long-term artificial selection, leading to reduced genetic differentiation.

Clustering based on SNPs

PC clustering

Principal component analysis was performed to select the first three PCs (based on eigenvalue). Their cumulative contribution of variance accounted for 40.69%, 39.76% and 40.10% for SNPs-NlaIII, SNPs-MseI, merged data of SNPs-NlaIII and SNPs-MseI, respectively, which demonstrated that the 93 genotypes could be clustered into two subgroups by the first three PCs (Fig. 5), with W366 and W367 being always separated from other samples.

Fig. 5
figure 5

PCA plots using the different SNPs markers. a by SNPs-NlaIII; b by SNPs-MseI; c by merged SNPs-NlaIII and SNPs-MseI data

UPGMA clustering

The unweighted pair-group method with arithmetic means (UPGMA) algorithm was performed and demonstrated that the 93 genotypes could be divided into 2 subgroups (Fig. 6), which was consistent with the PC results. Group I included 1 to 3 samples, while group II contained 92 to 90 samples. The average genetic distance was 0.29, ranging from 0.02 to 0.55 based on merged SNPs-NlaIII and SNPs-MseI data. The two most closely related materials were W710 and W711, and the two most furthest materials were W366 and W740.

Fig. 6
figure 6

UPGMA Clustering using different SNPs markers. a by SNPs-NlaIII; b by SNPs-MseI; c by merged datas of SNPs-NlaIII plus SNPs-MseI

Bayesian clustering

Seventy-two thousand eight hundred twenty-four SNPs (MAF <5%) were used to assess the population structure of the entire pool of 93 samples. Delta K reached a maximum value at K=2, suggesting that the 93 samples were divided into two subgroups (consisting of 70 and 23 samples) (Fig. 7). In the population structure analysis, the results from K = 2 to K = 5 revealed the occurrence of gene introgression between groupІ and groupII, accounting for approximately 76.34% of the observed variations (calculated with K = 2).

Fig. 7
figure 7

Bayesian clustering based on 72,824 SNPs for 93 samples; Red: group I; Green: group II. Each vertical line on the X-axis correspond to a sample. The proportion of each color represents probability rate with which a given genotype belongs to each group

The analysis performed based on phenotypic traits, SSR and SNPs data using PCA, UPGMA and Bayesian clustering led to inconsistent results, the population structure is relatively simple, the matrix delamination is not distinctive.

Clustering of different category materials

Population genetic information of different category samples, including 57 restoring lines, 19 maintainer lines and 17 special rice lines was analyzed (Table 6) and clustered (Fig 8a, b, c, respectively). Results showed that the genetic basis of the restorer line was more abundant than that of the maintainer line, and that the genetic basis of the special rice was wider than that of the conventional rice.

Table 6 Population genetic analysis of different category materials
Fig. 8
figure 8

Clustering based on UPGMA. a clustering of 57 restoring lines. b clustering of 19 maintainer lines. c clustering of 17 special rice lines

Correlation analysis of genetic distance matrices based on 3 types of genetic markers

All cluster analyses were based on the genetic distance or genetic similarity coefficient generated by genetic markers between samples; in the present study, the coefficients of correlation (R2) between the genetic distance matrices were 0.0914, 0.1726, 0.198, 0.876, 0.3478, 0.2713, respectively (Fig. 9). These results may be due to the use of different number of markers.

Fig. 9
figure 9

Correlation between the genetic distance matrices generated using different genetic markers


Phenotype is the result of the interaction of genotype with environment. A given genotype can be expressed as different phenotypes in different environments; this is known as the plant inherent phenotypic plasticity, which is different from the genotype. In the study of germplasm resources with more samples, economical and effective method with the use of phenotype data to study population genetic structure and genetic diversity is also a very important at this early stage. The selected 15 traits belong to quantitative traits and are greatly affected by environment; hence it is not recommended to use them for population genetic structure analysis. In recent decades, SSR markers, which represent second-generation DNA molecular markers, have been widely used for plant population genetic analysis, phylogenetic reconstruction, and quantitative trait mapping. All kinds of DNA markers are different and results generated by different DNA markers reflect different polymorphic region in the genome and can reveal various information contained in the genome. Theoretically, the more markers used, the more accurate results will be. SSR markers are mostly distributed in centromeres, telomeres, introns, and 3' untranslated regions (UTR). Most of these markers are non-functional genetic markers and do not affect the application of SSR marker in clustering analysis. While the population genetic variation explained by SNPs was larger than that explained by SSRs (for example, in PC analysis), the accumulative contribution rate of the first three major factors analyzed by SSR was only 15.76%, far less than that using SNPs data (40.10%), indicating that the more DNA polymorphism, the more accurate the population variation can be explained. For association analysis between markers and traits, greater number of polymorphic sites is associated with higher mapping resolution. A natural population often contains multiple sub-populations, which could result in high degree of LD within the tested population and lead to pseudo-association between markers and traits. Therefore, LD and haplotype studies are necessary before carrying out association analysis. The core 48 pairs of SSR markers as well 72,824 SNPs had rich bands and high polymorphism in Indica rice genome; clustering result of SSR was concordant with that of SNPs, but different from phenotypic traits clustering.

All analyses of the population genetic structure were based on the estimation of genetic distance or genetic similarity coefficient matrix between samples. Analyses were conducted using three methods: PC, UPGMA, and Bayesian clustering. Bayesian algorithm is more practical than UPGMA and PC analysis no matter which genetic marker is used given the prior pedigree knowledge of the 93 samples. At the same time, the size of gene flow of each sample can be seen from the population genetic structure graph based on Bayesian algorithm.

Through the analysis of different types of materials (57 restorer lines, 19 sterile lines, and 17 special rice), the results showed that the genetic basis of the restorer lines was richer than that of the maintainer lines, which was consistent with the conclusion of Ying Jiezheng et al. [24]. The main reason may be that most CMS (cytoplasmic male sterile) lines currently used in production are related to cultivars such as Zhenshan 97B, II-32B, Zhong9a and Gang46a and may be derived from Aizazhan and Aijiaonante, which originated from dwarf rice varieties. At present, the restorer lines used in combination production originate from the Yangtze River Basin of China, Sichuan, Southeast Asia, South Korea, etc, and were created by crossing Indica and Japonica rice. Special rice has an abundant genetic basis compared to other rice germplasm resources and has high breeding potential.


Higher number of genetic markers is related to higher explained population variation, especially functional DNA markers. The above showed that it is difficult to make certain the genetic nature of rice germ-resources using phenotype traits clustering. Clustering results based on different genetic markers showed that the genetic basis of 93 samples was single. Average genetic distance was 0.29 based on 72,824 SNPs of 93 samples, which may be due to many reasons, such as the wide exchange of variety resources among breeding units in the process of breeding, and similar breeding goals. Genetic effects in populations depend on the opportunity distribution of MAFs across the genome-wide, and different populations have different MAF values. Although the gene flow in the population composed of 93 samples was relatively large, the average MAF of the population was only 0.21, indicating the genetic structure of 93 samples is simple and lacked rare alleles. Though the amount of colored rice only take up a small proportion of rice resource in this study, it arose an extensive attention all over the world, due to its characteristics which include special nutrition, health care and artificial utilization. Measures to improve the genetic diversity of rice cultivars in the Qinba area are important in the future.

Materials and methods

Plant materials

A total of 93 samples were collected from the Shaanxi Rice Research Institute (Hanzhong city, China), comprising 57 restoring lines, 19 maintainer lines, and 17 special rice (Special rice refers to rice with special genetic traits and uses such as colored rice genotypes including black rice, purple rice, red rice, green rice and yellow rice and aromatic rice germplasm, which only research colored rice in this study.), which were representative of the diversity of Oryza sativa Xian group present in the Qinba area in China.

Field experiments

Seeds were planted at the rice experimental farm (E: 106°59′57″, N: 33°7′48″) during three consecutive years (2018, 2019, 2020), with planting dates of 2018 April 10, 2019 April 11, and 2020 April 8, and transplanted on May 24, May 24, May 20 according to a 16.7cm × 20cm split-split-plot design. Each sample was arrayed randomly at plots with three repeats, to no edge row between the plots.


Six plants in the middle of each plot were selected to investigate the values of agronomic, economic and quality traits according to “Recording items, methods and standards of national rice variety test and observation” as well as “National Standard of GBT 17891-1999 high quality paddy”. The 15 selected phenotypic traits included sowing date, plant height, leaf length, leaf width, effective number of panicles per plant, panicle length, total number of grains per panicle, number of filled grains per panicle, 1000-grain weight, browning rate, milled rice rate, head milled rice rate, chalky grain rate, degree of chalkiness, and length/width ratio; the averages of the three-year data were used as the phenotypic data.

Phenotypic traits statistical analysis

The mean value (x), standard deviation (δ), and coefficient of variation (CV) were computed. Shannon-Weiner index (H’) was calculated according to the following equation: H’=-∑PilnPi, where Pi is the proportion of samples ranked at ith grade for a given phenotypic trait among all samples (all of the phenotypic traits were divided into 10 grades by assigning values less than \(\overline{\mathrm{x}}\hbox{-} 2\delta\) as 1th grade and those greater than \(\overline{\mathrm{x}}+2\delta\) as 10th grade, with inter-grade difference of 0.5δ for the remaining grades). All of phenotypic trait data were the standardized using Z-scores, and hierarchical cluster analysis was performed using between-groups linkage method based on Euclidean distance. The above analysis was carried out with the IBM SPSS statistics 22.0 software; MEGA7.0 software was used for editing and visualizing cluster results.

SSR genotyping

The genomic DNA of 93 samples was extracted from fresh leaves using the SDS technique and detected with 0.8% agarose gel electrophoresis. The 48 SSR primers were synthesized by Beijing Aoke Biotechnology Co., Ltd. (Beijing, China). PCR were carried out in a 10 μL volume containing 1 μL DNA template, 2 μL (10 μM) of forward and reverse primers (1μL each), 5μL 2×Taq Master Mix, and 2 μL RNase-free water. Reactions were programed as follows: initial denaturation at 94.0°C for 5 minutes, denaturation at 94.0°C for 1 minute, annealing at 50-60.0°C for 1 minute, and extension at 72.0°C for 1 minute, for a total of 35 cycles. Electrophoresis was performed using 8% non-denaturing polyacrylamide gel under 95V voltage; bands were visualized via silver staining. Following electrophoresis, each amplification band corresponded to a primer hybridization locus and was considered as an effective molecular marker. Each polymorphic band detected by a same given primer represented an allelic mutation. In order to generate molecular data matrices, clear bands for each fragment were scored in every accession for each primer pair and recorded as 1(presence of a fragment), 0(absence of a fragment), and 9(complete absence of band).

SSR marker efficiency analysis

The value of the polymorphism information content (PIC) was calculated using the PIC_Calc 0.6 program ( The level of polymorphism of each marker was assessed by the polymorphism information content, which measures the extent of genetic variation: PIC values smaller than 0.25 indicates low levels of polymorphism associated to a locus, PIC values between 0.25 and 0.5 imply moderate levels of polymorphism, while PIC values greater than 0.5 indicate high levels of polymorphism [25].

SNPs genotyping

The genomic DNA of 93 samples was digested using the NlaIII and MseI enzymes. GBS was performed using the Illumina Hiseq 2000 platform of Novo Gene Bioinformatics Technology Co.,Ltd (Beijing, China). The SNPs data obtained with NlaIII and MseI digestion were recorded as SNPs-NlaIII, SNPs-MseI, respectively. Polymorphism filtering of SNPs was done with dp., Miss and MAF of 2, 0.3 and 0.05, respectively, followed by annotation based on the reference genome (

LD and haplotype construction

Genotype data were then used to calculate LD between SNPs and to construct haplotypes using the EM algorithm implemented in PLINK1.07 ( The commands “--r2” and “--blocks” were used to calculate LD and assign SNPs to their respective haplotypes by calculating inter-maker LD within a 200kb window, respectively. Figures were constructed using the Origin8 platform (

AMOVA and gene flow

A total of 72,824 SNPs were employed to analyze molecular variance (AMOVA) and gene flow. The components of variance attributable to different varieties and breeding lines were estimated from the genetic distance matrix using the Tajima & Nei method, as specified in the AMOVA procedure in ARLEQUIN 3.1 [26]. A nonparametric permutation procedure with 9999 permutations was used to test the significance of variance components associated with the different possible levels of genetic structure in this study. The pairwise Fst values, a value of F statistic analogs computed from AMOVA, were used to compare genetic distances between any two groups.

PC clustering

PC analysis was performed under the Eigen module using NTSYS-pc2.10e [27].

UPGMA clustering

Identity-by-state (IBS) distance matrix generated by TASSEL5.0 ( was used to build an UPGMA tree. MEGA7.0 (http:// was used for editing and visualizing.

Bayesian clustering

STRUCTURE 2.3.4 (, which applies a Bayesian clustering algorithm, was used to simulate population genetic structure based on SSR and SNPs data, respectively. Using a membership probability threshold of 0.60, population K values from 1 to 5 were simulated with 5 iterations for each K using 10,000 burn-in periods followed by 10,0000 Markov Chain Monte Carlo iterations in order to obtain an estimate of the most probable number of populations. Delta K was plotted against K values; the best number of clusters was determined following the method proposed by Evanno et al [28], and obtained via the Structure Harvester platform ( [29].

Correlation analysis among genetic distance matrices by diffrent DNA marker dataset

Mantel tests were used to measure the correlation between the genetic distance matrices generated using 15 phenotypic traits and SSR, 15 phenotypic traits and SNPs-NlaIII, 15 phenotypic traits and SNPs-MseI, SNPs-NlaIII and SNPs-MseI, SNPs-NlaIII and SSRs, SNPs-MseI and SSRs. It was carried out using the GenAlEx software with 9999 permutations [30]. r ≥ 0.9, 0.8 ≤ r < 0.9, 0.7 ≤ r < 0.8, and r < 0.7 represented significant correlation, moderate correlation, weak correlation, and no correlation, respectively.

Availability of data and materials

The datasets generated during the current study are available in the NCBI repository, [] [PRJNA801889].



Analysis of molecular variance


Deoxyribonucleic acid


genome-wide association studies


genotyping by sequencing


Identity by state


Linkage disequilibrium


Minor allele frequency


number of polymorphic bands


Percentage of polymorphic bands


Polymorphism information content


Principal component analyses


restriction site-associated DNA


Single nucleotide polymorphism


Simple sequence repeats


total number of bands


Unweighted pair group method with arithmetic mean


  1. Ling QH, Zhang HC, Ding YF. Discussion of Naming for Two Subspecies of Oryza sativa L. Scientia Agricultura Sinica. 2013;46(2):250–6.

    Article  Google Scholar 

  2. Ding Y. The origin and evolution of cultivated rice in China. Acta Agron Sin. 1957;8(3):243–60.

    Google Scholar 

  3. Wang WS, Mauleon R, Hu ZQ, Chebotarov D, Tai SS, Wu ZC. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557(7703):43–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Delphine VI, Albrecht EM, Claude L, Benjamin S. Population structure and genetic diversity in a commercial maize breeding program assessed with SSR and SNP markers. Theor Appl Genet. 2010;120(7):1289–99.

    Article  Google Scholar 

  5. Loveless MD, Hamrick JL. Ecological determinants of genetic structure in plant populations. Ann Rev Ecol Syst. 1984;15:65–95.

    Article  Google Scholar 

  6. Shen YJ, Jiang H, Jin JP, Zhang ZB, Xi B, He YY, et al. Development of genome-wide DNA Polymorphism database for map-based cloning of rice genes. Plant Physial. 2004;135(3):1198–205.

    Article  CAS  Google Scholar 

  7. Zhang Y, Zhang XJ, Chen X, Sun W, Li J. Genetic diversity and structure of tea plant in Qinba area in China by three types of molecular markers. Hereditas. 2018;155(3):22.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Junjian N, Peter MC, David JM. Evaluation of Genetic Diversity in Rice Subspecies Using Microsatellite Markers. Crop Science. 2002;42(2).

  9. Yu J, Hu SN, Wang J, Wong GK, Li SG, Liu B, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002;296(5565):79–92.

    Article  CAS  PubMed  Google Scholar 

  10. He GL, Fu GP, Peng CS, Deng W, Zhu S, Yang Z, et al. DNA Fingerprint Map and Analysis of Genetic Diversity of the Japonica Rice Varieties in the Regional Test in Jiangxi Province in 2018. Acta Agriculturae Universitatis Jiangxiensis. 2019;41:843–52.

    Article  Google Scholar 

  11. Huang XH, Li LZ, Zhang JF, He DL, Zhang XQ, Chen JB, et al. Evaluation of Diversity and Evolution of the Microsatellite LEI0258 in Chicken Region from South China. Chinese J Animal Vet Sci. 2016;47(11):2175–83.

    Article  Google Scholar 

  12. Yin QQ, Li DY, Wang HZ, Cao DC, Lu CY, Sun XW, et al. Microsatellite marker analysis on genetic diversity in two German mirror carp(Cyprinus carpio L.) families. J Anhui Agricu Univ. 2008;(2):211–8.

  13. Zeng XS, Peng D, Shi Y, Xie W, Liu AM. Fingerprinting Construction of Rice Core Parental Lines with SSR Markers. Crop Research. 2016;30(5) 481-486+511.

  14. Li C, Zhang Y, Ying K, Liang XL, Han B. Sequence variations of simple sequence repeats on chromosome 4 in two subspecies of the Asian cultivated rice. Theoretical Appl Genetics. 2004;108(3):392–400.

    Article  CAS  Google Scholar 

  15. Li HB, Yang J, Lv ZW, Yi B, Wen J, Fu TD, et al. Screening of Brassica napus core SSR primers. Chinese J Oil Crop Sci. 2010;32(3):329–36.

    Article  CAS  Google Scholar 

  16. Lin YX, Wang AX, Liu H, Wang Z, Liang MZ, Dai XJ, et al. Research on DNA Molecular Digital Fingerprint Database Based on 48 Pairs of SSR Primers for 94 Hybrid Rice Parents in NYT 1433-2014. Chinese J Rice Sci. 2016;30:593–602.

    Article  CAS  Google Scholar 

  17. Sui GL, Yu SC, Yang JX, Wang WH, Su TB, Zhang FL, et al. Validation of a Core Set of Microsatellite Markers and Its Application for Varieties Identification in Chinese Cabbage. Acta Horticulturae Sinica. 2014;41(10):2021–34.

    Article  CAS  Google Scholar 

  18. Wang MH, Zhang XT, Wu GL, Jiang Q, Shi YH. DNA Fingerprints Construction and Purity Identification Based on SSR Markers for Rice Varieties in Ningbo City. China Rice. 2019;25(6):50–4.

    Article  Google Scholar 

  19. Teng HT, Lv B, Zhao JR, Xu Y, Wang FG, DU WW, et al. DNA Fingerprint Profile Involved in Plant Variety Protection Practice. Biotechnol Bull. 2009;1:1-6.

  20. Nasu S, Suzuki J, Ohta R, et al. Search for and analysis of single nucleotide polymorphisms (SNPs) in rice (Oryza sativa, Oryza rufipogon) and establishment of SNP markers. DNA Res. 2002;9(5):163–71.

    Article  CAS  PubMed  Google Scholar 

  21. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007;17(2):240–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One. 2011;6(5):e19379.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ministry of Agriculture of the PRC. NY/T1433-2014, Protocol for identification of rice varieties-SSR marker method. China Agriculture Press; 2014.

  24. Ying JZ, Shi YF, Zhuang JY, Xue QZ. Microsatellite Marker Evaluation on Genetic Diversity of the Major Commercial Rice Varieties in China. Scientia Agricultura Sinica. 2007;4:649–54.

    Article  Google Scholar 

  25. Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Human Genetics. 1980;32(3):314–31.

    CAS  Google Scholar 

  26. Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online. 2007;1:47–50.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Rohlf F. NTSYS-pc -Numerical Taxonomy and Multivariate Analysis System, vol. 2. New York: Applied Biostatistics Inc; 1988. p. 1.

    Google Scholar 

  28. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005;14(8):2611–20.

    Article  CAS  PubMed  Google Scholar 

  29. Earl DA, Vonholdt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Res. 2012;4(2):359–61.

    Article  Google Scholar 

  30. Peakall R, Smouse PE. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update. Bioinformatics. 2012;28(19):2537–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank LetPub ( for its linguistic assistance during the preparation of this manuscript.


This study was supported by the Sci-technological Project of Shaanxi Province (NYKJ-2016-35), the Sci-technological Project of Shaanxi Province (2013K02-10-01), the Sci-technological Project of Shaanxi Province (2020NY-050), the Sci-technological Project of Shaanxi Province (2019NY-041).

Author information

Authors and Affiliations



Yu Zhang wrote the manuscript, Yewen Wang and Peijiang Li performed the field experiments and identified all plant materials, Yuexing Wang, Shimao Zheng, Qiaoqiao He and Xixi Zhou performed the laboratory tests and analyzed data. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Yu Zhang.

Ethics declarations

Ethics approval and consent to participate

Yewen Wang and Peijiang Li identified all plant materials. In this study, experimental research and field studies of plant materials, including the collection of plant material and all methods were in compliance with relevant institutional, national and international guidelines and legislation.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., He, Q., Zhou, X. et al. Genetic diversity and population structure of 93 rice cultivars (lines) (Oryza sativa Xian group) in Qinba in China by 3 types of genetic markers. BMC Genomics 23, 550 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: