Genome-wide association study of eating and cooking qualities in different subpopulations of rice (Oryza sativa L.)

Starch and protein are two major components of polished rice, and the amylose and protein contents affect eating and cooking qualities (ECQs). In the present study, genome-wide association study with high-quality re-sequencing data was performed for 10 ECQs in a panel of 227 non-glutinous rice accessions and four derived panels. Population structure accounted for high phenotypic variation in three routine panels and had minor effects on subspecies-based panels. Using the mixed linear model method based on the P + K model, we detected 29, 24, 16, 17, and 29 loci that were significant for ECQ parameters in each of the five panels. Some of these loci were close to starch synthesis-related genes. Two quantitative trait loci (QTLs) (chr.9: 15417525 ~ 15474876; 17538294 ~ 18443016) for several starch paste viscosity properties detected in four panels were close to the isoamylase 3 gene, one QTL (chr.1: 30627943 ~ 31668474) for consistency detected in three panels was close to the starch synthase IV-1 gene. The QTL (chr.7: 1118122 ~ 1967247) for breakdown (BD), detected in the whole panel and japonica panel, and one QTL (chr.7: 25312126 ~ 26540950) for BD and setback (SB), detected in the whole panel and indica panel, may be specific gene alleles in japonica or indica panels. One previously detected QTL (chr.11: 22240707 ~ 22563596) for protein content and one new QTL (chr.5: 7756614 ~ 8042699) for many ECQ traits detected in more than two panels, may represent valuable targets for future cloning of the underlying genes. This study detected minor-effect QTLs affecting ECQs, and may increase our understanding of the genetic differences regulating the formation of ECQ between indica and japonica varieties.


Background
Improving grain quality and grain yield are among the most important objectives in rice breeding programs. Apparent amylose content (AAC) is known to affect the eating and cooking quality (ECQ) of rice [1]. However, the rapid viscosity analyser (RVA) profile can distinguish eating quality rice varieties with similar AACs by evaluating the breakdown (BD), setback (SB), and consistency (CS) viscosities [2]. In general, rice varieties with good eating quality have a BD of more than 100 rapid visco units (RVU) and an SB of less than 25 RVU [3]. Therefore, starch paste viscosities play an essential role in estimating the eating, cooking, and processing qualities of rice, and have received more and more attention in rice breeding programs [4,5].
The Wx gene encodes granule bound starch synthase I (GBSSI), which is responsible for amylose synthesis. Starch paste viscosity properties are predominantly controlled by the Wx gene, as well as minor-effect genes with various additive effects [6][7][8]. The starch paste viscosity of glutinous rice differs from that of nonglutinous rice due to its lack of amylose [1]. Since GBSSI is inactive in glutinous varieties, the starch paste properties are thought to be controlled by other genes. The pullulanase (PUL) gene plays an important role in the control of peak viscosity (PV), hot paste viscosity (HPV), cold paste viscosity (CPV), BD, peak time (PeT), and pasting temperature (PTemp) in glutinous rice [4]. Soluble starch synthase IIa (SSIIa) is responsible for HPV, CPV, BD, SB, PeT, and PTemp, as well as the thermal properties of glutinous rice [9]. Polymorphisms in both SBE1 and SBE3 loci account for 70 % of the observed variations in HPV and CPV, and account for 40 % in both PV and CS in waxy rice [10]. Many populations derived from bi-parental crosses with similar AACs were developed to identify QTLs for starch paste viscosities and eliminate the Wx effects in non-glutinous accessions [7,11]. A QTL cluster close to SSIIa that had major effects on the alkali spreading value (ASV), pasting time, and minor effects on gel consistency, AAC, PV, CPV, BD, and SB was identified [7]. The branching enzyme IIb (BEIIb or SBE3) gene cluster is responsible for PV, HPV, CPV, BD, CS, and gel consistency [7] . Another method to avoid the effects of the Wx gene on starch paste viscosity was performed to detect more QTLs within subpopulations of the same Wx background [8,12].
Association mapping is a powerful method to connect structural genomics and phenomics in plants, provided that information on population structure and linkage disequilibrium (LD) are available [13]. Association mapping, especially candidate-gene association mapping, is commonly used to detect QTLs for starch qualities [4,9,14,15]. Tian et al. [14] found that the starch synthesis-related genes (SSRGs) cooperated with each other to form a regulatory network that controls the ECQs and defines correlations among AAC, ASV, and gel consistency. Yan et al. [4] genotyped waxy rice with 17 SSRGs using 43 genetagged molecular markers, and found that 10 SSRGs are involved in controlling the RVA profile parameters. However, candidate-gene association mapping will miss other unknown loci, and genome-wide association studies (GWAS) are comprehensive approaches to systemically search the genome for causal genetic variations [16]. The distribution of functional alleles is strongly correlated with population structure since LD can be caused by an admixture of subpopulations, which leads to false-positive results, particularly in populations with a small number of samples [17]. An ideal sample with a subtle population structure and familial relatedness is the best panel to control for false-negatives [17]. Moreover, subdividing the diversity panel to analyze subpopulations independently using the mixed model provides a reasonable solution to lower false-positive or -negative rates [18]. For example, Huang et al. [19] chose only the indica landrace population to conduct GWAS for 14 agronomic traits. Huang et al. [20] collected phenotypic data of flowering time and grain-related traits for a GWAS on the indica or japonica subpopulations and the whole population, respectively.
Two subspecies of rice (Oryza sativa L.), indica and japonica, differ in several morphological and physiological characteristics including grain shape, shattering, apiculus hair length, leaf color, seed dormancy, cold tolerance, phenol, and sensitivity to potassium chlorate [21][22][23]. Moreover, japonica and indica rice varieties vary in starch physicochemical properties, which strongly influences rice cooking and processing methods for food and industrial applications [24]. The Wx a allele is predominant in non-waxy indica cultivars with high AACs (22-29 %), whereas the Wx b allele, with low AACs (12-19 %), is common to the non-waxy japonica variety [25][26][27]. The splice donor mutation is prevalent in temperate japonica rice but rare or absent in indica and tropical japonica rice [28]. The indica type SSIIa can convert S-type amylopectin to L-type amylopectin by elongating short chains of DP (degree of polymerization) ≤10 to form chains of DP 13-25 [24]. The japonica S-type amylopectin has enriched short branch chains of DP ≤10 and depleted intermediate chains of 12 ≤DP ≤24 compared with the indica L-type amylopectin, whereas the proportion of cluster interconnecting long chains of DP ≥25 are comparable in the two varieties [29]. Bao et al. [30] classified 56 glutinous rice varieties into two groups according to the gelatinization temperature (GT), corresponding to high GT and low GT, and found that all 15 high GT accessions belonged to indica rice. Despite the two genes (Wx and SSIIa) being well-studied, the exact genetic effects of other SSRGs in shaping grain quality and starch paste viscosities in different rice subpopulations remain unclear [4].
In the present study, we divided the 227 non-glutinous rice accessions into two panels and two ecotype based panels, and explored the genetics of 10 ECQ parameters in different panels and ecotypes based on genome-wide association mapping with high-quality re-sequencing data. These results will increase our understanding of the mechanism of ECQ properties in different ecotypes and how minor-effect QTLs function on ECQs.

Results
Phenotypic variation and contribution of population structure to phenotypic variation Population structure was evaluated using the ADMIX-TURE program, which estimates individual ancestry and admixture proportions assuming that K populations exist based on a maximum-likelihood method [31]. Because rice (Oryza sativa L.) has two subspecies, it was not surprising that indica and japonica subpopulations could be identified. However, the indica subpopulation could be further divided into indica, aus, as well as some admixtures (admix), while the japonica subpopulation could be divided into temperate japonica, tropical japonica, and some admixtures (Additional file 1: Figure S1). Principal component analysis (PCA) plots of the first two components ( Fig. 1) clearly differentiated rice accessions into different subpopulations, which corresponded with the population structure results ( Fig. 1; Additional file 1: Figure S1). All 10 phenotypic variations in the admixture group showed moderate levels. The aus group had the highest AAC, PeT, SB, and CS, but the lowest PV, HPV, CPV, BD, and PTemp. The indica group had a wide phenotypic variation range with regards to AAC, PV, HPV, CPV, BD, SB, and CS compared to the other groups, and its AAC, PV, CPV, SB, and CS mid-values were higher than those of the japonica group. Temperate japonica had the lowest AAC, SB, CS, and PeT, and the highest PTemp. The phenotypic variation of tropical japonica was between that of indica and temperate japonica. However, there was no significant difference in protein content (PRO) among groups (Fig. 2).
With respect to the whole panel, a wide range in all traits was observed for AAC (10.2~30.7 %), PRO (1.  Table 1). When comparing the indica panel with the japonica panel, the AAC, CPV, SB, CS, and PTemp of the indica panel were significantly higher than those of the japonica panel. The PRO, HPV, and PeT of the indica panel were significantly lower than those of the japonica panel, whereas PV and BD were similar between the two panels (Table 1). Furthermore, the AAC, PRO, SB, CS, and PTemp of Panel 1 were significantly higher than those of Panel 2. Panel 1 had lower PV, HPV, BD, and PeT than Panel 2 (Additional file 1: Table S1).
Given that population structure is a main factor affecting GWAS, we evaluated phenotypic variation among different subspecies groups (indica/japonica) in the whole panel (Fig. 2), as well as the population structure contribution for phenotypic variation in each panel (Table 1 and Additional file 1: Table S1).

PV, HPV, CPV, BD, SB, and CS
Two, four, three, three, four, and three loci were detected for PV, HPV, CPV, BD, SB, and CS, respectively. The major locus (chr.6: 1765761) was detected for PV, BD, SB, and CS, and the nearby locus (chr.6: 1750498) was the most significant site that linked with HPV and CPV. One locus (chr.5: 7756614) for PV was close to the locus   (Table 2).
indica and japonica panels AAC and PRO One locus (chr.6: 1529682) and one (chr.6: 1765761) for AAC were detected in indica and japonica panels, respectively. Three and six loci for PRO were detected in indica and japonica panels, respectively. One locus (chr.11: 22240707) for PRO in indica panel was close to the locus (chr.11: 22563596) detected in japonica panel (Table 4).    (Table 4). Three loci for PV, four for HPV, one for CPV, two for BD, three for SB, and two for CS were identified in the japonica panel. The QTL (chr.6: 1765761) for PV and SB was close to the locus (chr.6: 1759451) for CS. One locus (chr.9: 18431810) for HPV was close to the locus (chr.9: 18443016) for CPV, one QTL (chr.1: 31774927) for SB was close to one QTL (chr.1: 31668474) for CS in japonica panel (Table 4).
Three loci for PeT and four for PTemp were detected in the japonica panel. The locus (chr.5: 7996572) for PeT were close to the locus detected in the whole panel. The locus (chr.6: 6747298) for PTemp was close to the loci in the whole panel, Panel 1, and the indica panel (Table 4).

Effect of population structure on GWAS
Given that linkage disequilibrium can be caused by the admixture of subpopulations and may result in falsepositives if not correctly controlled in the statistical analysis, estimation of population structure must be included in the association analysis [17]. In this study, to determine whether different networks of alleles were associated with ECQ trait variation in the different panels, we performed GWAS on each of the four panels independently and in the panel as a whole. The present study allowed us to compare GWAS results derived from different panels.
The effect of population structure on phenotypic variation was estimated based on analysis of variance (ANOVA) ( Table 1 and Additional file 1: Table S1). For the whole panel, eight of 10 traits were strongly affected by population structure. For Panel 1 and 2, nine and five traits were strongly affected by population structure (P <0.01), respectively, indicating that Panel 2 (breeding lines) experienced artificial selection, which was less structured than Panel 1. However, only two traits were significantly affected by population structure in the indica and japonica panels at significance level of P <0.01, suggesting that association mapping in the subpopulations, e.g. indica or japonica subpopulations, with simple population structures can lower false-positives. indica and japonica varietal groups should be properly treated for association analysis, which may explain why transgressive offspring occur in divergent subpopulation crosses [18] .

QTL comparison among different panels
For convenience, we regarded those SNPs that were detected in different panels with the physical distance less than 2.0 Mb as the same QTL, and these SNPs were regarded as alleles.
Similar to previous studies, the Wx gene is responsible for AAC in these five panels. The single nucleotide polymorphism (SNP) (chr.6: 1765761) for AAC is the Wx a / Wx b allele (a SNP at the splice site of intron 1), which was present in the whole panel, Panel 1, and japonica panel. The Wx gene was also the major QTL for AAC in the other two panels, but the Wx a /Wx b allele was not the most significant association site, suggested that different traits maybe preferred linked with different SNPs. Besides the Wx gene, one minor effect locus (chr.9: 19310549) for AAC was detected in Panel 2 (consisted of breeding lines); this locus was close to the QTL flanked by BPH and RM160 [32] (Fig. 3).
One QTL (chr.5: 12559930~13312843) for PRO detected in the whole panel and Panel 1 was close to the protein content QTL qPRO5, which was detected by Xu et al. [8] in the M201-Wx subpopulation. The other QTL (chr.11: 22562237~22563596) for PRO detected in the whole, Panel 2, indica, and japonica panels was close to pro11, which is flanked by RM209 and RM229 [33].
For starch paste viscosity parameters, a specific SNP (chr.6: 1765761), recognized as the Wx a /Wx b allele, was responsible for 4 (PV, BD, SB, and CS), 3 (BD, SB, and CS), and 2 (PV and SB) in the whole panel, Panel 1, and japonica panel, respectively. There were also some SNPs close to Wx gene that significantly associated with RVA parameters. However, only one Wx gene linked SNP was detected for BD in Panel 2, that may due to the reason that most Korean breeding lines were low-AAC varieties with less diversity. The GWAS results in our study may increase our understanding of the genetic control of starch properties in South Korea japonica rice varieties with low AACs and the same Wx allele. Besides the Wx gene, 12 QTLs were identified in two panels, 3 QTLs were detected in 3 panels, and 2 QTLs were detected in 4 panels (Fig. 3). The QTL (chr.5: 7756614~8042699) was detected for PV in the whole panel and Panel 2, for HPV in the whole panel, Panel 2 and japonica panel, and for PeT in the whole panel and japonica panel. This locus has not been detected in previous studies, and is a potentially novel locus specifically present in japonica rice varieties. The QTL (chr.9: 17538294~18443016) for HPV in the whole panel, indica, and japonica panels, and for CPV in the whole panel and japonica panel, was close to qPV9 for PV, HPV, and CPV [8], which is close to the isoamylase 3 gene (Os09g0469400, chr.9: 17851431~17868668). The QTL (chr.7: 1118122 1967427) for BD in the whole panel and japonica panel and SB in the whole panel, was close to qSB7 [7]. The QTL (chr.7: 2532697~26540950) for BD and SB in the whole panel and indica panel was close to the QTL (linked with RM6420) for PV, HPV, CPV, and SB [34], and close to the QTL for SB and CS [8]. The QTL (chr.4: 24056499~24293919) for SB, detected in the whole panel and Panel 2, was close to the QTL for FV (final viscosity, CPV), PKT, SBV, and TV (HPV), and is linked with RM3785 [35]. The QTL (chr.1: 306279433 1774927) for CS in the whole panel, Panel 1, and japonica panel, and three loci for SB (japonica panel), PeT (whole panel), and CPV (whole panel) was 0.21 Mb from the SSIV-1 gene (Os01g0720600, chr.1:300324283 0041425), indicating that SSIV-1 may influence starch qualities through affecting CPV, SB, CS, and PeT. The QTL (chr.9: 15417525~15474876) for CS in the whole panel and Panel 1, and for CPV in Panel 1, was close to qCS9 described by Yan et al. [7]. The QTL (chr.3: 33737410~33893179) for PeT was detected in Panel 1 and the japonica panel, and was close to Btemp (peak temperature) flanked by RM203 and RM422 [15]. The QTL (chr.6: 6722646~7002095) for PTemp, detected in the whole panel, Panel 1, indica, and japonica panels, was close to the ALK gene (Os06g0229800, chr.6: 6748398~6753302) [36], which encodes starch synthase IIa (SSIIa).

Conclusions
Overall, we investigated the genetics of 10 rice ECQ parameters by GWAS mapping in the whole panel and four derived panels. GWAS within subspecies panels (indica/japonica) cannot only lower false-positives influenced by population structure, but can also increase our understanding of ECQ differences between indica and japonica subspecies. Comparison of GWAS results indicated that some loci were common in different panels, while some were in specific panels. Loci detected in more than one panel were potentially reliable QTLs for ECQs, and these QTLs should be properly used in molecular breeding to improve rice ECQs, particularly in the Korean japonica rice with similar AACs and the same Wx allele.

Materials
A total of 227 non-glutinous rice accessions from 295 rice accessions (Additional file 2: Table S2) and their whole genome re-sequencing data [40] were used in this study. Here, we defined the 227 accessions as a whole panel for GWAS, among which the core collection of 110 diverse rice accessions was defined as Panel 1, which comprises 57 Korean accessions and 53 rice accessions collected form 27 countries worldwide. The remaining 117 accessions were considered as Panel 2, which consists of breeding lines, one USDA rice accession, three Japan breeding lines, and 113 Korean breeding lines released between 1967 and 2011. Furthermore, all 227 accessions were divided into the indica panel (59) and japonica panel (168), since indica/japonica species contribute the most to the population structure. The whole panel was planted in 2014 during the growing season from early May to September at Kongju National University farm, Yeasan Gun, South Korea. A randomized block design with two replications for each line was used in field experiments.
After being air-dried and stored at room temperature for 3 months, all rice samples were de-hulled to brown rice, then milled into white rice, afterwards ground and passed through a 100-mesh sieve to obtain milled-grain flour. All samples were stored in cooling rooms at 4°C until use.

Phenotypic data evaluation AAC and protein content
Determination of AAC was performed using the iodine staining method [41]. The absorbance of the solution was measured at 620 nm against the blank solution with a spectrophotometer. AAC was calculated using a standard curve made from rice samples with known AACs. PRO was measured using the Kjeldahl method with a Kjeltec-Foss 70 2400 Auto analyzer using 5.95 as the nitrogen-to-protein conversion factor.

RVA profile parameters
The RVA profile of rice flour was determined using a Rapid Visco Analyser (RVA-3, Newport Scientific, Warrie wood, Australia). According to the method of AACC61-02, 3.0 g of rice flour (12 % m. b.) was placed in an aluminum canister, after which 25.0 g of distilled water was added. The heating profile was set as follows: (1) the temperature was held at 50°C for 1 min; (2) the temperature was linearly ramped up to 95°C until 4.8 min; (3) the temperature was held at 95°C until 7.3 min; (4) the temperature was linearly ramped down to 50°C at 11.1 min; and (5) held at 50°C until 12.5 min.
Four primary parameters were obtained from the pasting curve: PV, HPV, CPV, and PeT. Three secondary parameters were calculated from the primary parameters: BD (PV minus HPV), SB (CPV minus PV), and CS (CPV minus HPV). The viscosity parameters were measured in RVU. PTemp was determined according to the method proposed by Bao [42].

Statistical analyses of phenotypic data
Analysis of variance (ANOVA) was performed on the different panels to determine the phenotypic variation and population structure influence on phenotypic variation using the general linear model procedure (PROC GLM) by the SAS program (version 9.3, SAS Institute Inc., Cary, NC). In the model, population structure was set as the fixed independent variable, and 10 phenotypic data points were set as the dependent variables.

Population structure evaluation and principal component analysis
The genome re-sequencing data of the 227 accessions were collected from the 295 rice whole genome resequencing data, which were previously obtained using HiSeq 2000 and HiSeq 2500 [40,43]. After removing SNPs with a minimum frequency <0.05 by filter alignment using TASSEL 5.0, there were 2,071,925, 1,229,263, 1,563,570, 1,178,226, and 574,169 high-quality SNPs remaining for Panel 1, Panel 2, the whole panel, the indica panel, and the japonica panel, respectively. Population structure was evaluated by the ADMIXTURE program, which estimates individual ancestry and admixture proportions assuming that K populations exist based on a maximum-likelihood method [31]. We ran the program with the script: admixture -cv genotype.bed k k ranged from 1 to 5. Principal component analysis (PCA) of each panel was calculated by TASSEL 5.0 with default parameters by using high-quality SNPs.

Genome-wide association mapping of five panels
To control for false-positives, the P (PCA) + K model, based on a mixed linear model (MLM), was used for genomewide association mapping for the five panels by GAPIT software [44]. Here the principal components (PCs) were used to control for population structure in GWAS. We used the default parameters for running the GAPIT software: GAPIT < − myGAPIT(Y = myY, G = myG, PCA.total = 3, Model.selection = TRUE). myY is the phenotype matrix and myG is genotype matrix. SNP sites with the lowest P value in the peak region (P <10 −4 ) were considered significant SNPs (or loci) for phenotypic variation. Otherwise, sites with regions having only one SNP significant were considered false-positives. Loci with a physical distance less than 2.0 Mb were considered the same locus.