- Open Access
Genetic architecture of end-use quality traits in soft white winter wheat
BMC Genomics volume 23, Article number: 440 (2022)
Genetic improvement of end-use quality is an important objective in wheat breeding programs to meet the requirements of grain markets, millers, and bakers. However, end-use quality phenotyping is expensive and laborious thus, testing is often delayed until advanced generations. To better understand the underlying genetic architecture of end-use quality traits, we investigated the phenotypic and genotypic structure of 14 end-use quality traits in 672 advanced soft white winter wheat breeding lines and cultivars adapted to the Pacific Northwest region of the United States.
This collection of germplasm had continuous distributions for the 14 end-use quality traits with industrially significant differences for all traits. The breeding lines and cultivars were genotyped using genotyping-by-sequencing and 40,518 SNP markers were used for association mapping (GWAS). The GWAS identified 178 marker-trait associations (MTAs) distributed across all wheat chromosomes. A total of 40 MTAs were positioned within genomic regions of previously discovered end-use quality genes/QTL. Among the identified MTAs, 12 markers had large effects and thus could be considered in the larger scheme of selecting and fixing favorable alleles in breeding for end-use quality in soft white wheat germplasm. We also identified 15 loci (two of them with large effects) that can be used for simultaneous breeding of more than a single end-use quality trait. The results highlight the complex nature of the genetic architecture of end-use quality, and the challenges of simultaneously selecting favorable genotypes for a large number of traits. This study also illustrates that some end-use quality traits were mainly controlled by a larger number of small-effect loci and may be more amenable to alternate selection strategies such as genomic selection.
In conclusion, a breeder may be faced with the dilemma of balancing genotypic selection in early generation(s) versus costly phenotyping later on.
End-use quality improvement in soft white wheat (Triticum aestivum L.) is one of the primary objectives of wheat breeding programs. End-use quality is complex and involves multiple traits. The key end-use quality parameters in soft white wheat include softer kernels, lower grain protein content and gluten strength, less damaged starch, lower non-starch polysaccharides that lead to decreased water absorption capacity, larger cookies diameter and cake volume. For some soft wheat products, starch paste viscosity is a key quality trait.
In breeding programs, end-use quality phenotyping is laborious, expensive, time consuming and requires a large amount of grain. Consequently, selection for end-use quality is often delayed until later breeding stages [1, 2]. Since most end-use quality traits are predominantly controlled by genetic factors [3,4,5], a better understanding of the underlying genetic architecture of the various traits can support strategies for both phenotypic and genotypic selection, including an assessment of the potential effectiveness of marker-assisted selection. Analysis of marker trait associations have identified numerous quantitative trait loci (QTL) for different end-use quality traits distributed across all 21 wheat chromosomes [2, 4, 6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]. However, most of these studies were performed in hard wheat (bread wheat) and these investigations [2, 4, 17, 22, 23] were performed in soft wheat. Soft white wheat has unique milling and baking parameters which are aimed at making food products such as cookies and cakes [24, 25].
A number of end-use quality traits are influenced by the effect(s) of major genes. For example, the genetic architecture of grain hardness is primarily controlled by the puroindolines, of gluten strength by the high molecular weight glutenins, and of starch paste viscosity by the granule bound starch synthase (‘waxy’) genes [1, 26]. However, these major genes are often fixed in elite breeding populations due to parent selection, or early generation phenotypic and/or genotypic selection and do not sufficiently account for the levels of end-use quality required for cultivar release nor for the range of variation observed among breeding populations .
A number of mapping studies for end-use quality were performed in bi-parental populations and in some cases one or both parents were either poorly adapted or would not constitute ‘elite’ germplasm for applied plant breeding [9, 11, 13]. Additionally, the bi-parental genetic structure limits QTL mapping resolution. Genome-wide association mapping (GWAS) can overcome these limitations by using historical recombination events that occur throughout the germplasm evolution and using elite breeding germplasm from the breeding program of interest.. In this study we implemented GWAS using recent breeding lines and cultivars from the Washington State University (WSU) soft white winter wheat breeding program to investigate the underlying genetic architecture of phenotypic variation of 14 end-use quality traits in 672 soft white winter wheat genotypes. We identified end-use quality associated single nucleotide polymorphism (SNP) markers using GWAS and identified large effect QTL. These QTL contribute to better understanding of the underlying genetic architecture of end-use quality in soft white wheat and provide an objective assessment as to the potential for marker assisted selection (MAS) versus other genotypic and phenotypic selection strategies.
Materials and methods
A total of 672 soft white winter wheat breeding lines and cultivars were used in this study. The breeding lines were F4:5 lines and double haploid lines selected from different crosses to represent the diversity present in the WSU winter wheat breeding program. The genotypes and the environments in which the lines were grown were described in Aoun et al. . In brief, this germplasm was evaluated in 29 environments (year-location combinations). Genotypes were grown from 2015 to 2019 in seven locations in Washington State (WA), USA including Pullman, Lind, Davenport, Ritzville, Waterville, Walla Walla, and Dayton. In this dataset, there were 1–7 nurseries per environment with a total of 76 nurseries. From each nursery, a single sample from one replicate per genotype was evaluated for end-use quality traits. The dataset was unbalanced with some shared lines between environments. The connectivity between environments in terms of genotypes was described in Aoun et al. . There were 43 genotypes (out of the 672) evaluated for end-use quality in more than one-quarter of the environments .
The wheat genotypes were evaluated for 14 end-use quality traits that are classified into four categories which are grain characteristics, milling traits, flour characteristics, and baking parameters. The phenotypic and genotypic data were retrieved from Aoun et al.  which investigated genotype × environment interactions and tested the performance of genomic prediction for the 14 end‐use quality traits. Traits associated with grain characteristics included Single Kernel Characterization System (SKCS) hardness, SKCS size, SKCS weight, test weight, and grain protein content. SKCS hardness is a key determinant of end-use quality where hard wheat is mainly used for making bread, and soft wheat is primarily used for making cookies, cakes, and confectionery products [1, 28, 29]. In the grain market, test weight and grain protein content are the two main parameters. High test weight, which is correlated with kernel weight and size [30, 31], usually leads to higher milling performance .
Milling traits included break flour yield, flour yield, flour ash content, and milling score. Break flour yield was calculated as the percent of flour recovered from the break rolls, whereas flour yield (‘straight grade’) was determined as the proportion of grain recovered as flour (break plus reduction flour). Flour ash content is the minerals remaining after flour combustion. Milling score was a function of both flour yield and flour ash content . Higher break flour yield, flour yield, and milling score are desirable in soft wheat. Higher inclusion of bran reduces the functionality of most doughs and batters . As such, mineral content of flours (ash) serves as a proxy for bran contamination and lower flour ash is preferred.
Flour functionality plays an important role in baking performance. Flour parameters included flour protein content, flour sodium dodecyl sulfate sedimentation volume (SDS sedimentation), water solvent retention capacity (water SRC), and flour swelling volume (FSV). Unlike bread, soft white wheat products require lower grain/flour protein content, weaker gluten strength (lower SDS sedimentation volume), and low water SRC. FSV is an end-use quality parameter associated with the amount of amylose and amylopectin components in endosperm starch  and needs to be high for making some Asian-style noodles [1, 36]. The baking parameter cookie diameter is considered an important indicator of the overall quality of soft wheat [28, 37] and has been a key selection trait in soft wheat breeding programs.
These end-use quality traits were measured following the procedures from the American Association of Cereal Chemists International  and as described by Aoun et al. . The data set was analyzed using mixed linear model in the R package lme4 [39, 40]. The environments were considered random, while genotypes were fitted as fixed in the model. For each trait, best linear unbiased estimators (BLUEs) of the genotypes were extracted from the mixed linear model and used for further statistical analysis. Broad sense heritability (H2Cullis) and correlations between traits were previously described by Aoun et al. .
Genotyping-by-sequencing (GBS)  was used to genotype the 672 soft white wheat breeding lines and cultivars. The genotypic data for the 672 genotypes were previously provided by Aoun et al. . The GBS was performed at the North Carolina State University Genomic Sciences Laboratory in Raleigh, NC, USA. The sequence reads were aligned to the T. aestivum RefSeq v1.0 reference genome  and SNP data were filtered for minor allele frequency (MAF) ≥ 5%, missing data ≤ 30%, and heterozygous frequency ≤ 15%. From this, 40,518 SNPs were used for further analysis. Missing datapoints in the SNP data were imputed using the expectation–maximization algorithm implemented in the package rrBLUP  in R version 4.0.2 .
Population structure and linkage disequilibrium
To visualize the population structure in the 672 genotypes, principal component analysis (PCA) was performed using the ‘prcomp’ function in R based on 40,518 SNPs. The population structure was visualized using the first two principal components (PCs) that explained the highest percentage of variation. Pairwise linkage disequilibrium (LD) between SNPs (r2) was estimated using TASSEL v5  by applying a sliding window of 50 markers. The r2 values of marker pairs were plotted against the physical distances in Mega base pairs (Mb) after randomly selecting 10% of the total SNP pairs. To visualize the LD decay across the genome and for each of the 21 chromosomes, a locally estimated scatterplot smoothing (LOESS) curve was fitted using the function ‘geom_smooth’ in R package ggplot2 . The r2 threshold was derived from the 95th percentile of the distribution of unlinked r2 (for markers on different chromosomes)  that were significant at the 99.99% level of confidence. The r2 threshold is the value beyond which LD was likely to be caused by genetic linkage. The intersection of the horizontal line at the r2 threshold value with the LOESS curve on the LD scatter plot was considered as the estimate of the extent of LD across the genome (genome-wise LD decay plot) and across each chromosome (chromosome-wise LD decay plot).
Genome-wide association mapping
The BLUEs for each trait were considered as the phenotype in the GWAS. Association mapping was performed using three models 1) mixed linear model (MLM), 2) Fixed and random model Circulating Probability Unification (FarmCPU) , and 3) Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK)  implemented in the GAPIT R package . The single-locus MLM is the most widely used in association mapping studies. However, it tests one marker at a time and therefore is likely to increase the number of false negatives for complex traits [22, 51]. Multi-locus models such as FarmCPU were proposed to overcome this problem. FarmCPU iteratively uses fixed and random models in which the identified significant SNPs from the iterations are fitted as cofactors . FarmCPU was reported to control for false negatives and false positives without causing model overfitting. BLINK was derived from the FarmCPU method with a few modifications. BLINK does not assume that causal genes are evenly distributed across the genome. It also works directly on markers instead of bins and excludes markers in LD with the most significant markers. BLINK uses Bayesian Information Content (BIC) of a fixed effect model to approximate the maximum likelihood of a random effect model to select marker trait associations (MTAs).
The GWAS models considered family relatedness (Kinship matrix or K matrix)  and population structure (Q matrix). K matrix was included in all GWAS models, whereas the optimal number of principal components (PCs) in the Q matrix were determined based on quantile–quantile (Q-Q) plots that visualize the expected -log10 (P) versus the observed -log10 (P). The number of PCs included in the GWAS models was limited to the first four PCs. Manhattan plots for MTAs were visualized using the R package ‘qqman’ . MTAs were considered significant at a false discovery rate (FDR)  of ≤ 0.05.
Based on the T. aestivum RefSeq v1.0 reference genome assembly (https://wheat.pw.usda.gov/jb/?data=/ggds/whe-iwgsc2018), we identified annotated genes within the genomic regions of significant SNPs that exhibit large effects (significantly impacted trait values based on Tukey’s HSD test, had large impact on the phenotype, and were unlikely to be false positives, i.e., MAF ≥ 7%). In this search, we considered only high-confidence annotated genes located within a few kilobase pairs before and after the associated SNP physical positions. The putative biological functions of the candidate genes were retrieved from this website https://wheat-urgi.versailles.inra.fr/Seq-Repository/Annotations. In addition, we extracted MTAs available at the International Wheat Genome Sequencing Consortium (IWGSC) sequence repository that were within the genomic regions of large effect MTAs. Furthermore, markers (SNP, simple sequence repeat: SSR, Diversity Arrays Technology: DArT) associated with end-use quality in previous genetic studies [2, 4, 6,7,8,9,10,11,12,13,14,15,16,17, 19,20,21,22] and physically close (based on T. aestivum RefSeq v1.0 reference genome) to all MTAs in this study were determined.
The distributions of BLUEs for each of the 14 end-use quality traits are illustrated in Fig. 1. There were continuous phenotypic distributions for all end-use quality traits. For grain characteristics, BLUEs ranged from -0.8 to 54.1 for SKCS hardness, 2.1 to 3.2 mm for SKCS size, 28.5 to 48.3 mg for SKCS weight, 55.6 to 65.4 kg/hL for test weight, and 8.3 to 15.7% for grain protein. For milling traits, BLUEs ranged from 36.5 to 54.9% for break flour yield, 57.1 to 74.3% for flour yield, 0.20 to 0.51% for flour ash, and 67.0 to 98.6 for milling score. For flour parameters, BLUEs ranged from 6.4 to 12.4% for flour protein, 3.4 to 18.2 g/mL for SDS sedimentation, 44.5 to 73.1% for water SRC, and 11.6 to 26.2 mL/g for FSV. For cookie diameter, BLUEs ranged from 7.8 to 9.7 cm. For all of these traits, differences in phenotypes would be considered to be industrially significant, with many values below minimum targets . Moderate to high broad sense heritability (H2 = 0.46–0.70) was observed for all traits except for grain and flour protein content (H2 = 0.18 to 0.19) .
Population structure and linkage disequilibrium
Of the 40,518 SNPs, there were 14,102 (34.8%) SNPs on the A genome, 16,626 (41.0%) SNPs on the B genome, 8,656 (21.4%) SNPs on the D genome, and 1,134 (2.8%) SNPs on unaligned (UN) chromosome(s). PCA based on the first two PCs showed minimal clustering in wheat genotypes, which was expected since the plant materials in this study were from the same wheat breeding program (Supplementary Fig. S1). The first 10 PCs accounted cumulatively for 26.3% of the variation. The first four PCs explained 5.3%, 4.0%, 3.3% and 3.0% of variation, respectively. The genome-wise LD dropped to an r2 threshold of 0.1 within 6.5 Mb on average (Supplementary Fig. S2). LD decayed to 0.1 at ~ 2.5–5.0 Mb for chromosomes on the A genome, to 5.0–10 Mb for chromosomes on the B and the D genome (Supplementary Fig. S3).
Genome-wide association mapping
GWAS model selection
The best models within each method were selected based on examination of Q-Q plots. For MLM, we selected MTAs from the K (Kinship) model. Using FarmCPU, K + 2PCs (Kinship and Q based on the first two PCs) was selected to model SKCS hardness, SKCS weight, grain protein content, flour yield, flour ash, milling score, SDS sedimentation, water SRC, and cookie diameter, whereas K + 3PCs (Kinship and Q based on the first three PCs) was selected to model the remaining traits, SKCS size, test weight, break flour yield, flour protein content, and FSV. For BLINK, we selected K + 4PCs (Kinship and Q based on the first four PCs) for all traits.
In contrast to the Q-Q plots generated from FarmCPU models, the Q-Q plots from MLM and BLINK did not show a sharp deviation of the observed P-value distribution from the expected P-value distribution (Supplementary Fig. S4, S5, S6). These results suggest that FarmCPU provided a better control of false negatives and false positives compared to MLM and BLINK. Thus, only association mapping results from FarmCPU will be discussed in this study (Tables 1,2,3,4 , Supplementary Fig. S7). MTAs generated from MLM and BLINK are provided in Supplementary Table S1, S2.
Based on the LD between markers, each MTA identified from the FarmCPU models represent a distinct locus or QTL. Considering all traits together, a total of 178 significant MTAs were identified across all wheat chromosomes (Tables 1,2,3,4). Sixty-two MTAs were detected on the A genome, 77 MTAs on the B genome, 34 MTAs on the D genome, and five MTAs on unaligned (UN) chromosomes. Chromosome 1B and 7A carried the highest number of MTAs (n = 16), whereas chromosome 4D had only a single MTA. The favorable alleles and their corresponding frequencies are described in Tables 1,2,3,4. There were 12 large-effect markers associated with 11 traits (1–2 markers per trait) (Table 5). For SKCS size, FSV, and cookie diameter, all significant markers had small effects.
For grain characteristics, five MTAs with large effects were detected on chromosome 1B, 2B, 4B, 5A, and 6B (Table 5). Markers S5A_480515221 and S6B_705613777 were associated with SKCS hardness and impacted the hardness index by 6.7 and 7.9 units on average, respectively. Marker S2B_533178165 was associated with SKCS weight and influenced the phenotype by 3.55 mg. For test weight, S4B_413497949 had the largest effect and resulted in 1.2 kg/hL difference in the phenotype on average, whereas for grain protein content, S1B_46883868 had the largest effect with 0.85% increase/decrease in the phenotype. S1B_46883868 was also associated with flour protein content and affected the trait value by 0.63% on average.
For milling traits, five markers had large effects and were detected on chromosomes 1B, 1D, 5A, and 6B (Table 5). Marker S1B_653681752 was associated with break flour yield and flour yield and influenced trait values by 2.9% and 2.7% on average, respectively. An additional large effect marker was associated with flour yield: S6B_19335996 affected the trait value by 2.5%. Marker S4A_120144412 was associated with flour ash and influenced the phenotype by 0.03% on average. For milling score, S1D_14707739 and S5A_20640566 affected the phenotype by 3.8 and 2.4 units, respectively.
For flour parameters, there were four markers with large effects located on chromosomes 1B, 1D, and 4A (Table 5). In addition to break flour yield, S1B_653681752 also influenced water SRC by 4.9% on average. For SDS sedimentation, two large effect markers were identified including S1D_121990680 and S1D_411063068, which affected the trait values by 1.3 and 1.4 g/mL, respectively. Except for S5A_480515221, S4A_120144412, and S1D_121990680, which were associated with SKCS hardness, flour ash, and SDS sedimentation, the favorable alleles for the remaining nine large effect markers were present in high frequencies (86–93%) in this soft white wheat germplasm.
Loci associated with at least two end-use quality traits
Among the 178 MTAs identified in this study, there were 17 loci associated with more than a single end-use quality trait (Table 6). Among these loci, there were two large-effect markers (S1B_46883868 and S1B_653681752) and ten small-effect markers that were associated with at least two end-use quality traits. For each of these 12 markers, there was desirable linkage between the favorable alleles. This suggests that these markers are having desirable pleiotropic effects and could be useful to simultaneously breed for more than a single end-use quality trait. Based on pairwise LD estimates between physically close markers, there were additionally five loci on chromosomes 1A, 1B, 6B, 7A, and 7B that were associated with more than a single end-use quality trait (Table 6). For each of these loci, LD between significant markers that were associated with different traits was higher than the r2 threshold of 0.1. For three of these loci, S1A_586706397/S1A_587581129, S6B_27918221/ S6B_29771821 and S7A_730416426/S7A_731026067, there was desirable linkage between marker alleles in each locus, whereas for the other two loci S1B_561712520/S1B_569507932 and S7B_624947199/S7B_636744313, there was unfavorable linkage. Therefore, for the latter two loci, selecting for one trait could negatively affect the other trait.
Co-localized MTAs with previously identified end-use quality genes/QTL
A total of 35 annotated genes were located close to the physical positions of the 12 large effect markers. The putative functions of these genes are described in Supplementary Table S3. In addition, we found 13 GWAS MTAs available at the IWGSC sequence repository that were within the genomic regions of the 12 large effect markers identified in our study. These GWAS MTAs from previous studies were associated with thousand kernel weight, test weight, grain fill duration, grain protein content, SDS sedimentation, and grain minerals (Cu and Zn) (Supplementary Table S3). Furthermore, comparative mapping (based on physical positions of molecular markers) between all the 178 identified MTAs from this study and end-use quality QTL/genes from previous genetic studies [2, 4, 6,7,8,9,10,11,12,13,14,15,16,17, 19,20,21,22] showed that 40 MTAs were positioned within genomic regions of previously discovered end-use quality genes/QTL (Supplementary Table S4).
Of the 16 identified loci for SKCS hardness in this study, 10 loci were found within genomic regions of previously reported grain hardness QTL (Supplementary Table S4). For instance, SKCS hardness associated markers in this study, S1A_583894689, S3A_690786387, S3B_718850970, S5A_480515221, S6B_130163859, S6B_705613777, and S7A_12011069 were located close to the positions of previously reported grain hardness associated markers/QTL, QGh.caas-1A (~ 575 Mb, ), wPt-4725 (709 Mb ,), QKh.WJ-3B.3 (~ 695 Mb, ), S5A_463766631 (464 Mb, ), IWB11485 (121 Mb, ), S6B_703822990 (704 Mb, ), wPt-0744 (0.2 Mb, ), respectively. Similarly, S1B_635869100 was positioned 8–13 Mb from Qgh-1B , QKh.WY-1B.1b , QGh.cass-1B , QNhi.hwwgr-1BL , and QKH.ksw-1B . On chromosome 5B, S5B_549556474 identified in this study was found close to Qshi.hwwgr-5BL (566–571 Mb, ) and a QTL flanked by the SSR marker wmc289 (556 Mb, ). In addition, a MTA on chromosome 7B, S7B_643501407, was located at ~ 8 Mb from QKh.WJ-7B.1B  and QGh.caas-7B.1b . Our comparative mapping showed the importance of these 10 previously characterized loci in controlling kernel hardness and suggests that the remaining six loci could be novel.
Three loci associated with SKCS size were located close to previously identified kernel size QTL (Supplementary Table S4). For instance, S2B_154846350, associated with SKCS size in this study, was located at 4–5 Mb from IWB30179  and QKd.cob-1A (~ 134 Mb, ) that were associated with SKCS size and kernel diameter, respectively. Similarly, S2D_563799166 and S5A_578074731 were close to previously identified kernel diameter QTL, QKd.hwwgr-2DL (~ 552 Mb, ) and QKD.ksu-5A1 (568 Mb, ), respectively. For SKCS weight, S2D_563799166 identified in this study was positioned at ~ 12 Mb from QSkw.hwwgr-2DL (552 Mb, ) (Supplementary Table S4). Two of our identified MTAs for test weight were found near previously identified test weight associated markers/QTL. This includes S2A_762292662 located close to IWB35564 (760 Mb, ) and S7B_40194878 located at 7–8 Mb from QTW.ksu-7B  and IWB54370  (Supplementary Table S4).
Four of the associated loci with grain protein content in this study were previously reported (Supplementary Table S4). This includes S1B_633974958, which was positioned close to QGpc.caas-1B.1 (628–638 Mb) and QWgc.caas-1B (~ 628–634 Mb), which were associated with grain protein content and wet gluten content, respectively . Similarly, S4B_63121316 and S5D_543253602 were found close to associated markers with grain protein content gwm368 (60 Mb, ) and wPt-9788/wPt-0400 (560 Mb, ), respectively. On chromosome 7A, the grain protein content associated marker, S7A_731026067 was 18 Mb from Qgpc.7A.1 (wmc525, ), which was associated with both protein content and dry gluten content.
For break flour yield, the associated locus tagged by marker S3B_630394456 was at the physical position of IWA6254 (630 Mb) also associated with break flour yield . For flour yield, five of the identified MTAs were mapped close to flour yield associated loci in previous genetic studies (Supplementary Table S4). For instance, S1A_9128313 and S1A_15686346 were in close proximity to QFY.ksu-1A (7 Mb, ). S1A_9128313 and S1A_15686346 were also within the genomic regions of the genes TraesCS1A01G010900 (5 Mb) and TraesCS1A01G039600 (16 Mb), respectively. TraesCS1A01G010900 (5 Mb) and TraesCS1A01G039600 were annotated as low molecular weight glutenin subunit and high molecular weight glutenin subunit, respectively. Another MTA on chromosome 1B associated with flour yield, S1B_653681752, was identified close to QFY.ksu-1B (649 Mb) . Similarly, S5A_382294123 and S6B_19335996 were found in proximity to IWB76667 (384 Mb, ) and IWA7725 (27 Mb, ), respectively. Two flour ash associated loci in this study, S5A_3649534 and S5B_68052478, were positioned close to Qgac.cob-5A (gwm443, 11 Mb) and Qgac.cob-5B.1 (gwm540, 67 Mb), respectively, which were previously identified by El-Feki et al.  (Supplementary Table S4).
For flour protein, S2D_28763299 was located 15–17 Mb from QGpc.caas-2D , which was associated with grain protein content and QWgc.WY-2D.5 , which was associated with wet gluten content. Similarly, S7A_730416426 associated with flour protein content in this study was ~ 17 Mb away from qGPC.7A.1 , which was associated with grain protein content and dry gluten content (Supplementary Table S4).
Two SDS sedimentation associated markers in this study were previously identified (Supplementary Table S4). These loci include S1B_561712520 located close to IWB14950 (558 Mb, ) and the gene Glu-B1 (556 Mb) and S5B_539075483, which mapped close to QSsd.caas-5B (527 Mb, ). Among water SRC associated markers, S1B_547973154, S1B_653681752, S2B_66559534, and S6B_29771821 were positioned near Glu-B1 (556 Mb), IWB27057 (652 Mb, ), IWA820 (44 Mb, ), and IWA7725 (27 Mb, ), respectively (Supplementary Table S4).
This study used historical data that captured a wide range of phenotypic variation for end-use quality within a soft white winter wheat breeding program. Heritability estimates for end-use quality traits were moderate to high except for grain and flour protein content. This suggests that most traits are primarily controlled by genetic factors and that a genotypic selection (as opposed to phenotypic selection) is a rational strategy. This study identified the genetic architecture underlying 14 end-use quality traits among recent breeding lines and cultivars from a soft white winter wheat breeding program. Prior to this study, Jernigan et al.  investigated the genetic architecture of end use quality in a set of 480 advanced soft white winter wheat breeding lines and cultivars from Pacific Northwest breeding programs selected from 1992 to 2014. Thus, the germplasm used in this study is different from that used by Jernigan et al. . Consequently, our investigation was expected to corroborate previous and/or discover additional QTL associated with end-use quality in soft white wheat.
Identified MTAs in this study as well as genotypes with favorable alleles will be useful for end-use quality improvement in soft white and other types of wheat. The 12 large effect markers can be converted into Kompetitive Allele Specific PCR (KASP) or thermal asymmetric reverse PCR (STARP) markers for use in marker-assisted selection (MAS). Among these large effect markers, S1B_653681752 is useful to breed for higher break flour yield and flour yield and lower water SRC. Similarly, S1B_46883868 is associated with both grain protein content and flour protein. The favorable alleles of nine of the large effect markers were present in high frequencies in this germplasm. This suggested that these markers were under high selection pressure in the soft white wheat breeding program, likely the result of long-term phenotyping and selection, and the pyramiding of favorable alleles across the breeding populations. Based on our comparative mapping, eight of the large effect markers including S1B_46883868, S1D_14707739, S1D_121990680, S1D_411063068, S2B_533178165, S4A_120144412, S4B_413497949, and S5A_20640566 were not reported in previous studies, thus should be prioritized for MAS. Only a few loci were found to have large effects, suggesting that many end-use quality traits have complex genetic architecture and are mainly controlled by several minor genes with small effects. For some traits like SKCS diameter, FSV, and cookie diameter, all identified markers had small effects, suggesting that MAS may not be useful for these traits. Therefore, genomic selection might be a better approach to implement for such traits [27, 55].
Grain characteristics greatly influence wheat end-use quality [4, 7, 11, 30, 47]. Grain hardness affects most end-use quality traits including break flour yield, flour yield, flour particle size, starch damage, dough strength, and cookie diameter [27, 56,57,58]. The variation in grain hardness in the present soft wheat germplasm, like most soft wheat breeding populations, is independent of the puroindolines because wild-type puroindoline genes at the Ha locus are generally fixed. This is consistent that no MTAs were identified on chromosome 5DS in this study. Other grain characteristics including SKCS size, SKCS weight, test weight, and grain protein influence wheat milling performance [28, 30]. SKCS size and SKCS weight were highly correlated in this germplasm (r = 0.8; ) and this was reflected in the GWAS in which S2D_563799166 and S6B_583281710 were found to be associated with both traits.
Grain protein content is an essential quality trait that affects flour functionality. Unlike bread, soft wheat products often require lower protein levels to minimize gluten formation and mixing strength . The positive correlation (r = 0.4–05; ) between grain/flour protein content and SDS sedimentation (a measure of gluten strength) in this germplasm provides further evidence of their direct relationship. However, based on the GWAS, no significant markers were in common between SDS sedimentation and grain/flour protein content. Grain and flour protein were phenotypically correlated in this germplasm . This relationship was also evident in our GWAS in which five markers were associated with both grain and flour protein. Grain and flour protein in this wheat collection had low heritability estimates and high genotype by environment interactions as described by Aoun et al. . Consequently, most markers associated with grain/flour protein in this study had small effects, except for marker S1B_46883868.
Higher break flour yield, flour yield, lower flour ash, and higher milling score are desirable traits in soft wheat. Cultivars with alleles that increase these traits could lead to higher milling performance and thus greater profit for flour millers. Moderate to high heritability estimates and positive correlations among milling traits in this germplasm  suggest that genetic gain and simultaneous breeding for these traits is possible. Positive correlations between milling traits were also obvious in our GWAS results. For instance, S1B_653681752 and S5B_508665777 favorable alleles for break flour yield were also associated with higher flour yield. Similarly, S6D_471614981, a favorable allele for flour yield was also associated with higher milling score. Negative correlations between milling score and ash in this germplasm (r = -0.7) were discussed in Aoun et al. . This desirable negative correlation was also reflected in our GWAS in which the S5B_68052478 minor allele was associated with lower ash and higher milling score. We found that S1B_100055026, which was associated with break flour yield, was located close to Glu-B3 gene flanked by the DArT marker wPt-1317 (137 Mb, ). Similarly, the flour yield associated marker in this study, S1B_555294134, was located 1 Mb from Glu-B1 (556 Mb). It is well known that glutenin subunit families are major components of wheat endosperm storage proteins and are associated with many end-use quality traits. The presence of break flour yield and flour yield associated loci close to Glu-B1 and Glu-B3 may suggest that there is a genetic association between endosperm storage proteins and endosperm structure as evidenced by Boehm Jr et al. . The composition of the protein matrix surrounding starch granules likely contributes to the mechanical strength of the endosperm.
Flour and baking parameters
Unlike bread, confectionary products require lower gluten strength and water absorption capacity, which were measured using SDS sedimentation and water SRC, respectively. Higher water SRC is in part due to starch damage from milling and non-starch polysaccharides [5, 33, 60] and thus, lower water absorption is preferred as it results in better cookie spread and lower viscosity batters. Three water SRC associated markers co-localized with milling trait associated markers including S1B_653681752, S5A_382294123, and S6B_27918221/ S6B_29771821. Negative correlations between water SRC and milling traits previously discussed by Aoun et al.  were also observed in our GWAS results particularly for markers S1B_653681752 and S5A_382294123.
Higher FSV is desirable for making some Asian-style noodles [1, 36]. We found that S1A_534055653, which was associated with FSV in our study was near the gene Glu-A1 flanked by the SSR marker wmc312 (511 Mb, ). This result suggests genetic correlation between gluten content/strength and FSV. Similar observation was also found for cookie diameter in which its associated marker S1B_573323546 was close to the position of the gene Glu-B1. The FSV associated marker S7D_38000037 from this study was 2 Mb from the waxy locus Wx-D1. The association between S7D_38000037 and any null allele at Wx-D1 is at present unknown, but is unlikely as the known Waxy allele at Wx-D1 is rare . Similarly, we did not identify MTAs for FSV that were close to the locations of the other homoeologous waxy loci Wx-A1 and Wx-B1 which were located on chromosome 7A and 4A, respectively [35, 62]. Mutation/deletion in any of the three waxy loci often results in reduced amylose ‘partial waxy’ wheat which is associated with higher FSV. Therefore, the variation in FSV in this germplasm is likely independent of the waxy loci. As noted above, there were no major QTL identified for cookie baking. As such, alternative genotypic selection strategies such as genomic selection may be more appropriate for this trait.
In this study we investigated the phenotypic and genotypic structure of 14 end-use quality traits in 672 soft white winter wheat breeding lines and cultivars adapted to the Pacific Northwest region of the United States. A total of 178 MTAs were identified across all wheat chromosomes of which 40 MTAs were positioned within genomic regions of previously discovered end-use quality genes/QTL. These results highlight the fact that among the multitude of traits that a wheat breeder selects for, end-use quality is a relatively large proportion. The high heritability of most traits underscores the success of long-term phenotypic selection. Among the identified MTAs, 12 markers had large effects (eight of them were previously uncharacterized) and thus could be prioritized in breeding programs. For example, a relatively manageable number of lines, say, those resulting from head row selection, could be subjected to a single round of genotypic selection to fix the favorable allele at one or more of the large effect loci. Such a strategy could return benefits later on as a greater proportion of lines would meet end-use quality targets during subsequent replicated yield trials. This study also revealed that for some end-use quality traits (SKCS size, FSV, and cookie diameter), only small effect markers were identified, suggesting that these traits are controlled by multiple minor genes in this germplasm, and that alternative selection strategies such as genomic selection could augment traditional and laborious phenotyping.
Availability of data and materials
The datasets generated and analyzed during the current study are available in the T3/wheat repository, https:/wheat.triticeaetoolbox.org/.
Kiszonas AM, Fuerst EP, Morris CF. A comprehensive survey of soft wheat grain quality in U.S. germplasm. Cereal Chem. 2013;90:47–57.
Jernigan KL, Godoy JV, Huang M, Zhou Y, Morris CF, et al. Genetic dissection of end-use quality traits in adapted soft white winter wheat. Frontiers Plant Sci. 2018;9:271.
Smith N, Guttieri M, Souza E, et al. Identification and validation of QTL for grain quality traits in a cross of soft wheat cultivars Pioneer Brand 25R26 and Foster. Crop Sci. 2011;51(4):1424–36.
Carter AH, Garland-Campbell K, Morris CF, Kidwell KK. Chromosomes 3B and 4D are associated with several milling and baking quality traits in a soft white spring wheat (Triticum aestivum L.) population. Theor Appl Genet. 2012;124(6):1079–96.
Souza EJ, Sneller C, Guttieri MJ, et al. Basis for selecting soft wheat for end-use quality. Crop Sci. 2012;52(1):21–31.
Huang XQ, Cloutier S, Lycar L, et al. Molecular detection of QTLs for agronomic and quality traits in a doubled haploid population derived from two Canadian wheats (Triticum aestivum L.). Theor Appl Genet. 2006;113(4):753–66.
Li Y, Song Y, Zhou R, Branlard G, Jia J. Detection of QTLs for bread-making quality in wheat using a recombinant inbred line population. Plant Breed. 2009;128(3):235–43.
Li J, Cui F, Ding AM, et al. QTL detection of seven quality traits in wheat using two related recombinant inbred line populations. Euphytica. 2012;183(2):207–26.
Li Y, Zhou R, Wang J, Liao X, Branlard G, Jia J. Novel and favorable QTL allele clusters for end-use quality revealed by introgression lines derived from synthetic wheat. Mol Breed. 2012;29(3):627–43.
Li C, Bai G, Chao S, Carver B, Wang Z. Single nucleotide polymorphisms linked to quantitative trait loci for grain quality traits in wheat. Crop J. 2016;4(1):1–11.
Sun X, Marza F, Ma H, Carver BF, Bai G. Mapping quantitative trait loci for quality factors in an inter-class cross of US and Chinese wheat. Theor Appl Genet. 2010;120(5):1041–51.
Bordes J, Ravel C, Le Gouis J, Lapierre A, Charmet G, Balfourier F. Use of a global wheat core collection for association analysis of flour and dough quality traits. J Cereal Sci. 2011;54(1):137–47.
Jun M, Zhang CY, et al. Identification of QTLs conferring agronomic and quality traits in hexaploid wheat. J Integrative Agric. 2012;11(9):1399–408.
El-Feki WM, Byrne PF, Reid SD, Lapitan NL, Haley SD. Quantitative trait locus mapping for end-use quality traits in hard winter wheat under contrasting soil moisture levels. Crop Sci. 2013;53(5):1953–67.
Fox GP, Martin A, Kelly AM, et al. QTLs for water absorption and flour yield identified in the doubled haploid wheat population Lang/QT8766. Euphytica. 2013;192(3):453–62.
Maphosa L, Langridge P, Taylor H, Chalmers KJ, Bennett D, Kuchel H, Mather DE. Genetic control of processing quality in a bread wheat mapping population grown in water-limited environments. J Cereal Sci. 2013;57(3):304–11.
Jernigan KL, Morris CF, Zemetra R, et al. Genetic analysis of soft white wheat end-use quality traits in a club by common wheat cross. J Cereal Sci. 2017;76:148–56.
Kiszonas AM, Morris CF. Wheat breeding for quality: A historical review. Cereal Chem. 2018;95(1):17–34.
Goel S, Singh K, Singh B, et al. Analysis of genetic control and QTL mapping of essential wheat grain quality traits in a recombinant inbred population. PLoS ONE. 2019;14(3):e0200669.
Naraghi SM, Simsek S, Kumar A, et al. Deciphering the genetics of major end-use quality traits in wheat. G3: Genes Genomes Genet. 2019;9:1405–27.
Zhang G, Chen RY, Shao M, Bai G, Seabourn BW. Genetic analysis of end-use quality traits in wheat. Crop Sci. 2021;61:1709–23.
Aoun M, Carter AH, Ward BP, Morris CF. Genome-wide association mapping of the ‘super-soft’ kernel texture in white winter wheat. Theor Appl Genet. 2021;134:2547–59.
Thompson YA, Carter AH, Ward BP, Kiszonas AM, Morris CF. Association mapping of sponge cake volume in US Pacific Northwest elite soft white wheat (Triticum aestivum L.). J Cereal Sci. 2021;100:103250.
Kiszonas AM, Fuerst EP, Morris CF. Modeling end-use quality in US soft wheat germplasm. Cereal Chem. 2015;92(1):57–64.
Morris CF, Engle DA, Kiszonas AM. Breeding, selection, and quality characteristics of soft white wheat. Cereal Foods World. 2020;65(5):53.
Gale KR. Diagnostic DNA markers for quality traits in wheat. J Cereal Sci. 2005;41:181–92.
Aoun M, Carter AH, Thompson YA, Ward BP, Morris CF. Environment characterization and genomic prediction for end-use quality traits in soft white winter wheat. Plant Genome. 2021;14:e20128.
Morris CF, Rose SP. Wheat. In: Henry RJ, Kettlewell PS, editors. Cereal Grain Quality. New York: Chapman Hall; 1996. p. 3–54.
Bhave M, Morris CF. Molecular genetics of puroindolines and related genes: allelic diversity in wheat and other grasses. Plant Mol Biol. 2008;66(3):205–19.
Campbell KG, Bergman CJ, Gualberto DG, et al. Quantitative trait loci associated with kernel traits in a soft x hard wheat cross. Crop Sci. 1999;39:1184–95.
Galande AA, Tiwari R, Ammiraju JSS, et al. Genetic analysis of kernel hardness in bread wheat using PCR-based markers. Theor Appl Genet. 2001;103:601–6.
Gwirtz JA, Willyard MR, McFall KL. Wheat Quality in the United States of America. In: Popper L, Schäfer W, Freund W, editors. The Future of Flour. Kansas City: Sosland Publication Co; 2006. p. 17–42.
Kiszonas AM, Fuerst EP, Morris CF. Wheat arabinoxylan structure provides insight into function. Cereal Chem. 2013;90(4):387–95.
Morris CF, Li S, King GE, et al. A comprehensive genotype and environment assessment of wheat grain ash content in Oregon and Washington: analysis of variation. Cereal Chem. 2009;86:307–12.
Zeng M, Morris CF, Batey IL, Wrigley CW. Sources of variation for starch gelatinization, pasting, and gelation properties in wheat. Cereal Chem. 1997;74:63–71.
Guzman C, Peña RJ, Singh R, et al. Wheat quality improvement at CIMMYT and the use of genomic selection on it. Appl Translational Genomics. 2016;11:3–8.
Miller RA, Hoseney RC, Morris CF. Effect of formula water on the spread of sugar-snap cookies. Cereal Chem. 1997;74:669–71.
AACC International. Approved Methods of Analysis. 11th ed. St. Paul, MN: AACC International; 2008.
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Statistical Software. 2015;67(1):1–48.
Vazquez AI, Bates DM, Rosa GJM, et al. An R package for fitting generalized linear mixed models in animal breeding. J Animal Sci. 2010;88(2):497–504.
Poland JA, Brown PJ, Sorrells ME, Jannink JL. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE. 2012;7(2):e32253.
Appels R, Eversole K, Stein N, et al. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018;361:6403.
Endelman JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2011;4(3):250–5.
R Core Team. A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2020.
Bradbury PJ, Zhang Z, Kroon DE, et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016.
Breseghello F, Sorrells ME. Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics. 2006;172:1165–77.
Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS One Genet. 2016;12:e1005767.
Huang M, Liu X, Zhou Y, et al. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. GigaSci. 2019;8(2):154.
Lipka AE, Tian F, Wang Q, et al. GAPIT: genome association and prediction integrated tool. Bioinformatics. 2012;28:2397–9.
Wen YJ, Zhang H, Ni YL, et al. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform. 2018;19:700–12.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.
Turner S. qqman: QQ and Manhattan Plots for GWAS Data. R package version 0.1. 4. 2017.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc: Series B (Methodological). 1995;57:289–300.
Sandhu KS, Aoun M, Morris CF, Carter AH. Genomic selection for end-use quality and processing traits in soft white winter wheat breeding program with machine and deep learning models. Biol. 2021;10:689.
Campbell KG, Finney PL, Bergman CJ, et al. Quantitative trait loci associated with milling and baking quality in a soft× hard wheat cross. Crop Sci. 2001;41(4):1275–85.
Nelson JC, Andreescu C, Breseghello F, et al. Quantitative trait locus analysis of wheat quality traits. Euphytica. 2006;149:145–59.
Boehm JD Jr, Ibba MI, Kiszonas AM, See DR, Skinner DZ, Morris CF. Identification of genotyping-by-sequencing sequence tags associated with milling performance and end-use quality traits in hard red spring wheat (Triticum aestivum L.). J Cereal Sci. 2017;77:73–83.
Boehm JD Jr, Ibba MI, Kiszonas AM, See DR, Skinner DZ, Morris CF. Genetic analysis of kernel texture (grain hardness) in a hard red spring wheat (Triticum aestivum L.) bi-parental population. J Cereal Sci. 2018;79:57–65.
Guttieri MJ, Souza EJ, Sneller C. Nonstarch polysaccharides in wheat flour wire-cut cookie making. J Agric Food Chem. 2008;56(22):10927–32.
Morris CF, Kiszonas AM, Beecher BS, Peden GL. Registration of six partial waxy near-isogenic hexaploid wheat genetic stock lines lacking one or two granule bound starch synthase I genes. J Plant Reg. 2020;14(2):217–20.
Nakamura T, Vrinten P, Saito M, Konda M. Rapid classification of partial waxy wheats using PCR-based markers. Genome. 2002;45:1150–6.
We would like to thank Stacey Sykes for assistance in the publication of this manuscript.
Financial support was provided by USDA ARS CRIS Project 2090–43440-008-0D.
Ethics approval and consent to participate
We comply with relevant institutional, national, and international guidelines and legislation for plant studies.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Supplementary Fig. S1.
Principal component(PC) analysis obtained from 40,518 SNPs in 672 soft white winter wheat genotypes. The first two PCs, PC1 and PC2 explaining 5.3% and 4.0% of the variation, respectively. Supplementary Fig. S2. Scatter plot representing the genome-wise linkage disequilibrium (LD) decay. The LD estimate (r2)for pairs of SNPs was plotted against the corresponding physical positions inmega base pairs (Mb) based on IWGSC Wheat Chinese Spring IWGSC RefSeq v1.0. The dashed red line represents the population-specific critical value of r2=0.1.Supplementary Fig. S3. Scatter plot representing the chromosome-wise linkage disequilibrium (LD) decay. The LD estimate (r2) for pairs of SNPs was plotted against the corresponding physical positions in megabase pairs (Mb) based on Wheat Chinese Spring IWGSC RefSeq v1.0. The dashed redline represents the LD population threshold of 0.1. Supplementary Fig. S4. Quantile-Quantile plots of the expected -log10(P) versus the observed -log10(P) for association mapping model MLM for the14 end-use quality traits. Supplementary Fig. S5. Quantile-Quantile plots of the expected -log10 (P) versus the observed -log10(P) for association mapping model FarmCPU for the 14 end-use quality traits. Supplementary Fig. S6. Quantile-Quantile plots of the expected-log10 (P) versus the observed -log10(P) for association mapping model BLINK for the 14 end-use quality traits. Supplementary Fig. S7. Summary of genome-wide association studies for 14 end-use quality traits in 672 soft winter wheat genotypes based on Fixed and random model Circulating Probability Unification model. The horizontal red line indicates significance level at FDR ≤ 0.05.
Additional file 2: Table S1.
Summary of GWAS for 14 end-use quality in softwhite winter wheat using mixed linear model (MLM). Table S2. Summary of GWAS for 14 end-use quality in soft whitewinter wheat using Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) model. Table S3. Annotated genes within the genomic regions of large-effect markers identified in this study for end-use quality traits in soft white winter wheat. Table S4. Previously identified QTL within the genomic regions of Marker-Trait-Associations identified in this study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Aoun, M., Carter, A.H., Morris, C.F. et al. Genetic architecture of end-use quality traits in soft white winter wheat. BMC Genomics 23, 440 (2022). https://doi.org/10.1186/s12864-022-08676-5
- Soft white winter wheat
- End-use quality
- Molecular markers
- Association mapping