- Research article
The carotenoid biosynthetic and catabolic genes in wheat and their association with yellow pigments
BMC Genomicsvolume 18, Article number: 122 (2017)
In plants carotenoids play an important role in the photosynthetic process and photo-oxidative protection, and are the substrate for the synthesis of abscisic acid and strigolactones. In addition to their protective role as antioxidants and precursors of vitamin A, in wheat carotenoids are important as they influence the colour (whiteness vs. yellowness) of the grain. Understanding the genetic basis of grain yellow pigments, and identifying associated markers provide the basis for improving wheat quality by molecular breeding.
Twenty-four candidate genes involved in the biosynthesis and catabolism of carotenoid compounds have been identified in wheat by comparative genomics. Single nucleotide polymorphisms (SNPs) found in the coding sequences of 19 candidate genes allowed their chromosomal location and accurate map position on two reference consensus maps to be determined. The genome-wide association study based on genotyping a tetraploid wheat collection with 81,587 gene-associated SNPs validated quantitative trait loci (QTLs) previously detected in biparental populations and discovered new QTLs for grain colour-related traits. Ten carotenoid genes mapped in chromosome regions underlying pigment content QTLs indicating possible functional relationships between candidate genes and the trait.
The availability of linked, candidate gene-based markers can facilitate breeding wheat cultivars with desirable levels of carotenoids. Identifying QTLs linked to carotenoid pigmentation can contribute to understanding genes underlying carotenoid accumulation in the wheat kernels. Together these outputs can be combined to exploit the genetic variability of colour-related traits for the nutritional and commercial improvement of wheat products.
Carotenoids are organic pigments commonly present in plants, photosynthetic algae and some species of fungi and bacteria. They are normally associated with thylakoid membranes of chloroplasts and often provide the yellow, orange and red pigmentation to many flowers, fruits and roots . In plants, carotenoids play an important role in photosynthesis, photo-oxidative protection , and represent the substrate for the synthesis of apocarotenoid hormones, such as abscisic acid and strigolactones [3, 4]. Carotenoid actions and their relation to human health and disease have been widely reviewed . Carotenoids and some of their metabolites are suggested to play a protective role in a number of reactive oxygen species (ROS)-mediated conditions, such as, i.e., cardiovascular diseases, several types of cancer or neurological, as well as photosensitive or eye-related disorders.
Carotenoids are typically divided into two classes: carotenes, which are tetraterpenoid hydrocarbons, and xanthophylls that contain one or more oxygen groups . The carotenoid biosynthesis has been almost completely elucidated due to work in Arabidopsis thaliana, rice, maize and in some ornamental plants [6, 7]. Briefly, the first stage of the biosynthetic process, mediated by phytoene synthase (PSY), involves the condensation of two molecules of geranylgeranyl diphosphate to produce phytoene, which normally does not accumulate in tissues (Fig. 1). In higher plants, the phytoene undergoes a series of four desaturation reactions, mediated by phytoene desaturase (PDS), zeta-carotene isomerase (Z-ISO), zeta-carotene desaturase (ZDS) and carotenoid isomerase (CRTISO) that lead to the production of lycopene. Double lycopene cyclization can produce α-carotene (branch β-ε) or β-carotene (branch β-β). Subsequent modifications transform α-carotene to zeinoxanthin and lutein, and the β-carotene to β-cryptoxanthin, zeaxanthin, antheraxanthin, violaxanthin and neoxanthin. The oxidative cleavage of violaxanthin and neoxanthin form xanthoxin, which is converted to the phytohormone abscisic acid via ABA-aldehide . Strigolactones derive from β-carotenoids via a pathway involving the carotenoid cleavage dioxygenases CCD7, CCD8 and CYP711A1 .
Wheat is one of the most important crops worldwide and is the leading source of plant protein in human food, having a higher protein content than other major cereals, such as maize or rice . In addition to their protective role as antioxidant and as precursors of vitamin A, carotenoids are commercially important as they confer whiteness vs. yellowness degree to the end products of wheat. Consumers usually prefer white bread made from common wheat (Triticum aestivum L. subsp. aestivum), while yellow semolina and pasta made from durum wheat (Triticum turgidum L. subsp. durum) are preferred by the market. Flour and semolina colour is mainly the result of carotenoid accumulation in the grain , but the final colour of end-finished products is also associated to losses during grain storage and to the carotenoid oxidative degradation by enzymes, such as polyphenol oxidase, lipoxygenase and peroxidase, during processing [10, 11].
Flour and semolina colour in wheat is a quantitative trait controlled by several genes with additive effect, and influenced by environmental factors . Mapping studies for yellow pigment content (YPC) and yellow index (YI), in several biparental populations have identified QTLs on all wheat chromosomes (reviewed in Additional file 1: Table S1). The major QTL on the long arm of chromosome 7A, accounting for up to 60% of the phenotypic variation, was detected through all studies and attributed to allelic variations of the phytoene synthase (Psy-A1) gene [13–15]. Although there is an increased understanding of the mechanisms regulating carotenoid content and composition, only some carotenoid biosynthetic genes have been identified and cloned in wheat, such as phytoene synthase (PSY) [13, 16, 17], lycopene ε-cyclase (LYCE) [18, 19], carotene desaturase (PDS) and zeta-carotene desaturase (ZDS) , carotenoid β–hydroxylase (BCH) , lycopene β-cyclase (LYCB) .
As an alternative to classical linkage-based QTL mapping, the association mapping approach has received increased attention for detecting QTLs controlling complex traits . One of the potential disadvantages of genome-wide association studies (GWAS) is the appearance of spurious marker-trait associations (false-positive associations) resulting from population structure and multiple testing of thousands of markers [24, 25]. Association mapping can be simplified for some traits by the “candidate gene approach”, that is testing SNPs within a candidate gene for a significant association with the trait .
The objectives of the current study were to: a) identify candidate carotenoid metabolic/catabolic genes in wheat by exploiting genomic resources and SNPs detected within the coding sequences of candidate genes; b) provide the precise map position of candidate genes on high-density SNP-based consensus maps; c) identify the genetic loci controlling yellow pigments by GWAS and candidate gene approaches using a tetraploid wheat collection coupled with the 90 K iSelect SNP genotyping array. The identification of genetic loci controlling yellow pigment accumulation/degradation will provide information on the genetic resources available to breeders to improve commercial and nutritional properties of wheat products, as well as the opportunity to develop functionally associated markers to be used in marker-assisted selection (MAS).
Identification of carotenoid biosynthetic and catabolic genes of wheat
The A. thaliana isoprenoid pathways and respective genes from AtIPD (http://www.atipd.ethz.ch/) were used to identify and download the Arabidopsis gene sequences from the TAIR database (http://arabidopsis.org/). In order to isolate the wheat carotenoid sequences, the 24 cDNAs corresponding to all identified genes from A. thaliana database were used as query to extract sequences of T. aestivum and of the monocots Brachypodium distachyon, O. sativa and Zea mays (Table 1). The in silico analysis highlighted a lack of uniformity for acronyms and gene names/classifications used in literature between different plant species (e.g. the carotenoid β-ring hydroxylases is named BCH in Arabidopsis, CRTR-B or HYD in maize, and BCH or HYD in rice, Brachypodium and wheat). For simplicity, we used the gene nomenclature of A. thaliana, whose isoprenoid genes have been well characterized and reported in public metabolic pathway databases.
The bootstrapped molecular phylogenetic tree (Fig. 2), based on 119 carotenoid cDNAs which correspond to orthologous sequences of the above-mentioned five plant species showed clear clustering of the orthologs by gene family. Additionally this analysis showed that these carotenoid genes are generally highly conserved between species, with the minimum sequence similarity being between Arabidopsis and Brachypodium for NXS (70%), and the maximum similarity observed between Brachypodium and rice for CYP97C1 (89%). Sequence similarity helped to assign putative function to the identified wheat EST sequences. Table 1 lists the genebank entries of the carotenoid pathway genes of Arabidopsis, Brachypodium, rice, maize and wheat. The PSY gene family is tightly clustered based on the three paralogous genes, annotated as PSY1, PSY2 and PSY3, while in eudicots only the presence of PSY1 and PSY2 homologs have been reported [17, 27]. The BCH characterization present in literature  was confirmed by the phylogenetic tree: Ta_BCH1 clustered with Zm_BCH2, Os_BCH2 and Bd_BCH2, while Ta_BCH2 gene grouped with Os_BCH1.
The in silico gene expression analysis, using data from the publicly available Wheat 61 k GeneChip, revealed variation in transcription patterns for these carotenoid genes in a wide range of tissues and developmental stages in wheat (Additional file 2: Figure S1). Exploiting the PLEXdb database, the expression data was investigated to predict the genes’ impact on the final carotenoid content. In general, all carotenoid genes were found to be expressed to some degree during all developmental stages, with minimum expression levels of 3.53 and 4.51 RMA normalization for Z-ISO and CCD7, respectively, and maximum levels of 12.55 RMA normalization for ZDS. In particular, PSY1, PSY2, PDS, ZDS, LYCB, CYP97C1, CCD1, VDE, ZEP and NCED4 showed elevated expression levels (values higher than the mean values ± 2 SD) in seedling leaf (phase 6) while LYCE, BCH1 and BCH2 genes exhibited high level of transcripts in anthers before anthesis (phase 10). AAO3 showed higher levels of expression in reproductive tissues including immature pistil before anthesis. ABA2 showed the highest expression during the caryopsis-embryo-endosperm growth (phase 11 to 13). Low expression values (mean values ± 2 SD) were detected for LYCE in roots, CYP97C1 in anthers before anthesis, CCD8 in 22 DAP endosperm stage and NXS in floral bracts before anthesis.
After the phylogenetic analysis, a BLASTn analysis (based on percentage identity) was performed between the 24 wheat carotenoid genes and the entire wheat SNP dataset , which provides a marker coverage of about 85% of the genome. A total of 75 SNP markers corresponding to the 19 carotenoid gene sequences were identified, with several genes containing multiple SNPs (Table 2). No SNP markers were identified within the Z-ISO, CCD7, CCD8, CYP711A1 and NXS genes. Twenty-two and 32 SNP markers were located on the consensus durum  and bread wheat maps , respectively. This enabled us to assign genes to chromosomes groups; the CRTISO genes were mapped on chromosome group 1; BCH1 and VDE on homoeologous chromosome arms 2 L; LCYE on group 3; PDS on group 4; PSY2, PSY3, CCD1 and ABA2 on group 5; LUT5 on group 6; PSY1 and AAO3 on chromosome arms 7 L.
Phenotypic variation for yellow pigment content and yellow index
The tetraploid wheat collection, including 233 accessions of modern and old durum cultivars, durum landraces, domesticated and wild tetraploid wheat accessions, was evaluated for yellow index (YI) in six environments, and for yellow pigment content (YPC) in two environments. The analysis of variance showed highly significant differences among genotypes in each environment; environments, genotypes and environment x genotype interaction were significant in the combined analysis across environments (not shown). Mean, range, and heritability estimates (hB 2) for YPC and YI of the whole collection, and of the durum wheat sub-population in each trial are reported in Table 3. A normal frequency distribution (Additional file 3: Figure S2) was observed for both traits. Mean values of YI of the whole collection varied from 12.8 (F09) to 14.6 (V10), while mean values of the durum sub-population ranged from 13.3 (F09) to 15.3 (V10). The phenotypic variation in the whole collection (from 9.1 to 17.8) and in the durum sub-population (11.6–17.8) suggested that alleles for low and high YI were present in the T. turgidum subset of the collection. YPC in the whole collection ranged between 3.2 and 11.7 μg/g at F08, and between 2.4 and 12.6 μg/g at V09, with average values of 6.3 and 5.8 μg/g, respectively. The durum sub-population showed higher mean values than the whole collection. This would indicate that in recent decades durum wheat breeders have paid special attention to the selection of new cultivars with grain colour that will be of higher (commercial) value .
Broad-sense heritability in the whole collection ranged from 0.89 to 0.94 for YI, and from 0.91 to 0.95 for YPC. The high heritability values and the correlation coefficients among environments for YI and YPC (Tables 4 and Additional file 4: Table S2) indicated that both traits were stable, and that the phenotypic expression was mainly due to genotypic effects. Highly significant (0.001P) and positive correlation (r = 0.89) was observed between YPC and YI mean values across environments.
Association of carotenoid genes to yellow pigments
Out of 24 carotenoid candidate genes, 17 showed no SNPs in the coding sequences, failed in the array analysis, or had an allele frequency lower than 0.10 (Table 2) in the wheat collection. These genes were therefore removed from the Marker Trait Association (MTA) analysis. Seven candidate genes (PSY1, PSY2, BCH1, CYP97A3, VDE, ABA2 and AAO3) had between 1 to 5 SNPs, and a linear regression analysis was carried out between each SNP, and YPC and YI (Table 4). Except for BCH1 on 2BL, one or more SNPs of each candidate gene mapped onto one or both homeologous chromosomes were found to be significantly associated to YI, indicating their involvement in the yellow pigment biosynthesis or catabolism. PSY1, BCH1, CYP97A3, VDE and ABA2 were also significantly associated to YPC. The phenotypic variation (R2) explained by each of these markers varied from 5.9 to 16.3% for YI and from 7.4 to 14.8% for YPC. The estimated allelic effects for each marker ranged from −1.34 to 1.79 units for YI, and from 1.25 to 1.97 μg/g for YPC.
Detection of QTLs by GWAS
The wheat collection had been genotyped using the 90 K iSelect array. After excluding SNPs on the basis described in the methods, 13,639 SNPs in the whole collection and 9,863 SNPs in the durum sub-population were used for the association analysis. All of these SNPs have locations on the durum consensus map . MTAs were initially calculated by linear regression analysis (GLM) and by three more statistical models (GLM + PCs, MLM + K, MLM + K + PCs) taking into account the confounding effects of population structure and relative kinship to minimize the occurrence of false-positive associations. In general, unsurprisingly the number of significant MTAs with GLM and GLM + PCs was much higher than with MLM + K and MLM + K + PCs (Additional file 5: Table S3). The strong deviation of the observed -log10(P) values from the expected distribution (see Q-Q plots in Additional file 6: Figure S3) and the high number of significant MTAs clearly indicated the detection of numerous false-positives by GLM and GLM + PCs models. Observed P values were closer to expected distribution incorporating the K matrix only or the K matrix and the PCs into a MLM, providing more confidence in the associations for YI and YPC detected using this model. The MLM + K and MLM + K + PCs models gave similar results; to minimize possible false-positives we decided to focus on the results generated by the MLM + K + PCs model.
GWAS based on mean values of YI across environments detected nine significant QTLs in the whole collection, and five QTLs in the durum sub-population (Table 5). The QTLs identified in the analysis of the whole population were on chromosomes 4A, 4B (two), 5B, 7A (four) and 7B. The QTLs identified in the durum sub-population were on 4B (two) and 7A (three). Four QTLs (two on 4B and two on 7A) were identical in both analysis (the whole collection and in the durum sub-population). Out of nine significant QTLs for YI across environments, the QTL on 7A at 102.3 cM fulfilled the more stringent FDR criteria. The phenotypic variation (R2) for each of these markers varied from 4.8 to 6.1% in the whole collection and from 10.1 to 18.4% in the durum sub-population. The estimated allelic effects for each marker ranged from −1.25 to 1.33 units.
GWAS based on mean values of YPC over two environments (Table 6) detected three significant QTLs on chromosomes 4B (one) and 7A (two) both in the whole collection and in the durum sub-population, and one additional QTL on 4B (position 43.9 cM) in the durum sub-population. The QTL on 7A associated to the SNP marker IWB49295 located in the Psy-A1 coding sequence was consistent in both the whole collection and the durum sub-population. Out of four significant QTLs for YPC across environments in the durum sub-population, the QTL on 7A at 102.3 cM passed the FDR criteria. The phenotypic variation (R2) explained for each of these markers varied from 5.3 to 22.1%, while the allelic effects for YPC ranged from −1.90 to 1.79 μg/g.
To investigate the environmental variations on detection of significant QTLs by GWAS, the MTA analysis was carried out on the mean value over replicates for each of the six environments for YI and for each of the two environments for YPC (Tables 5 and 6). A high QTL-to-environment variation was observed for both traits as we identified 17 QTLs specific in single environments vs. common QTLs across environments. Considering the GWAS for YI in the whole collection, a minimum of 5 QTLs were detected at V12 and a maximum of 11 QTLs at V10. Eleven different QTLs were only identified in one environment, 7 in two environments, 4 in three environments, 1 in four environments and only 1 in five environments. Notably, no QTL was detected in all six environments. Genotype x environment (QTL x E) interaction was lower in the durum sub-population: 2 QTLs were detected in two environments, 3 in three environments, 1 in four environments and 1 in all six environments. The same trend was observed for YPC: 5 QTLs were identified in only one environment and 1 in both examined environments in the whole collection; out of 4 QTLs detected in the durum sub-population, 3 QTLs were consistent in one environment and 1 in both environments.
Identification and mapping of carotenoid genes in the wheat genome
The carotenoid biosynthetic pathway has been extensively studied in model plants and crop species due to their important roles in both development and photosynthesis , and their beneficial effects on human health . The wheat genome has still not been completely sequenced due to its huge size and complexity, and the knowledge of metabolic and catabolic pathway of carotenoid compounds remains incomplete.
Comparative genomic analysis across different taxa allowed to transfer functional information from well-characterized model organisms, such as Arabidopsis, rice and Brachypodium, to another less-studied taxon, like wheat. This has been beneficial for BCH1, BCH2, CYP97C1, CCD7, CCD1, NCED9 and CCD7 genes, many of which have been well characterized in rice, Brachypodium and Arabidopsis, but few of which have been studied in wheat. All the orthologues clustered by gene on the phylogenetic tree, sharing common conserved motifs in cDNA sequences. Unsuprisingly, the phylogenetic analysis revealed that the dicotyledonous PSY1 and PSY2 groups were more distantly related to those of the monocotyledonous groups, thus supporting the assumption that a single duplication event of the ancestor genes occurred before the divergence of the grass subfamilies [17, 27]. Differential duplication events took place in the BCH clade. A separation of the Arabidopsis BCH paralogs suggested the same time frame as the other genes for functional diversification , but an unexpected separation occurred prior to the main grass subfamily divergence for rice BCH1. Further studies on the gene structure and intron-exon size facilitate a better understanding of the BCH group. The in silico expression analysis of the carotenoid candidate genes included in the present study in a wide range of tissues and developmental stages showed that many of these genes had similar expression profiles. Additionally we observed that sometimes one or more genes were virtually unexpressed (such as Z-ISO and CCD7) or highly expressed (such as ZDS) in all the thirteen tissues/stages (Additional file 2: Figure S1). LYCE, BCH1, BCH2, CYP97A3 and ABA genes exhibited high expression levels in the anthers prior to anthesis and in kernel tissues, indicating their potential involvement in kernel carotenoids accumulation.
With the objectives of both characterizing the carotenoid genes and investigating their relationships with the amber colour of grain and flour of wheat, we analyzed a tetraploid wheat collection with the recently developed genotyping array including 81,587 gene-associated SNPs . The BLASTn analysis of the entire SNP dataset against the carotenoid gene sequences allowed to identifying 1–7 SNPs in the coding sequences of 19 out of 24 examined carotenoid candidate genes (Table 1). In many cases, at least one SNP was identified for each of the three homeologous genes present in the wheat genomes (PSY1, PSY2, PDS, ZDS, LYCE, CYP97A3, CCD1, ABA2 and AAO3). The recent availability of the high-resolution consensus map of durum  and common wheat  allowed us to determine the precise map position of most of the carotenoid genes (Table 1 and Fig. 3). The chromosomal location of 13 carotenoid genes determined by our strategy was consistent with results reported by Crawford and Francki (2013) , who identified the chromosomal locations based on survey sequence from the International Wheat Genome Sequencing Consortium (http://www.wheatgenome.org/). Map positions of a few carotenoid gene are reported in chromosome intervals as long as 5–20 cM in different SSR-based maps, such as PSY1 and PSY2  and LYCE . The carotenoid genes are distributed on 14 of the 21 chromosomes of bread wheat, and the identification of functional markers and map position can be particularly useful for breeders in MAS programs.
Association of carotenoid genes to yellow pigments
The allele frequency of SNP markers corresponding to carotenoid genes were found to be very variable in the examined wheat collection (Table 2). Several of these SNPs were either monomorphic, or had a MAF < 10% and therefore considered to be rare alleles. PSY1, PSY2, BCH1, CYP97A3, VDE, ABA2 and AAO3 were significantly associated to YPC and YI (Table 4), and this validated previous results obtained by using biparental mapping populations for PSY1 [15, 16], LYCE [19, 31] and AAO3 . The association of PSY2, BCH1, CYP97A3, VDE and ABA2 genes with YI and YPC is novel, and indicated that the SNP markers identified within the carotenoid gene sequences can represent a resource for developing genetic markers for use in marker assisted breeding.
Ten carotenoid metabolic/catabolic genes were mapped in corresponding chromosome regions with QTLs detected in the current work and/or in previous QTL studies (see review in Additional file 1: Table S1 and Fig. 3) indicating possible relations between candidate genes and grain colour-related traits. Six genes (CRTISO, VDE, LYCE, PSY2, CYP97A3 and PSY1) are directly involved in the biosynthesis of carotenoid compounds . Interestingly, the catabolic genes NCED9, ABA2 and AAO3, involved in the carotenoid cleavage to process violaxanthin and neoxanthin into abscisic acid, were located in chromosome regions influencing YPC [32–34]. These data are consistent with findings in other plant species such as Arabidopsis and maize [35, 36], demonstrating that carotenoid degradation is important in determining total carotenoid accumulation.
QTLs detected by GWAS and comparison with previous studies
In addition to the candidate gene approach, we conducted a GWAS by using the GLM and the MLM models taking into account the confounding effect of population structure and the relative kinship. Q-Q plots clearly indicated the MLM (K + PCs) as the most suitable model for the GWAS of YPC and YI, thus confirming other results of GWAS on quantitative traits carried out on crop plants . Several QTLs for YPC and YI, distributed on 12 of the 14 chromosomes of durum wheat, were detected (Tables 4 and 5 and Fig. 3). Four stable QTLs on 4B (two) and 7A (two) were associated with both YI and YPC, explaining the significant and positive correlation between the two colour-related traits found in the present and previous studies [38–40]. The higher number of QTLs for YI indicated that yellow pigments of wheat kernels are synthesized by different biochemical pathways, including that for the carotenoids, which interact in some way with the accumulation of carotenoids, such as polyphenol oxidase (PPO), lipoxygenase (LPX) and other carotenoid oxidative enzymes [10, 11]. In addition, it is possible that the wider variability of the entire wheat collection is determined by more genes influencing colour-related traits, and that some yellow pigment genes have been fixed during the breeding programs for grain colour improvement and therefore not detected in the durum sub-population.
Several studies on QTL mapping of yellow pigments in wheat have been published during the past two decades. A detailed list of QTLs detected in 26 peer-reviewed papers is reported in Additional file 1: Table S1 and the majority of them are illustrated in Fig. 3. Except chromosome 1D, QTLs for yellow pigments were detected on all wheat chromosomes. Results of QTL mapping studies indicated many differences in the number and map position of QTLs detected in the different experiments. This may be attributed to a high number of effective genes underlying QTLs coupled with: a) different contributions from parental genotypes of mapping populations; b) QTL x environment interactions; c) differences in the carotenoid extraction procedures and colour measurement, therefore different gene-to-trait associations revealed; d) marker density of linkage maps used in QTL analyses; e) differences in the statistical procedures used for QTL detection and threshold used for the statistical significance of MTAs.
While many of the QTLs for YI and YPC identified in the current study had been described previously (see Fig. 3 for a detailed comparison), 11 QTLs detected on 1AS, 2AL, 2BS, 3BL (two), 4BS, 5AS, 5BS (two) and 7AS (two) were new. Four of these QTLs were detected in more than one environment (Table 5 and Table 6), indicating that some wheat accessions of the examined collection possess new stable alleles potentially useful for improving colour and nutritional value of wheat grain. Additionally 16 QTLs detected in the present study (on chromosome arms 1BL, 2AL (two), 3BS, 4AL, 4BS, 5AL, 5BL, 6AL (two), 7AL (five), 7BL (two)) validated QTLs previously detected in different genetic backgrounds. Therefore these QTLs can be considered as stable and useful for MAS in breeding programs.
Genotype x environment interaction and QTL detection
With the aim to investigate if the results of GWAS were affected by environmental fluctuations, we conducted replicated trials for YI and YPC in six and two environments, respectively. Comparing the GWAS results, large variations in the number and type of QTLs were observed for both traits in different environments, thus confirming the existence of genotype x environment interaction effects as indicated by the variance analysis. Stable associations for YI in at least three over six environments in the whole collection were detected for five QTLs corresponding to one genomic region on chromosomes 5B, and four regions on 7A. In many cases, the SNP-trait associations were environment-specific, as 11 QTLs were consistent only in one environment and 7 in two environments. The same trend was observed for YPC evaluated in two environments. Although the high values of heritability (from 0.89 to 0.94 for YI and from 0.91 to 0.95 for YP) in open field trials, the complexity of the genetic basis of the studied traits tends to confound the interpretation of GWAS results. These findings are consistent with results obtained by association mapping and QTL linkage analyses experiments on complex traits with far lower heritability such as yield and yield components [41, 42]. The present study suggests that QTL analysis for agronomically important “true” quantitative traits should be always conducted in a plurality of environments with different soil and climatic conditions. Finally, the need to evaluate and take into account the G x E interaction is important in breeding programs to identify genotypes adapted in a wide range of environments.
Comparison between simple regression and MLM analysis for QTL detection
The SNPs located in the gene sequences PSY1, PSY2, BCH1, CYP97A3, VDE and ABA2 were significantly associated to YI and YPC by regression analysis but not by GWAS analysis. Only the SNP marker IWB59875 located in the coding sequence of the abscisic aldehyde oxidase (AAO3) on chromosome arm 7AL was consistent by both MTA analyses. The PSY1, PSY2, CYP97A3 and VDE genes were mapped on chromosome regions corresponding to QTLs for YI and YPC detected in the current study by GWAS or by previous studies using biparental mapping populations (see Fig. 3). NCED, CRTISO and LYCE, which were excluded from the regression analysis as they had allele frequencies lower than 0.05, were also mapped in chromosome regions corresponding to QTLs for YI and YPC. The same results were obtained by Zhao , who detected several SNPs near height-controlling genes consistent only by the naïve approach, and suggested that mapping populations derived from crosses between genetically distant parents could be needed to complement GWAS to reduce the rate of both false positives and false negatives. It is well known that GWAS carried out by the GLM model generally gives a high number of false-positives , and that it is necessary to take into account the confounding effect of population structure and relatedness among individual to control the overall probability of type I error . However, reducing the number of false positives may lead to increasing the number of false negatives, and in some situation ignoring most of the important findings on the genetics and physiology of the traits of interest . The combination of population genetic models and molecular biological knowledge into new QTL detection methods has been recently proposed to increase statistical power of GWAS in human and agricultural research, as to reduce the overall probability of type II error (false-negative associations), and incorporate biological context in GWAS results .
GWAS analysis in wheat collections can contribute to validate QTLs previously detected in biparental populations and to unravel new QTLs for colour-related traits. The MLM models can reduce the number of false positives, while the candidate gene approach can contribute to reduce the number of false negatives. However, GWAS analysis should be carried out on phenotypic data measured in more environments to detecting stable QTLs and determining the genotype x environment interactions that tend to confound the interpretation of MTAs and the genetic dissection even of quantitative traits with high heritability values. The availability of markers within the coding sequences of candidate genes can allow to elucidating the mechanism of carotenoid accumulation in the wheat kernels and to exploiting the genetic variability of colour-related traits for the nutritional and commercial improvement of end-finished products of wheat.
Plant materials and phenotypic evaluation
A collection of 233 accessions of tetraploid wheat (Triticum turgidum L., 2n = 4× = 28; AABB genome) was grown at Valenzano (Bari, southern Italy, 41°02′46″N, 16°53′09″E, altitude 118 m a.s.l., annual average rainfall 586 mm, average temperature 15,7 °C) for five years (2009, 2010, 2012, 2013 and 2014, hereafter reported as V09, V10, V12, V13, V14) and at Foggia (southern Italy, 41°32′11″N, 15°43′01″E, altitude 60 m a.s.l., annual average rainfall 469 mm, average temperature 15,4 °C) for three years (2008, 2009 and 2012, hereafter reported as F08, F09 and F12). The panel included accessions of seven T. turgidum subspecies: durum (124 accessions), durum var. ethiopicum (10), turanicum (20), polonicum (19), turgidum (16), carthlicum (14), dicoccum (18) and dicoccoides (12). The wheat collection has been extensively characterized in terms of genetic diversity and population structure , and has been used for the association mapping of loci controlling the resistance to stem rust  and β-glucan content . A detailed list of genotypes (number/name, year of release, country, pedigree) is provided by Laidò . A randomized complete block design with three replications was used with plots consisting of 1-m rows, 30 cm apart, with 50 germinating seeds per plot. During the growing season, standard cultivation practices were used. Grain samples were ground in a laboratory mill with a 1 mm sieve and the resulting whole flour stored at −4 °C for a maximum of 24 h before analysis. The determination of YPC was made according to AACC Approved Method 14–50  with slight modifications as described by Fares . YI was determined using the reflectance colorimeter Chroma Meter CR-300 (Minolta) and the “b*” value indicating the yellow intensity was used in subsequent analysis.
DNA extraction and SNP genotyping
Genomic DNA was isolated from freeze-dried leaf tissue following the protocol by Dellaporta . A total of 50 ng/μL of genomic DNA of each accession was analyzed with the wheat 90 K iSelect array . Genotyping was performed at TraitGenetics GmbH (http://www.traitgenetics.de) following the manufacturer’s recommendations as described in Akhunov . The genotyping assays were carried out to the Illumina iScan reader and performed using Genome Studio software version 2011.1.
Identification of putative carotenoid biosynthetic and catabolic gene sequences
The Arabidopsis thaliana isoprenoid pathways and respective genes from AtIPD (http://www.atipd.ethz.ch/) were used to identify and download from the TAIR database (http://arabidopsis.org/) the cDNA sequences involved in the carotenoid biosynthetic and catabolic pathway. Orthologous genes for Brachypodium distachyon, Oryza sativa, Zea mays and Triticum aestivum were retrieved from the UniGene Cluster database at NCBI (https://www.ncbi.nlm.nih.gov/) by carotenoid keyword searching. Phylogenetic analysis was carried out using the Neighbor-Joining method and a 1000 replication bootstrap test for significance . In order to denote the plant species, a two-letter prefix was placed before each gene symbol considering At for A. thaliana, Bd for B. distachyon, Os for O. sativa, Zm for Z. mays and Ta for T. aestivum. The alignment of each cDNA was performed via Mega4 software . The tree was generated with ClustalW2 (http://www.ebi.ac.uk/Tools/phylogeny/) and depicted with the program FigTree (http://tree.bio.ed.ac.uk/software/figtree/).
Wheat carotenoid gene sequences were blasted against the available dataset of SNP marker sequences reported by Wang , and markers aligned with 80% (IUM) identity were considered as markers within the coding sequences of the carotenoid genes. The BLASTn analysis was extended to contigs assembled in the chromosome survey-sequencing project (http://wheat-urgi.versailles.inra.fr/Seq-Repository) to identify additional SNPs flanking the carotenoid genes. All the retrieved wheat carotenoid cDNA sequences were blasted against the Wheat 61 k GeneChip in PLEXdb database (http://www.plantgdb.org) for obtaining information on carotenoid gene expression variation in different development phases.
Statistical analysis and QTL detection
Each year-location combination was considered as an environment, and analysis of variance was carried out using the standard procedure with the software MSTAT-C. Genetic variance (σ2 G), environmental variance (σ2 E) and broad-sense heritability (h2 B = σ2 G/(σ2 G + σ2 E + σ2 GxE) were obtained using the variance component estimates.
Pearson correlation coefficients were calculated between YPC and YI. Details about genetic diversity and population structure of the tetraploid wheat collection as investigated with SSR and DArT markers are provided by Laidò et al. , and with SNP markers by Marcotuli et al. . Using Bayesian clustering (K = 2), both sets of molecular markers distinguished the durum cultivars from the other tetraploid subspecies accessions; accordingly, GWAS was conducted on the whole collection and on the 124 durum varieties (hereafter referred to as durum sub-population). Mean values across replicates and mean values across replicates and years of YI and YPC were used in the GWAS for each environment and over environments, respectively. Prior to GWAS, markers that had >10% missing data points and markers with a minimum allele frequency (MAF) of less than 10% were removed from the data matrix. Unmapped markers on the consensus durum wheat map  were not used for association analysis. GWAS was carried out using TASSEL v.5 (http://www.maizegenetics.net) with and without correction for population structure. Associations between SNP markers and YPC and YI were calculated using the following models: a) simple regression analysis (general linear model, GLM); b) GLM including population structure as a covariate by using the Q-matrix derived from the principal component analysis (PCA) as implemented in TASSEL (GLM + PCs); c) mixed linear model (MLM) based on the kinship-matrix (MLM + K); d) mixed linear model based on both Q-matrix and K-matrix (MLM + K + PCs). The statistical models used in the present GWAS were extensively reviewed by Astle and Balding  considering the most widely used statistical approaches for controlling the confounding effects of population structure. The most appropriate GWAS method was chosen by inspection Q-Q plots and Manhattan plots for evidence of P value inflation. A marker-trait association was considered significant when one or more markers were associated with YPC or YI at threshold –log10(P) ≥ 3.0 determined by the modified Bonferroni correction as implemented in Genstat (GenStat, 2003). A false discovery rate (FDR) at 0.05P was calculated by the q-value package in R software . For the associations between carotenoid candidate genes and YPC and YI, the conservative Bonferroni correction for multiple testing was calculated by dividing P < 0.01 with the number of markers used in the analysis. Chromosome localization and map position of SNP markers were derived from the high-density linkage maps described by Maccaferri  for durum wheat and by Wang  for common wheat used as reference maps.
Quantitative trait loci
Single nucleotide polymorphisms
Yellow pigment content
Hirschberg J. Carotenoid biosynthesis in flowering plants. Curr Opin Plant Biol. 2001;4(3):210–8.
Cazzonelli CI, Pogson BJ. Source to sink: regulation of carotenoid biosynthesis in plants. Trends Plant Sci. 2010;15(5):266–74.
Seo M, Koshiba T. Complex regulation of ABA biosynthesis in plants. Trends Plant Sci. 2002;7(1):41–8.
Xie X, Yoneyama K, Yoneyama K. The strigolactone story. Annu Rev Phytopathol. 2010;48:93–117.
Fiedor J, Burda K. Potential role of carotenoids as antioxidants in human health and disease. Nutrients. 2014;6(2):466–88.
Moise AR, Al-Babili S, Wurtzel ET. Mechanistic aspects of carotenoid biosynthesis. Chem Rev. 2014;114(1):164–93.
Ruiz-Sola MA, Rodriguez-Concepcion M. Carotenoid biosynthesis in Arabidopsis: a colorful pathway. The Arabidopsis Book. 2012;10:e0158. doi:10.1199/tab.0158.
Shewry PR, Halford NG. Cereal seed storage proteins: structures, properties and role in grain utilization. J Exp Bot. 2002;53(370):947–58.
Mares DJ, Campbell AW. Mapping components of flour and noodle colour in Australian wheat. Aust J Agric Res. 2001;52:1297–309.
Feillet P, Autran J-C, Icard-Vernière C. Pasta brownness: an assessment. J Cereal Sci. 2000;32(3):215–33.
Ficco DBM, Mastrangelo AM, Trono D, Borrelli GM, De Vita P, Fares C, et al. The colours of durum wheat: a review. Crop Pasture Sci. 2014;65:1–15.
Clarke B, Liang R, Morell M, Bird A, Jenkins C, Li Z. Gene expression in a starch synthase IIa mutant of barley: changes in the level of gene transcription and grain composition. Funct Integr Genomics. 2008;8(3):211–21.
He XY, Zhang YL, He ZH, Wu YP, Xiao YG, Ma CX, et al. Characterization of phytoene synthase 1 gene (Psy1) located on common wheat chromosome 7A and development of a functional marker. Theor Appl Genet. 2008;116(2):213–21.
He XY, He ZH, Ma W, Appels R, Xia XC. Allelic variants of phytoene synthase 1 (Psy1) genes in Chinese and CIMMYT wheat cultivars and development of functional markers for flour colour. Mol Breeding. 2009;23(4):553–63.
Zhang W, Dubcovsky J. Association between allelic variation at the Phytoene synthase 1 gene and yellow pigment content in the wheat grain. Theor Appl Genet. 2008;116(5):635–45.
Pozniak CJ, Knox RE, Clarke FR, Clarke JM. Identification of QTL and association of a phytoene synthase gene with endosperm colour in durum wheat. Theor Appl Genet. 2007;114(3):525–37.
Dibari B, Murat F, Chosson A, Gautier V, Poncet C, Lecomte P, et al. Deciphering the genomic structure, function and evolution of carotenogenesis related phytoene synthases in grasses. BMC Genomics. 2012;13(1):1–14.
Howitt CA, Cavanagh CR, Bowerman AF, Cazzonelli C, Rampling L, Mimica JL, et al. Alternative splicing, activation of cryptic exons and amino acid substitutions in carotenoid biosynthetic genes are associated with lutein accumulation in wheat endosperm. Funct Integr Genomics. 2009;9(3):363–76.
Crawford AC, Francki MG. Lycopene-ε-cyclase (e-LCY3A) is functionally associated with quantitative trait loci for flour b* colour on chromosome 3A in wheat (Triticum aestivum L.). Mol Breeding. 2013;31(3):737–41.
Cong L, Wang C, Li Z, Chen L, Yang G, Wang Y, et al. cDNA cloning and expression analysis of wheat (Triticum aestivum L.) phytoene and zeta-carotene desaturase genes. Mol Biol Rep. 2010;37(7):3351–61.
Qin X, Zhang W, Dubcovsky J, Tian L. Cloning and comparative analysis of carotenoid beta-hydroxylase genes provides new insights into carotenoid metabolism in tetraploid (Triticum turgidum ssp. durum) and hexaploid (Triticum aestivum) wheat grains. Plant Mol Biol. 2012;80(6):631–46.
Zeng J, Wang X, Miao Y, Wang C, Zang M, Chen X, et al. Metabolic engineering of wheat provitamin a by simultaneously overexpressing CrtB and silencing carotenoid hydroxylase (TaHYD). J Agric Food Chem. 2015;63(41):9083–92.
Rafalski JA. Association genetics in crop improvement. Curr Opin Plant Biol. 2010;13(2):174–80.
Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9:29.
Astle W, Balding DJ. Population structure and cryptic relatedness in genetic association studies. Stat Sci. 2009;24:451–71.
Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D. Buckler Est. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001;28(3):286–9.
Gallagher CE, Matthews PD, Li F, Wurtzel ET. Gene duplication in the carotenoid biosynthetic pathway preceded evolution of the Grasses. Plant Physiol. 2004;135:1776–83.
Wang SW, Forrest D, Allen K, Chao A, Huang S, Maccaferri BE, et al. Characterization of polyploid wheat genomic diversity using a high-density 90,000 single nucleotide polymorphism array. Plant Biotechnol J. 2014;12(6):787–96.
Maccaferri M, Ricci A, Salvi S, Milner SG, Noli E, Martelli PL, et al. A high-density, SNP-based consensus map of tetraploid wheat as a bridge to integrate durum and bread wheat genomics and breeding. Plant Biotechnol J. 2014;13:648–63.
Blanco A, Colasuonno P, Gadaleta A, Mangini G, Schiavulli A, Simeone R, et al. Quantitative trait loci for yellow pigment concentration and individual carotenoid compounds in durum wheat. J Cereal Sci. 2011;54(2):255–64.
Qin X, Fischer K, Yu S, Dubcovsky J, Tian L. Distinct expression and function of carotenoid metabolic genes and homoeologs in developing wheat grains. BMC Plant Biol. 2016;16(1):155.
Colasuonno P, Gadaleta A, Giancaspro A, Nigro D, Giove S, Incerti O, et al. Development of a high-density SNP-based linkage map and detection of yellow pigment content QTLs in durum wheat. Mol Breeding. 2014;34:1563–78. doi:10.1007/s11032-014-0183-3.
Tsilo TJ, Hareland GA, Chao S, Anderson JA. Genetic mapping and QTL analysis of flour colour and milling yield related traits using recombinant inbred lines in hard red spring wheat. Crop Sci. 2011;51(1):237.
Roncallo PF, Cervigni GL, Jensen C, Miranda R, Carrera AD, Helguera M, et al. QTL analysis of main and epistatic effects for flour colour traits in durum wheat. Euphytica. 2012;185(1):77–92.
Wurtzel ET, Cuttriss A, Vallabhaneni R. Maize provitamin a carotenoids, current resources, and future metabolic engineering challenges. Front Plant Sci. 2012;3:29.
Gonzalez-Jorge S, Ha SH, Magallanes-Lundback M, Gilliland LU, Zhou A, Lipka AE, et al. Carotenoid cleavage dioxygenase4 is a negative regulator of beta-carotene content in Arabidopsis seeds. Plant Cell. 2013;25(12):4812–26.
Gupta PK, Kulwal PL, Jaiswal V. Association mapping in crop plants: opportunities and challenges. Adv Genet. 2014;85:109–47.
Fratianni A, Irano M, Panfili G, Acquistucci R. Estimation of colour of durum wheat. Comparison of WSB, HPLC, and reflectance colorimeter measurements. J Agric Food Chem. 2005;53(7):2373–8.
Digesù AM, Platani C, Cattivelli L, Mangini G, Blanco A. Genetic variability in yellow pigment components in cultivated and wild tetraploid wheats. J Cereal Sci. 2009;50(2):210–8.
Zhang KP, Chen GF, Zhao L, Liu B, Xu X-B, Tian JC. Molecular genetic analysis of flour colour using a doubled haploid population in bread wheat (Triticum aestivum L.). Euphytica. 2009;165(3):471–84.
Edae EA, Byrne PF, Haley SD, Lopes MS, Reynolds MP. Genome-wide association mapping of yield and yield components of spring wheat under contrasting moisture regimes. Theor Appl Genet. 2014;127(4):791–807.
Mora F, Castillo D, Lado B, Matus I, Poland J, Belzile F, et al. Genome-wide association mapping of agronomic traits and carbon isotope discrimination in a worldwide germplasm collection of spring wheat using SNP markers. Mol Breeding. 2015;35(2):1–12.
Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun. 2011;2:467.
Breseghello F, Sorrells ME. Association analysis as a strategy for improvement of quantitative traits in plants. Crop Sci. 2006;46(3):1323.
Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456(7223):728–31.
Marjoram P, Zubair A, Nuzhdin SV. Post-GWAS: where next? More samples, more SNPs or more biology? Heredity. 2014;112(1):79–88.
Laidò G, Mangini G, Taranto F, Gadaleta A, Blanco A, Cattivelli L, et al. Genetic diversity and population structure of tetraploid wheats (Triticum turgidum L.) estimated by SSR, DArT and pedigree data. PLoS One. 2013;8(6):e67280.
Laidò G, Marone D, Russo MA, Colecchia SA, Mastrangelo AM, De Vita P, et al. Linkage disequilibrium and genome-wide association mapping in tetraploid wheat Triticum turgidum L. PLoS One. 2014;9(4):e95211.
Marcotuli I, Houston K, Schwerdt JG, Waugh R, Fincher GB, Burton RA, et al. Genetic diversity and genome wide association study of β-glucan content in tetraploid wheat grains. PLoS One. 2016;11(4):e0152590.
AACC International. Approved Methods of Analysis, Method 14–50.01. Determination of Pigments. AACC International, 11th Ed., St. Paul, MN, U.S.A. 1961. http://dx.doi.org/10.1094/AACCIntMethod-14-50.01.
Fares C, Platani C, Tamma G, Leccese F. Microtest per la valutazione del colore in genotipi di frumento duro. Molini d’Italia, Anno XLII. 1991;12:19–21.
Dellaporta SL, Wood J, Hicks JB. A plant DNA minipreparation: Version II. Plant Mol Biol Rep. 1983;1(4):19–21.
Akhunov E, Nicolet C, Dvorak J. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theor Appl Genet. 2009;119(3):507–17. doi:10.1007/s00122-009-1059-5.
Felsenstein J. Confidence limits on phylogenies with a molecular clock. Syst Zool. 1985;34(2):152–61.
Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007;24(8):1596–9.
Dabney A, Storey JD, Warnes GR. qvalue: Q-value estimation for false discovery rate control. R package version 1380. 2010.
Vranová E, Hirsch-Hoffmann M, Gruissem W. AtIPD: A curated database of Arabidopsis isoprenoid pathway models and genes for isoprenoid network analysis. Plant Physiol. 2011;156(4):1655–60.
This research was supported by grants from MIUR, Italy, projects ‘PON-01_01145 – ISCOCEM”. KH would like to acknowledge funding from the Rural & Environment Science & Analytical Services Division of the Scottish Government.
Availability of data and materials
Six additional files were uploaded in BMC Genomics website for supporting the results and findings found in this study.
PC, IM, A. Gadaleta and AB designed the research; MLL, DN, A. Giancaspro, GM, PDV, AMM, NP, KH and RS performed the research and analyzed the data. PC and AB wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The manuscript does not report the use of any animal or human data or tissue.
No data related to nucleic acid sequences and protein sequences need to be deposited since they were already available in NCBI database.
List of detected QTLs for yellow index and/or yellow pigment content in wheat. (DOCX 18 kb)
Expression analysis from PLEXdb database of all key genes in carotenoid biosynthesis. (DOCX 83 kb)
Frequency distributions of yellow index and yellow pigment content. (DOCX 28 kb)
Correlation coefficients of yellow pigment content and of yellow index. (DOCX 12 kb)
Number of significant marker-trait associations detected by four GWAS models. (DOCX 13 kb)
Genome-wide association analysis for yellow index and yellow pigment content: Q-Q plots. (DOCX 1809 kb)