- Research article
- Open Access
Genome-wide transcriptome study in wheat identified candidate genes related to processing quality, majority of them showing interaction (quality x development) and having temporal and spatial distributions
BMC Genomics volume 15, Article number: 29 (2014)
The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments.
Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions.
The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers.
Bread wheat (Triticum aestivum L.) is the most common cultivated wheat in the world. Its flour can be processed into a wide range of food products such as bread, pasta, biscuits, unleavened flat bread (chapatti), etc. The end use quality of products is mainly dependent on processing quality, which is largely determined by balance composition of biochemical molecules in seed such as seed storage proteins , starch , phenolic compounds , etc. There is a continuous increasing demand for good quality products both by consumers and baking industries. For improvement of processing quality it is important to understand the genome-wide expression of genes and their temporal and spatial distributions. Of several approaches for genome-wide study, microarrays comprising a large amount of probe sets of transcripts can be useful for the identification of differentially expressed genes in diverse or contrasting set of genotypes for trait of interest. In this study, two traditionally known good and two poor chapatti making Indian wheat varieties were used to identify the candidate genes whose expression changed due to chapatti quality using wheat microarrays.
Transcriptome analysis has been used to improve genome-wide understanding of molecular mechanism of gene expression. In wheat, it has been done using either the 8K wheat microarray chips [4, 5] or 61K wheat microarray chips [5, 6]. Recently, the next generation sequencing has become an important technological platform for investigating genome-wide transcriptional regulation of metabolic pathways . However, in a polyploid crop such as bread wheat, sequence assembly and annotation are very challenging due to occurrence of multiple copies of gene sequences (homoeologous or paralogous genes). A reference bread wheat genome sequence, and cost-effective and faster high throughput computation system is required for making advances in wheat genomics. Despite several complications associated with microarray experiments , it still provides a faster and cost-effective method for genome-wide transcriptome study than the next generation sequencing approach.
Wheat microarrays have been successfully used for the identification of potential candidate genes under a wide range of biotic and abiotic stresses. For example, it has been used for studying the change in transcriptome patterns under biotic stresses such as Fusarium head blight [9–11], leaf rust , and insect attack . Similarly, it has also been used for abiotic stress conditions such as cold , drought tolerance , high nitrogen level , etc. Change in transcriptional profiling during seed development was studied using wheat microarrays . It has been also used to study metabolic biosynthesis pathway such as starch  and reactive oxygen species (ROS) . Wheat microarrays have been used to screen and identify seed-specific genes using digital differential display (DDD) tools . The translocation breakpoints were mapped using wheat microarrays . The application of microarrays for gene expression analysis has been discussed in detail elsewhere .
The gene expression analysis through seed developmental stages is important for understanding temporal distribution of gene expression, as the seed matures. Three stages of major transitions in gene expression have been reported [4, 6]. The first major transition is within 10 days after anthesis (DAA) where extensive cell division, expansion and differentiation occur to make milky endosperm [6, 21]. The second major transition occurs at 14 DAA where starch and seed storage proteins accumulate within cells to make semi-solid endosperm . The third major transition starts at 28 DAA where deposition of storage reserve decreases  followed by physiological maturity to turn caryopsis brown. However, the duration of the above stages may change depending upon environmental conditions during seed development. For example, Laudencia-Chingcuanco et al. observed three stages of gene expression patterns during seed development i.e. 3 to 7 DAA (multi-cellular tissue forming structure), 7 to 14 DAA (grain filling stage), and 21 to 28 DAA (maturing and desiccation stage). Similarly, hierarchical clustering of transcriptome data from 6 to 42 DAA revealed three major groups of expressed genes i.e. 6 to 10 DAA, 12 to 21 DAA, and 28 to 42 DAA . Therefore, genome-wide transcriptome study in at least three seed developmental stages is important for the identification of candidate genes whose expression changed during seed development for the improvement of processing quality.
Several statistical approaches are useful for the analysis of gene expression data from microarrays . Among them, two-way ANOVA (analysis of variance) is the most common analysis method where two categorical explanatory variables of an experiment can be simultaneously analysed by any combination of one level of one variable and one level of the other variable. In this analysis, variation in the expression of a gene which is affected by the level of other gene (s) (i.e. interaction) can be determined . In this study gene-specific two-way ANOVA was implemented to identify candidate genes whose expression changed due to processing quality (chapatti in this study), seed development, and interaction (quality x seed development).
Two traditionally known good chapatti making varieties, C306  and Lok1 were used to identify differentially expressed genes in comparison to two varieties, Sonalika  and WH291, poor chapatti making varieties using wheat microarrays. The genes involved in seed development and interaction (quality x development) were also studied. Since the four varieties might differ for other traits, the involvement of the candidate genes with other traits cannot be rule out in this study. Several processing quality related key genes such as seed storage protein genes (gliadins and glutenins), which were known to effect processing quality of several end-use products such as bread, biscuit, noodles, etc., shown very high difference in their expression between good and poor chapatti making varieties at early stage of seed development (7 DAA). In addition, many novel candidate genes were identified between the two groups. It is evident that majority of the reported genes involved in processing quality were also involved in interaction of quality and seed development. This information would be helpful to design experiments for gene regulation at specific development stage and for the identification of variations within genes such as single nucleotide polymorphisms (SNPs) useful for the improvement of processing quality in wheat through molecular breeding.
Results and discussion
Genome-wide transcriptome analysis was done between good and poor quality varieties at three seed developmental stages (7, 14, and 28 DAA) to identify the candidate genes whose expressions changed due to quality, development, and interaction (quality x development). A total of 60,130 probe sets (out of 61,290) qualified gene expression quality criterion where at least 100% of samples in any one out of 6 conditions (discussed later in Methods) had values between 20.0 and 100.0th percentile in the normalized data. Gene specific two-way ANOVA analysis with multiple test corrections (Benjamini-Hochberg’s FDR) of the 60,130 probe sets identified 35,472 probe sets (about 59%), which satisfied the corrected p value of 0.05 (see Additional file 1). A similar number of the expressed genes (31,768) were identified by Laudencia-Chingcuanco et al. among wheat genotypes during cold acclimation and vernalization using gene-specific ANOVA. The analysis partitioned the variation in expression of the probe sets into quality (3,126 probe sets), development (34,604 probe sets), interaction (quality x development) (1,732 probe sets), and expected by chance i.e. errors (156 probe sets) (see Additional file 2, Table 1). The number of probe sets passed the test in this study is higher than the number reported by Wan et al. who had identified 14,550 probe sets showing significant differential expression through development in a single wheat variety. In this study, a large number of genes were involved in quality and seed development referring to the involvement of many pathways and cellular processes.
Fold change analysis of the expression data of the 35,472 probe sets identified 236 probe sets, whose expressions were at least 10-fold between good and poor quality varieties in at least one of the three sources of variations [quality, development, and interaction (quality x development)] (Table 2). Among them, 110, 219, and 85 probe sets were involved in quality, development, and interaction (quality x development), respectively (Table 2, see Additional file 3).
Identification of candidate genes whose expression changed due to quality
Total of 110 probe sets were identified for quality which showed at least ten-fold differential expression between good and poor quality varieties in at least one of the three condition pairs (‘Good-7 DAA’ vs ‘Poor-7 DAA’, ‘Good-14 DAA’ vs ‘Poor-14 DAA’, and ‘Good-28 DAA’ vs ‘Poor-28 DAA’) (Tables 2 and 3, see Additional file 3, Figures 1 and 2A). Using blastx similarity search of the sequence of the 110 probe sets at NCBI, the putative gene function was assigned to 67 (about 61% = 67/110), hypothetical proteins to 37 (33.6% = 37/110) probe sets, and the function was not assigned to the remaining 6 probe sets. Out of 110, 67, 1, and 28 probe sets were differentially expressed at 7, 14, and 28 DAA, respectively and the remaining 14 probe sets were differentially expressed in either two or three seed developmental stages (Figure 2A).
Out of the 67, 29 probe sets (43% = 29/67), were mainly related to seed storage protein genes (Table 3). These genes were gliadin (alpha-/beta-, gamma, and omega gliadins), glutenin (low molecular weight and high molecular weight glutenin subunits), and avenin. Avenin is oat counterpart to gliadin. Glutenin and gliadin proteins make gluten matrix in dough which determines visco-elastic properties of wheat dough during processing. They mainly determine the processing quality of wheat end-use products [1, 24]. High molecular weight glutenin subunit (HMW-GS) contributes mainly to gluten elasticity property and gliadins to its viscosity property . HMW-GS was strongly correlated with bread making quality of wheat . Gliadin contributes to gluten functional property [27–31]. Xu et al. suggested that gliadin plays an important role in adjusting and controlling gluten’s viscoelastic properties rather than only just a diluent of gluten’s functional properties. Interestingly, at later stages of seed development (i.e. 14 and 28 DAA), these probe sets did not show much difference in expression between good and poor quality varieties.
The differential expression of the 29 probe sets of seed storage protein genes, key genes for processing quality, were analysed in pairwise among the four varieties to study variation in expression level among them (Table 4, see Additional file 4). It indicates that WH291 is more diverse than Sonalika with that of C306 and Lok1 at expression level of seed storage protein genes. Therefore, among them, the poor chapatti making quality, WH291 can be used with either of two good chapatti making varieties, C306 or Lok1 for molecular breeding.
The remaining probe sets showed similarity to transcription factors (GATA, IIA, heat stress TF A9), peroxidase, trypsin inhibitor, proteinase, amylases (alpha- and beta-amylases), kinases and phosphatase (serine/threonine protein kinase and phosphatase), etc. High differential expression of peroxidase, proteinase, and amylases in good chapatti varieties may have effect on processing quality. These enzymes are being extensively used in baking and food industries for making good quality end-use products.
Cluster analysis of the 110 probe sets into 5 groups identified the co-expressed genes having similar expression patterns in good and poor varieties in three seed developmental stages (see Additional file 5, Figure 3). In cluster I, the probe sets showed down-regulation in good quality varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. In cluster II, the probe sets showed up-regulation in good quality varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. In cluster III, five probe sets showed down-regulation in good quality varieties in all three seed developmental stages. In cluster IV, the probe sets showed up-regulation in good quality varieties at later stage of seed development (28 DAA). In cluster V, the majority of the probe sets showed up-regulation in good quality varieties at early stage of seed development. Out of 40, 28 probe sets in cluster V were related to seed storage protein genes and others to beta-amylase and hypothetical proteins. One probe set for seed storage protein gene (Ta.2415.2.S1_a_at for gliadin/avenin seed protein) was clustered separately in cluster II. Cluster analysis distinctly grouped key genes for processing quality in cluster V indicating their similar expression patterns in good quality varieties i.e. up-regulated at early stage of seed development (Figure 3). The role of the remaining four clusters cannot be validated to processing quality in the present study, but they represent potential candidate genes for marker development through association studies.
Identification of candidate genes whose expression changed due to seed development
Total 219 probe sets were identified for seed development whose expression changed at least ten-fold between good and poor quality varieties in at least one of the three condition pairs (‘Good-7 DAA’ vs ‘Poor-7 DAA’, ‘Good-14 DAA’ vs ‘Poor-14 DAA’, and ‘Good-28 DAA’ vs ‘Poor-28 DAA’) (Table 2, see Additional file 3, Figures 1 and 2B). Using blastx similarity search of the sequence of the 219 probe sets at NCBI, the putative gene function was assigned to 153 probe sets (69.9% = 153/219), hypothetical proteins to 57 (26.0% = 57/219) probe sets, and similarity of the remaining 9 probe sets was not found in the database. Out of 219, 171, 2, and 35 probe sets showed differential expression in 7, 14, and 28 DAA, respectively and the remaining 11 probe sets were differentially expressed in at least two seed development stages (Figure 2B). Further analysis revealed that out of 219, the expression of 37 probe sets was also changed due to quality (Figure 1).
Out of 153, the majority of probe sets (35.9% = 55/153) were dominated by seed storage protein genes such as gliadin (alpha-/beta-, gamma, and omega gliadins), glutenin (LMW glutenin subunit) and HMW glutenin subunit, and avenin-like. Other genes were related to grain hardness such as puroindolines and grain softness proteins, starch biosynthesis metabolism such as granule bound starch synthase-I (GBSS-I), alpha- and beta-amylase inhibitors, peroxidase, trypsin inhibitor, etc. (see Additional file 3).
Four probe sets related to genes for grain hardness and softness showed differential expression from about ten-fold (Ta.14614.1.S1_at) to about 43-fold (Ta.69.2.S1_x_at) in early stage of seed development i.e. 7 DAA and one probe set (Ta.23141.1.S1_at) showed about 22-fold differential expression at later stage of seed development i.e. 28 DAA between good and poor varieties (see Additional file 3). Grain hardness is considered as a single important trait that determines the quality of end-use products such as bread and cookies or biscuit . Puroindoline genes coding puroindoline a and b are major contributor of grain hardness. If puroindoline a and b are functional, the grain is soft or both or either one of them are mutated or absent, the grain is very hard . Out of five, two probe sets (Ta.69.2.S1_x_at and Ta.840.1.S1_at) represent grain softness proteins. Jolly et al. reported the linkage of the genes of grain softness proteins to the grain hardness locus (Ha). However, no direct or indirect relationship of this protein to grain texture has been established .
The wheat microarray contains probe sets of more than 28 genes of starch metabolism. In this study, only GBSS (granule bound starch synthase) and beta-amylase showed at least 10-fold differential expression between the contrasting varieties. Beta amylase (Ta.27780.1.S1_x_at) expression was involved in quality and seed development as well as in their interactions (Additional file 3: Table S3). Its differential expression (about 60-fold) was only present in early stage of seed development (7 DAA). Beta-amylase is responsible for starch hydrolysis at the time of seed germination by releasing maltose (two glucose units) of the second α-1,4 glycosidic bonds from the non-reducing end. It accumulates during grain maturation, but remains inactive till germination.
GBSS is believed to be responsible for synthesis of amylose in seed (for review, see . The content of amylose affects processing quality. GBSS mutant analysis found amylose free seed indicating that this enzyme is responsible for amylose synthesis . This enzyme is found in two forms- GBSSI in endosperm and GBSSII in non-endospermic tissues . In this study a high increase in differential expression of GBSS from early (about 7-fold at 7 DAA) to late maturity (about 93 fold at 28 DAA) was observed between good and poor quality varieties. The expression of the GBSS gene was involved in both qualities as well as in seed development, but not in interaction of quality and development (see Additional file 3).
Cluster analysis of the 219 probe sets showing at least ten-fold differential expression into 5 clusters identified the co-expressed genes having similar expression patterns in different seed development stages in good and poor varieties (see Additional file 5, Figure 3). Cluster analysis of the 219 probe sets grouped the probe sets in the same cluster having similar expression patterns (see Additional file 5, Figure 3). In cluster I, the probe sets showed up-regulation in good quality varieties in early stage of seed development (7 DAA) and did not show any much change at later stage of seed development. In cluster II, the majority of the probe sets showed up-regulation in good quality varieties at later stages of seed development (14 and 28 DAA). In cluster III, the probe sets showed down-regulation in good quality varieties at early stage i.e. 7 DAA (Figure 3). In clusters IV and V, the probe sets showed up-regulation in good varieties at early stage seed development i.e. 7 DAA. The key genes for processing quality, mainly seed storage protein genes were clustered in both cluster IV and V.
Identification of candidate genes involved in interaction (quality x development)
Total 85 probe sets was identified for interaction (quality x development) showing at least ten-fold expression between good and poor varieties in at least one of the three condition pairs (‘Good-7 DAA’ vs ‘Poor-7 DAA’, ‘Good-14 DAA’ vs ‘Poor-14 DAA’, and ‘Good-28 DAA’ vs ‘Poor-28 DAA’) (Table 2, see Additional file 3, Figure 1). These probe sets were involved in development. Of them, 56 probe sets also involved in quality (Figure 1, see Additional file 3). Further analysis revealed that 68, 2, and 13 probe sets (out of the 85 probe sets) were differentially expressed in 7, 14, and 28 DAA, respectively (Figure 2C).
Cluster analysis of the 85 probe sets grouped the probe sets in the same cluster having similar expression patterns (see Additional file 5, Figure 3). In cluster I, the probe sets showed up-regulation in poor varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. In cluster II, the probe sets showed down-regulation in poor varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. The most of probe sets for seed storage protein genes were grouped in cluster II. In cluster III, there was a single probe set (Ta.2025.1.S1_at, oxi-reductase, a Fe (II) oxygenase family protein) showing up-regulation in poor varieties at 14 and 28 DAA (Figure 3). In cluster IV, the probe sets showed up-regulation in good varieties at later stage seed development (28 DAA). In cluster V, the majority of the probe sets showed up-regulation in good varieties at early stage of seed development.
Spatial distribution of the expression of 110 probe sets related to processing quality across 10 development stages
Using gene-specific two-way ANOVA analysis, 110 probe sets were identified for processing quality showing at least 10-fold differential expression between good and poor quality varieties. The expression level of these probe sets were analysed across 1,328 samples which were available online and representing ten developmental stages including seedling, tillering, and flower development stages through meta-analysis using Genevestigator software. The heatmap produced after the analysis revealed expression potential of the individual probe sets in 10 development stages (Figure 4). Cluster analysis of the expression potential of the probe sets distinctly grouped the probe sets into three major clusters i.e. I, II and III of 43, 25, and 42 probe sets, respectively (Figure 4).
In cluster I the expression level of the 43 probe sets was global as they were present in the majority of the ten developmental stages (Figure 4). About 42% (18/43) of the probe sets was annotated with blastx and showing homology to genes related to cellular and molecular mechanisms such as granule-bound starch synthase I (GBSSI), Bowman-Birk type trypsin inhibitor, aspartic proteinase nepenthesin-1, cell division protease, peroxidase, sugar transporter, transcription factors (GATA and heat stress TF A-9), 14 KDa proline-rich protein, low temperature specific wheat gene pTACR7, peroxisomal acyl-coenzyme A oxidase 1, etc. About of 51% (22/43) of the probe sets were hypothetical proteins to Aegilops tauschii and Triticum urartu and the significant similarity was not found for the other three probe sets.
Except eight probe sets, the expression level of other 17 probe sets in cluster II was low i.e. white or grey color in the heatmap (Figure 4). Majority of the probe sets (52% = 13/25) was annotated with blastx and showed homology to kinases (LRR receptor-like serine/threonine protein kinase and wall-associated receptor kinase 5), DNA-directed RNA polymerase, serine/threonine protein phosphatase, beta-fructofuranosidase, alpha-amylase/subtilisin inhibitor, transcription factor IIF, etc. Forty percentage (10/25) of the 25 probe set was hypothetical proteins and the significant similarity was not found for the other two probe sets.
In cluster III the expression level of the probe sets were strong in ripening and the majority of them had also expression in germination stage (Figure 4). Majority of the probe sets (about 69% = 29/42) were related to seed storage protein genes such as gliadins (alpha gliadin, alpha/beta gliadin, gamma gliadin), glutenin (low molecular weight and high molecular weight glutenin subunits), and avenin like proteins. The seven probe sets were related to beta-amylase, trypsin inhibitor, serine/threonine protein kinase and deoxymugineic acid dioxygenase. Five probe sets were hypothetical proteins to Aegilops tauschii and Triticum urartu. The significant similarity was not found for the remaining one probe set.
Spatial distribution of the expression of 110 probe sets related to processing quality across 22 tissues/organs
The expression level of the 110 probe sets related to processing quality were analysed across 1,405 samples which were available online and representing 22 tissues including root, shoot, flag leaf, endosperm, embryo, coleoptile, and mesocotyl through meta-analysis using Genevestigator software. The heatmap produced after the analysis revealed expression potential of the individual probe sets in different tissues (Figure 5). Cluster analysis of the expression potential of the probe sets distinctly grouped the probe sets into three major clusters i.e. I, II, and III of 44, 23, and 43 probe sets, respectively (Figure 5), which was very similar to the clusters produced for the development stages (described in the previous section). The percentage of probe sets grouped in both clusters (Figures 4 and 5) was very similar i.e. 93.2% (cluster I), 95.7% (cluster II), and 90.7% (cluster III). Briefly, the expression level of the probe sets in cluster I was very strong (red color in the heatmap in Figure 5) and global as they were present in the most of the 22 tissues. Except a few probe sets, the expression level of the majority of the probe sets in cluster II and III were specific to one to two tissues (Figure 5). Majority of the probe sets in cluster III were related to seed storage protein genes such as gliadins (alpha gliadin, alpha/beta gliadin, gamma gliadin), glutenin (low molecular weight and high molecular weight glutenin subunits), and avenin like proteins.
The heatmaps providing information on spatial distribution of the genes would assist in designing functional genomics experiments specific to a particular growth stage or tissue. The correct interpretation of results is very important for formulating new hypothesis and models, and designing proper experiments to test them .
Pathway analysis of the 35,472 probe sets
The 35,472 probe sets (see Additional file 1) which passed the gene-specific two-way ANOVA (corrected p value of 0.05) were mapped to metabolism pathways by using MapMan software on the Triticum aestivum mapping data, “Taes_Affy_0709”. The genes of several metabolic pathways such as starch (AGPase, starch synthase, starch branching enzymes, starch debranching enzymes, and starch transporter), myo-inositol, sugar and sugar-alcohol, and cell wall were up-regulated at early stage of seed development (7 DAA) in good quality varieties (Figure A in Additional file 6). The genes of photo systems (light reactions) and Calvin cycle were down-regulated in early stage of seed development (7 DAA) in good quality varieties (Figure A in Additional file 6). The activities of other metabolic pathway genes were more differentiated at 7 DAA between good and poor varieties (Figure A in Additional file 6), but less at later stages (14 and 28 DAA) of seed development (Figures B and C in Additional file 6). This indicates that genes for seed development and processing quality are regulated at early stage of seed development.
Validation of differential expression by quantitative RT-PCR
The expression profiles of a few randomly chosen genes were validated by quantitative RT-PCR. A set of reference genes has been recently proposed for common wheat such as gene for cell division control protein, AAA-superfamily of ATPases (CDC, Ta.54227), ADP-ribosylation factor, ARF (ADPRF, Ta.2291), and RNase L inhibitor-like protein (RLI, Ta.2776) . For this study one house-keeping gene such as ARF was used as a reference gene for quantitative validation of the expression data. The two differentially expressed genes, pre-α/β-gliadin and γ-gliadin, were randomly chosen for validation of the microarray data (Tables 5 and 6, Figure 6). The individual CT values of the house-keeping gene at the three seed development stages, 7, 14, and 28 DAA, in the two diverse chapatti making varieties, C306 and Sonalika were used for normalization. The two target genes showed similar difference in their expressions through seed developmental stages between the varieties, which are largely in agreement with the microarray data (Table 6). For example, the expressions of pre-α/β-gliadin and γ-gliadin genes were very high in the variety C306 than the variety Sonalika at early stage seed development, 7 DAA (Figure 6, Table 6). Hence, the microarray data were validated by qualitative PCR.
Genome-wide transcriptome analysis with help of two-way ANOVA in developing caryopsis identified a substantial number of differentially (at least 10-fold) expressed genes in two diverse sets of Indian wheat varieties for chapatti quality. Many key processing quality related genes such as different subunits of glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, peroxidase, GBSS, etc. were identified. In addition, many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of majority of the candidate genes for good chapatti was involved in interaction of quality and development. Most of these probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expression of the two candidate genes, pre-α/β-gliadin and γ-gliadin between good and poor chapatti quality varieties was validated by quantitative real time PCR. Therefore, this study able to identify several known processing quality related genes and many additional candidate genes for good chapatti and their interactions with development and temporal and spatial distributions. Gene identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing programs for improvement of processing quality either by changing expression (over expression or down regulation) of genes or development of single nucleotide polymorphisms (SNPs) markers using bi-parental mapping populations for molecular breeding.
Two traditionally known good chapatti making varieties, C306 and Lok1 and two poor chapatti making varieties, Sonalika and WH291 were used for transcriptome studies. The detailed information on the varieties described in Additional file 7. These varieties were grown in three replicates at the Research Farm of National Agri-Food Biotechnology Institute (NABI) Mohali, India during 2010-11. The main individual spikes of the three biological replications of each of the four varieties were tagged at first day of anthesis. The tagged spikes were harvested at three main seed developmental stages i.e. 7, 14, and 28 DAA, immediately frozen in liquid nitrogen, and stored at -80°C for RNA extraction.
Total RNA was extracted from about 10 caryopsis of central parts of the harvested spikes using the TRIzol reagent (Life Technologies, NY, USA) and purified onto RNeasy Mini columns (Qiagen, Hilden, Germany). The extracted RNA was quantified on a NanoQuant, Infinite M200 PRO (Tecan, Mannedorf, Switzerland) and the quality was checked onto 1.5% agarose gels. RNA integrity number (RIN) of the extracted RNA was estimated on Agilent’s 2100 Bioanalyzer using the Agilent RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, USA). The images of RIN values of the extracted RNAs were present in Additional file 8. The good quality RNA samples (RIN value >7.0) were stored at -80°C for microarray experiments.
The GeneChip® Wheat Genome Array (Affymetrix, Santa Clara, USA), a 3’ in vitro transcription (3′ IVT) expression array containing 61,290 probe sets representing about 25 K unigenes was used for gene expression analysis. The sequence and annotation information of the probe sets on the array are available at http://www.affymetrix.com.
Microarray hybridization and scanning
RNA labelling and microarray hybridization on wheat genome arrays were performed according to its manual. The arrays were processed on an Affymetrix GeneChip® Fluidics Station 450 by running fluidic script EukGE-WS2v5-450. The hybridized arrays were scanned on a Affymetrix GeneChip® Scanner 3000. Hybridization quality of the microarrays was verified by scaling factor, overall hybridization rate, and signal strength of several bacterial spike controls.
Microarray data analysis
The expression console of Affymetrix’s GeneChip Command Console (AGCC) software was used for preliminary data analysis of the scanned arrays. The image files were created as *.dat file. The software computed cell intensity data of probe sets and their positional values (*.cel file) from the image file (*.dat). The normalization of the signal intensities of the probe sets (taking two types of files, *.cel and *.ARR) was carried out using the Robust Multi-array normalization algorithm (RMA) implemented in GeneSpring’s Multi-omic analysis (version 11.5.1, build-138755) software (Agilent Technologies, Santa Clara, USA). The RMA normalized values of the probe sets were used for statistical analysis.
Two-way ANOVA analysis
Two levels of categories (quality and developmental stage) were made for two-way ANOVA analysis of the gene expression data. For quality category, the four samples were divided into two groups- good varieties (i.e. good chapatti) and poor varieties (poor chapatti). For seed developmental stage category, each variety was divided into three groups - 7, 14, and 28 DAA. Therefore, six pairwise conditions (‘Good-7 DAA’, ‘Poor-7 DAA’, ‘Good-14 DAA’, ‘Poor-14 DAA’, ‘Good-28 DAA’, and ‘Poor-28 DAA’) were made for analysis. Variation in the expression data was partitioned into three groups, namely quality (genotype), developmental stage, and interaction of quality and developmental stage. ‘Asymptotic’ p value computation and the Benjamini-Hochberg false discovery rate (FDR) for multiple test correction (corrected p < 0.05) were used for the analysis. The two-ANOVA analysis was conducted in GeneSpring software.
Fold change analysis
The probe sets satisfying the corrected p value in two-way ANOVA analysis were used for fold change analysis in GeneSpring software. The probe sets showing at least two-fold and ten-fold differential expressions between good and poor groups were used for further analysis.
Cluster analysis of gene expression data
The probe sets in each of the three groups (quality, developmental stage, and interaction) as mentioned above were grouped by assuming five clusters by estimating pairwise Euclidean distance between genes, 50 iterations, and K-means algorithm implemented in GeneSpring software. The probe sets were clustered for identification of the probe sets with similar expression profiles.
Meta-analysis of the differentially expressed genes
Affymetrix® wheat microarray expression data conducted in a wide variety of growth stages and tissues were available online. These data were good resources for meta-analysis to study spatial distribution of the gene expression data in different growth stages and tissues/organs. This information would assist in designing functional genomics experiments specific to a particular growth stage or tissue. The spatial distribution of the differentially expressed genes (at least 10-fold between good and poor varieties) was studied in ten growth stages, namely, germination, seedling growth, tillering, stem elongation, booting, inflorescence emergence, anthesis, milk development, dough development, and ripening. Their spatial distributions were also studied in 22 tissues/organs; namely, shoot apex, crown, sheath, caryopsis, seedling, spike, spikelet, flag leaf, shoot, shoot-leaf, seedling-leaf, crown, roots, seedling-root, glume, coleoptile, anther, mesocotyl, embryo, endosperm, inflorescence, and pistil. Their spatial distributions were represented in form of heatmaps. Meta-analysis was conducted using Genevestigator software which retrieved the expression data from the web [41, 42]. Further, the expression data was clustered through hierarchical clustering on the estimated Euclidean distance matrix using optimal leaf-ordering implemented in the software.
Functional analyses of the differentially expressed data
The differential expression data of the 35,472 probe sets satisfying the corrected p value of 0.05 in two-way ANOVA (See Additional file 1) was assigned to metabolic pathways by MapMan software (version 3.5.1.R2) using the Triticum aestivum mapping data “Taes_Affy_0709” [43–45].
Validation of differential expression by quantitative RT-PCR
Quantitative real time PCR (qRT-PCR) was performed for validation of a few differentially expressed genes. Five microgram of the extracted RNA was treated with TURBO DNase™ (Ambion, Life Technologies, NY, USA) before cDNA synthesis according to the manufacturer manual. The SuperScript® III First-Strand Synthesis kit (Invitrogen, Life Technologies, NY, USA) was used for cDNA synthesis using oligodT (18-mer). The reaction mixture was incubated at 50°C for 50 min, then at 85°C for 5 min, and chilled immediately on ice. The mixture was diluted with nuclease free water to the final concentration of 20 ng/μl cDNAs. The primers used for PCR were designed to the same region harbouring the microarray probes using Primer Express software (Applied Biosystems, Forster City, CA, USA) (Tables 7 and 5). qRT-PCR was done in triplicates using SYBR Green fluorescence dye in a qRT-PCR machine (7500 Fast Real-Time PCR System, Applied Biosystems). Each reaction was prepared using 5 μl of 2X QuantiTect SYBR Green (Qiagen, USA), 1 μl of 20 ng/μl cDNA, and 1 pmol/μl each of forward and reverse primers, in a total volume of 10 μl. The cycling conditions were: 30 sec at 95°C, followed by 45 cycles of 95°C for 30 sec and 60°C for 1 min. To evaluate the presence of non-specific PCR products and primer dimers, the amplified PCR products were run at temperature ramp of 95°C for 15 sec, 60°C for 15 sec, followed by 20 min of slow ramp from 60°C to 95°C. The threshold cycles (Ct) of each target genes were normalized in each wheat varieties with Ct of the internal control (wheat ARF gene: AB050957.1). Normalization and quantification of relative changes in gene expression or fold changes between one good and one poor varieties were calculated by using the following formula: FC = 2-Δ (ΔCt).
All microarray experimental data have been deposited in the NCBI’s Gene Expression Omnibus (GEO) database, http://www.ncbi.nlm.nih.gov/geo/, with accession number GSE53606.
Shewry PR, Halford NG: Cereal seed storage proteins: structures, properties and role in grain utilization. J Exp Bot. 2002, 53: 947-958. 10.1093/jexbot/53.370.947.
Kim W, Johnson JW, Graybosch RA, Gaines CS: Physicochemical properties and end-use quality of wheat starch as a function of waxy protein alleles. J Cereal Sci. 2003, 37: 195-204. 10.1006/jcrs.2002.0494.
McCallum JA: Biochemistry of phenolic compounds in wheat grain (Triticum aestivum L.). PhD thesis. New Zealand, University of Canterbury, Christchurch. 1989, New Zealand
Laudencia-Chingcuanco DL, Stamova BS, You FM, Lazo GR, Beckles DM, Anderson OD: Transcriptional profiling of wheat caryopsis development using cDNA microarrays. Plant Mol Biol. 2007, 63: 651-668. 10.1007/s11103-006-9114-y.
Stamova BS, Laudencia-Chingcuanco D, Beckles DM: Transcriptomic analysis of starch biosynthesis in the developing grain of hexaploid wheat. Int J Plant Genomics. 2009, doi:10.1155/2009/407426
Wan Y, Poole RL, Huttly AK, Toscano-Underwood C, Feeney K, Welham S, Gooding MJ, Mills C, Edwards KJ, Shewry PR, Mitchell RAC: Transcriptome analysis of grain development in hexaploid wheat. BMC Genomics. 2008, 9: 121-10.1186/1471-2164-9-121.
Pont C, Murat F, Confolent C, Balzergue S, Salse J: RNA-seq in grain unveils fate of neo- and paleopolyploidization events in bread wheat (Triticum aestivum L.). Genome Biol. 2011, 12: R119-10.1186/gb-2011-12-12-r119.
Williams RBH, Chan EKF, Cowley MJ, Little PFR: The influence of genetic variation on gene expression. Genome Res. 2007, 17: 1707-1716. 10.1101/gr.6981507.
Golkari S, Gilbert J, Prashar S, Procunier JD: Microarray analysis of Fusarium graminearum-induced wheat genes: identification of organ-specific and differentially expressed genes. Plant Biotechnol J. 2007, 5: 38-49. 10.1111/j.1467-7652.2006.00213.x.
Golkari S, Gilbert J, Ban T, Procunier JD: QTL-specific microarray gene expression analysis of wheat resistance to Fusarium head blight in Sumai-3 and two susceptible NILs. Genome. 2009, 52: 409-418. 10.1139/G09-018.
Zhang X-W, Jia L-J, Zhang Y, Jiang G, Li X, Zhang D, Tang W-H: In planta stage-specific fungal gene profiling elucidates the molecular strategies of Fusarium graminearum growing inside wheat coleoptiles. Plant cell. 2012, 24: 5159-5176. 10.1105/tpc.112.105957.
Fofana B, Banks TW, McCallum B, Strelkov SE, Cloutier S: Temporal gene expression profiling of the wheat leaf rust pathosystem using cDNA microarray reveals differences in compatible and incompatible defence pathways. Int J Plant Genomics. 2007, 2007: 17542-
Liu ZL, Liu QR, Chu SS, Jiang GH: Insecticidal activity and chemical composition of the essential oils of Artemisia lavandulaefolia and Artemisia sieversiana from China. Chem Biodivers. 2010, 7: 2040-2045. 10.1002/cbdv.200900410.
Laudencia-Chingcuanco D, Ganeshan S, You F, Fowler B, Chibbar R, Anderson O: Genome-wide gene expression analysis supports a developmental model of low temperature tolerance gene regulation in wheat (Triticum aestivum L.). BMC Genomics. 2011, 12: 299-10.1186/1471-2164-12-299.
Kadam S, Singh K, Shukla S, Goel S, Vikram P, Pawar V, Gaikwad K, Khanna-Chopra R, Singh N: Genomic associations for drought tolerance on the short arm of wheat chromosome 4B. Funct Integr Genomics. 2012, 12: 447-464. 10.1007/s10142-012-0276-1.
Wan Y, Shewry PR, Hawkesford MJ: A novel family of γ-gliadin genes are highly regulated by nitrogen supply in developing wheat grain. J Exp Bot. 2013, 64: 161-168. 10.1093/jxb/ers318.
Liu X, Williams CE, Nemacheck JA, Wang H, Subramanyam S, Zheng C, Chen M-S: Reactive oxygen species are involved in plant defense against a gall midge. Plant Physiol. 2010, 152: 985-999. 10.1104/pp.109.150656.
Yang X, Xu H, Li W, Li L, Sun J, Li Y, Yan Y, Hu Y: Screening and identification of seed-specific genes using digital differential display tools combined with microarray data from common wheat. BMC Genomics. 2011, 12: 513-10.1186/1471-2164-12-513.
Bhat PR, Lukaszewski A, Cui X, Xu J, Svensson JT, Wanamaker S, Waines JG, Close TJ: Mapping translocation breakpoints using a wheat microarray. Nucl Acids Res. 2007, 35: 2936-2943. 10.1093/nar/gkm148.
Zhu X, Burger M, Doane TA, Horwath WR: Ammonia oxidation pathways and nitrifier denitrification are significant sources of N2O and NO under low oxygen availability. Proc Natl Acad Sci USA. 2013, 110: 6328-6333. 10.1073/pnas.1219993110.
Evers T, Millar S: Cereal grain structure and development: some implications for quality. J Cereal Sci. 2002, 36: 261-284. 10.1006/jcrs.2002.0435.
Reimers M, Weinstein JN: Quality assessment of microarrays: visualization of spatial artifacts and quantitation of regional biases. BMC Bioinforma. 2005, 6: 166-10.1186/1471-2105-6-166.
Bhatnagar T, Sachdev A, Johari RP: Molecular characterization of glutenins in wheat varieties differing in chapatti quality characteristics. J Plant Biochem & Biotech. 2002, 11: 33-36. 10.1007/BF03263131.
Shewry PR, Tatham AS: The prolamin storage proteins of cereal seeds: structure and evolution. Biochem J. 1990, 267: 1-12.
Ciaffi M, Tozzi L, Lafiandra D: Relationship between flour protein composition determined by size-exclusion high-performance liquid chromatography and dough rheological parameters. Cereal Chem. 1996, 73: 346-351.
Payne PI: Genetics of wheat storage proteins and the effect of allelic variation on bread-making quality. Ann Rev Plant Physiol. 1987, 38: 141-153. 10.1146/annurev.pp.38.060187.001041.
Hou G, Yamamoto H, Ng PKW: Relationships of quantity of gliadin subgroups of selected U.S. soft wheat flours to rheological and baking properties. Cereal Chem. 1996, 73: 352-357.
Huebner FR, Bietz JA: Improved chromatographic separation and characterization of ethanol-soluble wheat proteins. Cereal Chem. 1993, 70: 506-511.
Hussain A, Lukow OM: Influence of gliadin-rich subfractions of glenlea wheat on the mixing characteristics of wheat flour. Cereal Chem. 1997, 74: 791-799. 10.1094/CCHEM.19220.127.116.111.
Van Lonkhuijsen HJ, Hamer RJ, Schreuder C: Influence of specific gliadins on the breadmaking quality of wheat. Cereal Chem. 1992, 69: 174-177.
Xu J, Tseng Y, Carriere CJ, Wirtz D: Microheterogeneity and microrheology of wheat gliadin suspensions studied by multiple-particle tracking. Biomacromolecules. 2002, 3: 92-99. 10.1021/bm015586b.
Xu J, Bietz JA, Carriere CJ: Viscoelastic properties of wheat gliadin and glutenin suspensions. Food Chem. 2007, 101: 1025-1030. 10.1016/j.foodchem.2006.02.057.
Giroux MJ, Morris CF: Wheat grain hardness results from highly conserved mutations in the friabilin components puroindoline a and b. Proc Natl Acad Sci USA. 1998, 95: 6262-6266. 10.1073/pnas.95.11.6262.
Morris CF: Puroindolines: the molecular genetic basis of wheat grain hardness. Plant Mol Biol. 2002, 48: 633-647. 10.1023/A:1014837431178.
Jolly CJ, Glenn GM, Rahman S: GSP-1 genes are linked to the grain hardness locus (Ha) on wheat chromosome 5D. Proc Natl Acad Sci USA. 1996, 93: 2408-2413. 10.1073/pnas.93.6.2408.
Smith AM, Denyer K, Martin CR: What controls the amount and structure of starch in storage organs?. Plant Physiol. 1995, 107: 673-677.
Nakamura T, Yamamori M, Hirano H, Hidaka S, Nagamine T: Production of waxy (amylose-free) wheats. Mol Gen Genet. 1995, 248: 253-259. 10.1007/BF02191591.
Vrinten PL, Nakamura T: Wheat granule-bound starch synthase I and II are encoded by separate genes that are expressed in different tissues. Plant Physiol. 2000, 122: 255-263. 10.1104/pp.122.1.255.
Laule O, Hirsch-Hoffmann M, Hruz T, Gruissem W, Zimmermann P: Web-based analysis of the mouse transcriptome using Genevestigator. BMC Bioinforma. 2006, 7: 311-10.1186/1471-2105-7-311.
Paolacci AR, Tanzarella OA, Porceddu E, Ciaffi M: Identification and validation of reference genes for quantitative RT-PCR normalization in wheat. BMC Mol Biol. 2009, 10: 11-10.1186/1471-2199-10-11.
Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W: GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 2004, 136: 2621-2632. 10.1104/pp.104.046367.
Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P: Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics. 2008, 2008: 420747-
Thimm O, Blasing O, Gibon Y, Nagel A, Meyer S, Kruger P, Selbig J, Muller LA, Rhee SY, Stitt M: MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 2004, 37: 914-939. 10.1111/j.1365-313X.2004.02016.x.
Usadel B, Nagel A, Thimm O, Redestig H, Blaesing OE, Palacios-Rojas N, Selbig J, Hannemann J, Piques MC, Steinhauser D, Scheible W-R, Gibon Y, Morcuende R, Weicht D, Meyer S, Stitt M: Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of corresponding genes, and comparison with known responses. Plant Physiol. 2005, 138: 1195-1204. 10.1104/pp.105.060459.
Usadel B, Poree F, Nagel A, Lohse M, Czedik-Eysenberg A, Stitt M: A guide to using MapMan to visualize and compare omics data in plants: a case study in the crop species, maize. Plant Cell Environ. 2009, 32: 1211-1229. 10.1111/j.1365-3040.2009.01978.x.
Livak KJ, Schmittgen TD: Analysis of relative expression data using real-time quantitative PCR and the 2-ΔΔCT method. Methods. 2001, 25: 402-408. 10.1006/meth.2001.1262.
We would like to thank Executive Director, National Agri-Food Biotechnology Institute (NABI) for valuable advice, Dr. Ajay Pandey for qRT-PCR protocol, Mrs. Aakriti Gupta for technical support. This work was part of the NABI core project on “Improvement of processing and nutrition quality in wheat”. Anuradha Singh acknowledges Department of Biotechnology (DBT), Government of India for providing Junior Research Fellowship (JRF) and Senior Research Fellowship (SRF). We acknowledge Directorate of Wheat Research (DWR), Indian Council of Agricultural Research (ICAR), Karnal, India and Punjab Agricultural University (PAU), Ludhiana, India for supply wheat seeds.
The authors’ declare that they have no competing interests.
AS conducted experimental works and data analysis. SM helped in computational analysis. MS helped in a part of experimental works. AC and RT helped in critical review of the manuscript. JR helped in experiment designing, data analysis, and manuscript writing. All authors contributed to writing the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 2:Details of the probe sets identified for quality, seed development, and interaction (quality and seed development) through gene specific two-way ANOVA. Sheet 1: Detail of 3,126 probe sets whose expression changed due to quality. Sheet 2: Detail of 34,604 probe sets whose expression changed due to seed development. Sheet 3: Detail of 1,732 probe sets whose expression changed due to interaction (quality x development). (XLSX 8 MB)
Additional file 3:Gene-specific two-way ANOVA identified a total of 236 probe sets for quality, seed development, and interaction (quality x seed development) showing at least 10-fold differential expression between good and poor quality varieties in three seed developmental stages (i.e. 7, 14, and 28 days after anthesis, DAA). Among a total of 236 probe sets, 110, 219, and 85 probe sets were involved for quality, seed development, and interaction (quality x seed development), respectively. Gene function was assigned to the probe sets using blastx at NCBI web site. (+ = involved, - = not involved, up = up-regulated, down = down regulated). (DOCX 63 KB)
Additional file 4:Pairwise comparison of differential expression in three seed developmental stages [7, 14, and 28 days after anthesis (DAA)] of the probe sets of seed storage protein genes which were identified for quality in the four Indian wheat varieties. C306 and Lok1 were good and Sonalika and WH291 were poor chapatti making varieties. (XLSX 22 KB)
Additional file 5:Details of the probe sets in each of the five clusters identified through cluster analysis for quality, seed development, and interaction (quality x seed development). Sheet 1: Details of five clusters of 110 probe sets whose expressions changed due to quality between good and poor quality wheat varieties. Sheet 2: Details of five clusters of 219 probe sets whose expressions changed due to seed development namely 7, 14 and 28 days after anthesis (DAA). Sheet 3: Details of five clusters of 85 probe sets whose expressions changed due to interaction (quality x seed development). (XLSX 23 KB)
Additional file 6:Display of genes of 35,472 probe sets to metabolic pathways using MapMan software (version 3.5.1R2)[43–45]. The transcript levels of genes in several metabolic pathways were visualized by colour scales (+4.5 red to blue –4.5), where red and blue represents an increase and decrease of expression (log2 of fold change value), respectively in good quality varieties in comparison to poor making varieties in 7 days after anthesis (Top figure A), 14 DAA (Middle figure B), and 28 DAA (Last figure C). (TIFF 5 MB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.