Genome-wide transcriptome study in wheat identified candidate genes related to processing quality, majority of them showing interaction (quality x development) and having temporal and spatial distributions

Background The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Results Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. Conclusions The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers.


Background
Bread wheat (Triticum aestivum L.) is the most common cultivated wheat in the world. Its flour can be processed into a wide range of food products such as bread, pasta, biscuits, unleavened flat bread (chapatti), etc. The end use quality of products is mainly dependent on processing quality, which is largely determined by balance composition of biochemical molecules in seed such as seed storage proteins [1], starch [2], phenolic compounds [3], etc. There is a continuous increasing demand for good quality products both by consumers and baking industries. For improvement of processing quality it is important to understand the genome-wide expression of genes and their temporal and spatial distributions. Of several approaches for genome-wide study, microarrays comprising a large amount of probe sets of transcripts can be useful for the identification of differentially expressed genes in diverse or contrasting set of genotypes for trait of interest. In this study, two traditionally known good and two poor chapatti making Indian wheat varieties were used to identify the candidate genes whose expression changed due to chapatti quality using wheat microarrays.
Transcriptome analysis has been used to improve genome-wide understanding of molecular mechanism of gene expression. In wheat, it has been done using either the 8K wheat microarray chips [4,5] or 61K wheat microarray chips [5,6]. Recently, the next generation sequencing has become an important technological platform for investigating genome-wide transcriptional regulation of metabolic pathways [7]. However, in a polyploid crop such as bread wheat, sequence assembly and annotation are very challenging due to occurrence of multiple copies of gene sequences (homoeologous or paralogous genes). A reference bread wheat genome sequence, and cost-effective and faster high throughput computation system is required for making advances in wheat genomics. Despite several complications associated with microarray experiments [8], it still provides a faster and cost-effective method for genome-wide transcriptome study than the next generation sequencing approach.
Wheat microarrays have been successfully used for the identification of potential candidate genes under a wide range of biotic and abiotic stresses. For example, it has been used for studying the change in transcriptome patterns under biotic stresses such as Fusarium head blight [9][10][11], leaf rust [12], and insect attack [13]. Similarly, it has also been used for abiotic stress conditions such as cold [14], drought tolerance [15], high nitrogen level [16], etc. Change in transcriptional profiling during seed development was studied using wheat microarrays [4]. It has been also used to study metabolic biosynthesis pathway such as starch [5] and reactive oxygen species (ROS) [17]. Wheat microarrays have been used to screen and identify seed-specific genes using digital differential display (DDD) tools [18]. The translocation breakpoints were mapped using wheat microarrays [19]. The application of microarrays for gene expression analysis has been discussed in detail elsewhere [20].
The gene expression analysis through seed developmental stages is important for understanding temporal distribution of gene expression, as the seed matures. Three stages of major transitions in gene expression have been reported [4,6]. The first major transition is within 10 days after anthesis (DAA) where extensive cell division, expansion and differentiation occur to make milky endosperm [6,21]. The second major transition occurs at 14 DAA where starch and seed storage proteins accumulate within cells to make semi-solid endosperm [6]. The third major transition starts at 28 DAA where deposition of storage reserve decreases [6] followed by physiological maturity to turn caryopsis brown. However, the duration of the above stages may change depending upon environmental conditions during seed development. For example, Laudencia-Chingcuanco et al. [4] observed three stages of gene expression patterns during seed development i.e. 3 to 7 DAA (multi-cellular tissue forming structure), 7 to 14 DAA (grain filling stage), and 21 to 28 DAA (maturing and desiccation stage). Similarly, hierarchical clustering of transcriptome data from 6 to 42 DAA revealed three major groups of expressed genes i.e. 6 to 10 DAA, 12 to 21 DAA, and 28 to 42 DAA [6]. Therefore, genome-wide transcriptome study in at least three seed developmental stages is important for the identification of candidate genes whose expression changed during seed development for the improvement of processing quality.
Several statistical approaches are useful for the analysis of gene expression data from microarrays [22]. Among them, two-way ANOVA (analysis of variance) is the most common analysis method where two categorical explanatory variables of an experiment can be simultaneously analysed by any combination of one level of one variable and one level of the other variable. In this analysis, variation in the expression of a gene which is affected by the level of other gene (s) (i.e. interaction) can be determined [14]. In this study gene-specific two-way ANOVA was implemented to identify candidate genes whose expression changed due to processing quality (chapatti in this study), seed development, and interaction (quality x seed development).
Two traditionally known good chapatti making varieties, C306 [23] and Lok1 were used to identify differentially expressed genes in comparison to two varieties, Sonalika [23] and WH291, poor chapatti making varieties using wheat microarrays. The genes involved in seed development and interaction (quality x development) were also studied. Since the four varieties might differ for other traits, the involvement of the candidate genes with other traits cannot be rule out in this study. Several processing quality related key genes such as seed storage protein genes (gliadins and glutenins), which were known to effect processing quality of several enduse products such as bread, biscuit, noodles, etc., shown very high difference in their expression between good and poor chapatti making varieties at early stage of seed development (7 DAA). In addition, many novel candidate genes were identified between the two groups. It is evident that majority of the reported genes involved in processing quality were also involved in interaction of quality and seed development. This information would be helpful to design experiments for gene regulation at specific development stage and for the identification of variations within genes such as single nucleotide polymorphisms (SNPs) useful for the improvement of processing quality in wheat through molecular breeding.

Results and discussion
Genome-wide transcriptome analysis was done between good and poor quality varieties at three seed developmental stages (7, 14, and 28 DAA) to identify the candidate genes whose expressions changed due to quality, development, and interaction (quality x development). A total of 60,130 probe sets (out of 61,290) qualified gene expression quality criterion where at least 100% of samples in any one out of 6 conditions (discussed later in Methods) had values between 20.0 and 100.0th percentile in the normalized data. Gene specific two-way ANOVA analysis with multiple test corrections (Benjamini-Hochberg's FDR) of the 60,130 probe sets identified 35,472 probe sets (about 59%), which satisfied the corrected p value of 0.05 (see Additional file 1). A similar number of the expressed genes (31,768) were identified by Laudencia-Chingcuanco et al. [14] among wheat genotypes during cold acclimation and vernalization using gene-specific ANOVA. The analysis partitioned the variation in expression of the probe sets into quality (3,126 probe sets), development (34,604 probe sets), interaction (quality x development) (1,732 probe sets), and expected by chance i.e. errors (156 probe sets) (see Additional file 2, Table 1). The number of probe sets passed the test in this study is higher than the number reported by Wan et al. [6] who had identified 14,550 probe sets showing significant differential expression through development in a single wheat variety. In this study, a large number of genes were involved in quality and seed development referring to the involvement of many pathways and cellular processes.
Fold change analysis of the expression data of the 35,472 probe sets identified 236 probe sets, whose expressions were at least 10-fold between good and poor quality varieties in at least one of the three sources of variations [quality, development, and interaction (quality x development)] ( Table 2). Among them, 110, 219, and 85 probe sets were involved in quality, development, and interaction (quality x development), respectively ( Table 2, see Additional file 3).
Identification of candidate genes whose expression changed due to quality Total of 110 probe sets were identified for quality which showed at least ten-fold differential expression between good and poor quality varieties in at least one of the three condition pairs ('Good-7 DAA' vs 'Poor-7 DAA', 'Good-14 DAA' vs 'Poor-14 DAA', and 'Good-28 DAA' vs 'Poor-28 DAA') (Tables 2 and 3, see Additional file 3, Figures 1 and 2A). Using blastx similarity search of the sequence of the 110 probe sets at NCBI, the putative gene function was assigned to 67 (about 61% = 67/110), hypothetical proteins to 37 (33.6% = 37/110) probe sets, and the function was not assigned to the remaining 6 probe sets. Out of 110, 67, 1, and 28 probe sets were differentially expressed at 7, 14, and 28 DAA, respectively and the remaining 14 probe sets were differentially expressed in either two or three seed developmental stages ( Figure 2A).
Out of the 67, 29 probe sets (43% = 29/67), were mainly related to seed storage protein genes (Table 3). These genes were gliadin (alpha-/beta-, gamma, and omega gliadins), glutenin (low molecular weight and high molecular weight glutenin subunits), and avenin. Avenin is oat counterpart to gliadin. Glutenin and gliadin proteins make gluten matrix in dough which determines visco-elastic properties of wheat dough during processing. They mainly determine the processing quality of wheat end-use products [1,24]. High molecular weight glutenin subunit (HMW-GS) contributes mainly to gluten elasticity property and gliadins to its viscosity property [25]. HMW-GS was strongly correlated with bread making quality of wheat [26]. Gliadin contributes to gluten functional property [27][28][29][30][31]. Xu et al. [32] suggested that gliadin plays an important role in adjusting and controlling gluten's viscoelastic properties rather than only just a diluent of gluten's functional properties. Interestingly, at later stages of seed development (i.e. 14 and 28 DAA), these probe sets did Expected by chance i.e. error not show much difference in expression between good and poor quality varieties. The differential expression of the 29 probe sets of seed storage protein genes, key genes for processing quality, were analysed in pairwise among the four varieties to study variation in expression level among them (Table 4, see Additional file 4). It indicates that WH291 is more diverse than Sonalika with that of C306 and Lok1 at expression level of seed storage protein genes. Therefore, among them, the poor chapatti making quality, WH291 can be used with either of two good chapatti making varieties, C306 or Lok1 for molecular breeding.
The remaining probe sets showed similarity to transcription factors (GATA, IIA, heat stress TF A9), peroxidase, trypsin inhibitor, proteinase, amylases (alpha-and beta-amylases), kinases and phosphatase (serine/threonine protein kinase and phosphatase), etc. High differential expression of peroxidase, proteinase, and amylases in good chapatti varieties may have effect on processing quality. These enzymes are being extensively used in baking and food industries for making good quality enduse products.
Cluster analysis of the 110 probe sets into 5 groups identified the co-expressed genes having similar expression patterns in good and poor varieties in three seed developmental stages (see Additional file 5, Figure 3). In cluster I, the probe sets showed down-regulation in good quality varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. In cluster II, the probe sets showed upregulation in good quality varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. In cluster III, five probe sets showed down-regulation in good quality varieties in all three seed developmental stages. In cluster IV, the probe sets showed up-regulation in good quality varieties at later stage of seed development (28 DAA). In cluster V, the majority of the probe sets showed upregulation in good quality varieties at early stage of seed development. Out of 40, 28 probe sets in cluster V were related to seed storage protein genes and others to betaamylase and hypothetical proteins. One probe set for seed storage protein gene (Ta.2415.2.S1_a_at for gliadin/ avenin seed protein) was clustered separately in cluster II. Cluster analysis distinctly grouped key genes for processing quality in cluster V indicating their similar expression patterns in good quality varieties i.e. up-regulated at early stage of seed development ( Figure 3). The role of the remaining four clusters cannot be validated to processing quality in the present study, but they represent potential candidate genes for marker development through association studies.
Identification of candidate genes whose expression changed due to seed development Total 219 probe sets were identified for seed development whose expression changed at least ten-fold between good and poor quality varieties in at least one of the three condition pairs ('Good-7 DAA' vs 'Poor-7 DAA', 'Good-14 DAA' vs 'Poor-14 DAA', and 'Good-28 DAA' vs 'Poor-28 DAA') ( Table 2, see Additional file 3, Figures 1 and 2B). Using blastx similarity search of the sequence of the 219 probe sets at NCBI, the putative gene function was assigned to 153 probe sets (69.9% = 153/219), hypothetical proteins to 57 (26.0% = 57/219) probe sets, and similarity of the remaining 9 probe sets was not found in the database. Out of 219, 171, 2, and 35 probe sets showed differential expression in 7, 14, and 28 DAA, respectively and the remaining 11 probe sets were differentially expressed in at least two seed development stages ( Figure 2B). Further analysis revealed that out of 219, the expression of 37 probe sets was also changed due to quality ( Figure 1).
Four probe sets related to genes for grain hardness and softness showed differential expression from about The probe sets were identified through gene specific two-way ANOVA analysis. Table 3 Detail of the 110 probe sets identified for processing quality showing at least 10-fold differential expression between good and poor quality wheat varieties in at least one of the three seed developmental stages i.e. 7, 14, and 28 days after anthesis (DAA)  Table 3 Detail of the 110 probe sets identified for processing quality showing at least 10-fold differential expression between good and poor quality wheat varieties in at least one of the three seed developmental stages i.e. 7, 14, and 28 days after anthesis (DAA) (Continued)   The majority of the probe sets also involved in seed development stage and interaction (quality x development) as identified through gene-specific two-way ANOVA analysis. The function to the probe sets was assigned through blastx at NCBI website. + = involved, -= not involved.
ten-fold (Ta.14614.1.S1_at) to about 43-fold (Ta.69.2. S1_x_at) in early stage of seed development i.e. 7 DAA and one probe set (Ta.23141.1.S1_at) showed about 22fold differential expression at later stage of seed development i.e. 28 DAA between good and poor varieties (see Additional file 3). Grain hardness is considered as a single important trait that determines the quality of enduse products such as bread and cookies or biscuit [33]. Puroindoline genes coding puroindoline a and b are major contributor of grain hardness. If puroindoline a and b are functional, the grain is soft or both or either one of them are mutated or absent, the grain is very hard [34]. Out of five, two probe sets (Ta.69.2.S1_x_at and Ta.840.1.S1_at) represent grain softness proteins.
Jolly et al. [35] reported the linkage of the genes of grain softness proteins to the grain hardness locus (Ha). However, no direct or indirect relationship of this protein to grain texture has been established [34]. The wheat microarray contains probe sets of more than 28 genes of starch metabolism. In this study, only GBSS (granule bound starch synthase) and beta-amylase showed at least 10-fold differential expression between the contrasting varieties. Beta amylase (Ta.27780.1.S1_ x_at) expression was involved in quality and seed development as well as in their interactions (Additional file 3: Table S3). Its differential expression (about 60-fold) was only present in early stage of seed development (7 DAA). Beta-amylase is responsible for starch hydrolysis at the time of seed germination by releasing maltose (two glucose units) of the second α-1,4 glycosidic bonds from the non-reducing end. It accumulates during grain maturation, but remains inactive till germination.
GBSS is believed to be responsible for synthesis of amylose in seed (for review, see [36]. The content of amylose affects processing quality. GBSS mutant analysis found amylose free seed indicating that this enzyme is responsible for amylose synthesis [37]. This enzyme is found in two forms-GBSSI in endosperm and GBSSII in non-endospermic tissues [38]. In this study a high increase in differential expression of GBSS from early (about 7-fold at 7 DAA) to late maturity (about 93 fold at 28 DAA) was observed between good and poor quality varieties. The expression of the GBSS gene was involved Figure 1 Venn diagrams showing the number of probe sets of the candidate genes identified for quality, seed development, and interaction (quality x seed development). The probe sets were identified by gene specific two-way ANOVA showing at least 10-fold differential expression between two good (C306 and Lok1) and poor (Sonalika and WH291) chapatti making varieties. in both qualities as well as in seed development, but not in interaction of quality and development (see Additional file 3).
Cluster analysis of the 219 probe sets showing at least ten-fold differential expression into 5 clusters identified the co-expressed genes having similar expression patterns in different seed development stages in good and poor varieties (see Additional file 5, Figure 3). Cluster analysis of the 219 probe sets grouped the probe sets in the same cluster having similar expression patterns (see Additional file 5, Figure 3). In cluster I, the probe sets showed up-regulation in good quality varieties in early stage of seed development (7 DAA) and did not show any much change at later stage of seed development. In cluster II, the majority of the probe sets showed upregulation in good quality varieties at later stages of seed development (14 and 28 DAA). In cluster III, the probe sets showed down-regulation in good quality varieties at early stage i.e. 7 DAA (Figure 3). In clusters IV and V, the probe sets showed up-regulation in good varieties at early stage seed development i.e. 7 DAA. The key genes for processing quality, mainly seed storage protein genes were clustered in both cluster IV and V.

Identification of candidate genes involved in interaction (quality x development)
Total 85 probe sets was identified for interaction (quality x development) showing at least ten-fold expression between Table 4 Pairwise comparisons of differential expression of the probe sets related to seed storage protein genes at early stage of seed development i.e. 7 days after anthesis (DAA) among the four Indian wheat varieties, viz. C306 and Lok1 were good and Sonalika and WH291 were poor chapatti making varieties  Figure 1). These probe sets were involved in development. Of them, 56 probe sets also involved in quality (Figure 1, see Additional file 3). Further analysis revealed that 68, 2, and 13 probe sets (out of the 85 probe sets) were differentially expressed in 7, 14, and 28 DAA, respectively ( Figure 2C). Cluster analysis of the 85 probe sets grouped the probe sets in the same cluster having similar expression patterns (see Additional file 5, Figure 3). In cluster I, the probe sets showed up-regulation in poor varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. In cluster II, the probe sets showed down-regulation in poor varieties at early stage of seed development (i.e. 7 DAA) and did not show any change at later stage of seed development. The most of probe sets for seed storage protein genes were grouped in cluster II. In cluster III, there was a single probe set (Ta.2025.1.S1_at, oxireductase, a Fe (II) oxygenase family protein) showing upregulation in poor varieties at 14 and 28 DAA (Figure 3). In cluster IV, the probe sets showed up-regulation in good varieties at later stage seed development (28 DAA). In cluster V, the majority of the probe sets showed up-regulation in good varieties at early stage of seed development.
Spatial distribution of the expression of 110 probe sets related to processing quality across 10 development stages Using gene-specific two-way ANOVA analysis, 110 probe sets were identified for processing quality showing at least 10-fold differential expression between good and poor quality varieties. The expression level of these probe sets were analysed across 1,328 samples which Figure 3 Clustering of the expression of probe sets identified for quality, seed development, and interaction (quality x seed development) into five clusters (I, II, III, IV, and V) in three seed developmental stages, namely, 7, 14, and 28 days after anthesis (DAA). The cluster analysis was done to identify co-expressed genes using GeneSpring software (Agilent Tech, Santa Clara, USA).
were available online and representing ten developmental stages including seedling, tillering, and flower development stages through meta-analysis using Genevestigator software. The heatmap produced after the analysis revealed expression potential of the individual probe sets in 10 development stages (Figure 4). Cluster analysis of the expression potential of the probe sets distinctly grouped the probe sets into three major clusters i.e. I, II and III of 43, 25, and 42 probe sets, respectively ( Figure 4). In cluster I the expression level of the 43 probe sets was global as they were present in the majority of the ten developmental stages (Figure 4). About 42% (18/43) of the probe sets was annotated with blastx and showing homology to genes related to cellular and molecular mechanisms such as granule-bound starch synthase I (GBSSI), Bowman-Birk type trypsin inhibitor, aspartic proteinase nepenthesin-1, cell division protease, peroxidase, sugar transporter, transcription factors (GATA and heat stress TF A-9), 14 KDa proline-rich protein, low temperature specific wheat gene pTACR7, peroxisomal acyl-coenzyme A oxidase 1, etc. About of 51% (22/43) of the probe sets were hypothetical proteins to Aegilops tauschii and Triticum urartu and the significant similarity was not found for the other three probe sets.
Except eight probe sets, the expression level of other 17 probe sets in cluster II was low i.e. white or grey color in the heatmap (Figure 4). Majority of the probe sets (52% = 13/25) was annotated with blastx and showed homology to kinases (LRR receptor-like serine/threonine protein kinase and wall-associated receptor kinase 5), DNAdirected RNA polymerase, serine/threonine protein phosphatase, beta-fructofuranosidase, alpha-amylase/subtilisin inhibitor, transcription factor IIF, etc. Forty percentage (10/25) of the 25 probe set was hypothetical proteins and the significant similarity was not found for the other two probe sets.
In cluster III the expression level of the probe sets were strong in ripening and the majority of them had also expression in germination stage (Figure 4). Majority of the probe sets (about 69% = 29/42) were related to seed storage protein genes such as gliadins (alpha gliadin, alpha/beta gliadin, gamma gliadin), glutenin (low molecular weight and high molecular weight glutenin subunits), and avenin like proteins. The seven probe sets were related to beta-amylase, trypsin inhibitor, serine/threonine protein kinase and deoxymugineic acid dioxygenase. Five probe sets were hypothetical proteins to Aegilops tauschii and Triticum urartu. The significant similarity was not found for the remaining one probe set.
Spatial distribution of the expression of 110 probe sets related to processing quality across 22 tissues/organs The expression level of the 110 probe sets related to processing quality were analysed across 1,405 samples which were available online and representing 22 tissues including root, shoot, flag leaf, endosperm, embryo, coleoptile, and mesocotyl through meta-analysis using Genevestigator software. The heatmap produced after the analysis revealed expression potential of the individual probe sets in different tissues ( Figure 5). Cluster analysis of the expression potential of the probe sets distinctly grouped the probe sets into three major clusters i.e. I, II, and III of 44, 23, and 43 probe sets, respectively ( Figure 5), which was very similar to the clusters produced for the development stages (described in the previous section). The percentage of probe sets grouped in both clusters (Figures 4 and 5) was very similar i.e. 93.2% (cluster I), 95.7% (cluster II), and 90.7% (cluster III). Briefly, the expression level of the probe sets in cluster I was very strong (red color in the heatmap in Figure 5) and global as they were present in the most of the 22 tissues. Except a few probe sets, the expression level of the majority of the probe sets in cluster II and III were specific to one to two tissues ( Figure 5). Majority of the probe sets in cluster III were related to seed storage protein genes such as gliadins (alpha gliadin, alpha/beta gliadin, gamma gliadin), glutenin (low molecular weight and high molecular weight glutenin subunits), and avenin like proteins.
The heatmaps providing information on spatial distribution of the genes would assist in designing functional genomics experiments specific to a particular growth stage or tissue. The correct interpretation of results is very important for formulating new hypothesis and models, and designing proper experiments to test them [39].

Pathway analysis of the 35,472 probe sets
The 35,472 probe sets (see Additional file 1) which passed the gene-specific two-way ANOVA (corrected p value of 0.05) were mapped to metabolism pathways by using MapMan software on the Triticum aestivum mapping data, "Taes_Affy_0709". The genes of several metabolic (See figure on previous page.) Figure 4 A heat map of the 110 probe sets (identified for quality) indicating level of expression potentials in 10 development stages such as germination, seedling growth, tillering, stem elongation, booting, inflorescence emergence, anthesis, milk development, dough development, and ripening. The expression potentials of the 110 probe sets were estimated in 1,328 samples which were available in the Affymetrix®'s Triticum aestivum microarray database. The darkest red color represents the highest level of probe set expression potential. The expression potential is defined as the average of the top 1% signal values across all samples for a given probe set in a given platform. The heat map was generated in Genevestigator (Nebion AG, Zurich, Switzerland). . This indicates that genes for seed development and processing quality are regulated at early stage of seed development.

Validation of differential expression by quantitative RT-PCR
The expression profiles of a few randomly chosen genes were validated by quantitative RT-PCR. A set of reference genes has been recently proposed for common wheat such as gene for cell division control protein, AAA-superfamily of ATPases (CDC, Ta.54227), ADPribosylation factor, ARF (ADPRF, Ta.2291), and RNase L inhibitor-like protein (RLI, Ta.2776) [40]. For this study one house-keeping gene such as ARF was used as a reference gene for quantitative validation of the expression data. The two differentially expressed genes, pre-α/β-gliadin and γ-gliadin, were randomly chosen for validation of the microarray data (Tables 5 and 6, Figure 6). The individual C T values of the house-keeping gene at the three seed development stages, 7, 14, and 28 DAA, in the two diverse chapatti making varieties, C306 and Sonalika were used for normalization. The two target genes showed similar difference in their expressions through seed developmental stages between the varieties, which are largely in agreement with the microarray data (Table 6). For example, the expressions of pre-α/β-gliadin and γ-gliadin genes were very high in the variety C306 than the variety Sonalika at early stage seed development, 7 DAA ( Figure 6, Table 6). Hence, the microarray data were validated by qualitative PCR.

Conclusion
Genome-wide transcriptome analysis with help of twoway ANOVA in developing caryopsis identified a substantial number of differentially (at least 10-fold) expressed genes in two diverse sets of Indian wheat varieties for chapatti quality. Many key processing quality related genes such as different subunits of glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, peroxidase, GBSS, etc. were identified. In addition, many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of majority of the candidate genes for good chapatti was involved in interaction of quality and development. Most of these probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expression of the two candidate genes, pre-α/β-gliadin and γ-gliadin between good and poor chapatti quality varieties was validated by quantitative real time PCR. Therefore, this study able to identify several known processing quality related genes and many additional candidate genes for good chapatti and their interactions with development and temporal and spatial distributions. Gene identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing programs for improvement of processing quality either by changing expression (over expression or down regulation) of genes (See figure on previous page.) Figure 5 A heat map of the 110 probe sets (identified for quality) indicating level of expression potentials in 22 wheat tissues such as endosperm, glume, caryopsis, embryo, leaf, root, coleoptile, mesocotyl, seedling, sheath, shoot, shoot apex, leaf, flag leaf, crown, inflorescence, spikelet, pistil, anther, glumes, caryopsis, endosperm, embryo. The expression potentials of the 110 probe sets were estimated in 1,405 samples available in the Affymetrix®'s Triticum aestivum microarray database. The darkest red color represents the highest level of probe set expression. The expression potential is defined as the average of the top 1% signal values across all samples for a given probe set in a given platform. The heat map was generated in Genevestigator (Nebion AG, Zurich, Switzerland). 'Root' labelled at 16th column of the heatmap represents 'roots of seedling'. 'Crown' labelled at 17 th column of the heatmap represents 'Crown of seedling'. Table 5 Nucleotide sequences of forward and reverse primers of two randomly chosen candidate genes for processing quality (pre-α/β-gliadin and γ-Gliadin) and one control (ADP ribosylation factor, ARF) used for their validations of differential expression between good and poor quality varieties using quantitative real time PCR (qRT-PCR)

Wheat microarrays
The GeneChip® Wheat Genome Array (Affymetrix, Santa Clara, USA), a 3' in vitro transcription (3′ IVT) expression array containing 61,290 probe sets representing about 25 K unigenes was used for gene expression analysis. The sequence and annotation information of the probe sets on the array are available at http://www.affymetrix.com.

Microarray hybridization and scanning
RNA labelling and microarray hybridization on wheat genome arrays were performed according to its manual. The arrays were processed on an Affymetrix GeneChip® Fluidics Station 450 by running fluidic script EukGE-WS2v5-450. The hybridized arrays were scanned on a Affymetrix GeneChip® Scanner 3000. Hybridization quality of the microarrays was verified by scaling factor, overall hybridization rate, and signal strength of several bacterial spike controls.

Microarray data analysis i) Normalization
The expression console of Affymetrix's GeneChip Command Console (AGCC) software was used for preliminary data analysis of the scanned arrays. The image files were created as *.dat file. The software computed cell intensity data of probe sets and their Table 6 Fold change values of the validated (by qRT-PCR) pre-α/β-gliadin and γ-gliadin genes between good and poor quality varieties at three seed development stages using qRT-PCR and microarrays  Validation of differential expression (fold change) of two randomly chosen candidate genes (pre-α/β-gliadin and γ-Gliadin) at three seed development stages using qRT-PCR. The validation was done between a good (C306) and a poor (Sonalika) chapatti quality varieties in 7, 14, and 28 days after anthesis (DAA) of seed developmental stage through quantitative real time PCR (qRT-PCR). Y-axis represents fold change in differential expression of genes in the wheat variety, C306 in comparison to the wheat variety, Sonalika. X-axis represents three seed development stages i.e. 7, 14, and 28 DAA. The expression data were normalized to that of a control gene, ADP ribosylation factor, ARF. qRT-PCR data analysis was done following Livak and Schmitteng (2001) [46]. The probe sets satisfying the corrected p value in two-way ANOVA analysis were used for fold change analysis in GeneSpring software. The probe sets showing at least two-fold and ten-fold differential expressions between good and poor groups were used for further analysis. iv) Cluster analysis of gene expression data The probe sets in each of the three groups (quality, developmental stage, and interaction) as mentioned above were grouped by assuming five clusters by estimating pairwise Euclidean distance between genes, 50 iterations, and K-means algorithm implemented in GeneSpring software. The probe sets were clustered for identification of the probe sets with similar expression profiles.
v) Meta-analysis of the differentially expressed genes Affymetrix® wheat microarray expression data conducted in a wide variety of growth stages and tissues were available online. These data were good resources for meta-analysis to study spatial distribution of the gene expression data in different growth stages and tissues/organs. This information would assist in designing functional genomics experiments specific to a particular growth stage or tissue. The spatial distribution of the differentially expressed genes (at least 10-fold between good and poor varieties) was studied in ten growth stages, namely, germination, seedling growth, tillering, stem elongation, booting, inflorescence emergence, anthesis, milk development, dough development, and ripening. Their spatial distributions were also studied in 22 tissues/organs; namely, shoot apex, crown, sheath, caryopsis, seedling, spike, spikelet, flag leaf, shoot, shoot-leaf, seedling-leaf, crown, roots, seedling-root, glume, coleoptile, anther, mesocotyl, embryo, endosperm, inflorescence, and pistil. Their spatial distributions were represented in form of heatmaps. Meta-analysis was conducted using Genevestigator software which retrieved the expression data from the web [41,42]. Further, the expression data was clustered through hierarchical clustering on the estimated Euclidean distance matrix using optimal leaf-ordering implemented in the software. vi) Functional analyses of the differentially expressed data The differential expression data of the 35,472 probe sets satisfying the corrected p value of 0.05 in twoway ANOVA (See Additional file 1) was assigned to metabolic pathways by MapMan software (version 3.5.1.R2) using the Triticum aestivum mapping data "Taes_Affy_0709" [43][44][45].

Validation of differential expression by quantitative RT-PCR
Quantitative real time PCR (qRT-PCR) was performed for validation of a few differentially expressed genes. Five microgram of the extracted RNA was treated with TURBO DNase™ (Ambion, Life Technologies, NY, USA) before cDNA synthesis according to the manufacturer manual. The SuperScript® III First-Strand Synthesis kit (Invitrogen, Life Technologies, NY, USA) was used for cDNA synthesis using oligodT (18-mer). The reaction mixture was incubated at 50°C for 50 min, then at 85°C Table 7 Detail of two randomly chosen candidate genes for processing quality (pre-α/β-gliadin and γ-Gliadin) and one control (ADP ribosylation factor, ARF) used for validation of their differential expressions between good and poor quality varieties using quantitative real time PCR (qRT-PCR) for 5 min, and chilled immediately on ice. The mixture was diluted with nuclease free water to the final concentration of 20 ng/μl cDNAs. The primers used for PCR were designed to the same region harbouring the microarray probes using Primer Express software (Applied Biosystems, Forster City, CA, USA) ( Tables 7 and 5). qRT-PCR was done in triplicates using SYBR Green fluorescence dye in a qRT-PCR machine (7500 Fast Real-Time PCR System, Applied Biosystems). Each reaction was prepared using 5 μl of 2X QuantiTect SYBR Green (Qiagen, USA), 1 μl of 20 ng/μl cDNA, and 1 pmol/μl each of forward and reverse primers, in a total volume of 10 μl. The cycling conditions were: 30 sec at 95°C, followed by 45 cycles of 95°C for 30 sec and 60°C for 1 min. To evaluate the presence of non-specific PCR products and primer dimers, the amplified PCR products were run at temperature ramp of 95°C for 15 sec, 60°C for 15 sec, followed by 20 min of slow ramp from 60°C to 95°C. The threshold cycles (Ct) of each target genes were normalized in each wheat varieties with Ct of the internal control (wheat ARF gene: AB050957.1). Normalization and quantification of relative changes in gene expression or fold changes between one good and one poor varieties were calculated by using the following formula: FC = 2 -Δ (ΔCt) [46]. All microarray experimental data have been deposited in the NCBI's Gene Expression Omnibus (GEO) database, http://www.ncbi.nlm.nih.gov/geo/, with accession number GSE53606.