Deep sequencing reveals the complex and coordinated transcriptional regulation of genes related to grain quality in rice cultivars

Background Milling yield and eating quality are two important grain quality traits in rice. To identify the genes involved in these two traits, we performed a deep transcriptional analysis of developing seeds using both massively parallel signature sequencing (MPSS) and sequencing-by-synthesis (SBS). Five MPSS and five SBS libraries were constructed from 6-day-old developing seeds of Cypress (high milling yield), LaGrue (low milling yield), Ilpumbyeo (high eating quality), YR15965 (low eating quality), and Nipponbare (control). Results The transcriptomes revealed by MPSS and SBS had a high correlation co-efficient (0.81 to 0.90), and about 70% of the transcripts were commonly identified in both types of the libraries. SBS, however, identified 30% more transcripts than MPSS. Among the highly expressed genes in Cypress and Ilpumbyeo, over 100 conserved cis regulatory elements were identified. Numerous specifically expressed transcription factor (TF) genes were identified in Cypress (282), LaGrue (312), Ilpumbyeo (363), YR15965 (260), and Nipponbare (357). Many key grain quality-related genes (i.e., genes involved in starch metabolism, aspartate amino acid metabolism, storage and allergenic protein synthesis, and seed maturation) that were expressed at high levels underwent alternative splicing and produced antisense transcripts either in Cypress or Ilpumbyeo. Further, a time course RT-PCR analysis confirmed a higher expression level of genes involved in starch metabolism such as those encoding ADP glucose pyrophosphorylase (AGPase) and granule bound starch synthase I (GBSS I) in Cypress than that in LaGrue during early seed development. Conclusion This study represents the most comprehensive analysis of the developing seed transcriptome of rice available to date. Using two high throughput sequencing methods, we identified many differentially expressed genes that may affect milling yield or eating quality in rice. Many of the identified genes are involved in the biosynthesis of starch, aspartate family amino acids, and storage proteins. Some of the differentially expressed genes could be useful for the development of molecular markers if they are located in a known QTL region for milling yield or eating quality in the rice genome. Therefore, our comprehensive and deep survey of the developing seed transcriptome in five rice cultivars has provided a rich genomic resource for further elucidating the molecular basis of grain quality in rice.

price of rice in the market. Eating quality is determined by water, protein, starch, and fat content [10][11][12][13][14]. Eating quality is negatively correlated with protein content, stickiness, and hardness of rice [10,11]. The main factors affecting both eating and cooking quality of rice are amylose content, gel consistency and gelatinization temperature [12,13,15,16]. Cooked rice with high amylose content is flaky, dry, hard and non-sticky while rice with low amylose content is sticky, moist, tender and glossy [12,13]. Developing cultivars with high milling yield and eating quality have been the main objectives in rice breeding programs in the last few decades.
Milling yield and eating quality are complex traits controlled by quantitative trait loci (QTLs) [17]. In the last several years, many QTLs for eating quality have been mapped in the rice genome. For example, using chromosome segment substitution lines (CSSLs), Wan et al. [18] identified a total of 25 QTLs for nine eating quality traits. Many QTLs affecting different quality traits are mapped in the same chromosomal regions. Six QTLs are non-environment-specific and could be used for marker-assisted selection in rice quality improvement. Recently, Hao et al. [19] constructed 154 CSSLs for QTL mapping of quality traits. In that study, 10 QTLs for rice appearance traits and eight QTLs concerned with physico-chemical traits were detected. QTLs related to glossiness of cooked rice were identified in different genomic regions in Ilpumbyeo, a high grain quality rice in Korea [20]. The amylose content of rice is governed by the waxy (Wx) locus and mapped to chromosome 6 [21][22][23]. In contrast to the advances in genetic analysis of eating quality, less progress has been made on the genetic analysis of milling quality because the trait has low heritability and is sensitive to environmental factors [24,25]. Another challenge for milling yield analysis is that many mapping populations for milling yield had varied kernel shape among the individual lines and heterogeneity in grain dimensions confounds the assessment of genetic effects [9,24,[26][27][28][29][30][31]. Recently, a mapping study identified six QTLs responsible for head rice (milling) yield using recombinant inbred lines (RILs) derived from crosses of common parent Cypress (high milling) with RT0034 (low milling) and LaGrue (low milling) [9].
The molecular and biochemical basis of grain quality in cereals have been studied in the last decade, and the biochemical processes and many participating genes in the synthesis of starch [32][33][34], storage proteins [35][36][37][38][39], and lysine within the aspartate family amino acid pathway [40] have been characterized in rice and other cereals. However, how the expression of these genes is coordinated and regulated during grain filling is still poorly understood. Recently, Tian et al. [41] demonstrated that starch synthesis-related genes form a fine network to control eating and cooking qualities by regulating amylose content, gel consistency, and/or gelatinization temperature, and through genetic modification of any of these starch synthesis-related genes, eating and cooking quality can be improved in rice. The expression of 44 genes participating in three pathways (the synthesis of starch, storage proteins, and lysine) during rice grain filling were examined by RT-PCR in the maternal line 93-11 and in the super-hybrid rice line Liang-You-Pei-Jiu (LYP9) [3]. The analysis revealed diverse yet coordinated expression profiles of genes involved in the three pathways in developing seeds. These unique expression patterns of the quality-related genes may influence the final composition and property of starch, protein, and lysine synthesis in rice seeds.
Tools for whole-genome expression analysis like microarrays, serial analysis of gene expression (SAGE) and massively parallel signature sequencing (MPSS) have been widely used for transcriptome analysis in plants in last 10 years [42]. The sequencing-by-synthesis (SBS) secondgeneration sequencing method has been recently used for transcriptome analysis in many organisms because of its low cost and large sequencing output [43]. In this study, we used both MPSS and SBS to analyze the transcriptome of the developing rice seeds in five cultivars that differed in milling yield and eating quality. Many differentially expressed novel transcripts and genes involved in the biosynthesis of starch, aspartate family amino acids, and storage proteins were identified. Promoter analysis revealed the presence of hundreds of novel conserved patterns of cis regulatory elements in the up-regulated genes and putative co-expressed genes in the rice cultivars with high milling yield and good eating quality. Our comprehensive and deep survey of the developing seed transcriptome in five rice cultivars has provided an excellent starting material for further elucidating the molecular and biochemical basis of milling and eating quality in rice.

Results
Characteristics of the MPSS and SBS libraries and their matching to the rice genome and to EST and full-length cDNA databases Both MPSS and SBS tags are short cDNA tags or digital gene expression tags, which are mainly derived from the 3' regions of a transcript [44]. About 1.0 to 1.3 million 17-base MPSS signatures and about 2.0 to 4.0 million 20base SBS signatures were obtained in the 10 libraries (Table 1). These signatures were clustered and then processed with reliability and significance filters as described by Meyers et al. [45] (Additional File 1). For comparison of the expression levels across the libraries, the frequency of signatures in the individual libraries was normalized to one million (transcripts per million or TPM) [45]. The number of distinct signatures ranged from 12,000 to 18,000 in the MPSS libraries and from 77,000 to 165,000 in the SBS libraries. The SBS libraries contained two to three times more significant signatures (≥4 TPM) than the MPSS libraries. About 79 to 85% of the MPSS and 89 to 95% of the SBS significant signatures matched to the japonica (Nipponbare) genomic sequence ( Table 1). The significant MPSS and SBS signatures from all five libraries were classified into seven classes based on their location on the annotated genes according to the method previously described by Meyers et al. [45] (Additional File 2).

Correlation of the transcriptomic results generated by the MPSS and SBS technologies
From 62 to 77% of the significant signatures overlapped between the MPSS and SBS libraries (Table 1). Further, we used all the significant signatures in the MPSS and SBS libraries of the same cultivar for Pearson correlation coefficient analysis. The correlation coefficient was low when unfiltered MPSS and SBS data were used ( Table 2). Removal of a small fraction of outliers (3-8, < 0.001% of the signatures) increased the correlation coefficient significantly in all five libraries ( Table 2). For example, the correlation coefficient between the two YR15965 libraries was increased from 0.58 to 0.90 after removal of only four of 5,757 signatures.
Expression patterns of grain quality-related genes in the cultivars with high milling yield and good eating quality Data mining of the TIGR rice annotated genes (pseudomolecules version 5) identified 338 grain quality-related genes belonging to starch biosynthesis and degradation, seed storage protein synthesis (glutelin, globulin, and prolamins), seed maturation, seed allergen synthesis, seed development, and biosynthesis and degradation of aspartate family amino acids (aspartate, asparagine, threonine, isoleucine, methionine, and lysine). We examined the expression level of these genes in developing rice seeds of the five cultivars (Additional File 3). In both SBS and MPSS libraries, a total of 419 (16 grain-related genes) and 168 genes (3 grain-related genes) were ≥5-fold up-and down-regulated, respectively, in Cypress relative to both LaGrue and Nipponbare (Table 3). Similarly, 518 (8 grainrelated genes) and 106 genes (4 grain-related genes) were ≥5-fold up-and down-regulated, respectively, in Ilpumbyeo relative to both YR15965 and Nipponbare ( Table 3). The number of 5-fold up-and down-regulated antisense genes, genes with antisense transcripts, and genes encoding transcription factors (TFs) in Cypress (compared to both LaGrue and Nipponbare) and Ilpumbyeo (compared to both YR15965 and Nipponbare) are also listed in Table  3.
Interestingly, we found that genes encoding enzymes involved in the biosynthesis of starch underwent alternative splicing (Figure 1). For example, genes involved in the breakdown of long linear glucan leading to β-Dglucose-6-phosphate (Os03g55090 encoding phosphorylase and Os03g50480 encoding phosphoglucomutase) . Some of the 5fold up-regulated genes identified by either MPSS or SBS also had alternative splicing forms, and these included genes encoding glucose-1-phosphate adenylyltransferase large subunit 1 (also called AGPase) (Os01g44220) and 1,4-α-glucan branching enzyme (Os06g51084). Similarly, some of the 5-fold down-regulated genes identified by either MPSS or SBS produced alternative splicing forms, and these included genes encoding 1,4-α-glucan branching enzyme (Os06g51084) and phosphoglucomutase (Os03g50480) (Additional File 4A, 4B, 4C). These results showed the complexity of the transcription of quality-related genes in developing rice seeds.
For validation of the MPSS data, two starch biosynthesis-related genes that showed differential expression in the grain libraries were selected for strand specific RT-PCR. These two genes encode AGPase (AK073146) and GBSS I (AK070431). Total RNA was isolated from the developing seeds of Cypress, LaGrue, Ilpumbyeo, YR15965 and Nipponbare at 3, 6, 9, 12 and 15 DAF (days after flowering). A time-course study of the AGPase and GBSS I genes indicated that expression levels were higher in the high milling Cypress than in the low milling LaGrue in the early stages (6 and 9 DAF) of seed development ( Figure 2).

Genes encoding essential amino acids
The aspartate family pathway consists of five amino acids (asparagine, aspartate, lysine, methionine, and threonine), and is catalysed primarily by the enzymes aspartate kinase (AK) and dihydrodipicolinate synthase (DHPS). The regulatory network of the genes involved in the biosynthesis and degradation of aspartate family amino acids is plotted in Additional File 5. The genes involved in the metabolism of the aspartate family amino acids with 5fold up-or down-regulation in Cypress and Ilpumbyeo compared to their controls (LaGrue, YR15965, and Nipponbare) are listed in Additional File 4A, 4B, 4C, and Additional File 5). Some of the important genes for amino acid biosynthesis showed similar expression patterns in both MPSS and SBS libraries (Table 4 and Additional File 4A, 4B, 4C). For example, the genes encoding aspartate transaminase (Os01g55540), methionine adenosyltransferase (Os01g22010), and acetolactate synthase (Os03g21080) were 5-fold up-regulated in Ilpumbyeo compared to YR15965 and Nipponbare in both SBS and MPSS libraries. In contrast, some of the genes involved in aspartate family amino acid biosynthesis were downregulated, including those encoding threonine synthase (Os01g49890), aspartate kinase (Os03g63330), and malate dehydrogenase (Os10g33800) (Additional File 4A,   4B, 4C). In addition, many of the genes involved in the amino acid biosynthesis also underwent alternative splicing. Among them, some showed 5-fold up-regulation in Ilpumbyeo in either the MPSS or SBS libraries, and these included genes encoding L-3-cyanoalanine synthase (Os04g08350), methionine gamma-lyase (Os09g28050), and asparaginase (Os04g46370), which showed two, two, and three alternative splice forms, respectively (Additional File 4A, 4B, 4C).

Genes encoding seed maturation and allergenic and seedspecific expression proteins
Some of the genes belonging to this group showed similar expression patterns in both MPSS and SBS libraries (Table 4 and Additional File 4A). For example, the genes encoding seed-specific protein Bn15D14A (Os03g58480) and seed-maturation protein LEA4 (Os09g10620) were > 5-fold up-regulated in Cypress compared to LaGrue and Nipponbare in both MPSS and SBS libraries. However, the seed-allergenic protein RA5 precursor gene (Os07g11510) was up-regulated 15-fold in Ilpumbyeo compared to YR15965 (Table 4; Additional File 4A).
Expression patterns of TF genes in cultivars with high milling and good eating quality up-and down-regulated in Cypress and Ilpumbyeo compared to the controls (Table 3; Additional File 6). A total of 37 and 14 TF genes showed 5-fold up-regulation in Cypress and Ilpumbyeo libraries, respectively, in both SBS and MPSS libraries (Additional File 6). Similarly, 50 and 5 TF genes were down-regulated in Cypress and Ilpumbyeo, respectively, in both libraries. Some TFs were specifically up-regulated in either Cypress or Ilpumbyeo compared to the controls in both libraries. These TF genes encode PHD-finger family protein (PHD family; Os01g65600), zinc finger CCCH type domain containing protein ZFN-like 2 (C3H family; Os01g68860), transfactor (G2-like; Os06g40710), and bZIP transcription factor family protein (bZIP family; Os06g45140) (Additional File 6).

Identification of the conserved cis motifs among the upregulated genes in cultivars with high milling and good eating quality
The promoter sequences (1.0 kb before the ATG site) of the highly up-regulated genes (≥50-fold) in Cypress (compared to LaGrue and Nipponbare) and Ilpumbyeo (compared to YR15965 and Nipponbare) identified in both SBS and MPSS libraries were analyzed using the 'PLACE Signal Scan Search' software http://www.dna.affrc.go.jp/htdocs/ PLACE/. Many conserved motifs were present in the upregulated genes in Cypress and Ilpumbyeo, and these included CAATBOX1, WRKY71OS, GATABOX, EBOXB NNAPA, SEF4MOTIFGM7S, CGACGOSAMY3, WBOX HVISO1, CAREOSREP1, CANBNNAPA, AMYBOX1, AACACOREOSGLUB1, BOXIIPCCHS, 2SSEEDPROT-BANAPA, ACGTABOX, AMYBOX2, ACGTCBOX, ACGTOSGLUB1, CEREGLUBOX2PSLEGA, and GAD OWNAT (Additional File 7). Interestingly, many of the motifs have been reported to play a role in seed development and germination (Additional File 7) .

Discussion
Rice is a major source of nutrition for most people in the developing world. Although tremendous achievements have been made for the improvement of many agronomic traits in rice in the last three decades, much less progress has been obtained for quality traits due to the lack of simple and efficient selection methods in rice breeding. With rapid advancement in crop molecular breeding, marker-aided selection has been successfully applied in many crop plants. Similarly, new methods for genetic engineering of better crop plants have been reported in the last decade by overexpressing or gene silencing of candidate genes. Although several eating quality QTLs have been identified in previous studies [18,19], it is not clear whether these QTLs are useful for marker-aided selection or not because the genomic regions of these QTLs have not been further characterized. Recently Nelson et al. [9] identified six main-effect milling yield QTLs in the two RIL populations derived from crosses of common parent Cypress with RT0034, a low-milling yield japonica line and LaGrue, a low-milling yield japonica cultivar, respectively. In this study, we used two high throughput sequencing technologies to profile the transcriptome of five cultivars differing in milling yield and eating quality. Many genes specifically or commonly expressed in the high milling yield cultivar Cypress and the good eating quality cultivar Ilpumbyeo were identified from the MPSS and SBS libraries. These candidate genes are excellent starting materials for the development of molecular markers linked to milling quality in the US and eating quality in Korea for rice breeding. It is also possible that overexpression or silencing of some candidate genes will lead to the generation of transgenic rice plants with superior grain quality.
During the rice seed development, sugars, amino acids, and other important metabolites are transported from source (primarily leaves) to sink (seeds). Once in the seeds, these metabolites are allocated to different biosynthetic pathways (primarily starch metabolism and storage protein biosynthesis) to produce mainly starch and proteins in precise quantities and ratios. Achieving such a defined composition of starch and proteins require the regulation and coordination of various pathways so that, at each developmental stage, the participating enzymes are present in appropriate amounts and in the correct cellular compartments [3]. AGPase and GBSS I play important roles during starch biosynthesis in rice [71]. The genes encoding for AGPase and GBSS I enzymes are highly expressed 7 to 28 days after flowering during grain development, and their expression is highly correlated with the increases in both starch content and grain weight. The AGPase gene is also highly expressed in the high-yield cultivars of both glutinous and non-glutinous rice [71]. In addition, AGPase (Os01g44220) undergoes alternative splicing similar to the AGPase small subunit gene in barley [72]. Duan and Sun [3] showed that a mutation in the GBSS I gene leads to a lower level of functional GBSS I mRNA and correspondingly to a lower level of GBSS I enzyme for amylose synthesis, which causes a reduction in amylose accumulation. During rice seed formation, the genes encoding AGPases are active 3 days before flowering and maintain an intermediate although declining level of activity during seed maturation [3]. Genetic variation survey showed that the polymorphism in the rice waxy gene encoding the GBSS enzyme explains much of the variation in apparent amylose content across 92 important long, medium and short grain US rice cultivars and 101 progeny of a cross between low-amylose and intermediate-amylose breeding lines [73,74]. The amylose content and the level of waxy protein in 31 rice cultivars Venu et al. BMC Genomics 2011, 12:190 http://www.biomedcentral.com/1471-2164/ 12/190 from China were correlated with the ability of the cultivar to excise intron I from the leader sequence of the Wx transcript [75]. In this study, we found that the important starch biosynthesis related genes encoding AGPase (Os01g44220), 1,4-α-glucan branching enzyme (Os02g32660), limit dextrinase (Os04g08270), 1,4-α-glucan branching enzyme (Os06g51084), and α-amylase (Os09g29404) were up-regulated in Cypress compared to LaGrue and Nipponbare in six-days old developing seeds. Our time-course RT-PCR analysis also confirmed that expression of AGPase and GBSS I genes was higher in the high milling cultivar Cypress than in the low milling cultivar LaGrue early (6 and 9 DAF) in seed development. These results suggest that these two genes related to starch synthesis may greatly affect milling yield. Starch biosynthesis is also associated with complex genotypic-environmental interactions in maize endosperm [76]. Since the plants in this study were grown in the controlled environmental conditions (growth chambers), the effect of environmental factors on the expression of the starch biosynthesis genes should be tested in the field conditions.
Cereal proteins are generally deficient in lysine, but lysine content might be increased with increased accumulation of the precursor molecules required for the enzymatic reactions involved in lysine metabolism. The key precursor molecules include lactate, acetyl CoA, malate, L-aspartate, L-asparagine, L-aspartate-semialdehyde, homoserine, homocysteine, 2-oxobutanoate, 2-aceto-1hydroxybutyrate, and α-ketoglutarate, and the enzymes involved in their production are very important (Additional File 5). Enhancing the production of these precursor molecules will require the identification of the genes encoding these enzymes. In this study, we found that the genes encoding malate dehydrogenase (Os03g56280, Os01g46070) and aminotransferase (Os09g28050, Os03 g18810) involved in the production of malate and aspartate in Cypress and Ilpumbyeo, respectively, were up-regulated compared to the controls. Genes encoding aspartate transaminase (Os01g55540) and enoyl-CoA hydratase (Os02g43720) enzymes, which are responsible for the production of acetyl CoA, were also up-regulated in Cypress compared to the controls. Similarly, the gene encoding lactoylglutathione lyase (Os05g07940), which is responsible for the production of lactate, was up-regulated in Cypress compared to the controls. As indicated, genetic manipulation of the expression levels of these precursors/ enzymes may lead to an increased accumulation of lysine in the endosperm and thus an increased nutritional value of the rice seeds.
In the last decade, oligoarrays, SAGE, MPSS, and SBS have been widely used for transcriptome profiling. MPSS and SBS have been recently used for whole-genome transcription analysis and have generated abundant expression data for many organisms [42,44,45]. In this study, both MPSS and SBS technologies were used to analyze the transcriptomes of the 6-days-old developing seeds in five rice cultivars. The number of redundant and non-redundant signatures generated in this study were similar to those in previous reports in rice and Arabidopsis [43,45,77]. Although MPSS generates large volume of data, its complicated library-construction procedure and high sequencing cost limit its use in individual laboratories. As the cost of the next-generation sequencing methods has significantly decreased in the last few years, SBS sequencing has become a popular method for transcriptome analysis because it costs 90% less than MPSS and can generate at least three times more transcripts. Furthermore, in the current study, about 30% more transcripts were found in the SBS library than in the MPSS library. Many of these additional signatures are low-copy transcripts, indicating that SBS is a powerful method for identifying rare transcripts [43]. The correlation coefficient is higher between MPSS and SBS than between RL-SAGE and microarray [78], or between RL-SAGE and MPSS or MPSS and microarrays as in previous studies [79]. Therefore, SBS will undoubtedly become the preferred high throughput sequencing method for deep transcriptome analysis in plants.

Conclusion
Breeding for milling yield and eating quality in rice has been a daunting task due to the low genetic inheritability of both traits and the lack of molecular markers linked to the phenotypes. Genetic mapping of the two traits is also challenging because the traits are easily affected by environmental factors in the field. Using two high throughput sequencing methods, we identified many differentially expressed genes in developing rice seeds that may affect milling yield or eating quality. Many of the identified genes are involved in the biosynthesis of starch, aspartate family amino acids, and storage proteins. Some of these potential candidate genes could be used for the development of molecular markers for breeding programs or for the engineering of rice cultivars with high milling yield and eating quality. Our study provides a valuable genomic resource for both improvement of rice grain quality and for the characterization of grain quality pathways at the molecular and biochemical levels.

Plant materials, developing seeds harvest and growth conditions
Five rice cultivars including Cypress, LaGrue, Ilpumbyeo, YR15965, and Nipponbare were used in the study. Cypress (japonica cultivar) is a long grain cultivar with high yield and high milling quality released by Louisiana Venu et al. BMC Genomics 2011, 12:190 http://www.biomedcentral.com/1471-2164/ 12/190 State University. Cypress dries down slowly in the field, avoiding grain fissuring, cracking and chalkiness that reduce milling quality http://agebb.missouri.edu/rice/ research/99/pg5.htm [80][81][82][83][84]. LaGrue (japonica cultivar), a long grain variety released by the University of Arkansas in 1993, has low milling quality [80][81][82][83][84]. Both Cypress and LaGrue seeds were provided by Dr. Robert Fjellstrom, USDA-ARS Dale Bumpers National Rice Research Center, Stuttgart, Arkansas, USA. Ilpumbyeo (japonica cultivar) is a good eating quality cultivar with low amylose content [85][86][87]. YR15965 (japonica cultivar) is a low eating quality rice, derived from a cross between Hwayeongbyeo (temperate japonica variety) and Shennung 89-366 (sub-tropical japonica) [86]. Both Ilpumbyeo and YR15965 seeds were provided by Dr. Gynheung An, Crop Biotech Institute, Kyung Hee University, Korea. Nipponbare (japonica cultivar) was used as a control for milling and eating quality with Cypress, LaGrue, Ilpumbyeo and YR15965. All the five cultivars were grown in 3 replications in a Conviron growth chamber at 80% relative humidity with 12 h of light (500 μmol photons m-2 sec-1) at 26°C followed by 12 h of dark at 20°C. The spikelets were labeled on the day of anthesis to identify the age of developing seeds in a panicle. The developing seeds were harvested from the panicles at 3, 6, 9, 12 and 15 D after anthesis. The excised developing seeds from the panicle were freezed immediately in liquid nitrogen.

RNA isolation and RT-PCR
Total RNA was isolated from developing rice seeds harvested from Cypress, LaGrue, Ilpumbyeo, YR15965 and Nipponbare plants using Trizol reagent (Invitrogen). For removal of polysaccharides/polyglycons from the extract, the extracted RNA was purified twice by high salt precipitation according to the manufacturer's instructions. For the MPSS and SBS library construction, RNA isolated from the 6-days (D)-old developing seeds (intermediate stage of grain filling) was used. For the timecourse RT-PCR validation experiments, RNA isolated at 3, 6, 9, 12 and 15 D old developing seeds was used. RT-PCR was performed as described previously [78].

MPSS and SBS library construction, sequencing, and bioinformatics
MPSS and SBS libraries were constructed using the RNA obtained from 6 days old developing seeds from Cypress (MPSS library-PSC; SBS library-SPSC), LaGrue (MPSS library-PSL; SBS library-SPSL), Ilpumbyeo (MPSS library-PSI; SBS library-SPSI), YR15965 (MPSS library PSY; SBS library-SPSY) and Nipponbare (MPSS library-PSN; SBS library-SPSN). MPSS and SBS library construction and sequencing were performed essentially as previously described [43,45,77]. Data analysis was carried out to identify the genes responsible for milling quality and eating quality. The expression profiles of Cypress were compared with that of LaGrue and Nipponbare to identify the genes responsible for milling quality. Similarly, the expression profiles of Ilpumbyeo were compared with that of YR15965 and Nipponbare to identify the genes responsible for eating quality. Bioinformatic analyses including identification of antisense transcripts, alternate transcripts, and TFs were conducted as previously described [43]. Gramene database http://www. gramene.org was used as a reference database for the identification of genes involved in starch metabolism, aspartate amino acid metabolism, storage and allergenic protein synthesis, and seed maturation [88]. The entire dataset is available at the NCBI's Gene Expression Omnibus (GEO) database through the accession number GSM629225 to GSM629233