Genome-wide transcriptome analyses of developing seeds from low and normal phytic acid soybean lines

Background Low phytic acid (lpa) crops are potentially eco-friendly alternative to conventional normal phytic acid (PA) crops, improving mineral bioavailability in monogastric animals as well as decreasing phosphate pollution. The lpa crops developed to date carry mutations that are directly or indirectly associated with PA biosynthesis and accumulation during seed development. These lpa crops typically exhibit altered carbohydrate profiles, increased free phosphate, and lower seedling emergence, the latter of which reduces overall crop yield, hence limiting their large-scale cultivation. Improving lpa crop yield requires an understanding of the downstream effects of the lpa genotype on seed development. Towards that end, we present a comprehensive comparison of gene-expression profiles between lpa and normal PA soybean lines (Glycine max) at five stages of seed development using RNA-Seq approaches. The lpa line used in this study carries single point mutations in a myo-inositol phosphate synthase gene along with two multidrug-resistance protein ABC transporter genes. Results RNA sequencing data of lpa and normal PA soybean lines from five seed-developmental stages (total of 30 libraries) were used for differential expression and functional enrichment analyses. A total of 4235 differentially expressed genes, including 512-transcription factor genes were identified. Eighteen biological processes such as apoptosis, glucan metabolism, cellular transport, photosynthesis and 9 transcription factor families including WRKY, CAMTA3 and SNF2 were enriched during seed development. Genes associated with apoptosis, glucan metabolism, and cellular transport showed enhanced expression in early stages of lpa seed development, while those associated with photosynthesis showed decreased expression in late developmental stages. The results suggest that lpa-causing mutations play a role in inducing and suppressing plant defense responses during early and late stages of seed development, respectively. Conclusions This study provides a global perspective of transcriptomal changes during soybean seed development in an lpa mutant. The mutants are characterized by earlier expression of genes associated with cell wall biosynthesis and a decrease in photosynthetic genes in late stages. The biological processes and transcription factors identified in this study are signatures of lpa-causing mutations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2283-9) contains supplementary material, which is available to authorized users.


Background
Soybean (Glycine max (L.) Merr.) seed is one of the most important agricultural commodities produced worldwide, generating oils, proteins, and carbohydrates [1]. Final seed composition is influenced by both the genotype and environmental factors [2][3][4][5]. Breeding programs endeavor to improve the functional properties, and hence the economic value of soybean by reducing anti-nutritive seed components such as phytic acid. Phytic acid (PA), a major source of phosphorus in seeds, can cause problems such as poor mineral bioavailability and phosphate pollution [6,7]. Low PA (lpa) crops are therefore highly desirable for reducing anti-nutritional and environmental effects of conventional crops [8][9][10]. The lpa soybean line 'V99-5089' carries a non-lethal, recessive mutation in the myo-inositol phosphate synthase (MIPS) 1 gene, whereas 'CX-1834' carries mutations in two multidrug resistant protein (MRP) genes, encoding ATP-binding cassette transporters [11][12][13][14][15][16]. The MIPS1 gene, expressed during seed development in soybean, is associated with the conversion of glucose-6-phosphate to myo-inositol-3monophosphate, which is the first step in PA biosynthesis pathway [17,18]. Loss of function mutation in this gene disrupts this pathway. The MRP genes are also highly expressed in developing embryos, however, the mechanism by which they regulate PA levels in soybean is poorly understood [13].
The PA biosynthesis pathway plays a vital role in maintaining homeostasis. Several pathway intermediates, such as myo-inositol-1,4,5-trisphosphate, act as secondary messengers in signal transduction and are known to regulate growth and developmental processes, including phosphorus and mineral storage, DNA repair, chromatin remodeling, RNA editing and export, ATP generation, regulation of gene expression, regulation of guard cells, auxin metabolism and cell-wall polysaccharide biosynthesis [17,[19][20][21][22][23][24]. Numerous studies have reported the effects of lpa on plant growth and development. An RNAi-mediated mips1 knockdown in soybean was reported to inhibit seed development along with reduced PA content [25]. Similarly, seed embryo defects were reported for Arabidopsis and common bean (Phaseolus vulgaris L.) mips mutants [26,27]. The mips mutation regulates raffinose family oligosaccharide pathway, and mutants exhibit impaired pathogen resistance, programmed cell death in leaves, and polar auxin transport causing deformed cotyledon development [26,[28][29][30][31][32][33]. The mrp mutants are known to exhibit lpa phenotypes in soybean, rice, maize, and Arabidopsis [12,13,34,35]. MRP knockout studies in Arabidopsis exhibit phenotypes such as insensitivity to abscisic acid-mediated germination and unresponsive stomata opening, resulting in reduced transpiration rate and increased drought tolerance; which were rescued by MRP overexpression [36]. Finally, lpa crops are known to show poor seedling emergence, resulting in reduced crop yield, which decreases the agronomic value of lpa crops [37][38][39].
Despite these diverse physiological responses of different lpa mutations, very little is known about the effect of combining lpa mutations together on seed development and the underlying regulation of gene expression in soybean. Bowen et al. [40] investigated microarray-based gene expression changes in developing embryos of barley lpa mutant. This study identified several differentially expressed genes associated with different cellular processes, such as carbohydrate metabolism, hormonal signaling and transport [40]. Recent developments in sequencing technologies have enhanced the scope of genome-wide gene expression studies to a level far beyond microarray. An advanced generation recombinant inbred line (RIL) 3mlpa with three lpa-causing mutations, derived from a bi-parental cross of V99-5089 and CX-1834, provides a higher reduction in PA content of soybean seed relative to those with single mutations [41]. The 3mlpa triple mutant line and another advanced generation RIL 3MWT without any lpa-causing mutations, derived from the same cross (Table 1) are unique genetic materials for gene expression studies. In this report, we used mRNA-sequencing (RNA-Seq) to study the effect of lpa-causing mutations from MIPS1 and MRP genes on global changes in gene expression profiles of developing soybean seeds. A total of 30 transcriptome datasets derived from five developing seed stages with three biological replicates each of 3mlpa and 3MWT soybean line were sequenced and analyzed. To the best of our knowledge, this is the first extensive report describing the gene regulatory effect of MIPS1 and MRP mutations together. The significantly enriched biological processes and transcription factors identified in this study further our understanding of seed development lpa mutants.

Differential expression analyses
Whole soybean seeds comprised of cotyledons, endosperm, and seed coat, were sampled in triplicate from 3mlpa and 3MWT lines at five stages of seed development (Fig. 1). These stages were chosen to correspond to seed filling stages post embryo development and before desiccation [2]. More than 961 million sequencing reads were obtained from 30 mRNA libraries, and 86.69 % (more than 833 million) of the reads were mapped to the annotated soybean reference, Williams 82 genome assembly 1.0 sequence (Fig. 2, Additional file 1). Read counts (number of reads mapping to a given gene) were estimated from the sequence mapping data for all the annotated gene models (total of 54,175) of Williams 82 genome (annotation v1.1). Normalized read count data were used for differential expression analyses. Principal component analysis (PCA) and sample-to-sample distance clustering variance stabilized log 2 transformed normalized read count data for all genes from 30 sample libraries, are shown in Fig. 3. Sample libraries generated from different seed developmental stages were distinctly represented along PC1, in a unidirectional pattern starting from stage 1 to 5 in PCA; whereas those generated from the 3mlpa line were clearly differentiated from their respective 3MWT libraries along PC2 (Fig. 3a). This means that first and second major contributors to variation in the data are seed developmental stages and genotypes, respectively. At the same time, the biological replicates of each stage clustered together, suggesting minimal variance between replicates. Similarities and dissimilarities between individual sample libraries were visualized using heatmap of sample-to-sample distance clustering as shown in Fig. 3b. The sample libraries from early (stages 1-2) and late (stages 4-5) seed development stages were dissimilar to each other, while those from stage 3 are partially similar to both groups.
We identified a total of 6988 (4235 unique) genes with significant differential expression between the 3mlpa and 3MWT lines at five seed developmental stages ( Table 2). Some of these DEGs may be due to non-isogenic background of the experimental lines. Of these differentially expressed genes (DEGs), 3321 (47.5 %) and 3493 (48.5 %) genes were identified as up-and down-regulated in the 3mlpa line. About 174 (2.5 %) and 102 (1.5 %) genes were expressed only in either 3mlpa line or 3MWT, respectively ( Table 2). Several genes were differentially expressed in more than one stage, and 192 genes were differentially expressed between 3mlpa versus 3MWT in all five stages of seed development (Fig. 4).

Functional enrichment analyses
Functional enrichment of gene ontology (GO) and transcription factor (TF) families employing DEGs for each stage was performed using a statistical hyper-geometric test employing the Benjamini-Hochberg method for multiple testing to obtain adjusted-P-values. The enriched results were then filtered using adjusted-P-value < =0.01 (or 1 % FDR) to obtain highly significant enriched terms. Table 3 shows highly significant enriched GO terms identified at five seed developmental stages. Although some terms were found in more than one stage or were stagespecific, none were common in all five stages. In order to simplify the interpretations, we grouped the stages into early (stages 1-2) and late phase (stages 4-5) of seed development. Enriched GO terms from stage 3, showed partial overlap with both early and late phases. Most of the DEGs associated with significantly enriched biological processes in early phase of seed development were up-regulated, whereas those in late phase were down-regulated in 3mlpa line. Hierarchical clustering of mean normalized gene expression levels of DEGs associated with the enriched biological processes of cellular glucan metabolism, apoptosis, cellular transport, photosynthesis and glycolysis is shown in Fig. 5. The biological processes of cellular glucan metabolism, apoptosis and cellular transport were up-regulated in early stages ( Fig. 5a-d), while those associated with photosynthesis and glycolysis were downregulated in late stages ( Fig. 5e-f ) of 3mlpa line. The relative gene expression data from RNA-Seq analysis for two randomly selected DEGs (CESA: Glyma12g36570, and SugT1: Glyma08g06420) were evaluated using qPCR (Fig. 6). The genes CESA and SugT1 are associated with cellular glucan metabolism and cellular transport respectively. The RNA-Seq analysis showed up-regulation of CESA and SugT1 genes in 3mlpa in early stages of seed development. Both real time quantification and RNA-Seq analysis showed similar expression pattern for the CESA and SugT1 genes. There was no significant difference in the gene expression data obtained from qPCR and RNA-Seq analyses at the 0.01 significance level. We also tested relative expression of MIPS1, MRP-L and MRP-N genes, which were not differentially expressed in any of our seed stages. Both qPCR and RNA-Seq analyses estimated similar fold change expression with no significant difference between them (Additional file 2: Figure S1).
Regulation of defense response in early seed development of 3mlpa mutant ' Apoptosis' (GO:0006915) and 'Innate immune response' (GO: 0045087) were enriched in early phases of seed development, particularly in stages 1-3 and stage 3 respectively, with different sets of DEGs identified for these stages (Table 3). While several of these genes are shared between these two categories, a total of 61 DEGs were associated with both. Forty-seven of these genes (77 %) were up-regulated and 14 (23 %) genes were down-regulated in the 3mlpa mutant line. The FC and log 2 FC ratio of apoptosis-related differentially expressed genes are reported in Additional file 3. These defenserelated DEGs encoded (a) LRR (leucine rich repeat) domain-containing disease resistance proteins, (b) NB-ARC (nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4) domain-containing disease resistance proteins, (c) Bcl-2-associated athanogene 1, (d) ADR1-L1 (Activated Disease Resistant 1-like 1), (e) cysteine proteinases, (f ) protein kinase and (g) NTP hydrolases. Apoptosis is a common phenotype observed in mips1 mutants of Arabidopsis thaliana [26,30,43].
The defense-related genes identified in this study lie upstream of the apoptotic processes. For example, the LRR and NB-ARC domain-containing disease resistance genes are involved in initiating the defense response via induction of salicylic acid (SA) and formation of reactive oxygen species (ROS), both of which ultimately leading to cell death. Arabidopsis thaliana mips1 mutants were shown to accumulate SA and ceramide, both associated with formation of a cell death lesion area, with the phenotype rescued by treating with either myo-inositol or galactinol [26,30,43]. Apoptosis in mips1 mutants is also regulated by light intensity and ROS such as peroxisomal hydrogen peroxide and the superoxide anion [26,30,43]. Myo-inositol mitigates SA-dependent apoptosis and defense responses [43]. Although apoptosis is associated with mips1 mutation and not with mrp mutation, induction of the defense-related genes in 3mlpa may not be solely due to mips1 mutation. The ADR1-L1 genes (Glyma14g08700 and Glyma17g36420), which belong to a subgroup of CNL-A clade of coiled-coil NBS-LRR gene family, showed two-fold higher expression in 3mlpa mutant [44]. The ADR1 genes are positive regulators of SA-mediated cell death and defense response [45]. Enhanced expression of ADR1 gene was previously shown to establish drought tolerance in presence of SA [46]. And interestingly, the mrp5 mutants of Arabidopsis are also known to be drought tolerant [36]. Role of MRP genes in regulating defense responses is still unknown, and further studies are required to confirm the association of the drought tolerant phenotype of the mrp mutants and up regulation of ADR1 genes as in 3mlpa. In addition, several WRKY transcription factor family genes that regulate expression of other defense-related genes were also up-regulated in early stages of seed development in 3mlpa mutant. In summary, the lpacausing mutations in 3mlpa mutant may play a role in initiating defense responses during seed development.
Regulation of photosynthesis in late stages of seed development in 3mlpa mutant 'Photosynthesis' (GO:0015979) was enriched in late phase (stages 3-5) of seed development, with most of the photosynthesis-related DEGs from these stages were down-regulated in 3mlpa mutant (Table 3, Fig. 5). Fiftysix DEGs were associated with photosynthesis and of these only one gene (Glyma15g40131) was up-regulated in 3mlpa mutant. These DEGs genes encode different subunits in photosystem (PS) I (psaD, psaE, psaF, psaG, psaH, psaK, psaL, and psaN), and PS II (psbA, psbE, psbP, psbQ, psbW, psbX, and psbY), light-harvesting chlorophyll complex proteins from PS I (LHCA1 and LHCA2) and PS II (LHCB1, LHCB4, LHCB5), and a single gene encoding for magnesium protoporphyrin IX methyl transferase (CHLM) (Additional file 4). A previous study on transcriptome profiles of soybean cultivar 'Williams' reported expression of photosynthesis-related genes early in development when seed size was 25-50 mg [47]. This would correspond to stages 2-3 in our investigation. Bowen et al. [40] previously reported two photosynthesis related probesets that were differentially expressed in Fig. 4 Overlap of differentially expressed genes between developmental stages. Each oval in this Venn diagram corresponds to a seed developmental stage. Numbers within each oval represents significant differentially expressed genes between 3mlpa and 3MWT overlapping between-or unique to-seed developmental stages Out of total 6988 DEGs identified in five seed developmental stages, 4235 were unique, meaning only counted once. Remaining genes were repeatedly identified in more than one stage M955 maize line with lpa at 7 days post anthesis. One representing chloroplast nucleoid DNA binding protein was down-regulated, while another representing early light inducing protein, was up-regulated in M955 line [40]. Although these proteins were not identified in our study, regulation of photosynthesis-related genes in lpa mutants is a common observation. The PSI and PSII protein complexes are involved in capturing light energy during photosynthesis. Differential expression of these genes may translate into different numbers of PSI and PSII protein complexes between 3mlpa and 3MWT. The methyl transferase CHLM is involved in chlorophyll metabolism where it catalyzes the formation of Mg-protoporphyrin IX monomethyl ester [48]. Bruggeman et al. [49] have reviewed the association of chloroplast activity with the formation of cell death lesions, where excess light energy can trigger cell death. Down regulation of photosynthesis-related genes in 3mlpa may prevent apoptosis by limiting the light energy during late stages of seed development. Cell death lesion formation in mips1 Arabidopsis mutants is dependent on light intensity and ROS accumulation [26,30,43], and abolishing PSII assembly was shown to reduce cell death lesions [50]. Tetrapyrrole biosynthesis is also regulated to prevent accumulation of ROS, which can also lead to cell death [49]. Moreover, we also identified enhanced expression of CAMTA3/SIGNAL RE-SPONSIVE 1 (SR1) TF genes in 3mlpa during late stages of seed development. CAMTA3/SR1 is a negative regulator of SA-mediated plant immunity and it inhibits cell death [51]. Therefore, these observations strongly suggest that the up-regulation of cell death inducing genes in early stages of seed development in 3mlpa mutant is suppressed in late stages. Myo-inositol promotes photosynthesis in Mesembryanthemum crystallinum so another explanation for down-regulation of photosynthesis-related genes in 3mlpa mutant may be the mips1 mutation, which reduces myo-inositol levels [52]. The GO terms identified in this study are more specialized (or, child) terms in the GO hierarchy   Regulation of glucan metabolism in early seed development of 3mlpa mutant Cellular glucan metabolic process (GO:0006073) was found enriched in both stage 1 and 2 (early phase of seed development), however the DEGs associated with this process were stage-dependent (Table 3, the FC and log 2 FC ratios for these genes are reported in Additional file 5). Twenty-seven genes were up-regulated, and two genes (Glyma02g47076 and Glyma04g43470) were downregulated in 3mlpa mutant. Six genes were differentially expressed in both stages and all were up-regulated in 3mlpa mutant. The FC of up-regulated genes ranged between 1.6-60 and 1.3-18 for stages 1 and 2, respectively. In case of down-regulated genes, the FC was between 0.  [53]. Differential expression of the CESA and XET genes in early seed development suggests regulation of cell wall synthesis in 3mlpa mutant line. Role of myo-inositol in cell wall synthesis via the myo-inositol oxidation pathway, resulting in formation precursors of pectin and hemicellulose, is well established and regulation of cell wall biosynthesis in mips mutants is therefore expected [23,54]. The contribution of either the mips or mrp mutations towards this response lies out of scope of this experiment, however, it is possible that 3mlpa mutants initiate formation of cell wall polysaccharides at earlier stages of development relative to the non-mutants.

Regulation of cellular transport in early seed development of 3mlpa mutant
Processes involving oligo-peptide transporters (GO:0006857) and transmembrane (GO:0055085) transporters were enriched in stage 2 (Table 3). Although several transporter genes were differentially expressed in other stages, the GO term associated with the transporter activity was not significantly enriched in those stages. Together, oligo-peptide and transmembrane transport activity were associated with a total of 79 unique differentially expressed genes and 86 % of these genes were found up-regulated in the mutant line. Twenty-nine (44 %) of these up-regulated genes encode for multidrug transporters, including 15 from major facilitator superfamily (MFS), six from multidrug and toxic compound extrusion (MATE) efflux carrier superfamily, five from multidrug resistance superfamily, two genes for P-glycoprotein (PGP) and 1 gene for ATP binding cassette (ABC) subfamily (B4) (Additional file 6).
Most of the multidrug transporters are annotated as being involved in removal of toxic compounds from the cell [55]. Recently, a MFS transporter, Zinc-Induced Facilitator-Like 1 from Arabidopsis was reported to be associated with polar auxin transport in roots, as well as regulation of stomata for drought stress tolerance [56]. The ABC B4 transporters are also involved in auxin-gradient dependent polar auxin transport in root [57], and PGP transporters are involved in cellular and long distance transport of auxin [58]. Defects in mips mutant embryos were previously associated with impaired endomembrane system and lack of polar auxin transport [32]. The genes encoding monosaccharide transporters such as Sugar Transport Protein, Inositol Transporter and Polyol/Monosaccharide transporter were also differentially expressed in our dataset. These genes are involved in transport of sugars such as glucose, fructose, galactose, mannose, xylose, sorbitol, mannitol, xylitol, and epimers and derivatives of myo-inositol [59]. Other genes encoding transporters/carriers of cationic amino acids, oligopeptides, potassium, sulfate, nitrates, zinc, chloride and dicarboxylate ions were also identified in 3mlpa mutant line. Although substrates of MRP-L and MRP-N transporters are still unknown, enrichment of the transporter genes may be compensating for the loss of MRP-L and MRP-N.
Regulation of raffinose family oligosaccharide biosynthesis in developing seeds of 3mlpa mutant The raffinose family oligosaccharides such as raffinose and stachyose are considered to be anti-nutritive components in seeds [60]. The biosynthesis of raffinose family oligosaccharide (RFO) involves three steps: (a) galactinol synthase (GS) catalyzes the formation of galactinol from myo-inositol and UDP-galactose, (b) raffinose synthase (RS) catalyzes the formation of raffinose by adding sucrose to galactinol and releasing a myo-inositol molecule, and (c) stachyose synthase (SS) catalyzes the formation of stachyose by adding galactinol to raffinose and releasing myo-inositol (Fig. 7). The mips1 mutation is associated with low stachyose phenotype in the parental line V99-5089 [15]. However, the 3mlpa mutant shows normal stachyose levels in mature seeds despite having mips1 mutation [41]. The genes encoding RFO biosynthesis pathway enzymes are expressed in seed filling and dessication stages of soybean seed development [61]. None of these genes showed significant differential expression at 1 % FDR, which may be one possible explanation for normal stachyose phenotype in 3mlpa mutant. Figure 7 represents RFOs biosynthesis pathway, and the relative gene expression values for selected genes in the pathway. There are six genes (Glyma03g38080, Glyma03g38910, Glyma10g28610, Glyma19g40680, Glyma19g41550, and Glyma20g22700) annotated as GS in first assembly of reference soybean genome (Glyma.Wm82.a1.0). The Glyma19g41550 gene showed highest-FC in RNA-Seq experiment, so we estimated relative expression for this GS gene using qPCR. The relative transcript levels of GS gene showed a gradual increase in 3mlpa during seed development (Fig. 7a). RNA-Seq data showed similar gradual increase in expression in 3mlpa during seed development, but expression dropped at stage 4 (Fig. 7a). Glyma06g18890 encodes RS (also known as RS2 or Rsm1) is expressed during seed development in Williams 82 soybean line, and is known to control the raffinose and stachyose content in soybean seeds [60,62]. The relative transcript levels of the RS gene showed more than 2-fold change in 3mlpa in late phase (stages 3-5) of seed development, with no difference between the stages 3-5 (Fig. 7b). The fold change of RS remained below 2 in 3mlpa during early stages. This validates the RNA-Seq data except for stage 1, where fold change of RS was close to 3 in 3mlpa (Fig. 7b). Glyma19g40550 is the only SS encoding gene in Glyma.Wm82.a1.0. At stage 1 and 5, SS gene showed significant difference in the fold change obtained from qPCR and RNA-Seq analyses. The relative transcript levels of the SS gene showed almost no change between 3mlpa and 3MWT with fold change below 2 (Fig. 7c). It is possible that mechanisms other than differential expression are involved in regulation of RFO biosynthesis pathway, which does not allow us to elaborate further on these observations.

Conclusions
The PA biosynthesis pathway intermediates are involved in many growth and developmental processes. Several genes encoding key enzymes regulating this pathway are mutated to obtain lpa crops. We used a transcriptomics approach to compare the gene expression profiles of a lpa soybean line (3mlpa) carrying three (mips1, mrp-l, and mrp-n) mutations and a non-mutant line, 3MWT, at 5 stages of soybean seed development. The differential expression and functional enrichment analyses indicated regulation of biological processes such as glucan synthesis, cell death and photosynthesis. We also identified regulated transcription factor families, including WRKY, CAMTA, GRAS and ZIM. These results delineate the metabolic events associated with regulation of the PA biosynthetic pathway in the presence of lpa-causing mutations. We also quantified transcript levels of genes involved in the raffinose family oligosaccharides pathway in 3mlpa mutant. Overall, these results contribute to an understanding of regulation of metabolism in 3mlpa mutant during seed development.

Genetic material and background
The experimental lines of soybean (Glycine max (L.) Merr.) used in this study were: (a) a triple homozygous mutant line, designated as '3mlpa' (mips1/mrp-l/mrp-n) with lpa, and (b) a sibling line with homozygous nonmutant alleles, designated as '3MWT' (MIPS1/MRP-L/ MRP-N) with normal PA (Table 1). These lines were developed from a cross of V99-5089 with CX-1834 [16]. V99-5089 soybean experimental line carries a point mutation in MIPS1 gene (chromosome 11) that results in lpa, low stachyose, and high sucrose phenotype [15]. Similarly, CX-1834 soybean line carries point mutations in two MRP genes, namely, MRP-L and MRP-N, located on chromosomes 19 and 3 respectively. Both mrp mutations are required to obtain the lpa phenotype, and have no effect on seed stachyose and sucrose contents [12,16,63]. The MIPS1 gene does not interact with MRP-L and MRP-N genes. Although the 3mlpa line, carrying mutations in MIPS1 and two MRP genes, shows reduced PA content, the stachyose and sucrose phenotype associated with the mips1 mutation was rescued (Table 1) [41]. The low phytic acid soybean exhibits reduced seed emergence, however, the molecular basis of this relationship is still unknown.

Plant growth, and sampling
For each of the two experimental lines, 48 plants were grown in 12 pots (four plants per pot) containing Metro-Mix® 360 (Sun Gro) soilless media, over-layered with topsoil GardenPro ULTRA LITE . All plants were grown in the same growth chamber unit, with controlled conditions, as follows: 14/10 h (day/night) photoperiod, 24/16°C (day/night), light intensities in the range of 300 and 400 μE and 50-60 % relative humidity. About 41-47 plants from each experimental line were used for sampling developing seeds. Seed length was the criterion for sampling different developmental stages. Pods were randomly selected, opened and, seed length was measured. Five developmental stages were defined by seed length as: (Stage 1) between 2-4 mm; (Stage 2) between 4-6 mm; (Stage 3) between 6-8 mm; (Stage 4) between 8-10 mm; and, (Stage 5) between 10-12 mm (Fig. 1). Three biological replicates for each stage were taken, where each replicate sample was represented by a minimum of 10-15 seeds (stages 1-2), and at least 3 seeds (stages 3-5), collected from different pods on separate plants. Sampled seeds were immediately frozen in liquid nitrogen and stored at −70°C.

RNA extraction, library preparation, and mRNA sequencing
Frozen seeds were ground to a fine powder and total RNAs were extracted using RNeasy Plant Mini Kit, with on column DNase digestion (QIAGEN). RNA concentrations were determined by UV spectrophotometry (260 nm, NanoDrop 1000, Thermo Fischer Scientific). RNA concentrations were normalized to 200 ng/μl and the RNA integrity number (RIN) was measured using a Bioanalyzer (Agilent Technologies). Total RNA samples with RIN values ranging between 9.0-10.0, an indication of high quality, were kept. High quality total RNA samples (50 μl each) were used for library preparation and mRNA sequencing at the Génome Québec Innovation Centre, Canada. A total of 30 cDNA libraries were generated using the TruSeq RNA sample preparation kit (Illumina) and sequenced on five lanes of a HiSeq2000 sequencing system (Illumina), to obtain single-end 100-bp long RNA-seq reads. Six libraries representing three biological replicates of single sampling stage from both 3mlpa and 3MWT were multiplexed together in single lane.

Transcriptomics data processing and analysis
Sequencing data quality control was performed prior to data analysis. Sequencing reads were then mapped/aligned to the well-annotated 'Williams 82' soybean reference genome (assembly Glyma.Wm82.a1.0, annotation v1.1) using the splice-aware mapping tool, TopHat, v2.0.8 [42,64,65]. Sequence mapping data was used to estimate expression values for annotated genes using HTSeq-count 'Union' mode [66]. Differential gene expression analyses were performed using the statistical tool, DESeq, v1.12.1 [67]. Fold-change (FC) was calculated by dividing mean normalized gene expression value in 3mlpa over that in 3MWT. Most significant genes were identified at 1 % false discovery rate (FDR) calculated using P-value adjusted for multiple testing using Benjamini-Hochberg method [52].
Functional enrichment analysis was performed to identify ontology terms and pathways represented by these significant genes using online AgriGO tool [68]. The enriched results were then filtered using 1 % FDR to obtain highly significant enriched terms. R-script for statistical hypergeometric test was used to identify significantly enriched transcription factor families. Most significant GO terms and transcription factor families were identified at 1 % FDR using Benjamini-Hochberg method.

Quantitative real time PCR
RNA-Seq output was validated using quantitative PCR (qPCR). First strand cDNA was synthesized from two μg high quality total RNA (see above) using High Capacity RNA-to-cDNA kit (Applied Biosystems) following manufacturer's instructions, from a total of 30 samples comprising three biological replicates for each time point. Two μl of stock cDNA from each of the 30 samples was diluted ten fold in water. Quantitative real time PCR was performed in 20 μl total reaction volume, constituted of four μl of diluted cDNA, 10 μl of 2× SYBR Green PCR Master Mix (Applied Biosystems), 0.4 μl of 10 mM each gene-specific primers, and 3.2 μl of UltraPure™ distilled water. PCR conditions were: 50°C for 2 min, 95°C for 10 min, followed by 40 cycles at 95°C for 20 s and primer annealing temperature for 1 min, using 7500 Real Time PCR system (Applied Biosystems). Melting curve analyses were performed to test primer specificity. Target gene specific primers were designed according to the soybean reference sequence using Primer3.0 for six randomly selected most significant differentially expressed genes [69,70]. Primer information is provided in Additional file 7. Ubiquitin 10 (UBQ10) gene was used as a reference gene to normalize target gene transcript level amongst all samples [71]. For estimating target and reference gene efficiency, equal volumes of diluted cDNA from all samples were pooled together. Standard curve of Ct values was generated using five-fold serial dilution of this pooled cDNA sample. PCR efficiency was estimated using slope of this standard curve as: E = 10 (−1/ slope) . Relative quantification of target gene transcript was estimated using Pfaffl (or, efficiency correction) method [72]. The fold change gene expression values of stagespecific samples estimated using qPCR and RNA-Seq analyses were first tested for normality and to compare variance using the Shapiro-Wilk test and F-test, respectively. The Student's t and Mann-Whitney-Wilcoxon tests were performed to check the null hypothesis that there is no significant difference in qPCR and RNA-Seq data at the significance level of 0.01 (Additional file 8). Additional file 7: Primers used for quantitative real-time PCR. Gene ID for MIPS1 was not available for the first reference soybean assembly; therefore it is indicated as 'NA'. Published primer sequences were obtained for MIPS1 [18], RS2 [60] and housekeeping gene, UBQ10 [71] (XLSX 45 kb) Additional file 8: Statistical test results for qPCR and RNA-Seq data comparison. The stage-specific data for each gene was tested for normality and sample variances using Shapiro-Wilk's test and F-test, respectively. This was followed by Student's t-test and Mann-Whitney U test to check the null hypothesis that there is no significant difference in the gene expression data estimated using qPCR and RNA-Seq analyses, at the significance level of 0.01. Samples that showed significant difference in the expression between qPCR and RNA-Seq data are indicated by asterisk (*) symbol. The sample for which qPCR and RNA-Seq data showed unequal variance is indicated by hash (#) symbol (XLSX 51 kb)