Flux of transcript patterns during soybean seed development
© Jones et al; licensee BioMed Central Ltd. 2010
Received: 2 October 2009
Accepted: 24 February 2010
Published: 24 February 2010
To understand gene expression networks leading to functional properties of the soybean seed, we have undertaken a detailed examination of soybean seed development during the stages of major accumulation of oils, proteins, and starches, as well as the desiccating and mature stages, using microarrays consisting of up to 27,000 soybean cDNAs. A subset of these genes on a highly-repetitive 70-mer oligonucleotide microarray was also used to support the results.
It was discovered that genes related to cell growth and maintenance processes, as well as energy processes like photosynthesis, decreased in expression levels as the cotyledons approached the mature, dry stage. Genes involved with some storage proteins had their highest expression levels at the stage of highest fresh weight. However, genes encoding many transcription factors and DNA binding proteins showed higher expression levels in the desiccating and dry seeds than in most of the green stages.
Data on 27,000 cDNAs have been obtained over five stages of soybean development, including the stages of major accumulation of agronomically-important products, using two different types of microarrays. Of particular interest are the genes found to peak in expression at the desiccating and dry seed stages, such as those annotated as transcription factors, which may indicate the preparation of pathways that will be needed later in the early stages of imbibition and germination.
In 2000, Girke et al.  identified a number of seed-specific genes in Arabidopsis using microarrays created with 2600 cDNAs derived from seeds. About 260 genes, or 10% of those studied, were found to have at least ten-fold higher expression in the seeds than in the roots or leaves. Most of these seed-specific genes encoded the expected seed storage proteins as well as transcription factors and genes of unknown function. Overall, this study provided the first available expression data on thousands of Arabidopsis genes from both seeds and other tissues. Ruuska et al. (2002)  expanded on this work by studying the expression levels of >3500 seed-specific Arabidopsis genes over five time points. These time points included the stages of major storage reserve accumulation and ended just before seed desiccation. Approximately 1525 of these clones were found to have a significant expression level change during seed development. Results indicated that genes in the same metabolic pathway could show different expression patterns, suggesting they were regulated by different factors. This differential regulation might be coordinated with shifts from starch to oil and protein accumulation, and the contrasting expression patterns of very similar genes could indicate the movement of carbon from one part of the cell to another during the synthesis of metabolites like fatty acids.
More recently, Liu et al. (2008)  performed a comprehensive study of maize kernel development from early embryogenesis through storage product accumulation and desiccation, using arrays containing more than 30,000 unique maize genes. More than 10% of the genes were found to be significantly differentially expressed (p < 0.01) in at least one stage studied, with the highest number of differentially expressed genes occurring during the phase of beginning deposition of storage materials. Most of the 3400 significant genes were up-regulated (compared to the consecutive phase) during the middle three phases, but most of these genes were down-regulated during the first phase (cell division) and the last phase (desiccation). Additionally, genes such as LEA proteins, seed maturation proteins, and those related to ethylene signaling, including some ethylene-related transcription factors, were found to increase in expression at the last two mature and desiccating stages, compared to their expression at the youngest stage.
In this study, data obtained about gene expression changes in soybean seeds during mid-maturation to desiccation were evaluated at five different time points using soybean cotyledons. The expression changes in selected genes of interest were also tested using a second microarray format in which genes were spotted many times on the same slide, to provide additional replicates and thus statistical power compared to other methods such as RNA blotting for verification of gene expression. Studying the broad patterns of gene expression change in these later stages of seed development can yield important insights into the processes of seed filling, desiccation, and preparation for quiescence and germination.
Results & Discussion
Data collection and p-value analysis
Two different array formats were used for different aims. Array Format 1 provided a global view of gene expression trends during seed development as it consists of a low redundancy set of 27,609 soybean cDNAs from a variety of soybean tissues. See  for more details of the unigene selection and of the microarray construction. Both the 5' and the 3' ends of the cDNAs were annotated using the top BLAST hit (e-value ≤ 10-6). Format 2 consists of 192 oligos designed from the cDNAs of Array Format 1 and spotted forty times each on a single slide in order to validate expression for the selected cDNAs with a high number of within-slide replicates. Both the cDNAs of Array Format 1 and the oligos of Array Format 2 will likely detect mRNAs of all family members with 85% similarity and thus are not likely to distinguish all family members or paralogous genes .
For Array Format 1, total RNA was extracted from soybean cotyledons taken from seeds in the following fresh weight ranges: 25-50 mg, 75-100 mg, 400-500 mg, and 200-300 mg with yellow-colored tissue, as shown in Figure 1. Total RNA was also extracted from whole dry soybean seeds at a weight of 100-200 mg. Each of these five stages was compared to total RNA extracted from soybean cotyledons taken from seeds in the 100-200 mg fresh weight range, which was considered the reference tissue. For Array Format 2, only three of the stages were used (25-50 mg, 400-500 mg, dry seed) but were compared to the same reference tissue (100-200 mg cotyledon).
The program GeneSpring (Silicon Genetics, Redwood City, CA) was used to analyze the data from all experiments. Using Array Format 1 with five stages of soybean cotyledon development compared to the same reference tissue, 2227 genes were found to have p-values ≤ 0.05 in at least three of the five stages of development. Thus their expression levels in the experimental tissue were significantly different from their expression levels in the reference tissue in at least three of the five stages.
Percentage of genes in each of the ten functional categories in five PVSets
Cell Growth & Maintenance
Tubulin, auxin-regulated, histone
Chlorophyll binding, RuBisCO
Hypothetical/unknown function in databases
Transposons, cell death, pollen-related
Metallothionein, cysteine protease, peroxidase
Stress, Defense, Shock-related
Chitinase, drought resistance, stress-induced
Cytochrome P450, protein kinases, calmodulin
Lipoxygenase, seed maturation protein, trypsin inhibitor
DNA-binding, transcription factors, zinc finger proteins
Transporters and Membrane Proteins
Sugar/amino acid transporters, membrane intrinsic proteins
Here we focus on a selection of the data of particular biological interest, involving genes which peak in expression at specific stages of development and on genes in specific functional categories such as cell growth and maintenance, energy, and storage proteins. We also concentrate on some transcriptional factors whose expression increases during the latter stages of seed maturation.
Genes related to cell growth and maintenance, and signaling
Many of the cDNAs found in a single cluster have similar annotations, and the expression patterns of many related cDNAs are consistent with known biological processes in seed development. For example, PVSet6 (peak at the 25-50 mg stage) contains an unusually large number of cDNAs with annotations related to tubulins (both alpha and beta), histones, and chaperones (about 37% of the total Cell Growth and Maintenance genes). Additionally, this same cluster contains cDNAs with annotations involved in fatty acid synthesis such as enoyl-ACP reductase and 3-ketoacyl-ACP reductase; and those related to cell walls and the cytoskeleton, such as cinnamyl-alcohol dehydrogenase and beta scruin. This set also has a high percentage of genes in the Signaling category, with annotations including products such as annexin, cytochrome P450, nucleoside diphosphate kinase, and protein phosphatase. The cDNAs in this set are most highly expressed at the youngest stage of development studied (25-50 mg) and likely indicate the activity of processes necessary for the young seed to create and expand its cells as it grows. Similar results have been seen, for example, in the work by Gallardo et al. (2007)  on developing Medicago seeds, where it was found that both gene expression and protein abundance for cytoskeleton-related products such as actin and tubulin decreased from the stage of early seed fill until maturation and desiccation.
PVSet6, as mentioned, contains a large number of genes related to tubulin. Of the five sets in Table 1, only one other gene annotated as tubulin was found, in PVSet2 (peak at the 400-500 mg stage). PVSet2 also contains two genes annotated as expansin (involved in cell wall extension) as well as one gene each annotated as polygalacturonase and xyloglucan endo-transglycosylase, whose products break down cell wall components. No tubulin-annotated genes were found in PVSets 4 or 11, in which expression levels peak at the yellow and dry stages. Interestingly, a few other cell wall-related genes were found to be highly expressed at the final, dry seed stage, including a cellulose synthase, a cinnamyl-alcohol dehydrogenase (involved in lignin synthesis), and a pectinacetylesterase, which is involved in the breakdown of pectin in cell walls. These changes in which cell wall-related genes are highly expressed at different stages of development reflect the complex manner in which the cell wall must adapt to the development of the seed--growing, filling with storage products, then desiccating for dormancy--by synthesizing and degrading different components of the cytoskeleton.
ADR genes change dramatically during development
The most common annotation among the Cell Growth and Maintenance genes in PVSet2 (peak at the 400-500 mg stage) is ADR12, an auxin down-regulated gene of unknown function. About 35% of the Cell Growth and Maintenance genes in this set are annotated as ADR12, with another 8% annotated as the related gene ADR6. Another family member, ADR11, is found repeatedly in PVSet6 (peak at the 25-50 mg stage). However, no ADR genes are found in either PVSet4 or PVSet11, which peak at the final two stages of development. These ADR genes were first described in 1980 by Baulcombe and Key  as having reduced RNA concentration following auxin treatment of soybean hypocotyls. Datta et al. (1993)  found that they display tissue-specific expression in soybean under endogenous auxin conditions, and that their decrease in expression due to increased auxin is also tissue-specific. This same study also found that these genes are differentially expressed in soybean tissues in response to light, with some genes being induced and others repressed by light in a tissue-specific manner. Additionally, Thibaud-Nissen et al. (2003)  found that ADR12 increases in expression in soybean somatic embryos as they develop on auxin-containing media, while multiple ADR genes were found to be over-expressed during various stages of post-germination soybean cotyledon development . ADR6 predicts a protein of approximately 272 amino acids, while ADR11's protein is predicted to contain about 151 amino acids and ADR12's only about 41 amino acids . Investigation of the expression patterns and function of the ADR gene family is ongoing to determine what role it might play in cell growth and development in seeds.
Other genes expressed at stage of highest fresh weight
PVSet1 has an expression profile very similar to that of PVSet2, as both contain genes that peak in expression at the 400-500 mg stage. The genes in PVSet1, however, have a lower peak of expression at that stage of highest fresh weight. Interestingly, despite the similarity in the expression profiles, there are a number of differences in the types of genes found in the two sets. For example, PVSet1 contains no ADR genes, which are abundant in PVSet2, and does not have as many genes related to cell wall functions. PVSet1 also has several genes annotated as alcohol dehydrogenase, while PVSet2 has none, and there are more genes related to protein degradation (such as protease regulatory subunits and F-box proteins) in PVSet1. PVSet1 additionally has a much higher percentage of genes in the Transcription category (13.5%) than PVSet2 does (only 5.8%). Perhaps the most distinct difference between these two similarly-shaped sets, however, is found in the Seed Proteins category. PVSet2 has no genes in this category; but PVSet1 has 16.5% of its genes classified here. Almost all of these genes in PVSet1 are annotated as lipoxygenase, which is involved in the storage of nitrogen and the oxidation of polyunsaturated fatty acids in seeds [15, 16]. According to Wilson (1987) , this enzyme increases in activity until ten days before maturation. Lipoxygenase may also accumulate in the seeds for later use in reactions during early shoot growth .
Energy genes have higher expression in early development
The percentage of genes with annotations in the Energy category is fairly small in four of the five sets categorized, ranging from less than 2% in PVSets 2 and 4 to about 6.5% in PVSet11. However, in PVSet6 (peak at the 25-50 mg stage), Energy genes account for about 28% of the total genes. This category includes genes with annotations related to chlorophyll binding and the photosystems. Lee et al. (2002)  found that a number of genes encoding enzymes related to glycolysis in maize kernels and embryos have an expression profile that decreases steadily from a peak during the early stage of development. Similarly, genes associated with glycolytic enzymes such as sucrose synthase, triosephosphate isomerase, and enolase are found in PVSet6, which has a similar pattern. The energy-generating functions associated with these genes are most necessary when the seeds are young, green, and actively photosynthesizing and growing, and much less needed when the tissue has begun to yellow and desiccate.
Genes over-expressed at final stages of development
In contrast to the expected patterns of expression, a considerable number of cDNAs (such as those in PVSets 4, 5, 8, and 11) are strongly expressed in the final two stages of development, when the tissue is yellow and desiccating and turning into the hard, dry seed. Many of these genes in PVSets 4 and 11, unsurprisingly, are related to protein degradation, such as ubiquitin-conjugating enzymes, proteases, and proteasome regulatory subunits. The products of these genes are useful for breaking down proteins no longer needed as the seed prepares for quiescence. However, genes related to a number of other cellular processes are found here, too, including those whose products are involved in amino acid metabolism (S-adenosylmethionine synthetase, diaminopimelate epimerase, betaine aldehyde dehydrogenase) and fatty acid synthesis (omega-3 fatty acid desaturase), and genes whose products are related to cell walls (cellulose synthase, pectinacetylesterase) and cell division (kinesin, CDC48). Additionally, genes with expression patterns that increase at the yellow and dry seed stages, compared to the reference, include those involved in flavonoid synthesis, such as chalcone synthase and 4-coumarate-CoA ligase. Three genes annotated as chalcone synthase, including CHS7, are found in PVSet4, with one CHS gene in PVSet5 and another in PVSet8--all sets with expression patterns that increase in either the yellow seed or dry, hard seed stage. The increase in expression of isoflavonoid synthesis-related genes, especially CHS7 and CHS8, at later stages of soybean embryo development was also seen by Dhaubhadel et al. (2007) . Translation factors, chaperones, and other products associated with protein-protein interactions are also found among these genes' products, which could assist in creating properly-folded proteins during seed desiccation. The mRNAs for these factors or the proteins they encode may be produced late in seed development and then stored in the seed for use during the early stages of imbibition and germination.
PVSet11 also has a high percentage of genes (13%) in the Transcription category. These transcription factors (bHLH, ethylene response factor, auxin response factor), zinc finger proteins, ribonucleoproteins, etc., could also be related to the process of preparing transcripts in anticipation of germination. Interestingly, in their comparison of transcriptome and proteome data for developing Medicago truncatula seeds, Gallardo et al. (2007)  found a significant increase in the number of up-regulated transcripts, particularly those with annotations involved in transcription and RNA processing, at the mature, desiccating stage of seed development--but without a corresponding increase in the abundance of up-regulated proteins. They therefore speculate that the up-regulated transcripts "contribute to the stored mRNA pool used for protein synthesis during germination," a process also discussed in .
Confirmation of expression of selected genes using an oligo array
Expression data for selected transcription factors in both array formats.
Array Format 1
Array Format 2
Figure 3 a
Array Format 2 ID
Transcription factor mRNAs are expressed late in development
SB0002 is annotated as a Tub family member in Oryza sativa. The Tub or tubby domain was characterized in mice as involved in controlling obesity  and is now found in a wide variety of eukaryotes, including humans, other animals, and plants . There are a number of genes in the Tub family in various plant species, with fourteen Tubby-like (TULP) genes identified in rice  and eleven in Arabidopsis. The specific functions of different TULP family members have yet to be determined in most cases; however, they frequently contain an F-box domain, suggesting they are ultimately involved in the ubiquitination of proteins selected for degradation in a wide variety of biological processes . This is consistent in the current study with the large number of protein degradation-related genes that were found in sets containing genes that peaked at the dry seed stage. Cai et al. (2008)  found a Tubby-like gene in rice was involved in regulating a disease response gene. In addition to--or as part of--this role in transcriptional regulation, some TULP genes in both Arabidopsis and rice may be involved in signaling through abscisic acid and gibberellin pathways [23, 25]. The over-expression of this gene in the dry seed stage compared to earlier green stages could be indicative of the protein degradation occurring as the seed desiccates and becomes quiescent.
SB0001 is annotated as a CCAAT-binding transcription factor, subunit A (CBF-A), in Oryza sativa. CBF-A is also known as Heme Activator Protein 3 (HAP3) and Nuclear Factor Y-B (NF-YB). Its protein constitutes one-third of the HAP complex which binds to the CCAAT-box element in the promoter of a gene; this element is very common in the promoters of genes from animals, fungi, and plants . Animals and yeast have only one gene for each of the three subunits, but many genes for each subunit are found in plants. For example, there are ten genes for HAP3 in Arabidopsis and eleven in both rice  and wheat . LEAFY COTYLEDON1 (LEC1) is a well-studied HAP3 gene in Arabidopsis that has been shown to be involved in embryogenesis . However, it is widely believed that the relatively large number of genes for the different HAP subunits in plants evolved to regulate transcription of genes in a variety of biological processes . The function of just the HAP3-encoding genes has been linked to processes such as chloroplast formation in rice , improved yield in corn under drought stress , and flowering time in Arabidopsis. Kwong et al. (2003)  and Yang et al. (2005)  divided HAP3 genes into two classes based on their similarity to LEC1, with the gene represented by SB0001 (EST accession # AI442376.1) falling into the "Non-LEC1-type" grouping, meaning it is likely to be involved in a process other than embryogenesis, for example protein degradation or desiccation tolerance.
SB0047 is annotated as AG-motif binding protein 4 (AGP4) in tobacco, which was first identified by Sugimoto et al. (2003)  during their investigation of AGP1 as a transcriptional regulator of a wound-inducible Myb transcription factor. They revealed the AGP family as GATA-type zinc finger proteins, transcription factors found in animals, fungi, and plants . Members of this particular class of zinc finger proteins has been found to be involved in regulating a wide variety of genes in plants, including those responsive to light and circadian rhythms . Other research into these GATA-type zinc finger proteins has shown them to affect nitrogen and sugar metabolism , cell elongation , and flower and shoot apical meristem development . Given that Liu et al. (2005)  indicated that members of this family of transcription factors was involved in seed germination, it is possible the product of this gene is being accumulated in the seed during desiccation for later use during imbibition and germination.
SB0085 is annotated as SCARECROW-LIKE 3 (SCL3) in Arabidopsis thaliana. This gene was first identified by Pysh et al. (1999)  as part of a family of transcription factors known as GRAS, after GIBBERELLIC ACID INSENSITIVE (GAI), REPRESSOR OF GA1 (RGA), and SCARECROW (SCR). This family of genes has been identified in a wide variety of plants, including Arabidopsis, rice, maize, pea, oat, alfalfa, tomato, watermelon, and Brassica napus and is believed to be plant-specific [42–45]. GRAS genes in general have been found to be involved in light and gibberellic acid signaling as well as the formation of the axillary shoot and root meristems [43, 44]. The SCARECROW-like genes have been primarily studied for their role in root development, including cell division, cell differentiation, and root tip regeneration . Additionally they have been identified as targets of a root-knot nematode peptide that stimulates root growth and also as targets of miRNAs in Arabidopsis[45, 46]. SCARECROW itself has been found to be expressed in multiple tissues during embryo development in Arabidopsis and maize, particularly in the region where the root meristem is formed [47, 48]. Possibly, the product of this gene may accumulate in the seed during desiccation for use during germination, perhaps during early processes in root or shoot development.
Our analysis points to interesting transcription factors expressed late in development at a stage not previously surveyed in soybean. A comprehensive study of earlier stages of soybean seed development, including laser capture microdissection of various tissues from globular, heart, and cotyledon-stage soybeans, is discussed in Le et al. (2007) .
Storage proteins are under-expressed late in development
Arrays spotted with 27,609 cDNAs from soybean were used to obtain data on the gene expression changes over five stages of soybean cotyledon development, as compared to a reference stage. These stages include those when the seed is accumulating water and nutrients; the stage of highest fresh weight; a yellow, desiccating stage; and a dry, hard seed stage. A variety of expression patterns were found among the significant genes over these stages, including many whose expression peaked (compared to the reference) during the desiccating and dry seed stages. Many of these expression patterns and ratios were supported by additional experiments involving a second, highly-repetitive microarray format.
Genes with annotations related to cell wall development, protein folding, and energy production were commonly found to have expression profiles peaking in expression (compared to the reference) at the youngest stage studied, as would be expected with green, rapidly developing seeds. At the stage of highest fresh weight, before the seed begins to desiccate, genes with annotations in the seed proteins category were commonly found to peak in expression. A number of genes annotated as auxin down-regulated were also found to peak in expression at this stage. Surprisingly, many genes were found to peak in expression at the desiccating and dry stages of development, with annotations related to protein degradation, transcription factors, and other processes. The products of these genes may be used immediately by the seed to prepare for quiescence or may be accumulated for later use during imbibition and germination.
Immature soybean seeds (Glycine max cv. Williams) were harvested from greenhouse-grown plants, sorted by the fresh weight ranges as shown in Figure 1, dissected to separate the seed coat from the cotyledon, then lyophilized. Dry seeds were harvested at maturity and stored at room temperature. Total RNA was extracted from immature cotyledons and mature dry seeds using phenol:chloroform and a lithium chloride precipitation . Soybean is highly inbred, but in order to minimize biological variation, RNA was extracted from approximately 10 to 30 seeds (depending on the stage) from multiple plants.
Construction of microarrays
Details of construction and use of Array Format 1 have been reported [9, 13]. Briefly, ESTs from libraries representing a variety of soybean tissues were contigged to identify unigenes, then clones representative of about 27,609 unigenes were re-racked to build three new libraries. The 3' ends of the unigenes were sequenced. Purified PCR products of the three libraries were single-spotted on amine slides (TeleChem International, Sunnyvale, CA) using a Cartesian PixSys 8200 arrayer (Cartesian, Irvine, CA). The set of 27,609 soybean cDNAs for Array Format 1 also includes 64 choice clones that were each printed 24 times (eight times from each of three libraries).
To construct Array Format 2, 192 70-mer oligos based on cDNAs from Array Format 1 were designed and synthesized (Illumina/Invitrogen, Inc., San Diego, CA). These oligos were designed where possible to represent the 3' end of the corresponding cDNA due to the higher sequence variability within this region. The oligos were designed to represent a cluster of EST sequences and were designed from a single EST representative, not from a consensus sequence. The 192 sequences are part of a larger set of 38,400 oligos that represent a soybean unigene collection . However, since soybean is an ancient autotetraploid, many oligos will hit 2 to 3 members of highly related gene families or paralogous sequences. Each of the 192 oligos was then printed forty times on each amine slide (Corning GAPS II slides, Acton, MA) using a Genetix QArray2 robot (Hampshire, UK).
Hybridization reactions and replicates
For both array formats, the RNA was hybridized to the microarray slides using a direct-label two-color dual hybridization procedure using Cy3-dUTP or Cy5-dUTP [13, 52]. Approximately 80 μg total RNA was used with Array Format 1 and 40 μg with Array Format 2. The slides were scanned using a ScanArray Express (Perkin Elmer Life Sciences, Boston, MA) for Array Format 1 or a GenePix 4000B (Molecular Devices Corp., Sunnyvale, CA) for Array Format 2. The spots were found and their fluorescence intensity levels quantitated using ScanArray Express or GenePix Pro 6.0 software, respectively.
For Array Format 1, four slide replicates were made from each of the first three green stages (25-50 mg, 75-100 mg, 400-500 mg) including two dye swaps to mitigate any dye bias. The amount of material available from the later stage of the desiccating, yellow seed at 200-300 mg weight range was limiting at the time and the RNA yields are lower from the older seed, so for the final two stages only two slide replicates from each stage were made, also incorporating a dye swap. In all cases, the 100-200 mg fresh weight range served as the reference RNA in the two-color hybridization reactions. For Array Format 2, two slide hybridization replicates were made for each of the three stages (25-50 mg, 400-500 mg, and dry seed) and again compared to the 100-200 mg reference, including a dye swap. Independent biological samples were used for Array Format 2 compared to Array Format 1.
Because the average size of inserts on the Array Format 1 cDNA arrays is 1.3 kb , mRNAs of all family members with 85% similarity will likely be detected. Likewise, the 70-mer oligos of Array Format 2 also will hybridize to mRNAs with regions of sequence similarity of 85% over 20 nucleotides or more. Thus, they are not likely to distinguish all family members or paralogous genes since soybean is an ancient autotetraploid. For example, BLAST results of the 70-mer oligos representing the 19 transcription factors shown in Table 2 to the recently completed soybean genome sequence  showed that the majority hit only 2-3 genomic locations.
The p-values were calculated by GeneSpring using a one-sample, two-tailed t-test with the hypothetical mean set to 1. Due to multiple functions listed in the annotations, or to different functions attributed to the 5' end vs. the 3' end, some clone IDs were placed in two or more categories. These clone IDs were not considered when calculating the percentage of cDNAs in each functional category for each cluster. The data from Array Format 2 were normalized by GeneSpring GX using a Lowess normalization procedure, with each spot counted individually. Data from both replicate slides were averaged together for each spot. The ratios of the forty spots representing the same gene were then averaged together using GraphPad , meaning each gene is represented by a total of eighty replicate measurements. GraphPad was also used to calculate p-values and standard error for Array Format 2 data.
For Array Format 1, the k-means algorithm in GeneSpring was applied to randomly separate the genes into the number of clusters defined by the user (in this case, eleven). The centroid of each cluster was calculated by averaging the coordinates attached to each gene. Each gene was then reassigned to the centroid to which it was closest and the coordinates of the centroids were recalculated. This operation was performed numerous times until the data converged, resulting in the clusters shown in Figure 2. Array Format 1 data have been deposited in the NCBI Gene Expression Omnibus (GEO) database as accession number GSE18620.
We thank the many undergraduate students who have assisted with this project. Research supported by grants from the Illinois Soybean Association, USDA, United Soybean Board, Illinois Council on Food and Agricultural Research.
- Ritchie SW, Hanway JJ, Thompson HE, Benson GO: How a soybean plant develops. Special Report No. 53. 1996, Ames IA: Iowa State University of Science and Technology Cooperative Extension Service
- Bewley JD, Hempel FD, McCormick S, Zambryski P: Reproductive development. Biochemistry and Molecular Biology of Plants. Edited by: Buchanan BB, Gruissem W, Jones RL. 2000, Rockville MD: American Society of Plant Physiologists, 988-1043.
- Hills MJ: Control of storage-product synthesis in seeds. Curr Opin Plant Biol. 2004, 7: 302-308. 10.1016/j.pbi.2004.03.003.PubMedView Article
- Rosenberg LA, Rinne RW: Moisture loss as a prerequisite for seedling growth in soybean seeds. J Exp Bot. 1986, 37: 1663-1674. 10.1093/jxb/37.11.1663.View Article
- Carlson JB, Lersten NR: Reproductive morphology. Soybeans: Improvement, Production, and Uses. Edited by: Boerma HR, Specht JE. 2004, Madison WI: American Society of Agronomy, 59-95.
- Girke T, Todd J, Ruuska S, White J, Benning C, Ohlrogge J: Microarray analysis of developing Arabidopsis seeds. Plant Physiol. 2000, 124: 1570-1581. 10.1104/pp.124.4.1570.PubMed CentralPubMedView Article
- Ruuska SA, Girke T, Benning C, Ohlrogge JB: Contrapuntal networks of gene expression during Arabidopsis seed filling. Plant Cell. 2002, 14: 1191-1206. 10.1105/tpc.000877.PubMed CentralPubMedView Article
- Liu X, Fu J, Gu D, Liu W, Liu T, Peng Y, Wang J, Wang G: Genome-wide analysis of gene expression profiles during the kernel development of maize (Zea mays L.). Genomics. 2008, 91: 378-87. 10.1016/j.ygeno.2007.12.002.PubMedView Article
- Vodkin LO, Khanna A, Shealy R, Clough SJ, Gonzalez DO, Philip R, Zabala G, Thibuad-Nissen F, Sidarous M, Strömvik MV, Shoop E, Schmidt C, Retzel E, Erpelding J, Shoemaker RC, Rodriguez-Huete AM, Polacco JC, Coryell V, Keim P, Gong G, Liu L, Pardinas J, Schweitzer P: Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant. BMC Genomics. 2004, 5: 73-10.1186/1471-2164-5-73.PubMed CentralPubMedView Article
- Gallardo K, Firnhaber C, Zuber H, Héricher D, Belghazi M, Henry C, Küster H, Thompson R: A combined proteome and transcriptome analysis of developing Medicago truncatula seeds: evidence for metabolic specialization of maternal and filial tissues. Mol Cell Proteomics. 2007, 6: 2165-2179. 10.1074/mcp.M700171-MCP200.PubMedView Article
- Baulcombe DC, Key JL: Polyadenylated RNA sequences which are reduced in concentration following auxin treatment of soybean hypocotyls. J Biol Chem. 1980, 255: 8907-8913.
- Datta N, LaFayette PR, Kroner PA, Nagao RT, Key JL: Isolation and characterization of three families of auxin down-regulated cDNA clones. Plant Mol Biol. 1993, 21: 859-69. 10.1007/BF00027117.PubMedView Article
- Thibaud-Nissen F, Shealy RT, Khanna A, Vodkin LO: Clustering of microarray data reveals transcript patterns associated with somatic embryogenesis in soybean. Plant Physiol. 2003, 132: 118-136. 10.1104/pp.103.019968.PubMed CentralPubMedView Article
- Gonzalez DO, Vodkin LO: Specific elements of the glyoxylate pathway play a significant role in the functional transition of the soybean cotyledon during seedling development. BMC Genomics. 2007, 8: 468-10.1186/1471-2164-8-468.PubMed CentralPubMedView Article
- Stephenson LC, Bunker TW, Dubbs WE, Grimes HD: Specific soybean lipoxygenases localize to discrete subcellular compartments and their mRNAs are differentially regulated by source-sink status. Plant Physiol. 1998, 116: 923-33. 10.1104/pp.116.3.923.PubMed CentralPubMedView Article
- Skrzypczak-Jankun E, Borbulevych OY, Jankun J: Soybean lipoxygenase-3 in complex with 4-nitrocatechol. Acta Crystallogr Sect D Biol Crystallogr. 2004, 60: 613-15. 10.1107/S0907444904000861.View Article
- Wilson RF: Seed metabolism. Soybeans: Improvement, Production, and Uses. Edited by: Wilcox JR. 1987, Madison WI: American Society of Agronomy, 643-686.
- Islas-Flores I, Corrales-Villamar S, Bearer E, Raya JC, Villanueva MA: Isolation of lipoxygenase isoforms from Glycine max embryo axes based on apparent cross-reactivity with anti-myosin antibodies. Biochim Biophys Acta. 2002, 1571: 64-70.PubMedView Article
- Lee JM, Williams ME, Tingey SV, Rafalski JA: DNA array profiling of gene expression changes during maize embryo development. Funct Integr Genomics. 2002, 2: 13-27. 10.1007/s10142-002-0046-6.PubMedView Article
- Dhaubhadel S, Gijzen M, Moy P, Farhangkhoee M: Transcriptome analysis reveals a critical role of CHS7 and CHS8 genes for isoflavonoid synthesis in soybean seeds. Plant Physiol. 2007, 143: 326-338. 10.1104/pp.106.086306.PubMed CentralPubMedView Article
- Rajjou L, Gallardo K, Debeaujon I, Vandekerckhove J, Job C, Job D: The effect of α-amanitin on the Arabidopsis seed proteome highlights the distinct roles of stored and neosynthesized mRNAs during germination. Plant Physiol. 2004, 134: 1598-1613. 10.1104/pp.103.036293.PubMed CentralPubMedView Article
- Kleyn PW, Fan W, Kovats SG, Lee JJ, Pulido JC, Wu Y, Berkemeier LR, Misumi DJ, Holmgren L, Charlat O, Woolf EA, Tayber O, Brody T, Shu P, Hawkins F, Kennedy B, Baldini L, Ebeling C, Alperin GD, Deeds J, Lakey ND, Culpepper J, Chen H, Glücksmann-Kuis MA, Carlson GA, Duyk GM, Moore KJ: Identification and characterization of the mouse obesity gene tubby: a member of a novel gene family. Cell. 1996, 85: 281-290. 10.1016/S0092-8674(00)81104-6.PubMedView Article
- Liu Q: Identification of rice TUBBY-like genes and their evolution. FEBS J. 2008, 275: 163-171. 10.1111/j.1742-4658.2008.06469.x.PubMedView Article
- Jain M, Nijhawan A, Arora R, Agarwal P, Ray S, Sharma P, Kapoor S, Tyagi A, Khurana J: F-box proteins in rice: genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development, and regulation by light and abiotic stress. Plant Physiol. 2007, 143: 1467-1483. 10.1104/pp.106.091900.PubMed CentralPubMedView Article
- Lai CP, Lee CL, Chen PH, Wu SH, Yang CC, Shaw JF: Molecular analyses of the Arabidopsis TUBBY-like protein gene family. Plant Physiol. 2004, 134: 1586-1597. 10.1104/pp.103.037820.PubMed CentralPubMedView Article
- Cai M, Qiu D, Yuan T, Ding X, Li H, Duan L, Xu C, Li X, Wang S: Identification of novel pathogen-responsive cis-elements and their binding proteins in the promoter of OsWRKY13, a gene regulating rice disease resistance. Plant Cell Environ. 2008, 31: 86-96. 10.1111/j.1365-3040.2008.01773.x.PubMedView Article
- Thirumurugan T, Ito Y, Kubo T, Serizawa A, Kurata N: Identification, characterization and interaction of HAP family genes in rice. MGG Mol Genet Genomics. 2008, 279: 279-289. 10.1007/s00438-007-0312-3.View Article
- Gusmaroli G, Tonelli C, Mantovani R: Regulation of novel members of the Arabidopsis thaliana CCAAT-binding nuclear factor Y subunit. Gene. 2002, 283: 41-48. 10.1016/S0378-1119(01)00833-2.PubMedView Article
- Stephenson TJ, McIntyre CL, Collet C, Xue G-P: Genome-wide identification and expression analysis of the NF-Y family of transcription factors in Triticum aestivum. Plant Mol Biol. 2007, 65: 77-92. 10.1007/s11103-007-9200-9.PubMedView Article
- Lotan T, Ohto MA, Yee KM, West MA, Lo R, Kwong RW, Yamagishi K, Fischer RL, Goldberg RB, Harada JJ: Arabidopsis LEAFY COTYLEDON1 is sufficient to induce embryo development in vegetative cells. Cell. 1998, 93: 1195-1205. 10.1016/S0092-8674(00)81463-4.PubMedView Article
- Yang J, Xie Z, Glover BJ: Asymmetric evolution of duplicate genes encoding the CCAAT-binding factor NF-Y in plant genomes. New Phytol. 2005, 165: 623-632. 10.1111/j.1469-8137.2004.01260.x.PubMedView Article
- Miyoshi K, Ito Y, Serizawa A, Kurata N: OsHAP3 genes regulate chloroplast biogenesis in rice. Plant J. 2003, 36: 532-540. 10.1046/j.1365-313X.2003.01897.x.PubMedView Article
- Nelson DE, Repetti PP, Adams TR, Creelman RA, Wu J, Warner DC, Anstrom DC, Bensen RJ, Castiglioni PP, Donnarummo MG, Hinchey BS, Kumimoto RW, Maszle DR, Canales RD, Krolikowski KA, Dotson SB, Gutterson N, Ratcliffe OJ, Heard JE: Plant nuclear factor Y (NF-Y) B subunits confer drought tolerance and lead to improved corn yields on water-limited acres. Proc Natl Acad Sci USA. 2007, 104: 16450-16455. 10.1073/pnas.0707193104.PubMed CentralPubMedView Article
- Chen NZ, Zhang XQ, Wei PC, Chen QJ, Ren F, Chen J, Wang XC: AtHAP3b plays a crucial role in the regulation of flowering time in Arabidopsis during osmotic stress. J Biochem Mol Biol. 2007, 40: 1083-1089.PubMedView Article
- Kwong RW, Bui AQ, Lee H, Kwong LW, Fischer RL, Goldberg RB, Harada JJ: LEAFY COTYLEDON1-LIKE defines a class of regulators essential for embryo development. Plant Cell. 2003, 15: 5-18. 10.1105/tpc.006973.PubMed CentralPubMedView Article
- Sugimoto K, Takeda S, Hirochika H: Transcriptional activation mediated by binding of a plant GATA-type zinc finger protein AGP1 to the AG-motif (AGATCCAA) of the wound-inducible Myb gen NtMyb2. Plant J. 2003, 36: 550-64. 10.1046/j.1365-313X.2003.01899.x.PubMedView Article
- Teakle GR, Manfield IW, Graham JF, Gilmartin PM: Arabidopsis thaliana GATA factors: organization, expression and DNA-binding characteristics. Plant Mol Biol. 2002, 50: 43-57. 10.1023/A:1016062325584.PubMedView Article
- Bi Y-M, Zhang Y, Signorelli T, Zhao R, Zhu T, Rothstein S: Genetic analysis of Arabidopsis GATA transcription factor gene family reveals a nitrate-inducible member important for chlorophyll synthesis and glucose sensitivity. Plant J. 2005, 44: 680-92. 10.1111/j.1365-313X.2005.02568.x.PubMedView Article
- Shikata M, Matsuda Y, Ando K, Nishii A, Takemura M, Yokota A, Kohchi T: Characterization of ZIM, a member of a novel plant-specific GATA factor gene family. J Exp Bot. 2004, 55: 631-39. 10.1093/jxb/erh078.PubMedView Article
- Zhao Y, Medrano L, Ohashi K, Fletcher JC, Yu H, Sakai H, Meyerowitz EM: HANABA TARANU is a GATA transcription factor that regulates shoot apical meristem and flower development in Arabidopsis. Plant Cell. 2004, 16: 2586-2600. 10.1105/tpc.104.024869.PubMed CentralPubMedView Article
- Liu P-P, Koizuka N, Martin RC, Nonogaki H: The BME3 (Blue Micropylar End 3) GATA zinc finger transcription factor is a positive regulator of Arabidopsis seed germination. Plant J. 2005, 44: 960-71. 10.1111/j.1365-313X.2005.02588.x.PubMedView Article
- Pysh LD, Wysocka-Diller JW, Camilleri C, Bouchez D, Benfey PN: The GRAS gene family in Arabidopsis: sequence characterization and basic expression analysis of the SCARECROW-LIKE genes. Plant J. 1999, 18: 111-19. 10.1046/j.1365-313X.1999.00431.x.PubMedView Article
- Lee M-H, Kim B, Song S-K, Heo J-O, Yu N-I, Lee SA, Kim M, Kim DG, Sohn SO, Lim CE, Chang KS, Lee MM, Lim J: Large-scale analysis of the GRAS gene family in Arabidopsis thaliana. Plant Mol Biol. 2008, 67: 659-70. 10.1007/s11103-008-9345-1.PubMedView Article
- Sánchez C, Vielba JM, Ferro E, Covelo G, Solé A, Abarca D, De Mier BS, Díaz-Sala C: Two SCARECROW-LIKE genes are induced in response to exogenous auxin in rooting-competent cuttings of distantly related forest species. Tree Physiol. 2007, 27: 1459-70.PubMedView Article
- Huang G, Dong R, Allen R, Davis EL, Baum TJ, Hussey RS: A root-knot nematode secretory peptide functions as a ligand for a plant transcription factor. Mol Plant-Microbe Interact. 2006, 19: 463-70. 10.1094/MPMI-19-0463.PubMedView Article
- Llave C, Kasschau KD, Rector MA, Carrington JC: Endogenous and silencing-associated small RNAs in plants. Plant Cell. 2002, 14: 1605-19. 10.1105/tpc.003210.PubMed CentralPubMedView Article
- Wysocka-Diller JW, Helariutta Y, Fukaki H, Malamy JE, Benfey PN: Molecular analysis of SCARECROW function reveals a radial patterning mechanism common to root and shoot. Development. 2000, 127: 595-603.PubMed
- Lim J, Jung JW, Lim CE, Lee M-H, Kim BJ, Kim M, Bruce WB, Benfey PN: Conservation and diversification of SCARECROW in maize. Plant Mol Biol. 2005, 59: 619-30. 10.1007/s11103-005-0578-y.PubMed CentralPubMedView Article
- Le BH, Wagmaister JA, Kawashima T, Bui AQ, Harada JJ, Goldberg RB: Using genomics to study legume seed development. Plant Physiol. 2007, 144: 562-74. 10.1104/pp.107.100362.PubMed CentralPubMedView Article
- Nielsen NC, Dickinson CD, Cho TJ, Thanh VH, Scallon BJ, Fischer RL, Sims TL, Drews GN, Goldberg RB: Characterization of the glycinin gene family in soybean. Plant Cell. 1989, 1: 313-28. 10.1105/tpc.1.3.313.PubMed CentralPubMedView Article
- Meinke DW, Chen J, Beachy RN: Expression of storage-protein genes during soybean seed development. Planta. 1981, 153: 130-39. 10.1007/BF00384094.PubMedView Article
- Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes JE, Snesrud E, Lee N, Quackenbush J: A concise guide to cDNA microarray analysis. BioTechniques. 2000, 29: 548-62.PubMed
- Joint Genome Institute/Phytozome. [http://www.phytozome.net/]
- GraphPad. [http://www.graphpad.com]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.