Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant
- Lila O Vodkin1Email author,
- Anupama Khanna1, 7,
- Robin Shealy1,
- Steven J Clough1, 8,
- Delkin Orlando Gonzalez1,
- Reena Philip1, 9,
- Gracia Zabala1,
- Françoise Thibaud-Nissen1, 10,
- Mark Sidarous1,
- Martina V Strömvik2, 11,
- Elizabeth Shoop2, 12,
- Christina Schmidt2,
- Ernest Retzel2,
- John Erpelding3,
- Randy C Shoemaker3,
- Alicia M Rodriguez-Huete4, 13,
- Joseph C Polacco4,
- Virginia Coryell5,
- Paul Keim5,
- George Gong6,
- Lei Liu6,
- Jose Pardinas6 and
- Peter Schweitzer6, 14
© Vodkin et al; licensee BioMed Central Ltd. 2004
Received: 12 April 2004
Accepted: 29 September 2004
Published: 29 September 2004
Microarrays are an important tool with which to examine coordinated gene expression. Soybean (Glycine max) is one of the most economically valuable crop species in the world food supply. In order to accelerate both gene discovery as well as hypothesis-driven research in soybean, global expression resources needed to be developed. The applications of microarray for determining patterns of expression in different tissues or during conditional treatments by dual labeling of the mRNAs are unlimited. In addition, discovery of the molecular basis of traits through examination of naturally occurring variation in hundreds of mutant lines could be enhanced by the construction and use of soybean cDNA microarrays.
We report the construction and analysis of a low redundancy 'unigene' set of 27,513 clones that represent a variety of soybean cDNA libraries made from a wide array of source tissue and organ systems, developmental stages, and stress or pathogen-challenged plants.
The set was assembled from the 5' sequence data of the cDNA clones using cluster analysis programs. The selected clones were then physically reracked and sequenced at the 3' end. In order to increase gene discovery from immature cotyledon libraries that contain abundant mRNAs representing storage protein gene families, we utilized a high density filter normalization approach to preferentially select more weakly expressed cDNAs. All 27,513 cDNA inserts were amplified by polymerase chain reaction. The amplified products, along with some repetitively spotted control or 'choice' clones, were used to produce three 9,728-element microarrays that have been used to examine tissue specific gene expression and global expression in mutant isolines.
Global expression studies will be greatly aided by the availability of the sequence-validated and low redundancy cDNA sets described in this report. These cDNAs and ESTs represent a wide array of developmental stages and physiological conditions of the soybean plant. We also demonstrate that the quality of the data from the soybean cDNA microarrays is sufficiently reliable to examine isogenic lines that differ with respect to a mutant phenotype and thereby to define a small list of candidate genes potentially encoding or modulated by the mutant phenotype.
Genes of higher plants are expressed in a coordinated fashion during development of tissue and organ systems and in response to different environmental conditions. This regulation may be tightly linked for some sets of genes, for example, in a specific biochemical pathway. Expression of regulatory genes may modulate the expression of key genes or entire sets of genes in individual pathways. The investigation of single gene expression patterns as determined by RNA blotting or quantitative reverse transcriptase PCR have been used to understand how different temporal, developmental, and physiological processes affect gene expression. With recent advances in genomics, very large numbers of genes can now be simultaneously analyzed for their expression levels in a comparative fashion between two biological states using microarray or biochip technology. Several techniques for the 'global' analysis of gene expression have been described [1–4]. These include (a) high density expression arrays of cDNAs on conventional nylon filters with radioactive probing; (b) microarrays or 'chips' using fluorescent probes, and (c) serial analysis of gene expression (SAGE).
Methods for global expression analysis require either knowledge of the entire genome of an organism or accumulation of a large EST (expressed sequence tag) database for the organism. In soybean (Glycine max), more than 286,000 5' EST sequences have been generated and deposited in public databases [, and this report]. These 5'ESTs represent a collection of 80 cDNA libraries from different tissue and organ systems at various stages of development and under diverse physiological conditions. Collaborative, multidisciplinary research to enhance the development of plant genome resources and information that would be publicly available for gene expression, gene tagging, and mapping has been a priority in recent years in plants of agronomic importance, including soybean . Here, we report the development, qualification, and use of 27,513 members of a low redundancy set of tentatively unique cDNAs or 'unigenes' in soybean. The 3' sequence of this set was determined and microarrays constructed. The public availability of the low redundancy clone set, sequence information, and microarrays reported here will greatly enhance gene discovery and genomic scale research in soybean and other legumes by the community of researchers. For example, we illustrate the use of the 5' and 3' sequence-verified cDNA microarrays to determine organ-specific expression and we demonstrate their potential to discover the molecular basis of specific mutations in closely related isogenic lines.
Results and discussion
Cluster analysis of 280,000 ESTs reveals 61,127 'unigenes' in soybean
The combined number of contigs and singletons (sequences that occur only once) resulting from a computer assembly of ESTs is an estimate of the number of unique genes in the organism. As the number of ESTs grows, the number of unique genes in the organism will continue to be refined. Our current contig analysis of the entire public EST collection for soybean of 286,868 sequences yields 61,127 'unigenes' of which 36,357 are contigs and 24,770 are singletons.
The finding of 61,127 soybean unigenes by EST cluster analysis agrees well with an independent contig and unigene assembly in the databases of The Institute for Genomic Research (TIGR) which shows 30,084 contigs and 37,601 singletons for a total of 67,826 tentatively unique sequences from among 334,730 sequences representing all publicly available sequences clustered in Release 11 . Other large scale plant EST collections as analyzed by the TIGR gene indices  show 42,301 unigenes for Arabidopsis thaliana (of 247,429 ESTs), 36,976 for Medicago truncatula (of 189,919 ESTs), 31,012 for tomato (of 156,645 ESTs), 109,509 for wheat (of 494,195 ESTs) and 56,364 for maize (of 377,188 ESTs). The complete genome sequence for Arabidopsis has revealed an estimated 26,000 genes . Of course, the unigene sets determined by EST clustering are only estimates of the number of unique genes in an organism and depend on the number of ESTs available, the technologies used to make the libraries, and the bioinformatic methods used to assign clusters. The soybean genome is approximately 1.2 × 109 bp which is about 7.5 times the size of the Arabidopsis genome and twice that of tomato, but less than half the size of the maize genome. Thus, it is not unexpected that soybean may have a larger number of unigene clusters than Arabidopsis or tomato, for example. Although soybean is not hexaploid in origin as is modern wheat, it is thought to be an ancient autotetraploid and many examples of duplicate loci exist in soybean.
Virtual subtraction using high density cDNA filter arrays increases gene discovery in immature cotyledon libraries that abundantly express storage protein gene transcripts
Selection and 3' sequencing of 27,513 soybean cDNAs from the soybean unigene set to use in microarrays
High density cDNA arrays of bacterial cultures spotted on nylon membranes and probed with radioactively labeled transcripts are useful for gene discovery as illustrated in Figure 2 above, but they have limited use for quantifying the relative abundance of transcripts expressed in independent mRNA samples. An alternative method to the high density filters is microarray technology [2, 3] in which PCR-amplified cDNA inserts, or oligonucleotides, are printed on glass slides and probed with mRNAs populations that have been separately labeled with different fluorescent probes.
Comparison of the percent unique sequences as determined by either CAP3 or Phrap analysis for the 5' and 3' ESTs represented in each of the four successive reracked clone subsets that constitute the low redundancy soybean 'unigene' set
Rerack order & name
No. ESTs clustereda
% Unique ESTsc Cap3 or Phrap
2,202 s 259 c
2,054 s 334 c
88.0% : 80.4%
1,836 s 413 c
1,682 s 505 c
85.4% : 78.2%
5,566 s 620 c
5,116 s 831 c
89.2% : 78.0%
4,284 s 1,124 c
3,900 s 1,340 c
85.7% : 75.5%
3,426 s 200 c
3,289 s 260 c
93.5% : 79.7%
2,474 s 599 c
2,256 s 723 c
91.5% : 76.8%
6,295 s 521 c
5,909 s 745 c
91.7% : 89.5%
4,719 s 1,173 c
4,152 s 1,513 c
79.3% : 76.2%
Entire set, 1–4
21,873 s 2,402 c
18,663 s 3,966 c
88.2% : 81.2%
Entire set, 1–4
11,959 s 4,156 c
8,341 s 5,641 c
73.0% : 63.3%
EST clustering will overestimate the number of unique genes as some of the shorter ESTs will not overlap and thus are falsely counted as independent, unique sequences. However, the clustering analysis can also falsely lump non-identical members of gene families into the same contig based on conservation of sequence similarity in the coding region. The 3' sequencing is especially useful for resolving both of these issues as there is generally more variation in the 3' UTR in plant genes than in the coding region. For those reasons and as a quality control of the reracking process, we sequenced the 3' end of the reracked cDNAs. From the 27,513 total 3' sequencing attempts on the tentatively unique cDNAs represented in Table 1, a total of 22,088 sequences met the criteria of high quality sequence. The 3' sequencing was more problematic than the 5' sequencing due to termination of the sequencing reactions at some of the long polyA tails characteristic of soybean and many other plant cDNAs. An anchored primer was used to increase the success rate (see Methods). The average length of the 3' ESTs was 526 bases compared to the average 5' sequence read length of 474 for 280,094 ESTs.
Since the clustering analyses were performed at successive intervals as the EST collection grew in size, we repeated the Phrap contig analyses separately using only the input sequences of each cDNA rerack for which both a 5' and 3' EST were known. We also performed a CAP3 analysis . Table 1 shows that CAP3 values for the 5' sequence yielded 88.0 to 93.5% unique sequences while the Phrap values were slightly lower at 78.0 to 89.5% unique sequences. Interestingly, the estimate of unique sequences using the 5' EST data did not change substantially from reracked library r1021 where only approximately 6800 ESTs were clustered through library r1088 where over 250,000 sequences were clustered. A separate cluster analysis of only the 27,513 input 5' sequences revealed 81.2 to 88.2% unique sequences by Phrap and CAP3 analyses, respectively.
The 3' ESTs were also separately subjected to CAP3 or Phrap analysis. The CAP3 values showed a slightly higher level of uniqueness (or lower level of redundancy) with 79.3 to 91.5% for CAP3 in the successive clustering analyses versus 75.5 to 78.2% unique sequences as determined by Phrap. An overall figure of 73.0% for the CAP3 analysis on the 21,048 total 3' sequences clustered was found versus 63.3% for Phrap.
The differences between the 5' and 3' levels of uniqueness (i.e., 88.2% versus 73.0% for the entire sets as determined by CAP3) can be explained by the nature of reverse transcriptase action. The reverse transcriptase was primed using an oligo dT primer and so the cDNAs will begin at the 3' end and will terminate randomly at variable sites as the enzyme progresses to the 5' end of the mRNA template. Thus, 5' ESTs often begin at variable sites. Therefore, even though two 5' ESTs may have originated from the same mRNA transcript, they will not cluster if they are non-overlapping and will be counted as two separate ESTs. The 3' soybean EST reads begin just after the poly A tail and produce longer average read lengths than the 5' soybean ESTs; thus, the 3' ESTs are more likely to form an overlapping contig if there is any redundancy among them.
Information contained in a comprehensive cross list of soybean unigene clone IDs. Shown are various identifiers and annotations for 27,513 reracked cDNAs used in microarray construction. The full list is provided with arrays and available upon request.
Cross List Identifiers (for each cDNA clone)
Example (one of 27,543 cDNAs)
Reracked Clone ID
The individual cDNA clone ID in the 384-well destination plates after reracking or rearraying of the selected clones from the cDNA source library plates.
Reracked Plate ID
The 384-well reracked plate name in increments of 384 (ie., 1, 385, 769, etc.)
Reracked row_column position
Position of the clone in the 384-well reracked plate
Reracked 3' Keck Sequence ID
Sequence identifier assigned by the Keck Center for the 3' EST
Reracked 3' Genbank Accession
Genbank assigned accession number for the 3' EST
Reracked 3' Annotation
glutathione S-transferase GST 22 [Glycine max]
Top BLASTX hit for the 3' EST, at E10-6 or lower
Source Clone ID
The individual cDNA clone ID in the 384-well source plate.
Source Plate ID
The 384-well source plate name in increments of 384 (ie., 1, 385, 769, etc.)
Source row_column position
Position of the clone in the 384-well source plate
Source WashU Sequence ID
Sequence identifier assigned by Washington University, 5' EST
Source 5' Genbank Accession
Genbank assigned accession number for the 5' EST
Source 5' Annotation
glutathione S-transferase GST 22 [Glycine max]
Top BLASTX hit for the 5' EST at E10-6 or lower
Name of the cDNA source library
Specific information on the soybean variety or genotype
Entire roots of 8-day old seedlings
Tissue/organ system/stage from which the cDNA library was constructed
Construction of microarrays representing the 27,513 soybean unigene cDNAs
Soybean microarrays and low redundancy and low redudancy unigene sets built from the public EST collection
Microarrays and Reracked Unigene cDNA sets a
Source cDNA Library a
No. of cDNAs on array
Soybean tissues b
Set 1. Gm-r1070: 9216 cDNAs highly representative of developing seeds and flowers
whole young pods (2 cm)
immature cotyledons from 25–50 mg fresh weight seed
immature cotyledons 100–200 mg seed fresh wt.
immature cotyledons 100–200 mg seed fresh wt.
immature cotyledons 100–300 mg seed fresh wt.
immature cotyledons 100–300 mg seed fresh wt. low expressing cDNAs fromGm-c1007 filter hybridizations
immature seed coats from seed of 100–200 mg fresh wt.
immature seed coats from seed of 200–300 mg fresh wt.
Set 2. Gm-r1021+Gm-r1083: 9216 cDNAs highly representative of roots
roots of 8-days old seedlings
roots, 2 month old plants
roots innoculated with B. japonicum
whole 2–3 week old seedlings
Set 3. Gm-r1088: 9216 cDNAs highly representative of seedlings, leaves, and stressed or pathogen challended tissues
immature seed coats from seed of 200–300 mg fresh wt.
immature seed coats from seed of 100–200 mg fresh wt.
cotyledons of 3- and 7-day-old seedlings
somatic embryos cultured on MSD 20 for 2 to 9 mo.
differentiating somatic embryos cultured on MSM6AC
epicotyl, 2 week old seedling, auxin treatment
germinating shoot, cold stressed, 3 day old seedlings
leaf and shoot tip, salt stressed, 2 wk. old seedling
germinating shoot, 3 day old seedling, auxin treatment
leaf, drought stressed. 1 month old plants
leaves and shots from 2–3 week old seedlings induced for SDS symptoms
leaves and shoots from 2–3 week old seedlings included for SDS symptoms
9–11 day old seedlings induced for HR response by P. syringae carrying avrB gene
The cDNAs from the sequence-driven, reracked clone sets were amplified by PCR using the Qiagen-purified cDNA templates that were prepared for 3' sequencing (as opposed to amplification of the inserts directly from E. coli cultures containing the plasmid DNA). All 27,513 PCR reactions were performed with generic M13 forward and reverse primers using a robotic pipettor. Approximately 25% of the purified PCR cDNA inserts were subjected to agarose gel electrophoresis for quality control. Of these, the average insert size was estimated to be 1,340 bp for library Gm-r1021, 1,110 bp for library Gm-r1070, 1,259 for library Gm-r1083, and 1,269 bp for library Gm-r1088.
The 9,216 amplified inserts of each set were singly spotted onto glass slides as outlined in the Methods section. A set of 64 control or 'choice' clones was assembled by hand into one 96-well plate (designated Gm-b10BB) and printed eight times repetitively throughout each array. Thus, the total number of spots on the array is 9,728 consisting of 9,216 cDNAs from the unigene set plus 512 (64 cDNAs × 8 repeats) from the choice clones. The choice clones were selected for various reasons. Some represent constitutively expressed genes (such as ubiquitin and EF1). Some are cDNAs whose expression is restricted to a subset of specific plant tissues (such as Rubisco or seed storage proteins). Some are clones of enzymes representing commonly used antibiotic resistance markers in transgenic plants (as hygromycin or kanamycin resistance), and 32 are cDNAs that represent at least 13 different enzymes of the flavonoid pathway. The flavonoid pathway was chosen because the corresponding genes often respond to many biotic and abiotic stress conditions and it has been widely studied in plant systems.
Soybean microarrays have potential to reveal the molecular basis of a mutant phenotype
Differentially expressed cDNAs detected in dual labeling microarray experiments comparing isogenic lines of the T locus in soybean.
XB22A/37609 (T/T) / (t*/t*)
Overexpressed in XB22A
Underexpressed in XB22A
Trypsin inhibitor, Kunitz
No hits found
Trypsin inhibitor, Kunitz
An examination of the ratios of the 16 repetitively spotted flavonoid 3' hydroxylase cDNAs using a t test showed that the mean ratio of the repeated cDNAs on replicate 1 (2.424) were statistically significant at a P value of 0.0001 when compared to an expected mean of a 2.0, or a two-fold expression difference. The low P values were also found for replicate 2 and for the mean value (3.245) of both replicates. Thus, the flavonoid 3' hydroxylase cDNAs are statistically significant outliers in the microarray analysis.
The microarray data presented here and showing that the cytoplasmic levels of the flavonoid 3' hydroxylase are higher in the T/T line agree very well with RNA blot data which showed that the flavonoid 3' hydroxylase gene has reduced expression in the seed coats of the t*/t* isoline compared to the T/T lines . In addition to the RNA blot data showing differences in these mutant lines, we have definitively shown that the flavonoid 3' hydroxylase is encoded by the T locus by sequence data of other alleles of the locus and by genetic cosegregation data . We do not know the reason for the change in the expression levels of the seven other cDNAs as shown in Table 4, most of them representing various seed or storage type proteins. While the T locus does determine the flavonoid and pigment compounds synthesized in various tissues including seed coats and trichomes, it is possible that the flavonoid compounds themselves modulate an additional effect on seed protein synthesis in the seed coats. Alternatively, the observed differences in the levels of these cDNAs could be due to an artifact during the dissection procedure. We know from Northern blots, that flavonoid 3' hydroxylase is highly expressed in the seed coats, but is not expressed in the cotyledons so any small amount of contaminating cotyledon cells due to imprecise dissection of the seed coats of one line versus the other could lead to observed differences in seed protein RNAs.
As this example in Figure 3 and Table 4 illustrates, the use of dual labeled mRNAs from near isogenic lines to probe microarrays is a powerful approach with which to obtain a small list of candidate genes from among the thousands examined by microarray analysis. In this example only eight functionally different cDNAs (or seven if the two trypsin inhibitor cDNAs are counted as one) of over 9200 cDNAs spotted on the array met the criteria of exceeding two-fold levels of expression in both replicates. If a cDNA is repetitively spotted on an array, as were the flavonoid 3' hydroxylase cDNAs, then the data are statistically significant. After identifying a short list of candidate genes, it is then feasible to test them by other methods (as RNA blotting, quantitative RT-PCR, RFLP or SNP analysis) in order to find an association of a particular cDNA with the mutant phenotype. Of course, if a particular mutation has a regulatory or epigenetic effect on a large number of downstream RNAs, or if a mutation does not affect the abundance of an mRNA, then the global expression approach may not be effective in identifying the primary nature of the mutant locus. For example, the standard recessive t allele at the T locus is the result of a premature stop codon and does not affect abundance of the flavonoid 3' hydroxylase mRNA to the same extent as does the t* mutation at that locus .
Tissue specific gene expression using the soybean microarrays
A selection of genes that are differentially expressed in leaves or in roots
GenBank accession no.
Annotation (BLAST hit and organism)
Rubisco (Glycine max)
Light harvesting chlorophyll a/b binding protein (Arabidopsis thaliana)
Photosystem I subunit (Oryza sativa)
Thylakoid lumen protein (Arabidopsis thaliana)
Plastocyanin precursor (Glycine max)
Trehalose-6-phosphate phosphatase (Arabidopsis thaliana)
Vegetative Storage Protein (Glycine max)
Acidic chitinase (Glycine max)
Putative calreticulin (Oryza sativa)
Cytochrome P450 (Pyrus communis)
Catalase (Glycine max)
Putative serine carboxypeptidase II-3 precursor (Oryza sativa)
H protein (Flaveria anomala)
F3H (Flavanone-3-Hydroxylase) (Glycine max)
Matrix metalloproteinase MMP2 (Glycine max)
Putative lipoic acid synthase (LIP1) (Arabidopsis thaliana)
Lipid transfer protein-like protein (Retama raetam)
Nodulin-26 (Glycine max)
MtN19 homolog (Medicago truncatula)
Similar to nodulins and lipase homolog (Arabidopsis thaliana)
bZIP transcription factor (Arabidopsis thaliana)
Chalcone isomerase (Glycine max)
Putative aquaporin (tonoplast intrinsic protein) (Arabidopsis thaliana)
Phosphoenolpyruvate carboxkinase (Flaveria trinervia)
Similar to sucrose synthase (Pisum sativum)
Proline-rich protein (Glycine max)
DAD-1 (Defender Against apoptopic cell Death) (Glycine max)
Ripening related protein (Glycine max)
Germin-like protein (Phaseolus vulgaris)
Pectinesterase (EC 22.214.171.124) precursor (Vigna radiata)
Asparagine synthase (glutamine-hydrolyzing) (Glycine max)
Cationic peroxidase (Glycine max)
Tubulin (b chain) (Glycine max)
Specific tissue protein 1 (Cicer arietinum)
Auxin-repressed protein (Robinia pseudoacacia)
The soybean proline-rich protein (SbPRP1 Accession # J05208) (represented on the array by AW309104, Gm-c1019-3688) is also among the root expressed clones. SbPRP1 has been shown to be expressed preferentially in the roots . A gene of interest overexpressed in roots is the DAD-1 (Defender Against apoptotic cell Death). No Rubisco or photosynthesis related genes were observed to be over expressed in the roots as would be expected for the non-green tissues.
In leaves, genes typical for green tissues are upregulated as expected. These are the photosynthesis genes (eg. Rubisco, plastocyanin precursor, chlorophyll a/b binding protein type II, trehalose-6-phosphate phosphatase, photosystem subunits and proteins, thylakoid lumen protein, light-harvesting chlorophyll a/b binding protein), and the vegetative storage proteins. Ribosomal proteins, cytochrome P450, catalase, and chitinase were also noted as overexpressed in the leaves.
Our publicly available Gm-r1021 soybean unigene subset containing 4,098 cDNAs has also been used to examine differential gene expression in roots and shoots of older soybean plants . We have previously utilized microarrays containing 9,216 clones of the Gm-r1070 set (representing many cDNAs from developing seeds, seed coats, flowers, and pods) to carry out a detailed analysis of induction of somatic embryos during culture of cotyledons on auxin-containing media . The resulting transcript profiles were subjected to a cluster analysis and revealed the process of reprogramming of the cotyledons cells during the induction process. The 495 cDNAs (5.3% of the cDNAs on the array) that were differentially expressed were clustered into 11 sets using a non-hierarchical method (K-means) to reveal cDNAs with similar profiles in either the adaxial or the abaxial side of the embryos from 0 to 28 days in 7 day intervals. Among other conclusions, these global expression studies indicated that auxin induces dedifferentiation of the cotyledon and provokes a surge of cDNAs involved in cell division and oxidative burst.
Thus, the soybean cDNA arrays that we have developed from the unigene cDNA set can be used to reveal the underlying physiological and biochemical pathways potentially operative in specific tissues, developmental stages, or environmental treatments. Obviously, cDNA arrays from soybean or any other organism that are constructed with PCR inserts representing an average size of 1.1 kb will generally hybridize with any RNAs from gene family members that share greater than 85% homology. Thus, cDNA arrays will generally not distinguish expression from closely related duplicated sequences. Oligo arrays spotted with synthetic 70-mers or Affymetrix short oligo arrays have greater potential to separate the expression from close related duplicated sequences if the oligos are chosen from the 3' or 5' non-coding regions that carry more sequence variability than the protein coding regions.
Although microarray data is limited from soybean and most plants other than Arabidopsis, the construction of the 27,513 member low redundancy 'unigene' cDNAs for soybean reported in this paper will greatly stimulate this area. The number of slides containing all 27,513 of the cDNAs is being reduced to one, or at most two slides, and the slides are publicly available. Spotted PCR products with average size of over 1 kb are useful not only for soybean, but for other legume species as cross hybridization to the long probes will be substantial. The 3' sequencing reported here is particularly useful for differentiating gene family members and for future design of gene specific oligo arrays of either 70-mers spotted on glass slides or by Affymetrix technology using short oligos synthesized in situ. The cDNA or oligo-based microarrays add to the developing suite of genome analysis approaches in soybean . A few of the unlimited applications include profiling expression from genes that respond to challenges by various pathogens and by environmental stresses as drought, heat, cold, flooding, and herbicide application. Also, by analysis of the near isogenic lines of the T locus as an example, we demonstrated the potential of soybean cDNA arrays to be used for discovery of genes responsible for uncharacterized mutations. Future expression profiling of mutant phenotypes or of genotypes that differ in protein or oil content and other quantitative traits will yield significant clues to the genes involved in those pathways and traits.
Contig assembly for unigene selection
Raw sequence files of the 5' soybean EST data from Washington University or 3' data from the University of Illinois Keck Center were produced from sequence traces using the Phred base calling program [18, 19]. The sequences were trimmed for leading and trailing vector and linker sequences and artifact E. coli sequences were removed. Quality checks included determining the number of ambiguous 'N' base calls in a sequence and trimming the leading and trailing poor quality (high-N) sections to obtain the best subsequence where the number of Ns was 4% or less of the total bases. The EST sequences were clustered into contig sets based on sequence overlap using the program Phrap . The processing and analysis results for each sequence are displayed on a set of World Wide Web pages . The distribution of sequence lengths in each submission set are displayed in histograms. The base call and quality information for each sequence in a submission are displayed in artificial gel images of the sequences. Each sequence is displayed as the raw sequence before vector filtering and the cleaned sequence after vector filtering. A color-coded sequence quality graph shows the part of a sequence retained after trimming as well as the regions trimmed for low quality, polyA or polyT, and vector sequences. Blast reports for each sequence are displayed and can be searched collectively for words or phrases of interest. Contig sequences and images of the contig assemblies are displayed on linked web pages along with graphs describing the contig qualities .
Clone reracking and 3' sequencing
Soybean cDNA clones corresponding to the 5' most representative member of a contig or to a singleton were selected using Oracle database tables and SQL queries. The E.coli stocks representing those clones were reracked into new 384 well plates to form the sequence driven reracked libraries Gm-r1021 (4,089 cDNA clones), Gm-r1070 (9,216 cDNA clones) and, Gm-r1083 (4,992 cDNA clones). Initially, these were reracked from source 384-well plates to destination 384-well plates by Genome Systems (St. Louis, MO) using a Qbot and shipped on ice to the University of Illinois for extraction and 3' sequencing. Reracked library Gm-r1088 (9,216 cDNA clones) was reracked at the University of Ilinois Keck Center using a QPix robot, (Genetix, New Milton, Hampshire UK). Growth rates for the E.coli stocks were over 99.5%.
The cDNA libraries were all constructed in either pSPORT 1 (Invitrogen, Carlsbad, CA) or pBluesciptII SK (+) (Stratagene, La Jolla, CA) plasmid vectors in DH10B host cells. Each 384-well plate of a bacterial library was split into four 96-well, 2 ml block plates, each corresponding to a different quadrant (A1, A2, B1, B2) and grown overnight in 1 ml LB media with 100 μg/ml ampicillin. High quality DNA templates were purified using a QIAGEN BioRobot 9600 or BioRobot 8000 with QIAprep 96 Turbo miniprep kits (QIAGEN, Germantown MD). Dideoxy terminator sequencing reactions for the 3' ends of the soybean cDNA clones were conducted by the University of Illinois Keck Center for Comparative and Functional Genomics  using standard methods analyzed either on gel-based ABI 377 or capillary-based ABI3700 instruments. Inserts within each vector type can be sequenced from the 5' end using the M13 reverse primer and the 3' end using the M13 universal forward primer. However, for higher success rates at the 3' end, a degenerate primer consisting of [5'-TTTTTTTTTTTTTTTTTT(A/C/G)-3'] was employed in order to enhance the success of 3' sequencing reactions by eliminating the need to sequence through the poly A tail. The primer was synthesized and purified by HPLC (Qiagen Operon, Alameda CA) to remove shorter, incomplete primers. Using high quality Qiagen purified cDNA templates, the average 3' untrimmed read length was over 600 bases with a success rate of 80 to 85%. Original sequence trace files are available by ftp from the University of Illinois Keck Center . The trimmed sequences were entered into Genbank . The reracked 5' and 3' sequences were analyzed by both the CAP3  and Phrap programs .
All cDNA clones of the low redundancy reracked 'unigene' sets are available to the public through Biogenetic Services, Inc., Brookings, SD, or the American Type Culture Collection, Manasas, VA.
Annotation of the unigene cDNAs using BLASTX
The 5' and 3' sequences of the 27,513 unigene cDNAs clones were annotated using BLASTX against the nonredundant (nr) protein database with cutoff E value of 10E-6. The top blast hit was used as the annotation for each of the 5' and 3' ESTs represented in the unigene sets printed on the microarrays.
In some cases, the protein family assignments were also made using the Metafam program based on a BLASTX analysis against a protein sequence database consisting of a non-redundant set of sequences from SwissProt & TrEMBL , PIR & NRL , GenPept , and Integrated Genomics, Inc. (Chicago, IL). Each of the protein sequences in this database is also placed in a protein family in the MetaFam database [26–28]. The results from each BLASTX report were parsed and placed in an Oracle 8i database. The strong protein sequence hits from BLASTX are matched up to the MetaFam protein families to which those protein sequences belong.
Amplification of cDNAs and preparation for use in microarray construction
All pipetting steps involved in amplifying the cDNAs by PCR, purification of the cDNAs, and assembling them into 384-well spotting plates were conducted with a Multimek TM 96 Automated pippetor (Beckman Instruments, CA) to reduce errors associated with manual pipetting.
The same Qiagen plasmid DNA templates that were prepared for the 3' sequencing by a Qiagen robot at about 100+ ng/μl were also used for PCR amplification using Taq polymerase (Invitrogen, Carlsbad, CA), universal forward and reverse primers in 96 well plates using the MJ DNA Engine Tetrad (MJ Research, Waltham, MA). Four PCR reaction plates are prepared at a time, one from each quadrant of a 384-well library plate. A master mix consisting of final concentrations of 1X PCR buffer (20 mM Tris-HCl, pH 8.4, 50 mM KCl), 2 mM MgCl2, 0.25 mM each of dGTP, dATP, dTTP, dCTP, 1 μM of M13 universal primer, 1μM of M13 reverse primer, and 0.05 U/μl of Taq polymerase (Invitrogen, Carlsbad, CA, cat no. 18038-042) was prepared and 48 μl were aliquoted into each well of a 96 well PCR reaction plate (MJ Research MSP-9621). A 0.5 μl aliquot of an undiluted plasmid template DNA was aliquoted into the 48 μl of master mix. The plates were briefly centrifuged for 1 min at 1500 rpm and placed into an MJ PTC-200 DNA Engine for 1 min of denaturation at 94°C, and 28 cycles of 92°C for 30 sec, 56°C for 45 sec, and 72°C for 30 sec and a final extension of 72°C for 5 minutes. A typical yield from the PCR was about 30–100 ng/μl.
The PCR products were loaded into Millipore multiscreen plates (Millipore #MANU 03050) and were subjected to a vacuum applied at 15 psi for about 10 min until the wells were completely empty. Then 60 μl of sterile water were added to each well using the Multimek automated pipettor and the PCR products were washed. The purified products were eluted in sterile water, retrieved and then stored in 96 well plates at -20°C. A 1 μl aliquot of each well from 3 rows from each 96 well plate is run on a gel to check the quality of the PCR and purification of the cDNA. The yield after purification was between 30 and 40 μls with concentrations around 15–50 ng/μl.
Spotting plate assembly
The four quadrants were then reassembled into a 384-well spotting plate containing 6 μl per well: 4.5 μl of PCR product from the 96 well plates mixed with 1.5 μl of 4X Micro Spotting Solution Plus (MSP4X, Telechem, Sunnyvale, CA). Alternatively, in earlier prints, the spotting plates were assembled at a final concentration of 3X SSC, 0.01% N-lauroylsarcosine by mixing 3.5 μl of purified PCR product with 1.5 μl of 10X SSC, 0.033% Sarkosyl, pH 7.0 (1.5M NaCl, 0.15 M citric acid, trisodium salt, 1.12 mM N-lauroylsarcosine, Sigma L-9150).
A set of 9,216 prepared cDNA inserts from 24, 384-well spotting plates were single spotted onto amine coated glass slides (1 in × 3 in, Telechem Superamine, SMM slides, Telechem International, Sunnyvale, CA) using a Cartesian PyxSys 5500 robot (Genomic Solutions, Ann Arbor, MI) equipped with 16 quill pins (ChipMaker II from Telechem International) and an environmental chamber. The cDNAs were printed at 55% ± 5% relative humidity setting within the chamber and in a room that was controlled for humidity to be between 45 and 60% using room dehumidifiers as needed. Control of humidity was critical for printing.
All arrays contained 32 grids of spots arranged in an 8 × 4 matrix. Each grid had 19 rows and 16 columns of spots for a total of 9,728 spots per array. A total of 9,216 spots were the cDNAs prepared from the 'unigene' set to form 18 of the 19 rows with 288 spots per grid. After all of the 9,216 cDNAs were printed, an additional row of 16 spots was printed as the first row of each grid for a total of 32 grids × 16 spots = 512 additional spots. These cDNAs were printed from the choice clone spotting plate designated Gm-b10BB which contained 64 hand-picked clones. Thus, the 64 hand-picked, choice clones were printed 8 times each, i.e., each clone was printed in twice in four separate grids. In addition, since the Gm-r1021 library contained only 4089 cDNAs, an additional 135 were repeated in order to obtain an even 9216 cDNAs for printing when combined with the Gm-r1083 unigene set.
The three microarray platforms were entered in the Gene Expression Omnibus database  with platform accession numbers GPL229 for Gm-r1070, GPL1013 for Gm-r1021+Gm-r1083, and GPL1012 for Gm-r1088. Complete tables of sequence identifiers and accession numbers for the unigene cDNAs printed on arrays as illustrated in Table 2 are available .
Construction of the 'choice' clone PCR plate for repetitive spotting
To construct the choice plate Gm-b10BB, 64 clones were chosen to be used as negative and positive controls for expression analysis in all microarray slides. These 64 clones were chosen to represent certain constitutively expressed genes, or other markers for particular tissues, and is also highly representative of key genes of the soybean flavonoid pathway.
The 64 clones were hand picked and grown over night in microfuge tubes containing 100 μl of YT media at 37°C, 250 rpm. The following day, microfuge tubes containing 200 μl of YT supplemented with 100 μg/ml ampicillin and 8% glycerol were inoculated with 5 μl from the previous culture and grown over night at 37°C and 250 rpm.
To create a 96 well plate of these E.coli stocks, 100μl of the previously grown culture were transferred to wells in columns A1 thru H8 and stored at -80°C. Wells in columns A9 thru H12 were left empty. A small database for the Gm-b10BB plate was prepared containing the name of each gene, its sequence, accession number, and the corresponding well in the Gm-b10BB plate. To make a replicate copy for sequencing, 100 μl of YT supplemented with 100 μg/ml ampicillin and 8% glycerol were inoculated with 5 μl of the -80°C E. coli stock and incubated overnight at 37°C and 250 rpm. Miniprep DNAs were isolated and sequenced at the University of Illinois Keck Center using a 5' M13 primer. The identity of each clone was confirmed by comparison of the sequences obtained from the Keck center with the sequences contained in our previously prepared database by using the Pairwise Blast tool available at the NCBI web page. All sequences showed >97% identity with the corresponding sequence in the database.
PCR amplification using the DNA miniprep plate as a source for templates was performed with the Mutimeck 96 automated pipetor (Beckman) as described above. All PCR products were purified and separated in 1% agarose gel to evaluate the purity of the amplified DNAs and determine their size. The purified and analyzed PCR products from the 96 well plate, Gm-b10BB, were used to assemble a 384 well spotting plate. The 384 well spotting plate contained 6 μl per well: 4.5 μl of PCR product from the 96 well plate aliquoted on each of the 4 quadrants and mixed with 1.5 μl of 4X Micro Spotting Solution Plus (MSP4X, Telechem, Sunnyvale, CA) after assembly.
After all slides were printed, the cDNAs were UV-cross linked to the slide coating with 650 m Joule ultra violet light using a StrataLinker (Stratagene, La Jolla, CA). [Note: prior to cross-linking the spots were rehydrated if necessary. Rehydration was required for the slides printed with the SSC-Sarkosyl spotting solution but was not required for those printed with Telechem spotting solution. DNA spots were rehydrated by passing the slide over a gentle vapor of steam for a few seconds until spots glistened but did not coalesce and then were quick dried on a 70°C heating block]. To remove excess spotted DNA as well as to denature attached DNA to single strands, slides were treated with the following series of washes with agitation: 2 min with 200 mls of 0.2% SDS, two 1 min water rinses, 95°C water for 2 min, 0.2% SDS for 1 min, and finally two water rinses of 1 min each. Slides were subjected to low speed centrifugation for 2 min at 500 rpm to dry and were stored in a slide rack in a dust free container.
Plant material and RNA isolation and labelling
Seed coats and cotyledons were dissected from plants grown to maturity in soil in the greenhouse. Roots and leaves were collected from soybean plants grown for 11 days after germination in an aerated hydroponic solution with normal nutrient conditions. Total RNA was extracted using phenol-chloroform and lithium chloride precipitation methods [31, 32]. RNA was further purified by use of RNeasy Mini or Maxi columns Qiagen, Valencia, CA) according to the manufacturer's instructions. Prior to labelling, the purified RNA was concentrated in a Speed Vac (Savant Instruments, Halbrook, NY) or by using YM-30 Microcon column (Millipore, Bedford, MA).
For each RNA probe, 50 to 60 μg of purified total RNA was labeled by reverse transcription in the presence of Cy3- or Cy5-dUTP . Briefly: the RNA and 5μg oligo-dT 18–21 mer (Operon, Qiagen) were annealed in a 10μl volume at 70°C for 10 min and cooled on ice. A 20 μl cocktail containing 1X first strand reaction buffer, 10 mM DTT, 0.5 mM each of dATP, dCTP, dGTP, 0.2 mM dTTP, 100μM Cy3- or Cy5-dUTP (Amersham, Pharmacia) and 400 U of 200 U/μl SUPERSCRIPT™II (Invitrogen, Carlsbad, CA, cat no. #18064-014) was added to 10μl of the denatured RNA and oligo-dT mixture). The 30 μl reaction was incubated for 1 hr at 42°C, after which 200 additional units of SUPERSCRIPT™II were added and incubation was continued for another hour at 42°C. The reaction was then treated with RNAse A and RNAse H (0.5μg and 1.0 U respectively, Invitrogen, Carlsbad, CA) for 30 min at 37°C to degrade the RNA. The resulting Cy3 and Cy5-labeled cDNAs were paired and mixed together according to the intended experiment and unincorporated nucleotides were removed using a PCR cleaning kit (Qiagen, Valencia, CA). Cleaned probes were concentrated in a SpeedVac (Savant Instrument, Holbrook, NY) for approximately 5 min to a volume of less than 32 μl prior to being used in hybridization to one array.
Microarray hybridization reactions
The microarray slides were prehybridized by incubation in 5X SSC, 0.1% SDS, 1% BSA at 42°C for 45 to 60 min. For each slide, the labeled cDNA probe was brought to 30.5 μl with the addition of sterile water. A 1.5 μl aliquot of 10 μg/μl polyA was added and the probe was denatured at 95°C for 3 min. An equal amount (32 μl) of pre-warmed 2X hybridization buffer (50% formamide, 10X SSC, 0.2% SDS,  was added to the mixture and the probe was pipetted between the pre-hybridized slide and the cover slip (LifterSlip, Erie Scientific Company, Portsmouth, NH). The slide was placed in a hybridization chamber (Corning, New York, NY) and incubated overnight for 16–20 hrs at 42°C. The next day the cover slip was removed and the slide was washed once in 1X SSC, 0.2% SDS prewarmed to 42°C; once in 0.2X SSC, 0.2% SDS at room temperature; and once in 0.1X SSC at room temperature. The washes were conduced with gentle shaking at 100 rpm for 5 min. Slides were subjected to low speed centrifugation for 2 min at 500 rpm to dry.
Scanning, quantitation, and normalization
The hybridized slides were scanned with a ScanArray Express fluorescent microarray scanner (Perkin Elmer Life Sciences, Boston, MA) and their fluorescence quantified by ScanArray Express software or by GenePix Pro 3.0 (Axon Instruments, Union City, CA). A perl program was written for post analysis processing of the quantitated image files from the Scan Array Express or GenePix Pro3.0. Local background was subtracted from each spot intensity. Spots showing signal intensities below the 95th percentile of the background distribution in the Cy3 or Cy5 channel were filtered out. The ratio of Cy5 mean to Cy3 mean (r) was computed and used to adjust the Cy3 values to Cy3 X sqrt(r) and the Cy5 values to Cy5/sqrt(r). A between-replicate correction was made using an ANOVA model, which equalized average grid or slide intensities between replicates, for Cy3 and Cy5 separately. The ratio of the resulting adjusted intensities of Cy5 to Cy3 was computed for each spot. The coefficient of variation (standard deviation/mean) across replicates was calculated for each spot to evaluate repeatability of the hybridizations.
High density filter hybridization and selection of weakly expressing cDNAs
High density nylon filters containing 18,432 non-sequenced cDNA clones from the cDNA library Gm-c1007 made from immature cotyledons were spotted using a Qbot by Incyte Genomics. Before use, the filters were washed in0.5% SDS solution that was heated to 60°C, poured over the membrane, and gently agitated for five minutes. This will rid the filter of any residual debris and will result in a cleaner hybridization.
Radiolabelling of probe
Total mRNA from developing cotyledons was labeled with 33P-dATP in the following manner: RNA in 8 μl water (up to 5 μg, but generally 2 to 3 μg of mRNA) was combined with 4 μl Oligo dT (0.5 μg/ul, 70 μM, Sigma Lot 29H9065). The mixture was heat treated for 10 min at 70°C and chilled on ice before adding the following: 6 μl of 5X first strand buffer (BRL/Life Tech Cat. #18064-014); 1 μl DTT; 1.5 μl each of 10 mM dGTP, dCTP, and dTTP; 1.5 μl reverse transcriptase (200 units/μl, SuperScript II RT from BRL/Life Tech Cat. #18064-014); and 10 μl 33P dATP at 10 mCi/ (NEN, 33P Cat#612H04029). After incubation at 37°C for 90 min, the probe was purified by a passage through a Bio-Spin 30 Chromatography Column (Bio-Rad Cat. #732-6006), then stored at 4°C until ready to be denatured and added to the pre-hybridized filter).
The filter was rolled and placed in a hybridization bottle containing 25 ml of pre hybridization solution without formamide  and was prehybridized for 3–4 hrs at 65°C in a rotor oven.
Adding the probe to the filter: Once the filter was pre-hybridized, the probe was denatured for 10 min at 95°C and then the entire radiolabeled probe was added directly to the prehybridization mixture (in the bottle with the filer). The hybridization was allowed to proceed for 12–18 hrs.
The filters were washed twice in the pre-warmed (50–55°C) low stringency wash solution (2XSSC, 0.5% SDS, 0.1% Na pyrophosphate) for 15 min each. The filters were then washed for about 2 hrs at 55°C in high stringency buffer (0.1XSSC, 0.5% SDS, O.1% sodium pyrophosphate) with gentle shaking.
Filters were analyzed with a Typhoon 8600 variable mode imager (Amersham Pharmacia Biotech, Inc, Piscataway, NJ) and imaged with the software package Array Vision (Imaging Research Inc., St. Catharines, Ontario, Canada) to correlate spot intensity and filter position. Spots with very low intensity of 1 to 500 were selected at random in order to enrich for cDNAs representing mRNAs of low abundance. Theses clones were reracked into 384-well plates to form library Gm-r1030 and sent for 5' sequencing at the Washington University Genome Center.
Distribution of materials
Upon request, all novel materials described in this publication will be made available in a timely manner for noncommercial research purposes. The cDNA clones are available from the Biogenetic Services, Brookings, SD or the American Type Culture Collection, Manasas, VA. Microarrays are available on a cost recovery basis by contacting Lila Vodkin, University of Illinois.
List of abbreviations
polymerase chain reaction
standard saline citrate
sodium dodecyl sulfate
We gratefully acknowledge Sarah Jones, Anne Marie Boone, Tara Knackstedt, and Ben Pleune for assistance with analysis of the PCR products. We thank Jigyasa Tuteja for critical reading of the manuscript. This work was supported by NSF grant DBI9872565 from the Plant Genome Program (to LOV, ER, RS, JP, PK) and grants from the North Central Soybean Research Program and United Soybean Board for the Public EST Project (to RS, PK, LOV, and ER).
- Velculescu VE, Ahang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science. 1995, 270: 484-487.View ArticlePubMed
- Schena MD, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA. 1996, 93: 10614-10619. 10.1073/pnas.93.20.10614.PubMed CentralView ArticlePubMed
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278: 680-686. 10.1126/science.278.5338.680.View ArticlePubMed
- Marshall A, Hodgson J: DNA Chips: An array of possibilities. Nature Biotechnology. 1998, 16: 27-31. 10.1038/nbt0198-27.View ArticlePubMed
- Shoemaker R, Keim P, Vodkin L, Retzel E, Clifton SW, Waterston , Smoller D, Coryell V, Khanna A, Erpelding J, Gai X, Brendel V, Raph-Schmidt C, Shoop EG, Vielweber CJ, Schmatz M, Pape D, Bowers Y, Theising B, Martin J, Dante M, Wylie T, Granger C: A compilation of soybean ESTs: generation and analysis. Genome. 2002, 45: 329-338. 10.1139/g01-150.View ArticlePubMed
- Walbot V: Genes, genomes, genomics. What can plant biologists expect from the 1998 National Science Foundation Plant Genome Research Program. Plant Physiol. 1999, 119: 1151-1155. 10.1104/pp.119.4.1151.PubMed CentralView ArticlePubMed
- Green P: Documentation for Phrap. Genome Center, University of Washington, [http://bozeman.mbt.washington.edu]
- TIGR Gene Indices The Institute for Genome Research:. [http://www.tigr.org/tdb/tgi]
- Arabidopsis Genome Initiative. Nature. 2000, 408: 796-815. 10.1038/35048692.
- Bonaldo MF, Lennon G, Soares MB: Normalization and subtraction: Two approaches to facilitate gene discovery. Genome Res. 1996, 6: 791-806.View ArticlePubMed
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Zabala GZ, Vodkin LO: Cloning of the pleiotropic T locus in soybean and two recessive alleles that differentially affect structure and expression of the encoded flavonoid 3' hydroxylase. Genetics. 2003, 163: 295-309.PubMed CentralPubMed
- Hong JC, Nagao RT, Key JL: Developmentally regulated expression of soybean Proline-Rich cell wall protein genes. Plant Cell. 1989, 1: 937-943. 10.1105/tpc.1.9.937.PubMed CentralView ArticlePubMed
- Maguire TL, Grimmond S, Forrest A, Iturbe-Ormaetze I, Meksem K, Gresshoff P: Tissue-specific gene expression in soybean (Glycine max) detected by cDNA microarray analysis. J Plant Physiol. 2002, 159: 1361-1374.View Article
- Thibaud-Nissen F, Shealy RT, Khanna A, Vodkin LO: Clustering of microarray data reveals transcript patterns associated with somatic embryogenesis in soybean. Plant Physiol. 2003, 132: 118-136. 10.1104/pp.103.019968.PubMed CentralView ArticlePubMed
- Stacey G, Vodkin L, Parrott WA, Shoemaker RC: National Science Foundation-Sponsored Workshop Report: Draft Plan for Soybean Genomics. Plant Physiol. 2004, 135: 59-70. 10.1104/pp.103.037903.PubMed CentralView ArticlePubMed
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.View ArticlePubMed
- Ewing B, Hillier G, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.View ArticlePubMed
- Soybean Genomics Initiative:. [http://soybean.ccgb.umn.edu]
- Keck Center for Comparative and Functional Genomics, University of Illinois:. [http://www.biotech.uiuc.edu/keck.shtml]
- NCBI dbEST:. [http://www.ncbi.nih.gov/dbEST/index.html]
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31: 365-70. 10.1093/nar/gkg095.PubMed CentralView ArticlePubMed
- Wu CH, Yeh LS, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, Vinayaka CR, Zhang J, Barker WC: The Protein Information Resource. Nucleic Acids Res. 2003, 31: 345-7. 10.1093/nar/gkg040.PubMed CentralView ArticlePubMed
- Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L: Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003, 31: 28-33. 10.1093/nar/gkg033.PubMed CentralView ArticlePubMed
- Shoop E, Silverstein KAT, Johnson JE, Retzel EF: MetaFam: a Unified Classification of Protein Families. II. Schema and Query Capabilities. Bioinformatics. 2001, 17: 262-271. 10.1093/bioinformatics/17.3.262.View ArticlePubMed
- Silverstein KAT, Shoop E, Johnson JE, Kilian A, Freeman JL, Kunau TM, Awad IA, Mayer M, Retzel EF: The MetaFam Server: a comprehensive protein family resource. Nucleic Acids Research. 2001, 29: 49-51. 10.1093/nar/29.1.49.PubMed CentralView ArticlePubMed
- Silverstein KAT, Shoop E, Johnson JE, Retzel EF: MetaFam: a Unified Classification of Protein Families. I. Overview and Statistics. Bioinformatics. 2001, 17: 249-261. 10.1093/bioinformatics/17.3.249.View ArticlePubMed
- Gene Expression Omnibus:. [http://www.ncbi.nlm.nih.gov/geo]
- Soybean Functional Genomics:. [http://soybeangenomics.cropsci.uiuc.edu]
- McCarty DR: A simple method for extraction of mRNA from maize tissue. Maize Genet Coop Newsl. 1986, 60: 61-
- Wang C-S, Vodkin LO: Extraction of RNA from tissues containing highlevels of procyanidins that bind RNA. Plant Mol Biol Rept. 1994, 12: 132-145.View Article
- Hedge P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes J, Snesrud E, Lee N, Quackenbush J: A concise guide to cDNA microarray analysis. Biotechniques. 2000, 29: 548-562.
- Maniatis T, Fritsch E, Sambrook J: Molecular cloning: A laboratory manual. 1982, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.