Contig assembly for unigene selection
Raw sequence files of the 5' soybean EST data from Washington University or 3' data from the University of Illinois Keck Center were produced from sequence traces using the Phred base calling program [18, 19]. The sequences were trimmed for leading and trailing vector and linker sequences and artifact E. coli sequences were removed. Quality checks included determining the number of ambiguous 'N' base calls in a sequence and trimming the leading and trailing poor quality (high-N) sections to obtain the best subsequence where the number of Ns was 4% or less of the total bases. The EST sequences were clustered into contig sets based on sequence overlap using the program Phrap . The processing and analysis results for each sequence are displayed on a set of World Wide Web pages . The distribution of sequence lengths in each submission set are displayed in histograms. The base call and quality information for each sequence in a submission are displayed in artificial gel images of the sequences. Each sequence is displayed as the raw sequence before vector filtering and the cleaned sequence after vector filtering. A color-coded sequence quality graph shows the part of a sequence retained after trimming as well as the regions trimmed for low quality, polyA or polyT, and vector sequences. Blast reports for each sequence are displayed and can be searched collectively for words or phrases of interest. Contig sequences and images of the contig assemblies are displayed on linked web pages along with graphs describing the contig qualities .
Clone reracking and 3' sequencing
Soybean cDNA clones corresponding to the 5' most representative member of a contig or to a singleton were selected using Oracle database tables and SQL queries. The E.coli stocks representing those clones were reracked into new 384 well plates to form the sequence driven reracked libraries Gm-r1021 (4,089 cDNA clones), Gm-r1070 (9,216 cDNA clones) and, Gm-r1083 (4,992 cDNA clones). Initially, these were reracked from source 384-well plates to destination 384-well plates by Genome Systems (St. Louis, MO) using a Qbot and shipped on ice to the University of Illinois for extraction and 3' sequencing. Reracked library Gm-r1088 (9,216 cDNA clones) was reracked at the University of Ilinois Keck Center using a QPix robot, (Genetix, New Milton, Hampshire UK). Growth rates for the E.coli stocks were over 99.5%.
The cDNA libraries were all constructed in either pSPORT 1 (Invitrogen, Carlsbad, CA) or pBluesciptII SK (+) (Stratagene, La Jolla, CA) plasmid vectors in DH10B host cells. Each 384-well plate of a bacterial library was split into four 96-well, 2 ml block plates, each corresponding to a different quadrant (A1, A2, B1, B2) and grown overnight in 1 ml LB media with 100 μg/ml ampicillin. High quality DNA templates were purified using a QIAGEN BioRobot 9600 or BioRobot 8000 with QIAprep 96 Turbo miniprep kits (QIAGEN, Germantown MD). Dideoxy terminator sequencing reactions for the 3' ends of the soybean cDNA clones were conducted by the University of Illinois Keck Center for Comparative and Functional Genomics  using standard methods analyzed either on gel-based ABI 377 or capillary-based ABI3700 instruments. Inserts within each vector type can be sequenced from the 5' end using the M13 reverse primer and the 3' end using the M13 universal forward primer. However, for higher success rates at the 3' end, a degenerate primer consisting of [5'-TTTTTTTTTTTTTTTTTT(A/C/G)-3'] was employed in order to enhance the success of 3' sequencing reactions by eliminating the need to sequence through the poly A tail. The primer was synthesized and purified by HPLC (Qiagen Operon, Alameda CA) to remove shorter, incomplete primers. Using high quality Qiagen purified cDNA templates, the average 3' untrimmed read length was over 600 bases with a success rate of 80 to 85%. Original sequence trace files are available by ftp from the University of Illinois Keck Center . The trimmed sequences were entered into Genbank . The reracked 5' and 3' sequences were analyzed by both the CAP3  and Phrap programs .
All cDNA clones of the low redundancy reracked 'unigene' sets are available to the public through Biogenetic Services, Inc., Brookings, SD, or the American Type Culture Collection, Manasas, VA.
Annotation of the unigene cDNAs using BLASTX
The 5' and 3' sequences of the 27,513 unigene cDNAs clones were annotated using BLASTX against the nonredundant (nr) protein database with cutoff E value of 10E-6. The top blast hit was used as the annotation for each of the 5' and 3' ESTs represented in the unigene sets printed on the microarrays.
In some cases, the protein family assignments were also made using the Metafam program based on a BLASTX analysis against a protein sequence database consisting of a non-redundant set of sequences from SwissProt & TrEMBL , PIR & NRL , GenPept , and Integrated Genomics, Inc. (Chicago, IL). Each of the protein sequences in this database is also placed in a protein family in the MetaFam database [26–28]. The results from each BLASTX report were parsed and placed in an Oracle 8i database. The strong protein sequence hits from BLASTX are matched up to the MetaFam protein families to which those protein sequences belong.
Amplification of cDNAs and preparation for use in microarray construction
All pipetting steps involved in amplifying the cDNAs by PCR, purification of the cDNAs, and assembling them into 384-well spotting plates were conducted with a Multimek TM 96 Automated pippetor (Beckman Instruments, CA) to reduce errors associated with manual pipetting.
The same Qiagen plasmid DNA templates that were prepared for the 3' sequencing by a Qiagen robot at about 100+ ng/μl were also used for PCR amplification using Taq polymerase (Invitrogen, Carlsbad, CA), universal forward and reverse primers in 96 well plates using the MJ DNA Engine Tetrad (MJ Research, Waltham, MA). Four PCR reaction plates are prepared at a time, one from each quadrant of a 384-well library plate. A master mix consisting of final concentrations of 1X PCR buffer (20 mM Tris-HCl, pH 8.4, 50 mM KCl), 2 mM MgCl2, 0.25 mM each of dGTP, dATP, dTTP, dCTP, 1 μM of M13 universal primer, 1μM of M13 reverse primer, and 0.05 U/μl of Taq polymerase (Invitrogen, Carlsbad, CA, cat no. 18038-042) was prepared and 48 μl were aliquoted into each well of a 96 well PCR reaction plate (MJ Research MSP-9621). A 0.5 μl aliquot of an undiluted plasmid template DNA was aliquoted into the 48 μl of master mix. The plates were briefly centrifuged for 1 min at 1500 rpm and placed into an MJ PTC-200 DNA Engine for 1 min of denaturation at 94°C, and 28 cycles of 92°C for 30 sec, 56°C for 45 sec, and 72°C for 30 sec and a final extension of 72°C for 5 minutes. A typical yield from the PCR was about 30–100 ng/μl.
The PCR products were loaded into Millipore multiscreen plates (Millipore #MANU 03050) and were subjected to a vacuum applied at 15 psi for about 10 min until the wells were completely empty. Then 60 μl of sterile water were added to each well using the Multimek automated pipettor and the PCR products were washed. The purified products were eluted in sterile water, retrieved and then stored in 96 well plates at -20°C. A 1 μl aliquot of each well from 3 rows from each 96 well plate is run on a gel to check the quality of the PCR and purification of the cDNA. The yield after purification was between 30 and 40 μls with concentrations around 15–50 ng/μl.
Spotting plate assembly
The four quadrants were then reassembled into a 384-well spotting plate containing 6 μl per well: 4.5 μl of PCR product from the 96 well plates mixed with 1.5 μl of 4X Micro Spotting Solution Plus (MSP4X, Telechem, Sunnyvale, CA). Alternatively, in earlier prints, the spotting plates were assembled at a final concentration of 3X SSC, 0.01% N-lauroylsarcosine by mixing 3.5 μl of purified PCR product with 1.5 μl of 10X SSC, 0.033% Sarkosyl, pH 7.0 (1.5M NaCl, 0.15 M citric acid, trisodium salt, 1.12 mM N-lauroylsarcosine, Sigma L-9150).
A set of 9,216 prepared cDNA inserts from 24, 384-well spotting plates were single spotted onto amine coated glass slides (1 in × 3 in, Telechem Superamine, SMM slides, Telechem International, Sunnyvale, CA) using a Cartesian PyxSys 5500 robot (Genomic Solutions, Ann Arbor, MI) equipped with 16 quill pins (ChipMaker II from Telechem International) and an environmental chamber. The cDNAs were printed at 55% ± 5% relative humidity setting within the chamber and in a room that was controlled for humidity to be between 45 and 60% using room dehumidifiers as needed. Control of humidity was critical for printing.
All arrays contained 32 grids of spots arranged in an 8 × 4 matrix. Each grid had 19 rows and 16 columns of spots for a total of 9,728 spots per array. A total of 9,216 spots were the cDNAs prepared from the 'unigene' set to form 18 of the 19 rows with 288 spots per grid. After all of the 9,216 cDNAs were printed, an additional row of 16 spots was printed as the first row of each grid for a total of 32 grids × 16 spots = 512 additional spots. These cDNAs were printed from the choice clone spotting plate designated Gm-b10BB which contained 64 hand-picked clones. Thus, the 64 hand-picked, choice clones were printed 8 times each, i.e., each clone was printed in twice in four separate grids. In addition, since the Gm-r1021 library contained only 4089 cDNAs, an additional 135 were repeated in order to obtain an even 9216 cDNAs for printing when combined with the Gm-r1083 unigene set.
The three microarray platforms were entered in the Gene Expression Omnibus database  with platform accession numbers GPL229 for Gm-r1070, GPL1013 for Gm-r1021+Gm-r1083, and GPL1012 for Gm-r1088. Complete tables of sequence identifiers and accession numbers for the unigene cDNAs printed on arrays as illustrated in Table 2 are available .
Construction of the 'choice' clone PCR plate for repetitive spotting
To construct the choice plate Gm-b10BB, 64 clones were chosen to be used as negative and positive controls for expression analysis in all microarray slides. These 64 clones were chosen to represent certain constitutively expressed genes, or other markers for particular tissues, and is also highly representative of key genes of the soybean flavonoid pathway.
The 64 clones were hand picked and grown over night in microfuge tubes containing 100 μl of YT media at 37°C, 250 rpm. The following day, microfuge tubes containing 200 μl of YT supplemented with 100 μg/ml ampicillin and 8% glycerol were inoculated with 5 μl from the previous culture and grown over night at 37°C and 250 rpm.
To create a 96 well plate of these E.coli stocks, 100μl of the previously grown culture were transferred to wells in columns A1 thru H8 and stored at -80°C. Wells in columns A9 thru H12 were left empty. A small database for the Gm-b10BB plate was prepared containing the name of each gene, its sequence, accession number, and the corresponding well in the Gm-b10BB plate. To make a replicate copy for sequencing, 100 μl of YT supplemented with 100 μg/ml ampicillin and 8% glycerol were inoculated with 5 μl of the -80°C E. coli stock and incubated overnight at 37°C and 250 rpm. Miniprep DNAs were isolated and sequenced at the University of Illinois Keck Center using a 5' M13 primer. The identity of each clone was confirmed by comparison of the sequences obtained from the Keck center with the sequences contained in our previously prepared database by using the Pairwise Blast tool available at the NCBI web page. All sequences showed >97% identity with the corresponding sequence in the database.
PCR amplification using the DNA miniprep plate as a source for templates was performed with the Mutimeck 96 automated pipetor (Beckman) as described above. All PCR products were purified and separated in 1% agarose gel to evaluate the purity of the amplified DNAs and determine their size. The purified and analyzed PCR products from the 96 well plate, Gm-b10BB, were used to assemble a 384 well spotting plate. The 384 well spotting plate contained 6 μl per well: 4.5 μl of PCR product from the 96 well plate aliquoted on each of the 4 quadrants and mixed with 1.5 μl of 4X Micro Spotting Solution Plus (MSP4X, Telechem, Sunnyvale, CA) after assembly.
After all slides were printed, the cDNAs were UV-cross linked to the slide coating with 650 m Joule ultra violet light using a StrataLinker (Stratagene, La Jolla, CA). [Note: prior to cross-linking the spots were rehydrated if necessary. Rehydration was required for the slides printed with the SSC-Sarkosyl spotting solution but was not required for those printed with Telechem spotting solution. DNA spots were rehydrated by passing the slide over a gentle vapor of steam for a few seconds until spots glistened but did not coalesce and then were quick dried on a 70°C heating block]. To remove excess spotted DNA as well as to denature attached DNA to single strands, slides were treated with the following series of washes with agitation: 2 min with 200 mls of 0.2% SDS, two 1 min water rinses, 95°C water for 2 min, 0.2% SDS for 1 min, and finally two water rinses of 1 min each. Slides were subjected to low speed centrifugation for 2 min at 500 rpm to dry and were stored in a slide rack in a dust free container.
Plant material and RNA isolation and labelling
Seed coats and cotyledons were dissected from plants grown to maturity in soil in the greenhouse. Roots and leaves were collected from soybean plants grown for 11 days after germination in an aerated hydroponic solution with normal nutrient conditions. Total RNA was extracted using phenol-chloroform and lithium chloride precipitation methods [31, 32]. RNA was further purified by use of RNeasy Mini or Maxi columns Qiagen, Valencia, CA) according to the manufacturer's instructions. Prior to labelling, the purified RNA was concentrated in a Speed Vac (Savant Instruments, Halbrook, NY) or by using YM-30 Microcon column (Millipore, Bedford, MA).
For each RNA probe, 50 to 60 μg of purified total RNA was labeled by reverse transcription in the presence of Cy3- or Cy5-dUTP . Briefly: the RNA and 5μg oligo-dT 18–21 mer (Operon, Qiagen) were annealed in a 10μl volume at 70°C for 10 min and cooled on ice. A 20 μl cocktail containing 1X first strand reaction buffer, 10 mM DTT, 0.5 mM each of dATP, dCTP, dGTP, 0.2 mM dTTP, 100μM Cy3- or Cy5-dUTP (Amersham, Pharmacia) and 400 U of 200 U/μl SUPERSCRIPT™II (Invitrogen, Carlsbad, CA, cat no. #18064-014) was added to 10μl of the denatured RNA and oligo-dT mixture). The 30 μl reaction was incubated for 1 hr at 42°C, after which 200 additional units of SUPERSCRIPT™II were added and incubation was continued for another hour at 42°C. The reaction was then treated with RNAse A and RNAse H (0.5μg and 1.0 U respectively, Invitrogen, Carlsbad, CA) for 30 min at 37°C to degrade the RNA. The resulting Cy3 and Cy5-labeled cDNAs were paired and mixed together according to the intended experiment and unincorporated nucleotides were removed using a PCR cleaning kit (Qiagen, Valencia, CA). Cleaned probes were concentrated in a SpeedVac (Savant Instrument, Holbrook, NY) for approximately 5 min to a volume of less than 32 μl prior to being used in hybridization to one array.
Microarray hybridization reactions
The microarray slides were prehybridized by incubation in 5X SSC, 0.1% SDS, 1% BSA at 42°C for 45 to 60 min. For each slide, the labeled cDNA probe was brought to 30.5 μl with the addition of sterile water. A 1.5 μl aliquot of 10 μg/μl polyA was added and the probe was denatured at 95°C for 3 min. An equal amount (32 μl) of pre-warmed 2X hybridization buffer (50% formamide, 10X SSC, 0.2% SDS,  was added to the mixture and the probe was pipetted between the pre-hybridized slide and the cover slip (LifterSlip, Erie Scientific Company, Portsmouth, NH). The slide was placed in a hybridization chamber (Corning, New York, NY) and incubated overnight for 16–20 hrs at 42°C. The next day the cover slip was removed and the slide was washed once in 1X SSC, 0.2% SDS prewarmed to 42°C; once in 0.2X SSC, 0.2% SDS at room temperature; and once in 0.1X SSC at room temperature. The washes were conduced with gentle shaking at 100 rpm for 5 min. Slides were subjected to low speed centrifugation for 2 min at 500 rpm to dry.
Scanning, quantitation, and normalization
The hybridized slides were scanned with a ScanArray Express fluorescent microarray scanner (Perkin Elmer Life Sciences, Boston, MA) and their fluorescence quantified by ScanArray Express software or by GenePix Pro 3.0 (Axon Instruments, Union City, CA). A perl program was written for post analysis processing of the quantitated image files from the Scan Array Express or GenePix Pro3.0. Local background was subtracted from each spot intensity. Spots showing signal intensities below the 95th percentile of the background distribution in the Cy3 or Cy5 channel were filtered out. The ratio of Cy5 mean to Cy3 mean (r) was computed and used to adjust the Cy3 values to Cy3 X sqrt(r) and the Cy5 values to Cy5/sqrt(r). A between-replicate correction was made using an ANOVA model, which equalized average grid or slide intensities between replicates, for Cy3 and Cy5 separately. The ratio of the resulting adjusted intensities of Cy5 to Cy3 was computed for each spot. The coefficient of variation (standard deviation/mean) across replicates was calculated for each spot to evaluate repeatability of the hybridizations.
High density filter hybridization and selection of weakly expressing cDNAs
High density nylon filters containing 18,432 non-sequenced cDNA clones from the cDNA library Gm-c1007 made from immature cotyledons were spotted using a Qbot by Incyte Genomics. Before use, the filters were washed in0.5% SDS solution that was heated to 60°C, poured over the membrane, and gently agitated for five minutes. This will rid the filter of any residual debris and will result in a cleaner hybridization.
Radiolabelling of probe
Total mRNA from developing cotyledons was labeled with 33P-dATP in the following manner: RNA in 8 μl water (up to 5 μg, but generally 2 to 3 μg of mRNA) was combined with 4 μl Oligo dT (0.5 μg/ul, 70 μM, Sigma Lot 29H9065). The mixture was heat treated for 10 min at 70°C and chilled on ice before adding the following: 6 μl of 5X first strand buffer (BRL/Life Tech Cat. #18064-014); 1 μl DTT; 1.5 μl each of 10 mM dGTP, dCTP, and dTTP; 1.5 μl reverse transcriptase (200 units/μl, SuperScript II RT from BRL/Life Tech Cat. #18064-014); and 10 μl 33P dATP at 10 mCi/ (NEN, 33P Cat#612H04029). After incubation at 37°C for 90 min, the probe was purified by a passage through a Bio-Spin 30 Chromatography Column (Bio-Rad Cat. #732-6006), then stored at 4°C until ready to be denatured and added to the pre-hybridized filter).
The filter was rolled and placed in a hybridization bottle containing 25 ml of pre hybridization solution without formamide  and was prehybridized for 3–4 hrs at 65°C in a rotor oven.
Adding the probe to the filter: Once the filter was pre-hybridized, the probe was denatured for 10 min at 95°C and then the entire radiolabeled probe was added directly to the prehybridization mixture (in the bottle with the filer). The hybridization was allowed to proceed for 12–18 hrs.
The filters were washed twice in the pre-warmed (50–55°C) low stringency wash solution (2XSSC, 0.5% SDS, 0.1% Na pyrophosphate) for 15 min each. The filters were then washed for about 2 hrs at 55°C in high stringency buffer (0.1XSSC, 0.5% SDS, O.1% sodium pyrophosphate) with gentle shaking.
Filters were analyzed with a Typhoon 8600 variable mode imager (Amersham Pharmacia Biotech, Inc, Piscataway, NJ) and imaged with the software package Array Vision (Imaging Research Inc., St. Catharines, Ontario, Canada) to correlate spot intensity and filter position. Spots with very low intensity of 1 to 500 were selected at random in order to enrich for cDNAs representing mRNAs of low abundance. Theses clones were reracked into 384-well plates to form library Gm-r1030 and sent for 5' sequencing at the Washington University Genome Center.
Distribution of materials
Upon request, all novel materials described in this publication will be made available in a timely manner for noncommercial research purposes. The cDNA clones are available from the Biogenetic Services, Brookings, SD or the American Type Culture Collection, Manasas, VA. Microarrays are available on a cost recovery basis by contacting Lila Vodkin, University of Illinois.