Microarray probe selection
A total of 22,787 probes were printed on the cotton oligonucleotide microarray, composed of 12,006 oligonucleotides selected from recently reported contigs and singletons derived from a global assembly of cotton ESTs [10], and 9,629 designed against the latest TIGR cotton transcript assembly (CGI8). We also included 1,154 oligonucleotides designed in the Z. J. Chen laboratory based on previously available sequence data [27].
Similar probe design strategies were used for all threes sets of probes. Lee et al. [17] provided a description for the design first set of oligonucleotides. The second and third set of oligonucleotides were designed from the EST exemplar sequences [10] and the TIGR Gene Index [11] with Picky (v 1.0) [18] and Picky (2.0), respectively. Only EST sequences with predicted a protein from ESTScan [28] or a high protein homology (70% percent identity) to an Arabidopsis gene were considered candidates for oligonucleotide selection.
Three different criteria were invoked in the iterative selection of the microarray probes from the EST assemblies: (1) candidate probes identified by Picky by their unique sequences, complexity, and Tm; (2) characterized cotton mRNAs in GenBank and genes of special interest to the cotton community; and (3) complementation to previously synthesized probe set(s). Thus, for each oligonucleotide set, candidate long-oligonucleotide probes (60 – 70 mers) were separately generated based on criteria 1 and 2, then a final list of probes was selected for oligonucleotide synthesis by cross-checking the new list of probes with previously synthesized oligonucleotides.
For design of both the second and third probe sets, we solicited input from the cotton community to identify genes of interest for microarray probe design. Most requests were for known cotton genes with sequences in Genbank, in addition to candidate genes identified in our EST assembly by high homology to genes characterized in other organisms. Probes for these genes were designed with the sequences from the EST assemblies as 'background' to identify the most unique probes possible.
Both transcription factors and genes with little or no annotation were represented on the cotton microarray. Based on widespread interest in transcription factor expression levels, we selected probes targeting genes that had either a transcription related GO ontology [29], or a transcript factor domain as predicted by PFAM [30] (Table 1). Genes with little or no annotation represent a gene discovery component to future microarray experiments. When genes targeted by the 2nd probe second were compared to the Arabidopsis TAIR protein dataset, 1,200 of them did not have a significant BLASTX hit (< 1e-20) but they did have a coding frame as predicted by ESTScan [28].
Oligonucleotide synthesis and microarray printing
Each set of oligonucleotides was synthesized and aliquoted into 3 replicate plates by IDT Technologies (Coralville, IA, USA). An aliquot of 384-well plates from all three sets of oligonucleotides was hydrated in water then diluted to the printing concentration with 3× SSC. Positive and negative controls were included on the printed microarrays. To assess microarray quality, two spots of each oligo from the same pin-dip were printed in separate slide sections on Corning epoxy slides at the Washington University Microarray Core facility using a locally constructed linear servo arrayer (after the DeRisi model [31]) creating the first version of the cotton oligonucleotide microarray [GEO: GPL4305]. After printing, slides were allowed to dry in 50–70% humidity for 12–16 hrs (~25°C) and cross-linked at 150 mJoules. Two slides from the print batch were checked using SpotCheck (Genetix). Printed cotton microarrays, and images of each print batch are publicly available [16]. Experiments using the preliminary platform [GEO: GPL4305] or this new platform [GEO: GPL4808] can be found at GEO [32].
RNA extraction
One leaf and one bud (10 – 14 days before anthesis) tissue sample of G. hirsutum cv. Acala Maxxa were collected from three separate replications of 4 – 8 plants grown in Horticulture Greenhouse at Iowa State University under supplemental lighting (16 hr. days). RNA was extracted from each of the six samples using a modified hot-borate method [33], quantified, and checked for integrity using a Bioanalyzer (Agilent, Inc., Palo Alto, CA, USA). Equimolar amounts of RNA (A260) from three separate extractions were pooled into a single leaf and single bud sample, respectively.
RNA amplificationand labeling
An indirect labeling procedure of amplified aminoallyl.aRNA (TargetAmp™, Epicentre Biotechnologies, Madison, WI, USA) was used for one leaf RNA sample and one bud RNA sample. 0.5 ug of total RNA was used as starting material for 1 round of aRNA amplification, resulting in 26 ug and 51 ug of aRNA from leaves and buds, respectively.
Cy3 and Cy5 dyes (Amersham Biosciences, Pittsburgh, PA, USA) were coupled to two aliquots of 13 and 16 ug of both aRNA samples, respectively. The Cy3- and Cy5-labeled aRNA probes were purified using the Qiagen RNA easy Mini kit (Qiagen, Germantown MD, USA). and sufficient incorporation Cy3 (550 nm) and Cye5 (650 nm) dyes was verified.
Microarray hybridization and image analysis
For microarray hybridization, 300 ng of Cy3 and Cy5 labeled aRNA was used per each slide using the Pronto!™ Plus system protocol (Promega Corporation, Madison WI, USA) with minor changes as described below. Slides from each rep (3) were immersed in 200 ml of Pronto Universal Pre-Soak solution containing 2 ml of liquid Sodium Borohydride for 20 min at 42°C. Slides were transferred to fresh containers with Wash Solution 2 at room temperature for 2 min and then immersed in 200 ml of hybridization buffer (5 × SSC; 0.1 × SDS; BSA 0.1 mg/ml). Slides then were incubated with a fresh Wash Solution 2 at room temperature for 2 min, and were washed 2 additional times with Wash Solution 3 at room temperature for 2 min each. Following immersion in nuclease-free water, slides were dried by centrifugation at 1,600 g for 3 min. All hybridizations and post-hybridization washes were performed exactly as described in the Pronto!™ Plus system protocol.
Microarray images were captured using an arrayWoRx® Biochip Reader (Applied Precision, Issaquah, WA, USA) using an exposure of 0.5 sec for each channel (Cy5 and Cy3) at ~10 um resolution. GenPix® Pro (v 5.1, Molecular Devices, Sunnyvale, CA, USA) was used to extract the background-adjusted intensity of each spot. Features that were 'absent', 'not-found', or that had a negative intensity after background adjustment were excluded from the analysis. Data files from this experiment can be found in GEO data set [GSE5875].
Experimental design and statistical analysis
Three replications of a three treatment loop design (bud → leaf, leaf → bud', and bud' → bud) were hybridized on nine microarrays, where bud and bud' simply represent different aliquots of the same aRNA. The signal intensity data were natural log transformed and median normalized, and the 9,654 genes with complete data were examined for expression differences among the three sample types (leaf, bud, and bud'). We considered a standard mixed linear model for the data from any single gene given by
y
ijk
= μ + δ
i
+ τ
j
+ s
k
+ e
ijk
,
where y
ijk
denotes the normalized log-scale signal intensity (averaged over duplicate spots) for dye i, sample type j, and slide k; μ denotes a an intercept parameter; δ
i
denotes the effect of dye i; τ
j
denotes the effect of sample type j; s
k
denotes the random effect of slide k; and e
ijk
denotes a random error term that is intended to capture all other sources of variability. (Note that although we considered a separate model for each gene, we have suppressed a gene subscript on each term to simplify notation.) Here i = 1, 2 (Cy3 and Cy5); j = 1, 2, 3 (bud, bud', and leaf); and k = 1, ..., 9 (microarray slides 1 – 9). On the basis of this model, t-tests for differential expression between each pair of sample types (leaf vs. bud, leaf vs. bud', and bud vs. bud') were conducted. The 9,654 p-values from each of these comparisons were converted to q-values using the method of Story and Tibshirani [34]. These q-values were used to identify the number of differentially expressed genes for a given comparison when controlling the false discovery rate at various levels.