Microarray construction: primer design, PCR reactions, and arraying
We generated each array element by polymerase chain reaction (PCR) using gene-specific primer pairs (GenSet) selected for each of the predicted and known ORFs in the annotated S. pombe genome sequence [10, 11]. We wrote a Perl script (available at our website: [29]) to batch process EMBL format files for exon selection and primer processing. PRIMER3 [30, 31] was used to determine primer sequences matching defined criterions. The majority of primers were 18–22 bp long with melting temperatures between 58–62°C and GC contents between 40–60%. Primers were selected such that the resulting amplicons were 180–500 bp long and contained 100% exon sequence, and the reverse primers were positioned <2500 bp upstream of the stop codon. All the forward primers had an additional 8 bp universal sequence at their 5' end (5'-TGACCATG-3'), which is not included in above parameters. All primer and amplicon sequences were blasted against the S. pombe genome. Only primers and amplicons that showed no significant similarity to other sequences in the genome were used (i.e., primers with a blast score of <70 and amplicons with a blast score of <400, the latter corresponding to less than ~70% sequence identity). For ~50 genes, we amplified up to 150 bp of 3'- or 5'-untranslated regions to obtain more specific array elements. In a few cases of highly similar genes, we had to use less specific array elements (blast score of <1000 with other sequences in the genome); this affected ~140 genes, including many ribosomal protein and transposon-related genes.
In addition to the predicted ORFs, we amplified fragments of the 11 mitochondrial genes, 19 pseudogenes, various RNA genes (a few genes for ribosomal RNA, tRNAs, and snRNAs as well as 68 other larger genes for 'miscellaneous RNAs' [32]), 114 very hypothetical ORFs, 33 large introns, as well as centromeric repeats and ars elements. The latest microarrays contain elements for 5269 different genes and other genomic features of fission yeast. Some genes are represented by two or more different array elements. We also designed array elements from 22 S. cerevisiae genes showing varying degrees of similarity to S. pombe genes to control for cross-hybridization. The arrays also contain elements for several widely used markers and epitope tagging sequences: Kan-MX, GFP, GST, Myc, and 3HA [33]; TAP [34]; and Pk [35]. A detailed file containing all the primer sequences and parameters is available from our website [29]. PCR products of five genes from the prokaryote Bacillus subtilis were used as control elements on the array (lysA, pheB, dapB, thrB, and trpC). These can be used as positive controls by spiking in a 'cocktail' of the corresponding bacterial mRNAs in known quantities (for details on control genes and preparation of mRNA 'cocktails' by in vitro transcription, see [36]).
PCR reactions were performed in 96-well plates (Costar) using a Tetrad thermocycler (MJ Research). For each array element, two rounds of PCR reactions were performed. For the first PCR reaction, we used gene-specific primer pairs, with forward primers containing an additional universal sequence (see above). As a template, we used genomic DNA prepared with a simple glass bead protocol [33]. To amplify array elements from genes containing only small exons (<250 bp), we used pools of cDNA libraries as a template ([37, 38]; pREP3X: constructed by B. Edgar and C. Norbury; Clontech). PCR products from the first round were used as templates for the second round of PCR reactions, together with gene-specific reverse primers and a universal forward primer containing a 5'-amino modification (5'-GCTGAACAGCTATGACCATG-3'; Oswel). Details of the PCR reaction mixes and cycling parameters are available from our website [29]. All PCR products were checked for single strong bands of expected sizes on 2.5% agarose 1x TBE slab gels. Typically, the failure rate was <3%. Failed PCR reactions were repeated, and new primer sequences were ordered in cases where PCR reactions failed repeatedly. At the time of writing, array elements for all predicted genes had been successfully amplified. The gene-specific primer pairs together with the two sequential and independent PCR reactions make it highly unlikely that array elements are assigned to wrong genes.
Spotting buffer was added to the PCR products at a final concentration of 250 mM sodium phosphate pH 8.5, 0.00025% Sarkosyl, followed by spin filtration using 96-well filtration plates (Millipore). The filtered array elements with spotting buffer (15 μl total volume) were then re-arrayed into 384 well plates (Genetix), snap frozen on dry ice, and stored at -70°C. These array elements were printed without any further purification onto activated amine-binding slides (Codelink, Amersham) using a BioRobotics TAS arrayer with a 48-pin tool. All array elements are printed in duplicate onto each slide (~13,000 spots/slide). The replicate spots are printed in separate halves of the slides and with different spotting pins to obtain two measurements that are as independent as possible [6], and to prevent local depletion of the sample and minimize the chance of losing both measurements of a gene due to local hybridisation problems (unpublished observations). One array of each batch was quality control tested by hybridization. Array elements were dried completely in a vacuum concentrator and stored at -70°C in sealed plates between print rounds. Before printing, array elements were reconstituted by addition of HPLC water (BDH) and left to dissolve o/n at 4°C. Details of the arraying and post-processing procedures are available from the website of the Microarray Facility at the Sanger Institute [36].
RNA isolation from fission yeast
We used the S. pombe wild-type strain 972 h- for all experiments [39]. Standard media and growth conditions were used [40], and cells were harvested from liquid cultures at mid-exponential phase (OD600 0.1–0.4), unless stated otherwise. For the spike-in experiment (Table 2), S. cerevisiae cells (strain AB1380) were grown in YPD medium to OD600 0.3, and RNA was extracted as described below for S. pombe cells.
Cells were harvested either by mild centrifugation (2 min, 800 rcf), and the pellet was snap frozen in liquid nitrogen after discarding the supernatant, or by rapid filtration (Millipore), and the filters were snap frozen in liquid nitrogen after transfer into a 50 ml tube. To see whether these two methods of cell harvesting affect gene expression, we used a microarray to directly compare RNA samples obtained after cell filtration or centrifugation of the same culture grown in EMM medium. The data obtained from the two samples were very similar to each other (SD of signal ratios: 0.08), and only two mitochondrial genes were 2-fold different between the samples. We conclude that the two methods of cell harvesting that we routinely use do not lead to significant differences in gene expression.
Total RNA was isolated from S. pombe cells using a hot phenol method followed by phenol-chloroform extractions, precipitation, and purification using Qiagen RNeasy columns. (We had also experimented with isolating mRNA before labelling, and only a few genes give different results compared to total RNA. Because mRNA isolation requires much larger cell samples and potentially introduces biases, we routinely use total RNA for labelling.) RNA quality was determined by gel electrophoresis and spectrophotometry. A detailed protocol is available from our website [29].
Sample labelling and microarray hybridisation
To generate fluorescently labelled samples for microarray hybridisation, we used a direct labelling protocol. 10–20 μg of total RNA was reverse transcribed into cDNA with Superscript enzyme (GibcoBRL) and an oligo-dT17 primer in the presence of Cy3- or Cy5-dCTP (PerkinElmer). We have also experimented with a mix of random nonamer and oligo(dT) primers for labelling; although this will lead to amplification of non-coding RNAs, it does not give increased background, and significantly improves the signal intensities of most spots. Only a few genes are differentially labelled when comparing the two priming methods. One advantage of using a random primer is that mRNAs without or with short polyA tails will also be represented in the hybridisation. The labelled cDNAs were purified using AutoSeq G-50 columns (Amersham) and precipitation. Hybridization was performed at 49°C in a buffer containing 48% formamide using LifterSlips (Erie Scientific) and a hybridisation oven with humid chamber (Boekel Scientific). Slides were washed at room temperature and stored in the dark for scanning. A detailed protocol for labelling, hybridisation, and slide washing is available from our website [29].
Data acquisition, processing, normalization, and evaluation
Microarrays were scanned using a GenePix 4000 B laser scanner, and fluorescence signals were analysed using GenePix Pro software (Axon Instruments). Array images that did not pass minimal quality thresholds were not used (median signal-to-background >3; median signal-to-noise >5; mean of median background signal <200). Technically flawed spots were removed either automatically by the GenePix software or through manual investigation of the array images, and such spots were flagged as 'absent' in the GenePix results files.
For subsequent data processing and normalization, we developed a Perl script that uses GenePix results files as input (script available from or website [29]). This script discards data from spots with failed or faulty PCR products by masking them 'absent'. Data from spots with low array element concentration (as judged by PCR product staining on gel) or PCR products where the reverse primer is located 2500–3500 bp from the gene end are flagged 'marginal'. All genes on the array are also represented by at least one good array element, and 'marginal' data from sub-optimal array elements are only used as a backup if other data from a given gene are not available. The script also applies cut-off criteria to discard data from weak signals: spots with <50% of pixels >2 SD above median local background signal in one or both channels are flagged 'absent', unless one channel shows >95% of the pixels >2 SD above local background. The SD was calculated using only the lower 55% of the pixel intensities (called SD2 in GenePix Pro), as this measure is less susceptible to being skewed by bright pixels. The script provides a quality control report showing the numbers and percentages of spots discarded during the various steps of the data analysis pipeline as well as data of replicate spots with signal ratios >2-fold different from each other.
The script also performs a local normalization using a sliding square window of spots surrounding each spot. A user-defined minimum number of spots is chosen to be used with which to normalize over (default is 400). The window size default is 16 spots. This means the square contains 33 × 33 spots (1089) surrounding central spots, 33 × 17 spots (561) surrounding spots at the edge of the array, and 17 × 17 spots (289) surrounding spots in a corner of the array. Only spots that are flagged 'present' are used for the normalization. Hence, using a window of 16 means that sometimes, especially for spots close to the corners of the array, less than 400 spots may actually be used for the normalization. In cases where the block size chosen is small, the window size is increased up to a user-defined maximum window size (default is 24) so that at least 600 total spots are in the square. This means the block size used with this window change is larger than may be necessary to optimise the chances of having 400 'present' spots to use for normalization. This is a heuristic to make the algorithm faster for the majority of spots, since counting the number of 'present' spots in the initial square uses a relatively large amount of computational time. If, during normalisation, the number of spots is still found to be less than 400, the window is increased further until the maximum window size is reached. In these cases, the spots that do use less than 400 spots for normalisation are reported in the output log file. The script then calculates a normalization factor such that the median signal ratio of all measurable spots within the square equals 1, and this factor is then used to scale the signal ratio for the central spot. The signal ratios used for normalization correspond to the median of all pixel-by-pixel ratios of signals minus median local background for each pixel of a given spot (called 'median of ratios' in GenePix Pro). This measures ratios more reliably and is less affected by unspecific signals than the 'ratio of medians' (see also [41]). In the rare cases where the 'median of ratios' was zero, the 'ratio of medians' was used instead for data evaluation. Finally, the script averages the normalized data from all replicate spots that produced measurable signal ratios of the same genomic element. These mean normalized ratios were then used for downstream data evaluation and mining using GeneSpring (Silicon Genetics) and SAM [28].
Microarray experiments used in this study
Self-self experiments were performed with RNA isolated from exponentially growing cells, followed by labelling identical samples with both Cy3 and Cy5 fluorochromes and hybridising on the same array (six experiments in total). Self-self experiments were used for data in Figure 1, Figure 2 (right), Figure 5, Figure 6, Figure 7B, Table 2, and Table 3. For experiments showing differential gene expression, we compared samples from cells growing at 25°C vs 30°C (one experiment; used for Figure 2 [left], Figures 3,4,5, and Table 3), samples from cells growing in full vs minimal medium (four experiments; used in Figure 7A and Table 3), as well as samples from cells harvested by centrifugation vs filtration (one experiment; used in Table 3). Some data were acquired from previously published experiments, including samples from meiotic vs vegetative cells ([16]; Table 1) and samples from oxidatively stressed vs unstressed cells ([17]; Table 3).