Improving the statistical detection of regulated genes from microarray data using intensitybased variance estimation
 Jason Comander†^{1, 2, 3},
 Sripriya Natarajan†^{1, 3},
 Michael A GimbroneJr^{1, 2} and
 Guillermo GarcíaCardeña^{1, 2}Email author
DOI: 10.1186/14712164517
© Comander et al; licensee BioMed Central Ltd. 2004
Received: 04 November 2003
Accepted: 27 February 2004
Published: 27 February 2004
Abstract
Background
Gene microarray technology provides the ability to study the regulation of thousands of genes simultaneously, but its potential is limited without an estimate of the statistical significance of the observed changes in gene expression. Due to the large number of genes being tested and the comparatively small number of array replicates (e.g., N = 3), standard statistical methods such as the Student's ttest fail to produce reliable results. Two other statistical approaches commonly used to improve significance estimates are a penalized ttest and a Ztest using intensitydependent variance estimates.
Results
The performance of these approaches is compared using a dataset of 23 replicates, and a new implementation of the Ztest is introduced that pools together variance estimates of genes with similar minimum intensity. Significance estimates based on 3 replicate arrays are calculated using each statistical technique, and their accuracy is evaluated by comparing them to a reliable estimate based on the remaining 20 replicates. The reproducibility of each test statistic is evaluated by applying it to multiple, independent sets of 3 replicate arrays. Two implementations of a Ztest using intensitydependent variance produce more reproducible results than two implementations of a penalized ttest. Furthermore, the minimum intensitybased Zstatistic demonstrates higher accuracy and higher or equal precision than all other statistical techniques tested.
Conclusion
An intensitybased variance estimation technique provides one simple, effective approach that can improve pvalue estimates for differentially regulated genes derived from replicated microarray datasets. Implementations of the Ztest algorithms are available at http://vessels.bwh.harvard.edu/software/papers/bmcg2004.
Background
Biologists can now use microarray technology to determine the expression levels of tens of thousands of genes simultaneously, in less time than it previously took to measure the expression level of a single gene. Currently, cDNA and oligo microarrays can measure a sizeable fraction of all mRNA species in cell or tissue samples. The richness of the resulting data is opening up a new era of systems biology that promises to reveal the complex inner workings of cellular machinery [1]. However, there remains the challenge of processing the microarray data from array images into a format that best facilitates the discovery of new biological insights. In fact, applying improved computational tools to previously published microarray data has led to the discovery of new biology (e.g. [2]). We strongly believe, as do others, that the quality of the data processing steps is critical to the overall success of a microarray experiment [3].
A typical data processing pipeline consists of several steps. (See [4] for a review, and see [5, 6] for a review of microarray processing software.) First, image analysis software locates the arrayed spots in the scanned image, quantifies the foreground and background brightness of each spot, and notes any irregularities in spot morphology. The background intensity value is then subtracted from the foreground intensity value. The backgroundsubtracted intensity data from each array must then be normalized, or rescaled, to remove artifactual differences in signal brightness due, for example, to different labeling efficiencies that produced arrays of different overall intensity. Normalization techniques are often based on the assumption that a large number of spots will have similar expression levels between conditions. Curvefitting techniques, such as a locally weighted regression, are used to equalize expression values between arrays, or between array channels for twocolor arrays [7, 8]. After this normalization, the intensity values can be used by a variety of algorithms for detecting differences in expression between the measured biological conditions. This processing is applied whether two samples are compared directly or a "reference sample" experimental design is used. In a reference sample design, the same reference RNA sample is hybridized to one channel of all arrays, and the other channel is hybridized with each individual experimental sample. This design is often used when multiple biological conditions are being investigated and it becomes impractical to perform every pairwise combination of conditions directly [4, 9].
Accurately detecting differentially regulated genes
Given a list of normalized intensity values across various biological conditions, the next step is to determine which genes are differentially regulated among the conditions being studied. In the early days of microarray experimentation, an emphasis was placed on analyzing the data using exploratory data mining techniques, such as hierarchical clustering [10] and selforganizing maps [11]. Clustering algorithms measure the similarity between observed gene regulation patterns across the various conditions, and assemble clusters such that similarly regulated genes are grouped together. The resulting clusters produce an effective overview of the data, showing which of the many possible patterns of regulation are actually present in the data. Since these patterns are somewhat robust, a few erroneous spots are unlikely to change them dramatically. For a researcher who is simply interested in the overall pattern of the data, performing replicate arrays to reduce the number of errors is not particularly efficient. Many researchers choose instead to explore a greater number of experimental conditions.
Increasingly, microarrays are being used in a different context; researchers want to know with high confidence which specific genes are regulated across a small number of experimental conditions (e.g., treatment vs. control, or mutant vs. wildtype). To answer this question, it becomes extremely important to use an accurate method to rank individual genes by their probability of truly being regulated, especially since this information may be used to plan more laborintensive experiments around biological questions raised by a small number of such putatively regulated genes. In the absence of replicate arrays, the reliability of the data can be estimated (e.g. [12, 13]), but such "single slide" methods require a model of the expected noise characteristics of the system, a property that can potentially change between datasets. Performing replicate arrays can significantly improve predictions of differentially regulated genes, thereby decreasing the false positive (false detection) rate and false negative rate [9, 14, 15]. Using replicate arrays allows the calculation of more accurate significance estimates (pvalues) that will aid in the interpretation of a list of "top regulated genes," which are commonly ranked by ratio alone.
Here we address the problem of accurately detecting genes that are significantly differentially regulated between a pair of biological conditions, given microarray datasets with a small number of replicates (e.g. N = 3 arrays). If the number of replicates were very large (e.g., hundreds), the task would be relatively easy; since the ratio of expression levels between the two conditions would be well estimated by the average ratio or median ratio, the genes could simply be ranked by one of these estimates. In practice, however, the number of replicate arrays is rarely greater than 3, and estimates of average expression ratios are not always sufficiently accurate to predict which genes are truly regulated. The variation of a measured expression ratio is critical in determining whether the observed ratio is due to random measurement fluctuations or to a true difference between the quantities being measured. Genes with larger measured expression ratios between conditions are more likely to be truly regulated, while genes whose ratios have a high measured variance are less likely to be truly regulated. This idea can be expressed mathematically as a test statistic where the numerator contains an estimate of the size of the effect, i.e. the ratio of gene expression intensities between conditions, and the denominator includes an estimate of the variance, i.e. the standard deviation of the ratio. A variety of such statistical tests have been applied to microarray data (reviewed in [4, 16]); the challenge is to choose the numerator and denominator of the test statistic such that it makes the best use of all available data in order to get the most accurate determination of which genes are most likely to be regulated.
Comparing statistical tests used to find differentially regulated genes
An alternative to using a penalized tstatistic is obtaining a more precise estimate of the standard deviation σ_{M}. Such an estimate should be less susceptible to a chance concordance of measurements of M that occasionally produces an extremely low σ_{M} and a high tstatistic. For this purpose, knowledge of the relationships between the data points can be used to improve the estimate. Namely, the variance values, or σ_{M} ^{2}, for one spot can be pooled, or smoothed, with the σ_{M} ^{2} values of spots that are likely to have similar variances. The variance of microarray data has often been observed to be a function of the spot intensity [12, 15, 21–30], raising the possibility that the variances of individual spots can be pooled with those of spots of similar intensity to produce a more precise estimate of the standard deviation. Several studies have taken into account this intensitydependent heteroscedasticity. For example, Rocke et al. [27] and Newton et al. [13] have presented models of measurement error in microarrays that can explicitly take into account higher variance at lower expression levels. More general approaches to variance pooling have been implemented in a variety of ways, using loessbased curve fits [15], robust nonparametric spline fits [28] and sliding windows for calculating either local averages [26, 29, 30] or interquartile ranges [24]. These more reliable estimates of the standard deviation can be used directly to calculate Zstatistics, which are calculated according to the same formula as the standard tstatistic, but correspond to lower pvalues [26, 31].
Strategies for pooling standard deviations
This study expands upon previous work on intensitydependent variance estimation for microarray data by introducing a new metric, I_{min}, for pooling standard deviations. We evaluate the performance of the I_{avg} and I_{min} metrics by explicitly comparing the reproducibility and accuracy of the Zstatistics calculated using these two metrics. We also compare the performance of the Zstatistics to the performance of other statistical techniques in current use, the standard and penalized ttests. Finally, we extend our technique for pooling standard deviations to twocolor microarray data from a reference sample experimental design.
Results
Datasets
The analyses in this study were performed on five different datasets. Datasets 1–4 use the direct comparison experimental design, i.e. labeled cDNA from two biological conditions, "X" and "Y," were cohybridized onto a single array. Each dataset was generated from a different biological experiment using twocolor Agilent cDNA arrays. For Datasets 1–3, the biological experiment, RNA processing and array hybridization were repeated three times. Dataset 4 contains 23 replicate arrays (see Methods). Dataset 5 uses a reference sample design, where RNA from each experimental condition is cohybridized on an array with a standardized reference RNA sample. Dataset 5 contains three replicates arrays for each experimental condition.
Average logged intensity (I_{avg}) vs. minimum logged intensity (I_{min}) pooling metric
Mean residual errors for spots with σ_{M}' > σ_{M} and σ_{M}' < σ_{M}, using I _{avg} or I _{min} pooling metric.
Agilent Feature Extraction  SPOT Processing  Agilent FG + SPOT BG  

σ_{M} ^{2}' > σ_{M} ^{2}  σ_{M} ^{2}' < σ_{M} ^{2}  σ_{M} ^{2}' > σ_{M} ^{2}  σ_{M} ^{2}' < σ_{M} ^{2}  σ_{M} ^{2}' > σ_{M} ^{2}  σ_{M} ^{2}' < σ_{M} ^{2}  
I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  
Dataset 1  0.28  0.28  0.31  0.31  0.19  0.20  0.22  0.25  0.19  0.20  0.22  0.25 
Dataset 2  0.24  0.23  0.25  0.25  0.15  0.15  0.16  0.17  0.15  0.15  0.16  0.17 
Dataset 3  0.20  0.18  0.21  0.18  0.09  0.09  0.09  0.09  0.09  0.09  0.09  0.09 
Dataset 4 #1  0.15  0.14  0.17  0.15  NA  NA  NA  NA  NA  NA  NA  NA 
Dataset 4 #2  0.20  0.17  0.23  0.19  NA  NA  NA  NA  NA  NA  NA  NA 
Dataset 4 #3  0.45  0.24  0.49  0.23  NA  NA  NA  NA  NA  NA  NA  NA 
Dataset 4 #4  0.21  0.20  0.23  0.21  NA  NA  NA  NA  NA  NA  NA  NA 
Dataset 4 #5  0.25  0.19  0.28  0.20  NA  NA  NA  NA  NA  NA  NA  NA 
Dataset 4 #6  0.18  0.17  0.21  0.18  NA  NA  NA  NA  NA  NA  NA  NA 
Comparing the accuracy of different ranking statistics
Accuracy of each test statistic when compared to a "gold standard" tstatistic.
M_{avg}  t  d_{90th percentile}  d_{SAM}  Z(I_{avg})  Z(I_{min})  

Dataset 4 #1  0.69  0.09  0.80  0.80  0.79  0.83 
Dataset 4 #2  0.66  0.08  0.78  0.78  0.79  0.82 
Dataset 4 #3  0.54  0.04  0.79  0.70  0.77  0.84 
Dataset 4 #4  0.68  0.02  0.80  0.79  0.80  0.83 
Dataset 4 #5  0.64  0.06  0.80  0.74  0.78  0.84 
Dataset 4 #6  0.67  0.01  0.80  0.80  0.79  0.83 
Comparing the reproducibility of different ranking statistics
Reproducibility of each test statistic when used on replicate datasets.
M_{avg}  t  d_{90th percentile}  d_{SAM}  Z(I_{avg})  Z(I_{min})  

R ^{2}  Dataset 4 #1  0.89  0.00  0.86  0.87  0.93  0.94 
Dataset 4 #2  0.90  0.00  0.87  0.87  0.93  0.93  
Dataset 4 #3  0.89  0.00  0.88  0.88  0.95  0.95  
Slope  Dataset 4 #1  0.90  0.00  0.97  1.63  1.00  1.00 
Dataset 4 #2  0.93  0.01  0.96  0.58  1.02  1.04  
Dataset 4 #3  1.00  0.07  0.96  0.74  1.09  1.08 
The R^{2} values for the two Zstatistics were similar to each other and consistently higher than those of the other techniques. Nonparametric measures of correlation, the Spearman Rho and Kendall Tau rank correlation coefficients, were also higher for both Zstatistics than any of the other statistics for all three pairs of subsets (data not shown). All three calculations of the slope coefficients for both Zstatistics, as well as M_{avg} and the 90^{th} percentile penalized tstatistic, are close to 1, indicating that the overall magnitudes of the Zstatistics are consistent across datasets, whereas the standard tstatistic and the SAM penalized tstatistic produced test statistics whose overall magnitudes vary across the subsets.
Outlier detection
The accuracy of this outlier detection technique was also evaluated by comparing the Zstatistic to t_{gold} using Dataset 4. Fig. 7c plots Z(I_{min}) vs. t_{gold} for Dataset 4 set #2. The outliers, which are highlighted, include false positive spots for which t_{gold} is low and Z(I_{min}) is high, although not all such points are detected as outliers. Fig. 7d is an identical plot to Fig. 7c except that the Zstatistics for the outlier spots are calculated using the highervalued measured standard deviation σ_{M} instead of the pooled value σ_{M}'. The outliers are now mostly clustered around the origin with the other nonsignificant spots. A few spots with moderately high t_{gold} values are detected as outliers and have low corrected Zstatistics, and some potential false positives with high Zstatistic values and low t_{gold} values are not detected as outliers.
At the end of the analysis, the outliercorrected Z statistics are converted to pvalues. To demonstrate the additional information that the pvalues provide, Fig. 7b shows a scatterplot of X vs. Y for Dataset 1, with statistically significant spots colored according to their multipletestcorrected pvalues (see Methods). Spots with similar ratios may have different pvalues due to their different standard deviations. In addition, after outlier detection, some spots with high ratios are not found to be significant.
Analysis of reference sample arrays
The techniques used above for a direct comparison experimental design were extended to a reference sample design (see Methods). Under a reference sample design, one can estimate either the standard deviation of the individual logged ratios comparing experimental samples to the reference sample, M_{x} and M_{y}, or the standard deviation of the paired differences of these logged ratios, μ = M_{x}M_{y}. Under the first, or unpaired method, the Zstatistic is calculated as where N_{X} and N_{Y} are the number of replicates for the given spot for condition X and condition Y, respectively. Under the second, or paired method, the Zstatistic is calculated as where N is the number of paired replicates for the spot. The samples used in a reference sample design may not always have been collected or processed in pairs, so we evaluated both of these methods.
Linear regression slope coefficients calculated between the corresponding Zstatistics using independent or pairwise analysis.
Agilent Feature Extraction  SPOT  Agilent FG + SPOT BG  

I_{avg}  1.70  1.68  1.70 
I_{min}  1.59  1.61  1.63 
Mean residual errors for spots with σ^{2}' > σ^{2} and σ^{2}' < σ^{2}, using I _{avg} or I _{min} pooling metric.
Agilent Feature Extraction  SPOT Processing  Agilent FG + SPOT BG  

σ^{2}' > σ^{2}  σ^{2}' < σ^{2}  σ^{2}' > σ^{2}  σ^{2}' < σ^{2}  σ^{2}' > σ^{2}  σ^{2}' < σ^{2}  
I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  I_{avg}  I_{min}  
σ_{M} ^{2} Cond. X  0.19  0.18  0.21  0.20  0.14  0.14  0.15  0.15  0.13  0.13  0.14  0.14 
σ_{M} ^{2} Cond. Y  0.22  0.18  0.25  0.23  0.16  0.14  0.17  0.17  0.15  0.13  0.16  0.16 
σ_{μ} ^{2}  0.19  0.18  0.22  0.19  0.13  0.13  0.14  0.14  0.12  0.12  0.12  0.12 
Discussion
Building up new knowledge about biological systems is the ultimate purpose of microarray experiments, but all such insights have to be built on a solid foundation to be accurate and useful. Proper normalization of data and accurate detection of which genes are regulated are vital to the success of downstream exploration of microarray data. Even for exploratory cluster analyses, the genes that are significantly regulated must be selected beforehand. This task of detecting these genes is a difficult statistical problem; a statistical hypothesis is made for each of tens of thousands of genes tested, but only a small number of replicate arrays are available to test those hypotheses. The statistical methods presented in this study attempt to draw as much information as possible out of a small number of array replicates to determine which genes are likely to be regulated.
It is clear that looking at the measurements of each gene in isolation can produce a test with low statistical power (e.g. using the standard ttest, Fig. 1). To improve statistical power, we can use knowledge about the relationships among the many thousands of points in the arrays. Specifically, we group together spots that have similar standard deviations and then pool together many less accurate estimates of standard deviation into a single, more accurate estimate. Our data also show that the Zstatistics are more precise than either standard or penalized tstatistics for detecting differential gene expression in microarray data. We further demonstrate that pooling standard deviations using the minimum intensity metric produces Zstatistics that are more accurate than the standard ttest, the penalized ttests, and the average intensitybased Zstatistic.
Average combined logged intensity (I_{avg}) vs. minimum logged intensity (I_{min}) pooling metric
We evaluated two different intensitybased metrics for pooling standard deviations. There are many reports that the variance is a function of intensity, but the exact shape of this relationship could depend on many factors extrinsic to the biological experiment, such as the array technology being used, the signaltonoise ratio of the data, the similarity between the two conditions[30], the normalization technique or the background subtraction technique. For this reason, we favor an estimation of the standard deviation using a curvefitting technique rather than a fixed model based on previous data. Furthermore, when dealing with twochannel arrays, there are two different intensity values associated with each replicated spot. It is possible that the variation is best described as a function of the average intensities of both channels. However, our own experience and many other reports suggest that the highest variances are often seen for low intensity spots. If so, the variance may be better described as a function of the minimum intensity over all the spots.
The data presented here show that the mean residual errors are either equal or lower when using the I_{min} compared to the I_{avg} pooling metric, for every dataset using the Agilent Feature Extraction image processing technique. The subset of Dataset 4 for which this difference is most striking, #3 in Table 1, also has a population of spots with particularly high variance (see Fig. 4). The I_{avg} metric pools these spots together with other spots that have a much lower variance. In contrast, the I_{min} metric moves these spots to the low end of the xaxis, and the curve fit tracks the standard deviation of the spots much better. The noisiest spots on microarrays are often those where at least one channel is "blank", i.e. a noisy, low level of signal that presumably represents no expression. The I_{min} metric is better at grouping such spots together. For datasets with low background levels, there is a smaller difference in the performance of the two pooling metrics.
The trends in the mean residual errors from the unpaired reference sample analysis agree with the results from the direct comparison analyses. This similarity is to be expected, since processing each reference sample condition separately is equivalent to doing a direct comparison between each condition and reference RNA samples. Both pooling metrics generate similar mean residual error values when pooling σ_{μ}, but one dataset is not enough to make any generalizations about which pooling metric will perform best for all paired reference sample datasets. The improved performance of the I_{min} pooling metric is lost when using SPOT processing or combined Agilent foreground and SPOT background image processing, suggesting that these image processing techniques may be more effective at removing noise at low intensities.
The I_{avg} and I_{min} pooling techniques are reproducible to the same degree, since their R^{2} coefficients between Zstatistics from paired datasets (see Table 1) are similar to each other. The I_{min} pooling technique generates slightly more accurate results, as indicated by the greater R^{2} coefficients between Z(I_{min}) and t_{gold} compared to those between Z(I_{avg}) and t_{gold} (see Table 2). This trend holds for all six subsets of Dataset 4.
The higher accuracy of Z(I_{min})
The Zstatistic calculated using the I_{min} pooling metric provides an improvement in accuracy over the other techniques. The tstatistic derived from datasets with 20 replicates was used as a surrogate "gold standard" since 8 or more replicates can be considered sufficient to give power to the tstatistic [17]. The tstatistic was chosen as the "gold standard" instead of the average logged ratio since the latter does not take variability into account. For each of the six permuted subsets of Datasets 4, the 90^{th} percentile penalized tstatistic, SAM penalized tstatistic, and Z(I_{avg}) had similar R^{2} values when correlated with the "gold standard" tstatistic, although the SAM statistic did perform poorly for the noisiest subset of Dataset 4 (#3 in Table 2) with an R^{2} value of only 0.70. Z(I_{min}), however, consistently produced the highest R^{2} value for each of the six datasets. Since the ratios used in each of these statistics is identical, this result indicates that the standard error generated with the I_{min} technique produces the best correlation with the gold standard tstatistic based on 20 replicates. Although excluding spots with very low intensity could eliminate the difference in performance between the I_{min} and I_{avg} pooling metrics, this approach would make it impossible to detect lowexpressed regulated genes, which may be biologically significant.
The Zstatistics from the I_{min} technique do not correlate perfectly with the "gold standard" tstatistic, however. Some disagreement can be expected because the Z(I_{min}) data was based on only three replicate arrays, which contain much less information than the 20 replicates used to calculate the "gold standard" tstatistic. Also the significance estimates calculated using the "gold standard" tstatistic may still contain some inaccuracies, even with 20 replicates. Kerr et al. found this to be true with 12 replicates, where accuracy is reduced if the error distribution for each gene is modeled separately instead of using a pooled estimate [15]. Analyzing the large (N = 20) replicate dataset using robust estimators of ratio and standard deviation may be able to create a more accurate "gold standard" to use for further testing of the Zstatistic or other statistics. Note that we do not employ an explicit permutationbased approach to estimate the false detection rates of the statistics investigated in this study, as in Ref. [16]. Rather than permute gene labels from a small set of arrays to estimate the distribution of expected test statistics, with the availability of the large (N = 23) replicate dataset described herein, we preferred to use this rich source of actual test statistics directly.
The higher reproducibility of zstatistics
The Zstatistic – calculated with either pooling the I_{min} or I_{avg} pooling metric – provides an appreciable improvement in reproducibility over the average logged ratio alone, the standard ttest and the 90^{th} percentile and SAM penalized tstatistics. Both linear (R^{2}) and nonparametric rank correlation coefficients were highest for the Zstatistic when comparing corresponding spots between three independent pairs of replicate datasets. Also, the standard tstatistic and SAM penalized tstatistic generate linear regression slope coefficients that vary greatly from pair to pair, indicating that their absolute magnitude is not as reproducible as the Zstatistics, whose linear regression slope coefficients are much closer to 1.
The high correlation values and nearunity slope coefficients for the Zstatistic support the hypothesis that pooling the standard deviations of spots with similar intensities provides a stable, precise estimate of the standard deviation. This assumption of a wellestimated standard deviation supports the use of the Gaussian distribution to map the Zstatistic to a pvalue. Using only the measured standard deviation, one is forced to use a tdistribution with only 2 degrees of freedom to generate a pvalue. This test does not have sufficient power to generate any significantly regulated points; because of the very small number of degrees of freedom, not a single spot seen in Fig. 1a is found to be significant after multiple test correction. In contrast, even after a conservative multiple test correction that makes the cutoff for statistical significance much more stringent, many spots are found significant using the Zstatistic. The penalized tstatistics do not produce a stable estimate of the standard deviation with these data, perhaps because the constant added to the denominator of the test statistic showed a large variation between replicate datasets. Therefore they cannot be mapped to a pvalue in a reproducible manner.
Outlier detection
One limitation of using a pooled standard deviation is that for a spot with replicate ratios that include one or more outliers, the appropriately high measured standard deviation will be replaced by an inappropriately low pooled standard deviation. This substitution could produce a false positive result. We have sought to minimize this limitation by implementing an overlying outlier detection algorithm. (For other implementations of outlier detection, see Ref. [26, 30].) The algorithm in this study uses the measured standard deviation instead of the pooled standard deviation for spots for which the pooling model may not hold. These spots are identified as ones for which residual error σ–σ' is positive and greater than twice the standard deviation of the positive residual errors.
The measured standard deviations for these outlier points are valid sample measurements of the variance process and should be used to calculate the pooled standard deviations for spots with similar intensities. These ratio measurements, however, are too widely varying for one to have the same confidence in the average ratio as one would have for other spots; thus, it is appropriate to substitute the measured standard deviation for the pooled standard deviation in these cases. Fig. 7c,7d, which highlight outlier spots on a plot of the Z(I_{min}) vs. the "gold standard" tstatistic for Dataset 4b, show that this outlier detection technique correctly detects many of the presumably false positive spots that have a high Zstatistic and low t_{gold} value. The plots also show some false positive spots that are not detected through this algorithm, as well as a few spots that become false negatives after outlier detection. Other, more complex outlier detection algorithms may perform better, and should be explored. A simple modification to the current algorithm, using local instead of global estimates of the standard deviation of the residual error, may improve outlier detection. Alternative implementations include modifying the pooling window shape to give more weight to a spot's measured standard deviation or that of its nearest neighbors by intensity. Strictly speaking, the pvalues for outlier spots should be calculated using a tdistribution instead of a Gaussian distribution since the measured standard deviation is being used. We have shown, however, that with 3 replicates, no spots in our datasets can be found statistically significant using the ttest and strict multiple test correction. In order to preserve detection of spots, we continue to use the Gaussian distribution to convert outlier Zstatistics to pvalues, which may slightly increase the false positive rate for spots detected as outliers. In practice, however, such spots are rarely found to be significantly regulated.
Unpaired vs. paired analysis for reference sample experiments
Finally, we have extended our algorithms to apply to data from a reference sample experimental design. This design gives one the flexibility to compare many different conditions to one another, but the tradeoff is a loss in precision. In theory, using a reference sample design instead of a direct comparison design should increase the variance by a factor of 2. This increase has in fact been observed in practice [33].
The paired analysis method can reduce the measured variation in a reference sample design. The linear regression slope coefficients in Table 1 indicate that the Zstatistic values using the paired analysis are higher than the unpaired Zstatistic values. Thus, the paired difference of logged ratios, μ, is less variable than the independent logged ratios, M_{X} and M_{Y}. This observation suggests that the effects of biological or analytical variation from replicate to replicate can be reduced if comparisons are made between paired samples. Whether this reduction is due to using paired biological samples or paired array processing dates [34] is still an open question, and probably will be contextdependent. Although it may not always be practical, it would be beneficial for investigators to design reference sample experiments to be performed in parallel whenever possible to take advantage of the lower standard deviations produced by paired analysis.
Finding the optimal statistical test
Several areas remain for further refinement of our implementation of poolingbased statistical analysis of microarray data. Currently, the standard deviation is pooled using a simple moving rectangular window of 501 spots, but other window sizes and shapes may improve performance slightly. More generally, we have not explicitly compared the moving average estimator with the splinefit or loessbased techniques to estimate the standard deviation used in other studies (see Background). While we expect performance to be similar, further testing may reveal an advantage.
Following Ref. [35], we do not try to estimate the dyespecific bias of individual spots or genes (i.e., dyegene interaction) in order to preserve degrees of freedom needed to estimate the variance. Informally we noted that dye bias in some spots produced high measured variances that caused those spots to be considered nonsignificant outliers. A posthoc test to warn of potential dye bias of individual spots may be appropriate for small numbers of array replicates (e.g. N = 3), especially if the experimental design is unbalanced (i.e., the number of dyeswapped and unswapped arrays is not equal).
Note that this study only considered statistics of the general form (ratio) / (standard deviation). ANOVA models that consider the variance as intensitydependent, as seen in Ref. [15, 25], can be seen as an extension of this concept. An ANOVA framework, however, also allows for a more complicated experimental model that can incorporate normalization and multiple biological conditions. Pooling standard deviations as a function of minimum intensity instead of average intensity may benefit such models. Permutation tests can also be used to detect regulated genes, and are known to be robust to outliers but can have low power for small N. Xu et al. found a permutation test to be equally or less accurate than parametric methods in ranking genes [36]. Bayesian analysis can also be applied to microarray data [13, 20, 21], and may be useful in this context to draw more information out of the distribution of intensities and ratios in the data.
In this study, data is first normalized, and then detection of regulated genes is performed in a separate step. In contrast, other approaches incorporate normalization and statistical inference into a unified model [29, 35]. Furthermore, the options for normalizing the data are numerous, including algorithms based on local regression (loess) [7], splines [37], a constant shift [15], or more exotic transforms that tend to remove the intensity dependence of the variance [38]. Increased attention to the lowlevel details of scanning and image processing may also improve accuracy [22, 33, 39], while at the same time potentially changing the intensity dependence of the variance. It remains to be seen how the techniques used for normalization or variancestabilizing transforms will impact the accuracy and precision of regulated gene detection. In addition, we are concerned that some of these transforms may create a systematic bias for or against genes of low intensity (e.g., [40]).
Test performance can depend on data characteristics
Although many datasets have a variance that is intensitydependent [12, 15, 21–26], some studies have analyzed datasets whose variance characteristics are not strongly intensitydependent (e.g., [35]). In general, we have experienced that microarray datasets with a low background relative to signal, loessbased normalization, and conservative background subtraction (e.g. SPOT Image Processing) produce standard deviations that are not strongly intensitydependent. In this context, the differences between the I_{min} and I_{avg} metrics disappear. In fact, for data with unusually low noise, the standard deviations is nearly constant across all spots and all of the statistical tests considered in this paper, even simply the average logged ratio, tend to converge. This observation is not unexpected; as the standard deviations converge to the same value, the denominator of the test statistics will become constant, leaving the test statistics simply proportional to the ratio. We would recommend finding a normalization [7, 29, 33, 37] and background subtraction technique [22, 32, 39] that produces low, intensityindependent standard deviations. Applying variance stabilizing transforms may eliminate the intensity dependence of the standard deviation [38], but might also reduce statistical power or bias the test toward spots of certain intensities. It cannot be predicted in advance whether all intensity dependence of the variation will be removed, so we continue to use the more robust statistic Z(I_{min}) for all of our datasets. Furthermore, in situations where changing the background subtraction or normalization technique is not possible because the original data is not available, using a more robust statistic like Z(I_{min}) will be advantageous.
While the pooling techniques described herein can compensate for intensitydependent variation, this intensity dependence can be minimized or exaggerated by different normalization techniques and background subtraction techniques. These techniques may have subtle effects on the power to detect regulated genes at different intensities, perhaps creating bias for or against detection of lowexpressed genes. For this reason, until the most sensitive and unbiased normalization and background subtraction methods are optimized for each microarray system, we would encourage creators of microarray data archives to preserve unnormalized intensity and background data, and the original image data when possible.
Of the many useful tests used to detect regulated genes from a small number of microarray replicates, we see the intensitybased variance estimation and Zstatistic described in this study to be a good combination of simplicity, robustness, precision, and accuracy. This technique allows meaningful pvalues to be added to a list of regulated genes. With this assessment of statistical significance, an investigator can proceed to focus on genes that are most likely to be regulated.
Methods
Data acquisition
For Datasets 1–3, microarrays were prepared essentially according to the manufacturer's instructions [41]. Briefly, 20 μg of total RNA were directlabeled with Cy3 and Cy5, and labeled cDNAs were hybridized overnight to Agilent Human 1 cDNA arrays (G4100a, Agilent Technologies, Palo Alto, CA) containing 16,142 features representing approximately 10,500 unique genes. After washing, the microarrays were scanned in an Agilent model G2505A microarray scanner.
Dataset 3 contains 3 replicate twocolor arrays with condition X in the Cy5 channel and condition Y in the Cy3 channel. Dataset 1 contains 3 replicates from another experiment, including one dyeswapped array; i.e. condition X in the Cy3 channel and condition Y in the Cy5 channel. Dataset 2 contains 3 replicate arrays without dyeswap, but each array was hybridized with a different amount of RNA, 5, 10 or 20 μg.
Dataset 4 consists of 23 replicate Agilent cDNA arrays from the Alliance for Cellular Signaling. The files MAE030201N00.txt to MAE030223N00.txt were downloaded from http://www.signalinggateway.org/data/micro/cgibin/microcond.cgi. These arrays correspond to the conditions "Bcell + SIMDM exposure = 0 minutes" vs. "Spleen". Four additional arrays are available for this condition (numbered MAE02070xN00.txt), but these arrays appeared to be slightly different from the other 23 arrays (using hierarchical clustering, data not shown) and were excluded from further analysis. The Bcell RNA was derived from 23 preparations, each from a different set of mice, while the spleen RNA was drawn from a single large pool (Rebecca Hart, Alliance for Cellular Signaling at the California Institute of Technology, Pasadena, CA, USA, personal communication).
Reference sample Dataset 5 is a set of 6 microarrays generated in our laboratory. Each of the arrays contains a reference RNA sample in the Cy3 channel. Three have condition "X" samples in the Cy5 channel and the other three have condition "Y" samples in the Cy5 channel. Since corresponding biological specimens for conditions X and Y were prepared together for each replicate, a natural pairing exists for the condition X and Y arrays.
Computer techniques
Statistical modules were programmed in Perl v5.8. Microsoft Visual Basic 6.0 was used to integrate the image processing and statistical modules.
Image processing
For Datasets 1–3 array images were processed using Agilent Feature Extraction software version A.6.1.1. The Feature Extraction Software provides normalized Cy3 and Cy5 channel intensity values for each spot on an array (in the gProcessedSignal and rProcessedSignal fields of the output files). The default settings were used for all options. Quality control algorithms in the software detect unusual (poor quality) spots; spots were excluded from analysis that contained a nonzero value any of the following fields: IsSaturated, IsFeatNonUnifOL, IsBGNonUnifOL, IsFeatPopnOL, IsBGPopnOL, IsManualFlag. For a detailed description of the Agilent Feature Extraction software and the algorithms it uses, see the Agilent Feature Extraction Version 6.1 Users' Manual. Briefly, Agilent Feature Extraction determines the foreground value for each channel based on the pixel values in a fixedsize circle centered on each spot. The median of pixel values in a concentric ring around the circle, with an excluded region between the outer boundary of the circle and the inner boundary of the ring, gives the spot background value. The raw spot value is calculated as its foreground value less its background value. A surrogate raw value is assigned when the foreground value does not exceed the background value by two standard deviations of the spot's background pixel values. Intensitybased normalization between channels using a linear regression and a lowess curvefit technique is then applied to remove any systematic dye incorporation biases.
Images were also processed using SPOT (CSIRO, New South Wales, Australia)[32], an Rbased implementation which uses seeded region growing to determine the foreground pixels for each spot and morphological opening to determine the background value for each spot. The raw spot values, foreground less the background values, are normalized between channels using an intensitybased Loess implementation in R available in the maNorm function of the marrayNorm package of the opensource Bioconductor software http://www.bioconductor.org. We considered three image processing techniques: Agilent Feature Extraction output alone, SPOT output alone with maNormbased normalization and Agilent foreground (gMedianSignal and rMedianSignal columns) less SPOT background (morphG and morphR columns) with maNormbased normalization.
Pooled standard deviations – direct comparison design
The pooled logged ratio standard deviation, σ'_{M}, was calculated by sorting all the spots by the average logged intensity or the minimum logged intensity I_{min} across both channels of all replicates and then taking the square root of the moving average of the variance σ_{M} ^{2} with a window of 501 spots. We averaged the variance instead of the standard deviation, since averaging the standard deviation directly will produce a negatively biased (~13%) estimate for N = 3 [42]. The Zstatistic was then calculated as . Note that I_{avg} and M as defined above are equivalent to the symbols Ā and M, respectively, as used in other studies [17]. The common "MA plot" would be called an "MI plot" using the notation of this study.
Pooled standard deviations – reference sample design
Three pairs of arrays were processed for each reference sample experiment. For the unpaired analysis, the arrays within a given condition were linearly normalized to each other, in order to map intensities from different replicates onto similar scales without altering the absolute ratio values (as described above). For each condition, the mean M_{avg} and sample standard deviation σ_{M} of the logged ratio were calculated for each feature. The pooled standard deviation of the logged ratio, σ'_{M}, was calculated by sorting all the spots by the average intensity, I_{avg}, or the minimum intensity, I_{min}, across both channels of all replicates for the condition and then taking the square root of the moving average of the variance σ_{M} ^{2}, with a window of 501 spots, centered on the given spot. The Zstatistic was calculated as where N_{X} and N_{Y} are the number of replicates for the given spot for condition X and condition Y, respectively.
For the paired reference sample analysis, the intensity vectors were all linearly normalized to the vector for the first replicate array of condition X to put all intensity values from both conditions on the same scale without changing the value of the ratios. Then the paired difference of logged ratios μ = M_{X}  M_{Y} for each pair of replicates was computed. The mean and sample standard deviation of μ was then calculated across replicates. The pooled standard deviation of μ, σ'_{μ}, was calculated by sorting all the spots by the average intensity I_{avg} or the minimum intensity I_{min} across both channels of all replicates for both conditions, and then taking the square root of the moving average of the variance σ_{μ} ^{2}, with a window of 501 spots. The Zstatistic was calculated as where N is the number of paired replicates for the spot.
To compare Zstatistic values between the paired and unpaired methods, the linear regression slope coefficient with intercept set to 0 was calculated between corresponding Zstatistics from the two methods.
Calculation of pvalues
For a Zstatistic Z, the twotailed pvalue is given by 1  2Φ(Z), where Φ is the cumulative distribution function for the zeromean, unitvariance Gaussian. The pvalue is corrected for multiple tests using Sidak's formula, p' = 1(1p)^{L}, where L is the total number of spots being examined. Note that we did not find it necessary to use more sophisticated means of controlling the error rate [43, 44], as we are primarily concerned with ranking regulated genes and not in establishing firm statistical cutoffs.
Calculation of standard tstatistics and penalized tstatistics
Outlier detection
When outlier detection was enabled, Zstatistics were calculated using the measured standard deviation instead of the pooled standard deviation for outlier spots. Outliers were determined by calculating σ_{ε}, the standard deviation of the residual error ε = σ  σ' for spots with σ > σ'. Spots for which ε > 2σ_{ε} were treated as outliers, similar to [26]. The measured standard deviations for the outlier points were considered to be valid sample measurements of the variance process and were not excluded from the calculation of the pooled standard deviations for spots with similar intensities.
Comparison of Zstatistic and penaltybased statistics
In order to test the reproducibility of different test statistics (Fig. 6), two sets of three arrays were randomly selected from the 23 replicate arrays in Dataset 4. For both of these subsets, we calculated the several different test statistics described above. For each gene, the value of each of the test statistics from one 3array subset was compared to the corresponding value from the other subset, using the squared Pearson's linear correlation coefficient, R^{2}, and two nonparametric, rankbased correlation coefficients, Spearman Rho and Kendall Tau, which were calculated using JMP (SAS Inc., Cary, NC). This entire process was repeated twice with the remaining arrays in Dataset 4, yielding a total of three independent comparisons. In total, six nonoverlapping sets of three arrays – 18 arrays in all – were drawn from the original pool of 23 arrays, leaving 5 arrays that were not used in this analysis. As the sets are nonoverlapping, each comparison is based on independent data.
In order to evaluate the accuracy of the different test statistics, we compared these statistics to an approximate "gold standard" measure (see Fig. 5). 3 arrays were randomly selected from the 23 arrays in Dataset 4; the other 20 were used to calculate "gold standard" tstatistics to which the results from the n = 3 dataset could be compared. The R^{2} value and the linear regression slope coefficient with intercept set to 0 were calculated between the corresponding experimental statistic and "gold standard" tstatistic for each gene. Only spots for which there were at least 15 replicates in the "gold standard" set of arrays were used. This process was repeated on a total of 6 random subsets.
Notes
List of abbreviations used
 X_{avg} :

Average logged intensity in channel X
 Y_{avg} :

Average logged intensity in channel Y
 M_{avg} :

Average ratio
 σ_{M} ^{2} :

Variance of the average ratio
 σ_{M} :

Standard deviation of the average ratio
 σ_{M}':

Pooled standard deviation of the average ratio
 I_{avg} :

Average logged intensity
 I_{min} :

Minimum logged intensity
 σ_{M}'(I_{avg}):

Pooled standard deviation of the average ratio, pooled using I_{min}
 σ_{M}'(I_{min}):

Pooled standard deviation of the average ratio, pooled using I_{min}
 μ:

Paired difference of ratios in a reference sample experiment
 σ_{μ}':

Pooled standard deviation of the paired difference of ratios
 Z(I_{avg}):

Zstatistic calculated with standard deviation pooled using average logged intensity
 Z(I_{min}):

Zstatistic calculated with standard deviation pooled using minimum logged intensity
 N:

Number of replicate arrays
 t:

Standard tstatistic
 d_{90th percentile} :

Penalized tstatistic calculated using 90^{th} percentile s_{0}
 d_{SAM} :

Penalized tstatistic calculated using Significance Analysis of Microarrays
 t_{gold} :

"Gold standard" tstatistic calculated using 20 replicate arrays
 R:

Pearson's correlation coefficient
Declarations
Acknowledgements
We would like to thank John Aach, Adnan Derti, and Yonatan Grad for a critical review of this manuscript, and Sandrine Dudoit for helpful discussions. We are grateful to the Alliance for Cellular Signaling for publicly releasing one of the microarray datasets used in this study. This work was supported by the Leet and Patterson Trust (GGC) and NIH P50HL56985 (MAG).
Authors’ Affiliations
References
 Segal E, Yelensky R, Koller D: Genomewide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics. 2003, 19 Suppl 1: I273I282. 10.1093/bioinformatics/btg1038.View ArticlePubMedGoogle Scholar
 Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet. 1999, 22: 281285. 10.1038/10343.View ArticlePubMedGoogle Scholar
 Tilstone Claire: Vital statistics. Nature. 2003, 424: 610612. 10.1038/424610a.View ArticlePubMedGoogle Scholar
 Smyth GK, Yang YH, Speed T: Statistical issues in cDNA microarray data analysis. Methods Mol Biol. 2003, 224: 111136. 10.1385/159259364X:111.PubMedGoogle Scholar
 Comander J, Weber GM, Gimbrone M. A., Jr., GarciaCardena G: Argusa new database system for Webbased analysis of multiple microarray data sets. Genome Res. 2001, 11: 16031610. 10.1101/gr.186601.PubMed CentralView ArticlePubMedGoogle Scholar
 Dudoit S, Gentleman RC, Quackenbush J: Open source software for the analysis of microarray data. Biotechniques. 2003, Suppl: 4551.PubMedGoogle Scholar
 Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e1510.1093/nar/30.4.e15.PubMed CentralView ArticlePubMedGoogle Scholar
 Kepler TB, Crosby L, Morgan KT: Normalization and analysis of DNA microarray data by selfconsistency and local regression. Genome Biol. 2002, 3: RESEARCH003710.1186/gb200237research0037.PubMed CentralView ArticlePubMedGoogle Scholar
 Churchill GA: Fundamentals of experimental design for cDNA microarrays. Nat Genet. 2002, 32 Suppl: 490495. 10.1038/ng1031.View ArticlePubMedGoogle Scholar
 Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci U S A. 1998, 95: 1486314868. 10.1073/pnas.95.25.14863.PubMed CentralView ArticlePubMedGoogle Scholar
 Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A. 1999, 96: 29072912. 10.1073/pnas.96.6.2907.PubMed CentralView ArticlePubMedGoogle Scholar
 Chen Y, Kamat V, Dougherty ER, Bittner ML, Meltzer PS, Trent JM: Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics. 2002, 18: 12071215. 10.1093/bioinformatics/18.9.1207.View ArticlePubMedGoogle Scholar
 Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol. 2001, 8: 3752. 10.1089/106652701300099074.View ArticlePubMedGoogle Scholar
 Pan W, Lin J, Le CT: How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. 2002, 3: research0022PubMed CentralPubMedGoogle Scholar
 Kerr MK, Afshari CA, Bennett L, Bushel P, Martinez J, Walker NJ, Churchill GA: Statistical Analysis of a Gene Expression Microarray Experiment with Replication. 2001, Bar Harbor, The Jackson Laboratory, [http://www.jax.org/staff/churchill/labsite/research/expression/niehs.pdf]Google Scholar
 Broberg P: Statistical methods for ranking differentially expressed genes. Genome Biol. 2003, 4: R4110.1186/gb200346r41.PubMed CentralView ArticlePubMedGoogle Scholar
 Dudoit S, Yang YH, Callow MJ, Speed T: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica. 2002, 12: 111139.Google Scholar
 Lonnstedt I, Speed T: Replicated microarray data. 2001, Uppsala, Uppsala University, [http://www.stat.berkeley.edu/users/terry/zarray/TechReport/Baypap4d.pdf]Google Scholar
 Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001, 98: 51165121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMedGoogle Scholar
 Efron B, Tibshirani R, Storey JD, Tusher V: Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc. 2001, 96: 11511160. 10.1198/016214501753382129.View ArticleGoogle Scholar
 Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t test and statistical inferences of gene changes. Bioinformatics. 2001, 17: 509519. 10.1093/bioinformatics/17.6.509.View ArticlePubMedGoogle Scholar
 Colantuoni C, Henry G, Zeger S, Pevsner J: SNOMAD (Standardization and NOrmalization of MicroArray Data): webaccessible gene expression data analysis. Bioinformatics. 2002, 18: 15401541. 10.1093/bioinformatics/18.11.1540.View ArticlePubMedGoogle Scholar
 Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109126. 10.1016/S00928674(00)000155.View ArticlePubMedGoogle Scholar
 Baggerly KA, Coombes KR, Hess KR, Stivers DN, Abruzzo LV, Zhang W: Identifying differentially expressed genes in cDNA microarray experiments. J Comput Biol. 2001, 8: 639659. 10.1089/106652701753307539.View ArticlePubMedGoogle Scholar
 Coombes KR, Highsmith WE, Krogmann TA, Baggerly KA, Stivers DN, Abruzzo LV: Identifying and quantifying sources of variation in microarray data using highdensity cDNA membrane arrays. J Comput Biol. 2002, 9: 655669. 10.1089/106652702760277372.View ArticlePubMedGoogle Scholar
 Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 Suppl: 496501. 10.1038/ng1032.View ArticlePubMedGoogle Scholar
 Rocke DM, Durbin B: A model for measurement error for gene expression arrays. J Comput Biol. 2001, 8: 557569. 10.1089/106652701753307485.View ArticlePubMedGoogle Scholar
 Nadon R, Shi P, Skandalis A, Woody E, Hubschle H, Susko E, Rghei N, Ramm P: Statistical inference methods for gene expression arrays. 2001, St. Catharines, Imaging Research Inc., [http://imaging.brocku.ca/PDF_files/AST_Technicalnote.pdf]Google Scholar
 Tsodikov A, Szabo A, Jones D: Adjustments and measures of differential expression for microarray data. Bioinformatics. 2002, 18: 251260. 10.1093/bioinformatics/18.2.251.View ArticlePubMedGoogle Scholar
 Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S, Sharov V, Saeed AI, White J, Li J, Lee NH, Yeatman TJ, Quackenbush J: Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol. 2002, 3: research0062PubMed CentralPubMedGoogle Scholar
 Cheadle C, Vawter MP, Freed WJ, Becker KG: Analysis of microarray data using Z score transformation. J Mol Diagn. 2003, 5: 7381.PubMed CentralView ArticlePubMedGoogle Scholar
 Yang YH, Buckley MJ, Dudoit S, Speed T: Comparison of Methods for Image Analysis on cDNA Microarray Data, Technical report #584. 2000, 2003: [http://www.stat.berkeley.edu/users/terry/zarray/TechReport/584.pdf]Google Scholar
 Dudley AM, Aach J, Steffen MA, Church GM: Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range. Proc Natl Acad Sci U S A. 2002, 99: 75547559. 10.1073/pnas.112683499.PubMed CentralView ArticlePubMedGoogle Scholar
 Livesey FJ, Furukawa T, Steffen MA, Church GM, Cepko CL: Microarray analysis of the transcriptional network controlled by the photoreceptor homeobox gene Crx. Curr Biol. 2000, 10: 301310. 10.1016/S09609822(00)003791.View ArticlePubMedGoogle Scholar
 Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol. 2000, 7: 819837. 10.1089/10665270050514954.View ArticlePubMedGoogle Scholar
 Xu R, Li X: A comparison of parametric versus permutation methods with applications to general and temporal microarray gene expression data. Bioinformatics. 2003, 19: 12841289. 10.1093/bioinformatics/btg155.View ArticlePubMedGoogle Scholar
 Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild HH, Nielsen C, Brunak S, Knudsen S: A new nonlinear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002, 3: research004810.1186/gb200239research0048.PubMed CentralView ArticlePubMedGoogle Scholar
 Durbin B, Rocke DM: Estimation of transformation parameters for microarray data. Bioinformatics. 2003, 19: 13601367. 10.1093/bioinformatics/btg178.View ArticlePubMedGoogle Scholar
 Qian J, Kluger Y, Yu H, Gerstein M: Identification and correction of spurious spatial correlations in microarray data. Biotechniques. 2003, 35: 424, 46, 48.PubMedGoogle Scholar
 Manduchi E, Grant GR, McKenzie SE, Overton GC, Surrey S, Stoeckert C. J., Jr.: Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics. 2000, 16: 685698. 10.1093/bioinformatics/16.8.685.View ArticlePubMedGoogle Scholar
 Agilent Fluorescent Direct Label Kit Protocol Rev. 2.1. 2003, Agilent Technologies, [http://www.chem.agilent.com]G255598003
 Sokal R, Rohlf FJ: Biometry: the principles and practice of statistics in biological research. 2000, W.H. Freeman and Co., 3rd ed.; page 53: page 53Google Scholar
 Ge Y, Dudoit S, Speed T: Resamplingbased multiple testing for microarray data hypothesis. Test. 2003, 12: 177.View ArticleGoogle Scholar
 Dudoit S, Shaffer JP, Boldrick JC: Multiple Hypothesis Testing in Microarray Experiments. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 110. 2002, [http://www.bepress.com/ucbbiostat/paper110]Google Scholar
 Significance Analysis of Microarrays. [http://wwwstat.stanford.edu/~tibs/SAM/]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.