### Fold change, equal variance, and data transformation

The ratio of the expression levels across two conditions is called fold change (FC); it has been used in the early comparative experiments [13, 14]. This criterion is arguable since, depending on the decision-makers, choosing cutoff FC is arbitrary. Furthermore, the FC method does not take into account the variability with gene expression measurements, or, even worse, it assumes that the variability for all expression measurements is the same, which is likely to be false for most gene expression experiments. However, FC criteria have their own advantages. First, they are biologically meaningful and easily interpreted. Second, more importantly, many studies have shown that FC-based methods, if used appropriately, outperform other methods [15–19].

One way to obtain equal variance from gene to gene is to transform the data, usually with a logarithmic transformation. After this transformation, a FC (log scale) can be calculated from the difference of means across two conditions. However, different data sets may require different variance-stabilization transformations. Several variance-stabilization and normalization transformation methods, which try to transform expression values to be equal variance and normally distributed for each gene, have been proposed [19–23].

### Model-based information sharing (MBIS)

MBIS makes the assumption that an appropriate data transformation is available and has been applied to the raw gene expression data. This transformation has furthermore stabilized the variance. Therefore, the variance for each gene is a constant, denoted by *s*^{2}, after transformation. If we can estimate *s*^{2} from data, then we can calculate p-value easily for each gene.

### Estimation of *s*^{2}

Suppose there are *n*_{1} and *n*_{2} replicates for condition one and two, respectively, and *G* genes to be tested. Under the assumptions of normality and equal variance, the estimated variance from each individual gene is an unbiased estimate for *s*^{2} and has a Chi-square distribution with degrees of freedom *n*_{1} + *n*_{2} - 2. Therefore the average of the estimated variances from all genes is also an unbiased estimate for *s*^{2}:

{\stackrel{\u0304}{s}}^{2}=\frac{1}{G}\sum _{i=1}^{G}{\stackrel{\u0304}{s}}_{i}^{2}

(2)

where {\stackrel{\u0304}{s}}_{i}^{2} is the estimated variance from individual gene *i* and *G* is the number of genes. Then we use the square root of {\stackrel{\u0304}{s}}^{2}, \stackrel{\u0304}{s}, as the estimated standard variance for each gene. From the equal variance assumption, we can use a normal distribution to approximate the mean difference of non-DE genes:

d~N\left(0,{s}^{2}\left(\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}\right)\right)

(3)

Based on this normal distribution, we calculate the p-value for gene *i*:

{p}_{i}=2*\left(1-\Phi \left(\frac{\left|{d}_{i}\right|}{\stackrel{\u0304}{s}\sqrt{\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}}}\right)\right)

(4)

where *d*_{
i
} is the difference of the means for gene *i* across two conditions and Φ(.) is the cumulative distribution function (CDF) of the standard normal distribution.

### Estimation of total number of non-DE genes *G*_{0}

For a given value *μ* (0 <*μ* < 1), we count the number (*N*_{
u
}) of genes with p-values greater than or equal to *μ*. Then an estimate of *G*_{0} is *N*_{
μ
}/(1-*μ*). To reduce the influence of DE genes since they have relatively small p-values, a relatively large *μ* is preferable. We can also use a vector of *μ*'s and calculate the corresponding estimated {\u1e20}_{0}'s and then take their (weighted) mean as the final estimate for *G*_{0}.

### Gene selection and estimations for false positives and FDR

For a given cutoff p-value, *p*_{0}, we pick those genes with p-values smaller than *p*_{0} as DE genes. Suppose *S* genes are selected. Then we can estimate the number of false positives, \stackrel{\wedge}{FP}={G}_{0}\times {p}_{0}, and the false discovery rate, \stackrel{\wedge}{FDR}={G}_{0}\times {p}_{0}\u2215S.

### SAM, t-test and q-value

For the SAM method, we use the R package, SAMr [10], and choose different values for s0.perc (percentile of estimated se's): -1 (t-test only, i.e. s0 = 0 in (1)), 20, 40, 60, 80 and 100. SAM will calculate p-values by permutation. For the t-test method, we calculate p-values from the regular t-test statistics (i.e. s0 = 0 in (1)) without permutation. We then use the calculated p-values for each method as the input for R package "qvalue" and then get the output of selected DE genes with different preset q-values.

### Simulation design

To restrict ourselves to small experiments, we assume the sample sizes for both conditions are 3, 5 and 8. We simulate 10,000 genes with normal distributions for two conditions. For non-DE genes, we assume they are normally distributed with a mean equal to 0; for DE genes, their absolute mean difference is uniformly distributed: with three ranges representing different degrees of differential expression: *U*(1,3), low, *U*(3,6), middle, and *U*(6,9), high. We assume the standard deviations are uniformly distributed as *U*(1,*b*), where *b* is greater than or equal to one. In the ideal situation, i.e. equal variance, *b* = 1. However, even after trying several variance-stabilization transformations, sometimes this assumption may be too strong for real data, and we therefore choose different *b*'s in our simulations: *b* = 1, 1.5 and 2. In other words, we simulate data with equal or near equal variance. The proportion of DE genes among all genes may also affect the gene selection results; we then choose three levels of proportions: 0.1, 0.3 and 0.5 (i.e. the numbers of DE genes are 1000, 3000 and 5000, respectively). The output of selected genes from "qvalue" for each method with different preset cutoff q-values: 0.05, 0.10, 0.15, 0.20 and 0.25, are compared.

### Real data set

We use Affymetrix GeneChip data sets selected from the GSE2350 series [24], downloaded from the NCBI GEO database [25] to compare our new method with others. We use the first three samples from both "control" (GSM44051, GSM44052 and GSM44053) and "CD40L treatment" (GSM44057, GSM44058 and GSM44059) groups. For the raw intensity data, we use the "rma" function in R package "affy" [10] to do background correction, normalization, and summarization [26]. Then we apply different methods to the summarized expression values (already on log base 2 scale) to estimate p-values that are the input for the "qvalue."

To see which method gives more biologically meaningful results, we use the web-based tool, CLASSIFI algorithm [27–29], that uses Gene Ontology (GO) [30] annotation to classify groups of genes defined by gene cluster analysis using the statistical analysis of GO annotation co-clustering. We compare the median p-values of "topfile" from the output of CLASSIFI. In general, the smaller the p-value is, the more reasonable the results in terms of GO classification [27].