When sample sizes are small, information shared by genes is helpful and should be used. While t-test treats each gene independently, both SAM and MBIS, use information shared among genes. When the equal variance assumption in MBIS is met, the estimated variance for gene

*i* in the t-test has a Chi-square distribution with degrees of freedom of

*n*
_{1} +

*n*
_{2} - 2:

The variance for

is:

And the square of standard error estimated in t-test has variance:

However, (2) has a Chi-square distribution with degrees of freedom

*G*(

*n*
_{1} +

*n*
_{2} - 2), and its variance is:

The square of standard error estimated for our new method is:

In a typical microarray experiment, the number of genes, *G*, is usually between 10K and 50K, indicating that the variance in (9) is very close to 0 and the estimated value in (2) is close to the true value; therefore a normal distribution is appropriate to approximate the mean differences of the true nulls.

In comparing (7) with (9), we can see that, while the regular t-test method gives a much larger variance for each estimated variance (each individual t-test will lose two degrees of freedom due to variance estimation), MBIS, a method that utilizes information among genes, has a more precise estimate for the common variance. Therefore, MBIS always outperforms the t-test.

On the other hand, the Chi-square distribution is right skewed, implying that its mean is larger than its median. If
's have a Chi-square distribution, they are more likely to have estimated values less than the mean (true value) than estimated values greater than the mean. In other words,
are more probable to underestimate than overestimate the constant variance. Therefore many true nulls may have very small p-values from a t-test only because they have small estimated standard errors. This explains why there are so many FPs from t-test in our simulations; and consequently t-test selects so many different DE genes than SAM and MBIS do in real data. Because of the same reason, adding a common number to each individual *se*
_{
i
} in (1) will potentially decrease the bias (for small s0.perc in SAM) and/or decrease the relative difference of estimated variances for most genes; therefore SAM usually improves the test statistics, although still not as favorably as MBIS. This explains why SAM performs better than t-test but worse than MBIS in terms of sensitivity and specificity.

When sample sizes are extremely small, as we mentioned before, SAM will have relatively larger p-values due to a limited number of permutations available, affecting the estimation of q-values by "qvalue". "qvalue" does not perform very well in this situation. For a given cutoff q-value, the corresponding cutoff p-value calculated by "qvalue" could be too large (as seen in the results from t-test and MBIS in simulation and real data) or too conservative (as in the results from SAM), a finding consistent with those from Jung and Jang [12].

Another difficulty for "qvalue" is that the number of selected genes can be very sensitive to the cutoff q-value, especially the very small preset q-value (see Table 2), that is desirable in practice; in this situation, SAM even performs worse than the regular t-test in terms of proportion of the DE genes selected. This raises the question of how to choose an appropriate q-value in practice to which there is no absolute answer. Sometimes, even for large q-values (as seen in the results from SAM in Table 1), the "qvalue" gives us a small proportion of true positives; on the other hand, we could select a large number of genes with a small q-value (as seen in the results from MBIS and t-test for real data in Table 2). We recommend that in this situation (small sample sizes), instead of using q-value only, one should choose a cutoff p-value to select DE genes first and then estimate FDR if desired.

Although we assume equal variance in the MBIS, we also evaluate this new method under situations when this assumption is violated. By simulation, we have shown that, when the variances of gene expressions are near constant, MBIS still outperforms both the t-test and SAM, making our method applicable in various situations.

From our experience, variances estimated from raw expression data are highly variable. We should transform data before applying MBIS. Several variance-stabilization and normalization transformation procedures, such as logarithm, Box-Cox transformation, generalized logarithm [19], variance stabilization [21] and data-driven Haar-Fisz transformation for microarrays (DDHFm) [22], are already available. In addition, choosing appropriate preprocessing procedures (background correction, normalization and summarization) is also very important for downstream analyses, including gene selection [16, 26, 31–34].