An integrated hierarchical Bayesian approach to normalizing leftcensored microRNA microarray data
 Jia Kang^{1}Email author and
 Ethan Yixun Xu^{2, 3}
DOI: 10.1186/1471216414507
© Kang and Xu; licensee BioMed Central Ltd. 2013
Received: 22 August 2012
Accepted: 23 July 2013
Published: 26 July 2013
Abstract
Background
MicroRNAs (miRNAs) are small endogenous ssRNAs that regulate target gene expression posttranscriptionally through the RNAi pathway. A critical preprocessing procedure for detecting differentially expressed miRNAs is normalization, aiming at removing the betweenarray systematic bias. Most normalization methods adopted for miRNA data are the same methods used to normalize mRNA data; but miRNA data are very different from mRNA data mainly because of possibly larger proportion of differentially expressed miRNA probes, and much larger percentage of leftcensored miRNA probes below detection limit (DL). Taking the unique characteristics of miRNA data into account, we present a hierarchical Bayesian approach that integrates normalization, missing data imputation, and feature selection in the same model.
Results
Results from both simulation and real data seem to suggest the superiority of performance of Bayesian method over other widely used normalization methods in detecting truly differentially expressed miRNAs. In addition, our findings clearly demonstrate the necessity of miRNA data normalization, and the robustness of our Bayesian approach against the violation of standard assumptions adopted in mRNA normalization methods.
Conclusion
Our study indicates that normalization procedures can have a profound impact on the detection of truly differentially expressed miRNAs. Although the proposed Bayesian method was formulated to handle normalization issues in miRNA data, we expect that biomarker discovery with other highdimensional profiling techniques where there are a significant proportion of leftcensored data points (e.g., proteomics) might also benefit from this approach.
Keywords
miRNA Normalization Hierarchical Bayesian modeling Detection limit Variable selectionBackground
MicroRNAs (miRNAs) are small noncoding RNAs that are critical players in mediating posttranscriptional genes regulations. They predominantly suppress gene expression by binding to the their target mRNAs to form RNAinduced silencing complex (RISC) [1] and are responsible for regulating >60% of the human coding genome [2–4].
As miRNAs are increasingly implicated in a number of important physiological and pathological processes (e.g., developmental timing [5], stem cell differentiation [6], cancer initiation and progression [7],), there has been an immensely growing interest within the scientific community to study the functional roles of miRNAs. To date, more than 18,226 hairpin precursor miRNAs from 168 species have been registered in the miRBase (http://www.mirbase.org/), and this number is expected to grow much further.
Microarray is a widely adopted technology to simultaneously quantify the expression of hundreds of miRNAs in one single experiment. Although this technology has shown enormous scientific potential in the comprehension of genomic regulation processes, many sources of systematic variation may influence its measured probe expression levels. The purpose of microarray data normalization is to minimize the arraylevel systematic bias, such that meaningful biological comparisons can be made and true biological changes can be found within one and multiple experiments.
A number of normalization methods have been developed for mRNA arrays in the past. Normalization methods for mRNAs often rely on one or more of the following assumptions: only a small fraction of features are differentially expressed and the proportions of up and downregulated expressions are approximately equal. In addition, for normalization methods based on housekeeping genes, the measurements for a set of housekeeping genes, which are expected to be expressed at relatively invariant levels across different tissues or treatment conditions, need to be well defined.
However, in contrast to mRNA arrays, which have an exceedingly high density of probes that are in situ synthesized on the array; miRNA microarrays are often lower density spotted arrays [8]. In addition, it was found in previous studies that (1) differentially expressed miRNAs are not always symmetric in the directions of regulation [9] (2) the fraction of differentially expressed miRNAs among treatment conditions often exceed 15%, and can be as high as 38% [10–12]. (3) Most commercially available miRNA microarrays do not have controls for endogenous RNAs or miRNAs that have been proven to be robustly invariant among different tissues or treatment conditions. Furthermore, different from mRNA arrays, where the abundance level for most of the probes is above the detection limit (DL) of the instrument; in miRNA arrays, we often have a large portion of miRNAs that are expressed below DL [13], resulting in the type of missing data called “nonignorable missing”, and the simple exclusion of such “intervalcensored” data can significantly bias results [14].
Given the differences between miRNA and mRNA data, it is questionable whether the normalization methods for mRNA arrays are adequate for miRNA arrays. In light of developing a more robust normalization method for miRNA arrays, Suo et al. [15] recently proposed a leastvariant set (LVS) approach based on a set of datadriven invariant miRNAs, and demonstrated its superiority to other normalization methods including quantile normalization using a set of human tissue miRNA profiling data. However, LVS implicitly assumes the existence of a set of invariant miRNAs, which in reality may not hold. In addition, LVS is not designed to appropriately handle the missing data problem found in miRNA datasets as described previously.
The framework of Bayesian Hierarchical Modeling (BHM) has proved to be successful for analyzing many types of complex datasets in genetics. Using a BHM strategy has the following key benefits: (1) BHM allows the modeling of noise additively, multiplicatively, or in a nonlinear fashion; (2) Hierarchical modeling enables information borrowing from the comparable units to strengthen statistical inference when the dataset is sparse due to the usual small sample size of biological studies, such that genes exhibiting unusual small variability by chance will not lead to artificially inflated values of test statistics. (3) Bayesian framework offers a natural platform for handling data below detection limit. These benefits motivated us to develop a fully Bayesian approach to miRNA data normalization so that our new algorithm could perform missing data imputation, array normalization, and signal extraction under one unified framework.
Recently, several studies compared the performance of mRNA normalization methods on miRNA microarray data [8, 15–17]. Rao et al. [8] and Zhao et al. [16] demonstrated that quantile normalization [18] (http://oz.berkeley.edu/~bolstad/stuff/qnorm.pdf) outperformed the other normalization methods they evaluated. In their studies, the primary objective was to compare the effect of normalization methods in reducing the variation among technical replicates. However, it is natural to expect that more aggressive normalization methods such as quantile normalization (e.g. forcing each array to have the exact same empirical distribution of intensities) will reduce variations among samples not only within the treatment group but also between treatment groups, which could make the discovery of truly differentially expressed features more difficult. Therefore, reduction of variation among technical replicates does not necessarily lead to an improved performance of feature selection, which is one of the most important goals of miRNA profiling experiments. To date, we are not aware of any literature that systematically compares the impact of different normalization methods in feature selection for the miRNA data. In this paper, we will compare the performance of Bayesian Normalization versus a few other widely adopted normalization methods in selecting differentially expressed (DE) miRNAs, through both a carefully designed simulation study and the analysis of a real dataset.
Methods
suggested by Kerr et al [19]. We parameterize the mean of y_{ gcr } to include additive effects at both miRNA and array levels, such that where α_{ g } is the miRNA effect or overall expression level of miRNA g, β_{ gcr } is the array effect that depends on g through α_{ g }, and σ^{ 2 }_{ g } is the variance for miRNA g. We assume that the variance of each miRNA does not depend on the treatment conditions. The differential effect for each miRNA g between the two treatment conditions is denoted as δ_{g.}
the model is made identifiable by normalizing within each condition by setting ${\overline{\beta}}_{\mathrm{gc}.}=0$$\forall g,c$, where the dot indicates that we are taking an average over the index r.
Handling measurements below level of detection
where L_{cr} is the level of detection for array c_{ r } under condition c, which are readily available from the array manufacturers, and I(,) represents interval censoring, following the notation in JAGS^{27} software language.
Winner’s curse correction
It is tempting to rank the features directly using the unadjusted posterior estimate of_{ , } the absolute effect size for each miRNA (δ_{ g } ), which is what Hein et al. adopted as their feature ranking criterion for mRNA microarray data [20]. However, in high dimensional genetic studies, estimates of effect sizes reported from the same discovery samples that were initially used to declare statistical significance are often grossly inflated and can not be replicated in an independent study. This type of bias is widely known as the Beavis effect [21] or the winner’s curse [22]. Prioritizing features with solely uncorrected δ_{ g } is likely to generate many false findings.
The Bayesian paradigm allows us to conveniently correct for winner's curse by incorporating into our model the prior belief that the significance of the effect observed for each feature may be due to random chance. More mathematically, the effect size δ_{ g } can be modeled with a mixture prior among a discrete probability with mass at zero, a continuous density f^{+} with support on the positive line, and a continuous density f^{} with support on the negative line. This threecomponent mixture prior describes the three possible categories that a miRNA could be classified under: nondifferentially expressed, upregulated, or downregulated, respectively.
We treat $\overrightarrow{\xi}$ as a hyperparameter with a Dirichelet distribution, i.e. $\overrightarrow{\xi}$ ~Dirichlet(α_{1,}α_{2,}α_{3}). The parameters α_{1,}α_{2,}α_{3} reflect our degree of prior belief in δ_{g} = 0 (false positive) versus δ_{g} ≠ 0 (true positive).
As discussed previously, unlike mRNA datasets, where it has been well established that only a small fraction of features can be differentially expressed; for miRNA datasets, we could not favor, a priori, any region of (0,1) for $\overrightarrow{\xi}$, because the proportions of differentially expressed features among different miRNA datasets could vary dramatically. Therefore, we set α_{1}=α_{2}=α_{3}=1, which essentially makes p($\overrightarrow{\xi}$ α_{1}=1_{,}α_{2}=1_{,}α_{3}=1) a uniform distribution over the simplex of possible values of $\overrightarrow{\xi}$.
Additional prior specification
To complete the model, we assigned the following prior distributions for the remaining parameters. The gene effect α_{ g }~N(G,S^{ 2 }), where G~N(0,1000), S^{ 2 }~Gamma(0.001,0.001).
The hyperparameters for miRNA level variance that appear in equation (2) are assumed to follow μ~Ν(0,100),η^{ −2 }~ Gamma(0.001,0.001). The slope and intercept parameters that appear in equation (3) are assumed to follow: ${a}^{0}{}_{\mathrm{cr}}$~N(0,100),${a}^{1}{}_{\mathrm{cr}}$~N(0,100).
And finally, the hyperparameter for the nonnull components of the effects size that appear in equation (6) are assume to follow: ${\mu}^{+}$ ~ N(0,100) [truncated(0, ∞)], ${\mu}^{}$ ~ N(0,100) [truncated(−∞, 0)], ${\sigma}^{+}{}^{2}$~ Gamma(0.001,0.001),${\sigma}^{{}^{2}}$~ Gamma(0.001,0.001). Essentially, all of the priors and hyperpriors in our model were specified in a noninformative fashion.
MCMC implementation
Each parameter was sampled from their posterior distributions by an MCMC algorithm using the JAGS software [23]. Specific parameter values and algorithm details not provided in this manuscript may be found in the R package BMIRN (Additional file 1). The instruction manual on how to use BMIRN can be found in Additional file 2.
On our Linux cluster (16 GB memory, and a dualdual core 3.0 GHz AMD processor), where independent MCMC chains can be parallelized, it takes about 302 seconds on average to run 20,000 iterations for the size of data described in our simulation study (we find 20,000 iterations typically more than enough steps to achieve convergence in both simulation and the real data analysis in our study).
Simulation studies
Study objective
The main objective of this simulation analysis is to explore how the unique features of miRNA datasets may impact the performance of the proposed Bayesian normalization method and the existing normalization methods.
 1.
The proportion of differentially expressed features in miRNA data may be large.
 2.
The proportion of missing values caused by measurements below instrument's detection limit in miRNA data may be large.
Simulation setup
Following the general simulation setup of Broët et al. [24], we generate a microarray dataset containing 300 miRNAs and five repeat arrays under two conditions. The gene effects α_{ g }' s range uniformly between 0 and 10, and the array effects are linear functions of the gene effects. Following the notations described in equation (3), we generate ${a}^{0}{}_{\mathrm{cr}}$ from N(0,0.5), and ${a}^{1}{}_{\mathrm{cr}}$ from N(0,0.05) in our analysis. The gene variances are simulated based on equation (2), with μ=−1.8 and η =1, giving a similar range of variances to those we have observed in real data.
A sparsity parameter sp is created corresponding to the percentage of miRNAs in the simulated dataset that are truly differentially expressed. The differential effect ${\delta}_{g}$ for miRNA g is then set to be zero for the (1sp) nonDE miRNAs, N(log(3),0.1^{2}) for the $\frac{\mathrm{sp}}{2}$ upregulated miRNAs, and N(−log(3),0.1^{2}) for the remaining downregulated $\frac{\mathrm{sp}}{2}$ miRNAs. In our simulation, sp is allowed to vary from 0.1 to 0.4, with a step size of 0.1. This range for sp is selected to match to range of the proportion of DE features observed in real miRNA datasets [10–12].
To investigate the effect of leftcensored measurements on the performance of the normalization methods, we create a simulation parameter m, which represents the proportion of measurements that are below detection limit. To generate datasets containing m% of missing values, we first obtain the mth quantile expression level (D_{ m }) for each array, and then set measurements below D_{ m } to missing for each array. D_{ m }’s are recorded individually for the downstream analysis. In this study, m is set to vary from 0 to 0.4, with a step size of 0.1, which is again chosen to reflect the range of missing values we encounter in real datasets. For each pair of sparsity parameter sp and missing parameter m, 20 replicate datasets are generated.
Normalization methods compared
In this simulation study, we compare the performance of the proposed Bayesian normalization method to the following widely used normalization methods: quantile normalization [18], variance stabilization normalization (VSN) [25], and no normalization.
Missing data imputation
For the Bayesian approach, imputation for the missing data below detection limit is automatically incorporated into the model framework, and no additional imputation work is necessary. For the other normalization methods investigated, the missing values were substituted with 0.5*DL, which is a common practice to handle data below DL [26, 27].
Performance evaluation
The performance of different normalization methods is measured in terms of their capability of recovering truly differentially expressed features. For Bayesian normalization, features are ranked by the absolute value of the posterior mean of δ_{ g }, which is the winner's curse corrected effect size for each miRNA. For all the other normalization methods, data normalization is first performed, a moderated ttest [28] is then applied to the normalized data, and the features are subsequently ranked by the ttest pvalues. The AUC (area under the receiver operating characteristic curve) is reported for each normalization method evaluated.
Real data analysis
In addition to the simulation studies, we also compared the performance of various normalization methods on two sets of human tissue miRNA profiling data, which are microarray data and high resolution realtime RTPCR (qRTPCR) data.
Tissue microarray data
The human tissue microarray data were generated as part of an effort to compare between microarray and quantitative TaqMan qRTPCR measurements [29]. The microarray dataset includes 43 samples hybridized on an Agilent Human miRNA Microarray 1.0 coming from nine different human tissues (brain, breast, heart, liver, placenta, testis, ovary, skeletal muscle, and thymus). There are four to five replicate arrays for each tissue type, and each array contains 534 miRNAs (excluding control probes). The microarray data are publicly available from GEO with series number GSE11879.
Tissue qRTPCR data
Normalization methods compared
As in the simulation study, we also compare the performance of quantile normalization, VSN, no normalization, and Bayesian normalization on the human tissue miRNA dataset. In addition, we also include the modified leastvariant set (LVS) [15] in the set of methods being evaluated. LVS is a recently proposed normalization method that is specifically designed for miRNA data. Suo et al. have demonstrated that LVS outperforms no normalization, 75^{th} percentileshift, quantile, global median, VSN, and LOWESS normalization methods in the same set of human tissue miRNA data as we use here [15].
Performance evaluation of normalization methods
We compare the performance of various normalization methods on the real data in terms of their ability to detect truly differentially expressed features, using the AUC metric.
In this human tissue dataset, different treatment conditions are essentially various human tissues, and truly differentially expressed features are those miRNAs that show distinct expression profiles between any pair of tissues. In order to compare the performance of various normalization methods in detecting truly differentially expressed hits, we adopted the same strategy as Suo et al. [15] in defining the set of true positive miRNAs. More specifically, differentially expressed miRNAs are selected as having qRTPCRbased fold changes ≥ 3 (or ≤ 1/3) and Pvalues computed on the array data <0.01 (using the moderated ttest).
Since we have both qRTPCR and microarray data available for eight tissues, AUCs for each normalization method on all 28 possible pairs of tissues will be computed.
Results and discussions
Performance evaluation of BHM normalization by simulation studies
When we examined the effect of the proportion of leftcensored measurements on the performance of normalization methods, the Bayesian approach again consistently provides a better AUC than all the other normalization methods. When the data contain a small fraction of missing data, VSN appears to perform slightly better than quantile normalization; however, as the proportion of missing becomes larger, quantile normalization starts to outperform VSN. At all ranges of missing investigated in this simulation study, nonormalization is the worst performer among all the methods examined. In addition, we also observe that VSN is much more sensitive to the variation of m than the proposed Bayesian method (Figure 3b).
Performance evaluation of BHM normalization by the analysis of tissue profiling data
The advantage of Bayesian normalization is further supported by the analysis of human tissue miRNA profiling data (GSE11879) using the qRTPCR results [30] as the gold standard.
AUC is often used as a standard performance metric in method evaluation for many data mining applications; however, it practice, scientists may also be interested in the proportion of true positive features out of the declared positive features (i.e. precision). Therefore, in this part of the analysis, we use the average of AUC and AUPRC (area under the PrecisionRecall curve) to evaluate the performance of different normalization methods.
The number of times that each normalization method gives the highest (AUC+AUPRC)/2, and the median (AUC+AUPRC)/2 of each method for all 28 tissue pairs in the human miRNA profiling dataset (GSE11879) using qRTPCR results as the gold standard
Method  Bayesian  Quantile  NoNormalization  LVS  VSN 

T_{max}  13  1  0  3  11 
0.5*(AUC+AUPRC)_{median}  0.86  0.72  0.74  0.81  0.82 
An interesting finding from analyzing the human tissue miRNA data is that the performance of feature selection could actually be negatively impacted by inappropriately chosen normalization procedures. For instance, in the case of brain vs. liver comparison, we expect to find a large percentage of differentially expressed miRNAs. The (AUC+AUPRC)/2 scores are 0.79, 0.65, 0.82, 0.86, and 0.92 for nonormalization, quantile normalization, VSN, LVS, and Bayesian normalization respectively.
This scenario would violate the assumption that only a small proportion of features are expected to show differential expression in a microarray dataset. By comparing (AUC+AUPRC)/2 score of each method, we discover that normalization methods designed for mRNA data that tend to be overly aggressive (e.g., quantile normalization) provide virtually no additional benefits in recovering truly differentially expressed features. On the other hand, the performance of our BHM normalization remains robust.
Conclusion
Profiling miRNA expression in cells with microarrays is becoming a widely used tool in elucidating miRNA functions. Normalization, often an overlooked aspect of data processing, is a critical step in the downstream detection of differentially expressed miRNAs. In the present study, we propose an integrative Bayesian approach to normalize miRNA data, and compare the performance of Bayesian method to other widely used miRNA normalization methods when the assumptions for mRNA normalization methods are violated.
Combining the findings from both simulation studies and the analysis of human tissue profiling data, it appears that normalization procedures can have a profound impact on the detection of truly differentially expressed miRNAs. Our Bayesian normalization framework appropriately addresses the leftcensored nature of the miRNA microarray data (Figure 3b) and is robust against the variation of the proportion of differentially expressed features (Figure 3a). On the contrary, the other normalization methods evaluated do not seem to adequately handle these unique characteristics of miRNA data. In addition, we expect that the robust performance of our Bayesian approach can also benefit from the embedded winner's curse correction in our model and the reduction of the propagations of uncertainties (introduced from performing normalization, imputation, and feature extraction sequentially rather than in one integrative step).
Although our Bayesian method was formulated to handle normalization issues in miRNA data, we expect that biomarker discovery with other highdimensional profiling techniques where there are a significant proportion of leftcensored data points (e.g., proteomics) might also benefit from this approach.
Abbreviations
 miRNA:

MicroRNA
 DL:

Detection limit
 LVS:

Leastvariant set
 BHM:

Bayesian hierarchical modeling
 DE:

Differentially expressed
 AUC:

Area under the receiver operating characteristic curve
 VSN:

Variance stabilization normalization
Declarations
Acknowledgements
The authors would like to thank Dr. Chunhua Qin for his valuable insights and comments and Dr. Vladimir Svetnik for his kind support.
Authors’ Affiliations
References
 Carthew RW, Sontheimer EJ: Origins and mechanisms of miRNAs and siRNAs. Cell. 2009, 136 (4): 642655.PubMed CentralView ArticlePubMedGoogle Scholar
 Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120 (1): 1520.View ArticlePubMedGoogle Scholar
 Friedman RC, Farh KKH, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009, 19 (1): 92105.PubMed CentralView ArticlePubMedGoogle Scholar
 Sheng Y, Engstrom PG, Lenhard B: Mammalian microRNA prediction through a support vector machine model of sequence and structure. PLoS One. 2007, 2 (9): e946PubMed CentralView ArticlePubMedGoogle Scholar
 Nimmo RA, Slack FJ: An elegant miRror: microRNAs in stem cells, developmental timing and cancer. Chromosoma. 2009, 118 (4): 405418.PubMed CentralView ArticlePubMedGoogle Scholar
 Ren J, Jin P, Wang E, Marincola FM, Stroncek DF: MicroRNA and gene expression patterns in the differentiation of human embryonic stem cells. J Transl Med. 2009, 7 (20): 20PubMed CentralView ArticlePubMedGoogle Scholar
 Landi MT, Zhao Y, Rotunno M, Koshiol J, Liu H, Bergen AW, Rubagotti M, Goldstein AM, Linnoila I, Marincola FM: MicroRNA expression differentiates histology and predicts survival of lung cancer. Clin cancer res: an official J Am Assoc Cancer Res. 2010, 16 (2): 430441.View ArticleGoogle Scholar
 Rao Y, Lee Y, Jarjoura D, Ruppert AS, Liu CG, Hsu JC, Hagan JP: A comparison of normalization techniques for microRNA microarray data. Stat Appl Genet Mol Biol. 2008, 7 (1): Article22PubMedGoogle Scholar
 Oshlack A, Emslie D, Corcoran LM, Smyth GK: Normalization of boutique twocolor microarrays with a high proportion of differentially expressed probes. Genome Biol. 2007, 8 (1): R2PubMed CentralView ArticlePubMedGoogle Scholar
 Calin GA, Ferracin M, Cimmino A, Di Leva G, Shimizu M, Wojcik SE, Iorio MV, Visone R, Sever NI, Fabbri M: A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med. 2005, 353 (17): 17931801.View ArticlePubMedGoogle Scholar
 Volinia S, Calin GA, Liu CG, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C, Ferracin M: A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci USA. 2006, 103 (7): 22572261.PubMed CentralView ArticlePubMedGoogle Scholar
 Yanaihara N, Caplen N, Bowman E, Seike M, Kumamoto K, Yi M, Stephens RM, Okamoto A, Yokota J, Tanaka T: Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell. 2006, 9 (3): 189198.View ArticlePubMedGoogle Scholar
 Schetter AJ, Leung SY, Sohn JJ, Zanetti KA, Bowman ED, Yanaihara N, Yuen ST, Chan TL, Kwong DL, Au GK: MicroRNA expression profiles associated with prognosis and therapeutic outcome in colon adenocarcinoma. JAMA. 2008, 299 (4): 425436.PubMed CentralView ArticlePubMedGoogle Scholar
 Little RJA, Rubin DB: Statistical Analysis with Missing Data. 1987, John Wiley & SonsGoogle Scholar
 Suo C, Salim A, Chia KS, Pawitan Y, Calza S: Modified leastvariant set normalization for miRNA microarray. RNA. 2010, 16 (12): 22932303.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhao Y, Wang E, Liu H, Rotunno M, Koshiol J, Marincola F, Landi M, McShane L: Evaluation of normalization methods for twochannel microRNA microarrays. J Transl Med. 2010, 8 (69):
 Hua YJ, Tu K, Tang ZY, Li YX, Xiao HS: Comparison of normalization methods with microRNA microarray. Genomics. 2008, 92 (2): 122128.View ArticlePubMedGoogle Scholar
 Bolstad B: Probe Level Quantile Normalization of High Density Oligonucleotide Array Data. 2001, (http://bmbolstad.com/stuff/qnorm.pdf)Google Scholar
 Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J comput biol: a J comput mol cell biol. 2000, 7 (6): 819837.View ArticleGoogle Scholar
 Hein AM, Richardson S, Causton HC, Ambler GK, Green PJ: BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. Biostatistics. 2005, 6 (3): 349373.View ArticlePubMedGoogle Scholar
 Xu S: Theoretical basis of the Beavis effect. Genetics. 2003, 165 (4): 22592268.PubMed CentralPubMedGoogle Scholar
 Zollner S, Pritchard JK: Overcoming the winner’s curse: estimating penetrance parameters from case–control data. Am J Hum Genet. 2007, 80 (4): 605615.PubMed CentralView ArticlePubMedGoogle Scholar
 Plummer M: JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003. 2003, Vienna, Austria: ISSN, 1609395XGoogle Scholar
 Broet P, Lewin A, Richardson S, Dalmasso C, Magdelenat H: A mixture modelbased strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics. 2004, 20 (16): 25622571.View ArticlePubMedGoogle Scholar
 Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 Suppl 1 (suppl 1): S96S104.View ArticlePubMedGoogle Scholar
 Hornung RW, Reed L: Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg. 1990, 5 (1): 4651.View ArticleGoogle Scholar
 Bleavins M, Carini C, JurimaRomet M, Rahbari R: Biomarkers in Drug Development: A Handbook of Practice, Application and Strategy. 2010, WileyBlackwellGoogle Scholar
 Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article3PubMedGoogle Scholar
 Ach RA, Wang H, Curry B: Measuring microRNAs: comparisons of microarray and quantitative PCR measurements, and of different total RNA prep methods. BMC Biotechnol. 2008, 8: 69PubMed CentralView ArticlePubMedGoogle Scholar
 Lee EJ, Baek M, Gusev Y, Brackett DJ, Nuovo GJ, Schmittgen TD: Systematic evaluation of microRNA processing patterns in tissues, cell lines, and tumors. RNA. 2008, 14 (1): 3542.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.