Intensity-based analysis of dual-color gene expression data as an alternative to ratio-based analysis to enhance reproducibility
© Bossers et al; licensee BioMed Central Ltd. 2010
Received: 8 October 2008
Accepted: 17 February 2010
Published: 17 February 2010
Ratio-based analysis is the current standard for the analysis of dual-color microarray data. Indeed, this method provides a powerful means to account for potential technical variations such as differences in background signal, spot size and spot concentration. However, current high density dual-color array platforms are of very high quality, and inter-array variance has become much less pronounced. We therefore raised the question whether it is feasible to use an intensity-based analysis rather than ratio-based analysis of dual-color microarray datasets. Furthermore, we compared performance of both ratio- and intensity-based analyses in terms of reproducibility and sensitivity for differential gene expression.
By analyzing three distinct and technically replicated datasets with either ratio- or intensity-based models, we determined that, when applied to the same dataset, intensity-based analysis of dual-color gene expression experiments yields 1) more reproducible results, and 2) is more sensitive in the detection of differentially expressed genes. These effects were most pronounced in experiments with large biological variation and complex hybridization designs. Furthermore, a power analysis revealed that for direct two-group comparisons above a certain sample size, ratio-based models have higher power, although the difference with intensity-based models is very small.
Intensity-based analysis of dual-color datasets results in more reproducible results and increased sensitivity in the detection of differential gene expression than the analysis of the same dataset with ratio-based analysis. Complex dual-color setups such as interwoven loop designs benefit most from ignoring the array factor. The applicability of our approach to array platforms other than dual-color needs to be further investigated.
During the last decade, microarray technology has evolved into an indispensable tool for high-throughput gene expression studies. For example, microarrays are now routinely applied to identify differentially expressed genes between paired sample series, classify tumors in prognostic groups, and identify transcriptional alterations during development [1–3]. Two main types of commercial high density microarray platforms have emerged: one-color oligonucleotide platforms such as Affymetrix and Illumina, and dual-color oligonucleotide platforms such as Agilent and Nimblegen. Dual-color gene expression platforms are very efficient in directly comparing two conditions, by hybridizing the two conditions together on the same array. This greatly reduces the possible confounding effects of inter-array variability and local array effects.
The outcome of comparative microarray experiments is a ranked list of significant genes, possibly involved in the process under investigation. The resulting gene list then serves as a starting point for further investigations, such as constructing new hypotheses, or the in vitro characterization of putatively identified genes. Due to their dimensionality (few observations, many variables), microarray experiments suffer from high rates of false positive and negative findings . Thus, the issue of reproducibility is of utmost importance in array experiments.
Analysis of variance (ANOVA) is a widely used tool to analyze and rank genes in both one- and dual-color comparative gene expression experiments . The ANOVA model incorporates factors such as treatment, tissue and age to estimate the effect of interest per gene. For dual-color arrays specifically, an array effect is included in the model to determine the technical noise introduced by any between-array differences. Accounting for such an array effect in the analysis of dual-color arrays was initially necessary due to the relatively poor quality of array platforms: researchers were confronted with different levels of background signals across arrays, and the process of spotting cDNAs yielded probes with different shapes and probe concentrations. The latest generations of commercial dual-color platforms however use synthesized oligonucleotide probes instead of cDNA probes, and are of much higher and consistent quality and concentration. Subsequently, the variance introduced by the array effect has become much less pronounced . We recently reported that using the Agilent arrayCGH platform or other CGH array platforms, the separate channels of these dual channel arrays are interchangeable, avoiding redundant hybridizations of the same reference material in every experiment . We therefore raised the question whether the results obtained by separately analyzing intensities from co-hybridized gene expression array samples are more reproducible than results based on classical ratio-based analysis. As an added benefit, an intensity-based analysis approach allows for pairwise comparison between any samples.
We have performed a set of experiments to determine whether the intensity-based analysis of dual-color arrays is more reproducible than the conventional ratio-based analysis. Two independent datasets were used: a human keratinocyte cell line dataset, and a dataset based on human brain tissue. By selecting these datasets, we were able to study the performance of the ratio- and intensity-based models in two distinct situations: no biological variation (cell line dataset) versus substantial biological variation (brain dataset).
For both the intensity-based and ratio-based analysis, we estimated the reproducibility and sensitivity in the detection of differential gene expression by analyzing technical replicates. Technical replicates, consisting of two non-overlapping sets of microarrays, were used rather than biological replicates, as our focus is on inclusion or exclusion of the array factor, which is a technical factor. Furthermore, we used a model selection algorithm to determine whether either intensity or ratio based analysis was most suitable for each dataset.
Our results indicate that intensity-based analysis outperforms the standard ratio-based analysis of the same dataset. Intensity-based results are more reproducible, and increase the sensitivity of detecting regulated genes. Our results also indicate that differences between ratio- and intensity-based results become smaller in large datasets with simple designs, suggesting that more complex designs such as factorial and loop designs benefit most from our approach.
Intensity-based analysis yields comparable results to ratio-based analysis, but is more sensitive in detecting differential gene expression
To determine the feasibility of intensity-based analysis of dual-color arrays, we performed a microarray experiment in which the effects of 4 different treatments (T1, T2, T3 and T4) were investigated using two keratinocyte-derived cell lines by measuring transcript levels on Agilent 4 × 44K Whole Human Genome arrays. The entire experiment was technically replicated (experiment C1 and C2, the hybridization setup can be found in Additional file 1).
To investigate whether these findings translate to other datasets as well, we analyzed a publicly available dataset in which two commercially available samples were hybridized 10 times on Agilent dual-color microarrays (data obtained from the MAQC dataset ). We selected this experiment specifically, because treatment and array effects are not partially confounded (as is the case in the cell line dataset), and biological variance is absent. We observed a very strong linear correlation between real and in silico reconstructed ratios (Additional file 3), and the ANOVA-derived array effect was very small compared to the treatment effect (Figure 2B).
Intensity-based results reproduce better than ratio-based results
The intensity model is favored over the ratio model based on BIC model selection.
Cell line dataset
Analysis of an independent dataset confirms the gain in reproducibility and sensitivity when using intensity-based models
A power perspective on intensity versus ratio-based models
Our results demonstrate that the analysis of dual-color microarray gene expression experiments using intensity-based linear models outperforms the standard ratio-based analysis. Both reproducibility and sensitivity were enhanced in detecting differential gene expression in two independent datasets.
By analyzing technically replicated experiments we determined the effect of both models on the reproducibility of gene rankings. Our studies show that for both the cell line and brain datasets the intensity-based analysis provides more reproducible gene rankings than the ratio-based analysis of the same dataset. For the cell line dataset, 78% of the 1,000 most significant genes is reproduced between the two duplicate datasets C1 and C2 when using the intensity analysis, whereas only 73% of the genes is reproduced with the ratio analysis (see Figure 4C). For the brain datasets B1 and B2, the difference between ratio- and intensity-based reproducibility is far more pronounced: only 4% of the top 1,000 genes are reproduced in the ratio analysis, while there is still a substantial overlap of 51% between intensity-based gene rankings (Figure 6C). The underlying reasons behind the apparent discrepancy between the cell line and brain datasets will be addressed later. An independent line of evidence, based on model selection, also indicated that intensity-based models are preferred over ratio-based models for the analysis of dual-color microarray data. When performing Bayesian Information Criterion model selection calculations, we found that for 95% of the transcripts in the cell line experiment, and virtually all transcripts in the human brain experiment, the intensity model was favored over the ratio model. Furthermore, for both the cell line dataset and a publicly available third dataset, a comparison between ANOVA-based array and treatment effect sizes revealed that the treatment effects are much larger.
Combining the gene ranking, relative effect size and model selection results, we argue that simply by selecting the intensity model instead of the ratio model for the analysis of the same set of gene expression measurements, more reproducible results are obtained.
It should be noted that the relative advantage of dropping the array effect depends on the complexity of the design and the sample size (the number of arrays). For the relatively simple MAQC data set BIC selects the model with array effect for 29% of the genes, much more frequently than for both the brain and cell line data sets. The beneficial effect of dropping the array effect from the model seems more pronounced in experiments that employ direct designs to address complex comparisons, such as time series and multifactorial experiments.
Adding to the enhanced reproducibility, intensity-based analysis is more sensitive in the detection of differential gene expression, as derived from more significant p-values. It is important to note that, by selecting the ratio-based p-value of the 1000th most significant gene as a cutoff, almost all of the 1000 genes (89% for dataset C1, 92% for dataset C2) are also significant in the intensity-based analysis using the same cutoff. Interestingly, this analysis also reveals that 3335 genes, not selected by the ratio model, are reproducibly more significant than the 1000th gene in the ratio results. This provides additional evidence for the enhanced sensitivity of the intensity model over the ratio model. Due to the poor reproducibility of the ratio-based results in the brain dataset, such calculations were not meaningful for that dataset.
Enhanced sensitivity due to ignoring the array effect in the linear model
The observation that ratio-derived p-values can be improved by intensity-based models can be attributed to the inclusion of the array effect in the ratio-based linear model. Pairing of data is a powerful concept for removing subject specific bias. In particular, when the quality of the spot printing procedure is not constant (often the case with in-house spotted arrays), it is essential to account for an array effect in the ANOVA model . But there is a price to pay: degrees of freedom . The total number of degrees of freedom equals the number of samples. The array effect consumes almost half of the degrees of freedom. However, due to the high quality of commercially available dual-color oligonucleotide microarrays, we and others observed that the ratios of the same sample pair, measured on different arrays, are strongly correlated , which means that the array effect is likely to be very small. When using a ratio-based model to analyze the data, many degrees of freedom are used to estimate the array effect, explaining only a small proportion of the variability. This ultimately results in less significant p-values, a lower correlation between p-values from the two replicate experiments, and a smaller proportion of reproduced top-ranked genes. Indeed, the results from the model selection experiments clearly indicate that the model without array effect is the preferred model for both datasets. It should be noted that we do not state that the array effect is absent: our analyses in fact show that an array effect is present in modern dual color microarray experiment. Furthermore, the results from the power calculations for the MAQC dataset show that including the array effect can be slightly beneficial for certain sample sizes. However, we conclude from our experiments that for both the brain and cell line datasets, the array effect is too small in comparison to the main factor of interest (treatment) to justify incorporation into the ANOVA model.
A possible argument for the inclusion of the array effect is the potential competition for spot binding between the co-hybridized samples. However, our and other studies suggest that competition is not an issue [7, 10]. This can be derived from the strong correlation between the real and in silico reconstructed ratios (see Additional files 2 and 3), and the hierarchical clustering in Figures 1 and 5. Our study was however not conducted to demonstrate that ratios can be reconstructed in silico by using separate intensities. Indeed, this has been demonstrated before . Our specific aim was to compare the performance of ratio- and intensity-based methods based on the main outcome of comparative gene expression experiments: a list of ranked genes. As this gene ranking provides the basis for further research, it needs to be robust and reproducible. We show here that intensity-based methods provide more reproducible results and is more sensitive in detecting differential gene expression, and thus outperform the standard ratio-based analysis.
Biological variation negatively affects ratio-based, but not intensity-based, replication
As indicated earlier, in the human brain experiment, we observed a striking lack of reproducibility (r = 0.05) between p-values generated by the ratio model on the replicate datasets B1 and B2, whereas the intensity-based p-values reproduced quite well (r = 0.46). These findings can be attributed to the following. First of all, the overall p-values (both intensity- and ratio-based) are less significant in the human brain experiment than in the cell line experiment, due to the large biological variation between individuals. Second, due to the relatively low level of biological replication, few degrees of freedom are left for estimating the biological effect. Third, the brain experiment was not designed with splitting the data into two technical replicates in mind. While the two data sets are biologically identical, the samples are paired differently on the arrays between the two replicate datasets (see Additional file 4). Since this pairing is more or less arbitrary, the results should be robust against this artifact, but this is not necessarily the case for the ratio-based analysis. When the biological variation is large, different sample pairings may result in differences in measured ratios, a phenomenon we observed in the brain dataset (Figure 7 and Additional file 5). The intensity-based analysis of brain datasets B1 and B2 does not suffer from these drawbacks: no ratios are calculated, and more degrees of freedom are left for estimating the biological effect of interest, resulting in a substantial proportion of reproducible findings (51% of the 1,000 most significant genes), and a relatively high correlation between p-values (r = 0.46). In a setting with many biological replicates per level (e.g. comparison of two large groups) the differences in correlation between the ratio-based and intensity-based analysis are likely to be smaller.
Our studies indicate that the reliability of gene rankings obtained from dual-color microarray experiments can be improved by using intensity-based models. An added benefit of the intensity-based analysis is that intensity models do not suffer from the drawbacks of ratio models in the analysis of complex direct dual-color experiments. Designs such as the interwoven loop design address the increased complexity of microarray experiments, which have progressed from "simple" two-group comparisons to multifactorial or time-course experiments. The aforementioned direct designs are efficient, but often bias certain comparisons over others and lack the possibility to extend the experiment by adding more groups or samples. There are no such limitations when analyzing dual-color experiments with intensity-based models . Finally, the LIMMA software package also uses intensity data from dual-color experiments, but mainly as a solution to compare samples which are unconnected in the hybridization design . Here, we provide evidence that it is beneficial to perform an intensity-based analysis for connected designs as well. It should be noted that the observed improvements may be limited to dual-color arrays and that further experiments are needed to justify the generality of these results for other array designs.
In conclusion, our results indicate that intensity-based models are very powerful in the analysis of dual-color gene expression data when these are obtained from a high-quality platform. Most importantly, intensity models yield more reproducible results, and are more sensitive in the detection of differential gene expression than standard ratio-based analysis methods on the same microarray dataset. The gain in reproducibility and sensitivity are most pronounced in complex designs such as the interwoven loop design. We argue that the intensity-based models outperform ratio-based models, and thus are the preferred models for the analysis of dual-color gene expression datasets derived from commercial oligo-based array platforms.
Human keratinocyte cell line dataset
The cell line sample set consisted of two immortalized cell lines (cell lines 10 and 19) derived from a single primary keratinocyte culture. The two cell lines were subjected to four different treatments (treatments T1, T2, T3 and T4). After RNA isolation and labeling (labeling performed with Agilent Low RNA Input Fluorescent Linear Amplification Kit, Agilent Technologies), equal amounts (1 μg) of Cy3-CTP and Cy5-CTP labeled samples were hybridized to Agilent 4 × 44K Whole Human Genome arrays (Agilent Technologies, Part Number G4112F), according to the manufacturer's instructions. The hybridization set-up on the 4 × 44K array was chosen in such a way that for each cell line, both Cy3- and Cy5-labeled samples for all treatments were hybridized on a single slide (containing 4 arrays). The entire experiment was technically replicated. The hybridization setup can be found in Additional file 1. Microarrays were scanned using the Agilent DNA Microarray Scanner (Agilent Technologies, Part Number G2505B), and scans were quantified using the Agilent Feature Extraction software (version 8.5.1). Raw expression data generated by the Feature Extraction software were imported into the R statistical environment using the LIMMA package  in Bioconductor http://www.bioconductor.org. No background correction was performed, as overall background levels were very low. The intensity distributions within and between arrays were normalized using the quantile scaling algorithm  in LIMMA. After normalization, the separate intensity channels were extracted from the ratio measurements. The log2-transformed intensity measurements were used in all following analyses. The microarray data have been deposited in the Gene Express Omnibus (GEO) database http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=djkxjoomeyecqja&acc=GSE12553.
Microarray hybridization data were extracted from the Gene Expression Omnibus (GEO accession number GSE5350, file MAQC_AGL_123_60TXTs.zip, series "C": arrays AGL_3_C1.txt, AGL_3_C2.txt, AGL_3_C3.txt, AGL_3_C4.txt and AGL_3_C5.txt, series "D": AGL_3_D1.txt, AGL_3_D2.txt, AGL_3_D3.txt, AGL_3_D4.txt and AGL_3_D5.txt). This dataset consists of 10 technical replications of a hybridization of Stratagene Universal Human Reference RNA (Cy3 in series "C", Cy5 in series "D") and Ambion Human Brain Reference RNA (Cy3 in series "D", Cy5 in series "C") as described in . Microarray normalization procedures were performed as described for the cell line experiment. Power curves were computed from the non-central t-distribution.
Human brain dataset
Fresh-frozen human brain tissue samples were obtained from the Netherlands Brain Bank, Amsterdam (NBB). Written informed consent for a brain autopsy and the use of the material and clinical information for research purposes was obtained by the NBB from the donor or next of kin. Gray matter was isolated from the prefrontal cortex of 49 individuals (matched for age, sex, postmortem interval and brain pH) with increasing levels of AD-related neuropathology, as defined by the Braak staging for neurofibrillary tangles . For each of the 7 Braak stages, 7 individuals were included. Tissue dissection was performed using a cryostat. For each sample, between 20 and 30 sections of 50 μm were cut. Grey matter areas were identified by eye and dissected out using pre-chilled scalpels. Tissue yields were typically around 50 mg. Total RNA was isolated using a combination of Trizol-based and RNeasy Mini Kit RNA isolation methods. Briefly, samples were homogenized in ice-cold Trizol (Life Technologies, Grand Island, New York, 3 ml Trizol per 100 mg tissue). After phase separation by addition of chloroform, the aqueous phase was mixed with an equal volume of 70% RNAse-free ethanol. Samples were then applied to an RNeasy Mini column (Qiagen, Valencia, California), and processed according to the RNeasy Mini Protocol for RNA Cleanup. Overall, the isolated RNA was of high integrity (average RNA integrity number of of 8.3, range 6.5-9.6, as determined by Agilent 2100 Bioanalyzer analysis).
After RNA isolation, for each sample, two 500 ng aliquots of RNA were linearly amplified and fluorescently labeled with either Cy3-CTP or Cy5-CTP (Perkin Elmer) with the Agilent Low RNA Input Fluorescent Linear Amplification Kit (Agilent Technologies). The most efficient hybridization scheme was calculated with the od function of the SMIDA package (version 0.1) in R. The resulting hybridization setup can be found in Additional files 4 and 6. Equal amounts (1 μg) of Cy3-CTP and Cy5-CTP labeled samples were hybridized to Agilent 44K Whole Human Genome arrays (Part Number G4112A) according to manufacturer's instructions. Microarray scanning, feature extraction and normalization procedures were performed as described for the cell line experiment. The full set of normalized expression values is publicly available at http://www.vumc.nl/braindataset and as supplementary information to this manuscript (see Additional file 7).
Clustering and ANOVA models
Here, μ captures the average gene intensity, τ i is the treatment specific effect, η j is the cell line (10 or 19) effect, αk(j)is the array effect, and ε ijk is the error component. Dye effects have not been incorporated, because the design was balanced for dyes and the data were normalized to remove dye-specific bias. Model 2 lacks the factor αk(j)and hence represents the intensity-based model. The treatment effect is the factor of biological interest, to which an F-test was applied to compute p-values. This resulted in four lists of p-values: ratio-based and intensity-based p-values for technical replicate C1, and ratio-based and intensity-based p-values for technical replicate C2.
A similar approach was taken for the human brain data. Each patient was hybridized twice. The resulting set of arrays was split in such a manner that each patient was represented exactly once in both data sets (brain datasets B1 and B2, see Additional file 4). It is noteworthy that the obtained datasets are indeed technical replicates, but not on the level of the experimental design, as is the case for cell line datasets C1 and C2. The cell line effect η j was dropped from the model, and the treatment effect τ i now represented the Braak stage. The F-test was performed on the Braak stage factor. Again, two ANOVA models were used: the ratio model which included the array effect αk(j), and the intensity model without array effect. Consequently, four lists of p-values were generated: ratio-based and intensity-based p-values for dataset B1, and ratio-based and intensity-based p-values for dataset B2.
We did not apply any multiple testing corrections for our purpose, since a criterion like False Discovery Rate (FDR) might distort the comparison between the models somewhat. Also, since both splits contain an equal numbers of samples, sample size 'bias' is absent.
Comparison ratio and intensity data, reproducibility calculations and model selection
For the cell line dataset, direct ratio measurements between co-hybridized sample pairs were compared with in silico reconstructed ratios of the two intensity measurements of the same sample pair, as measured on separate arrays, and against different samples. For example, in dataset C1, the directly measured ratios between samples T1 and T2 on array 1, were compared with the reconstructed ratios between T1 on array 4, and T2 on array 2 (Additional file 1). To eliminate possible confounding effects of noise introduced by genes expressed at very low levels, only genes with an average log-transformed intensity levels greater than 7 were used. To compare the overlap between gene rankings based on the ratio and intensity models, genes were ordered by p-value and assigned to bins containing 1,000 genes. The fraction of overlap then was defined as the amount of genes ranked in the same bin by both models, divided by the size of the bin.
Both for the cell line datasets C1 and C2, and the human brain datasets B1 and B2, reproducibility between the replicated datasets was determined as follows. First, the correlation between sets of p-values was calculated using Spearman's rho. Second, to assess the proportion of genes with similar ranks between replicates, genes were ordered by p-value. For bins of increasing size (10 to 1,000 genes, by increments of 10 genes), the proportion of overlap was defined as the fraction of genes, occurring in both sets. Third, Bayesian information criterion (BIC) model selection was used to score the ratio- and intensity-based linear models for each array feature. Information criterion methods, which aim to determine which set of model parameters the data support best, penalize models with more unknown parameters in order to select a model with a lower generalization error and hence more reproducible results . The preferred model was defined by the model with the lowest BIC value. BIC calculations were performed using the nlme package in R.
Funding for the human keratinocyte cell line experiment was provided by the EU Research Sixth Framework programme, project DISMAL, project number LSHC-CT-2005-018991. The human brain experiment received funding from the Royal Dutch Academy of Sciences, Innovation Fund and Solvay Pharmaceuticals.
- Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, Landfield PW: Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Natl Acad Sci USA. 2004, 101: 2173-2178. 10.1073/pnas.0308512100.PubMed CentralPubMedView Article
- Bossers K, Meerhoff G, Balesar R, van Dongen JW, Kruse CG, Swaab DF, Verhaagen J: Analysis of gene expression in Parkinson's disease: possible involvement of neurotrophic support and axon guidance in dopaminergic cell death. Brain Pathol. 2009, 19: 91-107. 10.1111/j.1750-3639.2008.00171.x.PubMedView Article
- White KP, Rifkin SA, Hurban P, Hogness DS: Microarray analysis of Drosophila development during metamorphosis. Science. 1999, 286: 2179-2184. 10.1126/science.286.5447.2179.PubMedView Article
- Blalock EM, Chen KC, Stromberg AJ, Norris CM, Kadish I, Kraner SD, Porter NM, Landfield PW: Harnessing the power of gene microarrays for the study of brain aging and Alzheimer's disease: statistical reliability and functional correlation. Ageing Res Rev. 2005, 4: 481-512.PubMedView Article
- Kerr MK, Churchill GA: Experimental design for gene expression microarrays. Biostatistics. 2001, 2: 183-201. 10.1093/biostatistics/2.2.183.PubMedView Article
- Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR: Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol. 2006, 24: 1140-1150. 10.1038/nbt1242.PubMedView Article
- Buffart TE, Israeli D, Tijssen M, Vosse SJ, Mrsic A, Meijer GA, Ylstra B: Across array comparative genomic hybridization: a strategy to reduce reference channel hybridizations. Genes Chromosomes Cancer. 2008, 47: 994-1004. 10.1002/gcc.20605.PubMedView Article
- Schwarz G: Estimating the dimension of a model. Annals of Statistics. 1978, 6: 461-464. 10.1214/aos/1176344136.View Article
- Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol. 2001, 8: 625-637. 10.1089/106652701753307520.PubMedView Article
- Hoen PA, Turk R, Boer JM, Sterrenburg E, De Menezes RX, Van Ommen GJ, Den Dunnen JT: Intensity-based analysis of two-colour microarrays enables efficient and flexible hybridization designs. Nucleic Acids Res. 2004, 32: E41-10.1093/nar/gnh038.PubMed CentralPubMedView Article
- Smyth GK: Limma: linear models for microarray data. Bioinformatics and Computational Biology Solutions using R and Bioconductor. Edited by: Gentleman R, Carey V, Dudoit S, Irizarry RA, Huber W. 2005, 397-420. full_text.View Article
- Smyth GK, Speed T: Normalization of cDNA microarray data. Methods. 2003, 31: 265-273. 10.1016/S1046-2023(03)00155-5.PubMedView Article
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185.PubMedView Article
- Braak H, Braak E: Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol (Berl). 1991, 82: 239-259. 10.1007/BF00308809.View Article
- Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. 2001, Springer-Verlag New York, LLCView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.