The cost of reducing starting RNA quantity for Illumina BeadArrays: A bead-level dilution experiment
© Lynch et al; licensee BioMed Central Ltd. 2010
Received: 30 April 2010
Accepted: 6 October 2010
Published: 6 October 2010
The demands of microarray expression technologies for quantities of RNA place a limit on the questions they can address. As a consequence, the RNA requirements have reduced over time as technologies have improved. In this paper we investigate the costs of reducing the starting quantity of RNA for the Illumina BeadArray platform. This we do via a dilution data set generated from two reference RNA sources that have become the standard for investigations into microarray and sequencing technologies.
We find that the starting quantity of RNA has an effect on observed intensities despite the fact that the quantity of cRNA being hybridized remains constant. We see a loss of sensitivity when using lower quantities of RNA, but no great rise in the false positive rate. Even with 10 ng of starting RNA, the positive results are reliable although many differentially expressed genes are missed. We see that there is some scope for combining data from samples that have contributed differing quantities of RNA, but note also that sample sizes should increase to compensate for the loss of signal-to-noise when using low quantities of starting RNA.
The BeadArray platform maintains a low false discovery rate even when small amounts of starting RNA are used. In contrast, the sensitivity of the platform drops off noticeably over the same range. Thus, those conducting experiments should not opt for low quantities of starting RNA without consideration of the costs of doing so. The implications for experimental design, and the integration of data from different starting quantities, are complex.
Gene expression microarrays have become a routine analysis tool; from their introduction  to recent headline publications [2–4] their widening use has been primarily down to better understanding of how to design [5, 6], use and analyse [7, 8] microarray experiments. An important, if somewhat forgotten, design issue has been the amount of starting material needed to produce high quality microarray data. Ten years ago, around 10 μ g of total RNA was required and even three years ago many labelling protocols required 1 μ g. The introduction of Illumina BeadChips with a standard labelling reaction requiring only 250 ng of total RNA made analysis of some previously unconsidered sample types possible; e.g. limited clinical samples or samples requiring considerable microdissection and pooling.
Whilst many researchers continue to push the limits of starting materials , development of robust standard labelling protocols has further decreased the amount of RNA required for microarrays. Until recently 250 ng of starting mRNA was recommended for the Illumina BeadArray platform. Now 50 ng to 100 ng is suggested http://www.illumina.com/technology/direct_hybridization_assay.ilmn. If one can indeed use so little starting material then this is of tremendous importance in terms of the scope of experiments that become possible. However, there is a wealth of literature that is based upon 250 ng, and it is important that future results are comparable to those obtained previously. One small comparison has previously been made . This study found that reproducible signal was obtainable from as little as 25 ng, but the study was not large enough to quantify the costs of such an approach.
Microarray dilution experiments [11, 12], where two samples are mixed together in a number of differing (known) ratios and those mixtures hybridized to arrays, have proven to be valuable tools for the comparison and investigation of microarray platforms, most notably in the MAQC project . We employ a nine-level dilution design to investigate the effect of changing the quantity of starting mRNA on the performance of Illumina BeadArrays. We consider the previously recommended level of 250 ng, the current recommended levels of 100 ng and 50 ng and one other (10 ng).
Here, we examine the costs and consequences of reducing the amount of starting RNA, with consideration for the issues of experimental design and meta-analysis, while also providing a unique bead-level dilution experiment to serve as a public resource to the Illumina-using community. We use the Illumina HumanWG-6 V3 BeadArray, analysed at the bead-level as we have previously recommended . One of the benefits of using the bead-level data is that we can analyse separately the two array-sections assigned to any one sample, thus allowing inferences to be made about the more flexible HT12 BeadArrays also. In addition to our purposes, we are creating a unique public resource, and have designed our experiment to be generally useful to the community.
Six samples can be hybridized to the Illumina HumanWG-6 V3 chip, each sample on two array-sections of approximately 1, 000, 000 beads that are distributed amongst approximately 50, 000 bead-types. We treat the two sections as separate arrays for the purposes of analysis, due to previously observed inter-section differences [14, 15]. This also has the effect of making our results comparable to those one might expect from the Illumina HT-12 array which takes 12 samples, allocated one section each.
We have used two reference RNA samples, previously employed in the MAQC study , which have subsequently become a standard for microarray  and next-generation sequencing  studies. These are the Stratagene Universal Human Reference RNA (hereafter "UHRR"), and the Ambion Human Brain Reference RNA (hereafter "Brain"). Nine levels of mixture, including the four employed in the first MAQC study, were then created. These are 100:0, 99:1, 95:5, 90:10, 75:25, 50:50, 25:75, 10:90 and 0:100, where mixtures are presented as UHRR:Brain.
These nine levels allow for investigation of broad trends, and for the detection of subtle differences. Combined with the four levels of starting material that we are investigating, this leads to 36 samples to be arranged across six Illumina HumanWG6 V3 BeadChips. Clearly it would not be desirable to confound levels of starting material with BeadChips as we would be unable to distinguish our comparison of interest from technical variation. However it is desirable that our data resemble data from a 'real-world' experiment else they have no external validity and, in general, experiments are conducted on BeadChips using only one level of starting material.
Our design was chosen to address this tension between internal and external validity. Each BeadChip was run with samples from two starting quantities of RNA (three samples from both chosen starting quantities), and each possible combination of the two starting quantities was run once and only once amongst the six BeadChips. Full details of the design are given in Section 1 of Additional File 1.
Stock UHRR tubes were prepared following manufacturer's recommendation and pooled to create a stock of 1 mg/ml; Brain RNA was received at 1 mg/ml. The quality was checked using the Agilent Bioanalyser. The RNA was accurately diluted to a working stock of 100 ng/μ l and the dilution series was created to the specifications given above. The minimum pipetting volume used was 10 μ l.
The Illumina TotalPrep-96 Kit (4397949) was used to process the samples using the range of input concentrations in question. For the 50 ng and 10 ng input quantities a 1:10 dilution of working RNA was used. Quality and quantity of the cRNA was checked before proceeding with hybridisation to Human WG-6 V3 BeadArray. The Illumina WGGX DirectHyb Assay Guide (11286331 RevA) protocol was followed for hybridisation, washing and scanning of the BeadArray, with the scanner set to return bead-level data (Additional File 1, Section 2). Quality assessment was achieved via examination of metrics files (Additional File 1, Section 3), agreement with previous MAQC data sets (Additional File 1, Section 4), and performance of housekeeping controls (Additional File 1, Section 5).
Preprocessing and statistical analysis
Illumina BeadScan files were processed and analysed using the beadarray package  from Bioconductor. Arrays were pre-processed on the log2-scale on a per-array-section basis. BASH  was used to remove high-frequency spatial artefacts, followed by outlier removal (outliers being defined as observations more than three median absolute deviations from the median), and expression detection score calculation. The detection score is a standard measure for Illumina expression experiments, and can be viewed as an empirical estimate of the p-value for the null hypothesis that there is no expression. Between-array-section quantile-normalization was performed within each starting material level, and a non-linear regression model fitted across dilution levels within each starting RNA level.
Our approach demands reporting of raw, bead-level, Illumina data, which exceeds the MIAME requirements. As popular repositories are not designed for the storage of raw (bead-level) data from random arrays, the files are available to download from our website at http://www.compbio.group.cam.ac.uk/Resources/Dilution/Dilution.html.
where E mrp is the observed (normalized) log-intensity for probe p at starting RNA quantity r in mixture level m, c m is the proportion of the mixture that is UHRR, U rp is the intensity associated with probe p at starting RNA level r in the UHRR sample, and B rp is similarly defined for the Brain sample. The ϵ mrp are independent measurement errors with mean zero and standard deviation σ rp .
This model implicitly assumes a linear relationship between quantity of RNA and measured intensity. This assumption is known not to hold over the full range of observed intensities for microarrays , and specifically for Illumina BeadArrays . While some models allow for non-linearity , they do not relate it to the known physico-chemical causes. To do so would be difficult and, in any case, would not obviously be advantageous in our situation.
and we fit this model in R using the nls() function, weighting each observation by the number of beads that contributed to the observation. Under this formulation, it is clear that the test of Δ = 0 from the summary.nls() function in R provides an approximate and quick test of a difference in log-intensities.
Restricting the analysis-group of bead-types
We have re-annotated the bead-types on the array , and have identified 23, 562 "perfect" bead-types (using the August 2009 annotation). These are bead-types that have a full 50 mer match to a reliable transcript, and do not possess additional undesirable properties (e.g. mapping to transcripts masked by repeat regions, having a non-unique transcriptomic match, mapping to transcripts that do not align well to the reference genome, etc.). Additionally, we define an 'analysis-group' of bead-types as a subset of these perfect bead-types that possess two further properties: 1) That their GC content is conducive to hybridization (i.e. in the range of 20-35 bases), which excludes a further 506 bead-types, and 2) That they occur at least six times on each array-section (see Additional File 1, Section 6). All analyses will be restricted to this 'analysis-group' unless otherwise stated.
Numbers of beads
Numbers of beads
Quantity of starting RNA:
Removed by BASH
Removed in summarization
In analysis-group bead-types
It has been observed previously that spatial artefacts can be associated with nearby regions where beads are non-decoded , so it may not be coincidental that the experiment with the greatest number of beads loses fewest to spatial artefacts. The differing numbers of beads may cause concern, although it should be noted that the median number of replicates for a bead-type only varies from 21 for the 10 ng experiment to a still very healthy 19 for the 100 ng experiment. The lack of monotonicity is also helpful; the trends that we show do not correlate with the total bead-numbers, suggesting that these numbers are not driving the results. Whilst we take 250 ng of starting RNA as our gold standard for comparison, we can also gain reassurance through comparisons to the 100 ng experiment which contained fewest beads.
As noted above, we restrict analyses to an analysis-group containing only 'perfect' bead-types, with desirable GC composition and at least six beads on each array-section. This reduces the number of bead-types considered to 21, 627. This also has a marginal effect on improving the balance between experiments in terms of the numbers of beads analysed.
Detection of expression
Quantity of starting RNA:
...at least one array-section
...both UHRR and Brain
mean number of array-sections
Consistency in expression detection between quantities of starting RNA
Expression of control bead-types
Control bead-type summary
Quantity of starting RNA:
Negative controls detected in UHRR
Negative controls detected in Brain
Negative controls detected in all array-sections
Negative controls detected in at least one array-section
Negative controls: median log-intensity UHRR
Negative controls: median log-intensity Brain
Housekeeping controls: median log-intensity UHRR
Housekeeping controls: median log-intensity Brain
Such observations could have explanation other than the bead-types showing specific signal. For instance, thermodynamic variation could lead to some negative control bead-types regularly being called as 'detected', but evidence of differential expression is harder to explain. Using Benjamini-Hochberg control for false discovery rate, there are still three negative control bead-types for the 250 ng starting material experiment (two for 100 ng, eight for 50 ng, and four for 10 ng) that show differential expression. The greatest evidence of a negative control showing specific hybridization is for bead-type ILMN_1343923 (Additional File 1, Section 8).
The log-intensity levels for the housekeeping control bead-types decrease at a greater rate than those for the negative control bead-types (except when saturation effects are apparent). Thus the log-fold-change in intensities from housekeeping gene to negative control (a measure of signal to noise) decreases with the amount of starting RNA. This change in performance is apparent even at 100 ng levels of starting material. Other control bead-types on the Illumina BeadArray platform are not sample dependent, and do not vary considerably between starting quantities of RNA.
Magnitude of expression
Squared rank correlations
Quantity of starting RNA
It is also clear from Figure 2 that intensities generally decrease with the quantity of starting RNA, as was observed specifically for the control bead-types. This loss of signal leads to an apparent diminishing of technical biases (e.g. if all signal were lost then we would cease to observe the diminishment of signal as target locations become more 5' along the gene), which should not be mistaken for a benefit.
Differential expression detected
amongst all bead-types
amongst analysis group
Consistency in detection of differential expression between quantities of starting RNA
Signal to noise
The variance of observations is not independent of their value. Since expression levels decrease as the quantity of starting RNA decreases, it is not possible to assess the change in variance as the quantity of starting RNA decreases, without simultaneously considering the level of expression.
From the non-linear model we can compare the estimate of the difference in expression levels to the estimated standard error of the difference. This side-steps the complications of the variance and fluorescence levels changing in a dependent manner as the amount of starting material changes. Considering only the analysis-group of bead-types, the median ratios of standard error to estimate are 0.23, 0.28, 0.31 and 0.52 for 250 ng, 100 ng, 50 ng and 10 ng of starting RNA respectively. The median ratios of the two signal to noise ratios are 1.12, 1.16 and 1.76 for 100 ng, 50 ng or 10 ng respectively comparing to a reference starting quantity of 250 ng.
Inevitably, there will be occasions when we wish to combine data sets generated using different quantities of starting material, possibly because we are performing a meta-analysis across different experiments, or possibly because not all samples within a single experiment can supply the desired quantity of starting RNA. Our analysis has, so far, considered the different quantities of starting material in this study as being different experiments, but we will now briefly consider strategies for combining them.
Consider if samples were run in strata of starting RNA, e.g. we have an experiment where some samples were run using 250 ng, while others were run using 50 ng. The strata were not balanced in terms of experimental design, so we may not wish to obtain two simple estimates for the parameters of interest (one from each stratum) and then combine the estimates. Our strategy for analysis may depend on whether some samples had been run in both strata.
Consider further that we only have Brain run at 50 ng and UHRR at 250 ng, and we wish to transform the 50 ng Brain data for comparison with UHRR. Essentially we wish to simulate a 250 ng Brain data set from this restricted data set, and can use the fact that we do have Brain run at 250 ng to assess the performance. We will consider both the scenario where we have only the two samples with which to work, and a second where we have additionally run UHRR at 50 ng.
Using the additional data (50 ng UHRR) makes only a small improvement to our ability to transform the 50 ng Brain data and in a real experiment running 50 ng of an additional sample may provide greater value to the ultimate analysis. We should be wary of trying to use a sample for both bias-estimation and analysis as there will be a lack of independence between these samples and all those that are bias-corrected using the results. Moreover the small improvement we see here, over the simpler quantile-normalization style approach, comes using samples that have large numbers of expressed genes. For bias correction of this nature to be useful, we need to observe a wide range of log-intensities which in turn requires large numbers of genes to be expressed. Thus the appropriateness of this more complicated method will be dependent on the size of the experiment and the nature of the samples being hybridized.
Implications for experimental design
A number of implications for experimental design are obvious. It is clear that all things being equal, of the range of starting quantities of RNA considered here, it is preferable to use 250 ng. If there are limitations to the amount of starting RNA available, then the more starting material used the better (within this range examined). Should the amount of available RNA differ between samples then more subtle decisions are required. On the basis of the signal-to-noise results, we can infer that if using 100 ng or 50 ng then the sample size would need inflating by a factor of at least 1.2 to achieve the same performance, while if using 10 ng, then in the region of three times the numbers will be required. Thus, when we have the choice and free from other pressures, reducing the starting RNA level is only desirable if it allows sample numbers to be increased by these factors.
The combination of multiple starting RNA levels in one experiment will be problematic. If we wish to normalize using data from the same sample hybridized from multiple quantities of starting RNA, then clearly we must stratify samples into a few starting quantities. If we do not have, or do not wish to make, recourse to replicate samples hybridized from several RNA quantities, and are simply going to normalize samples together, then there is merit in using as much starting RNA as possible for each sample, as was noted in the previous section.
In this scenario, where all samples are independent, it would still be hard to criticize a design that opted for a fixed number of starting levels, especially if this came at minimal cost to quality (i.e. 250 ng reduced to 220 ng but not to 10 ng) and allowed balance of experimental criteria to be achieved within each stratum of starting quantity. Such an approach is suboptimal by our criterion, but may be more robust to those unexpected events that befall real-world experiments.
We have presented a bead-level Illumina BeadArray dilution control experiment that will be a valuable resource for the Illumina analysis community. As intended, the experiment also answers an important experimental question regarding the required levels of starting RNA, however it also allows for a number of questions to be addressed regarding experimental design when large quantities of RNA are difficult to obtain.
We have shown that reliable signal is obtainable using as little as 10 ng of starting RNA. However we have also seen that lower levels of starting RNA are associated with a bias in expression levels (which may be correctable), and drop in sensitivity (which will not be).
This increase in noise implies that, if using less starting RNA, more samples would be needed in an experiment to achieve the same levels of precision. However, it seems that few false discoveries result from using even as little as 10 ng of starting RNA. Thus while a small experiment using a low starting quantity of RNA may fail to identify many subtle changes, one can have confidence in any changes that are reported.
This work was supported by the University of Cambridge, Cancer Research UK [grant number C14303] and Hutchison Whampoa Limited.
We thank Nuno Barbosa-Morais for advance access to annotation files. We thank Catherine Ingle, Manolis Dermitzakis and Barbara Stranger of the Wellcome Trust Sanger Institute, Matthew Ritchie of the Walter and Eliza Hall Institute of Medical Research, and Roslin Russell of Cancer Research UK for constructive design discussions.
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270 (5235): 467-70. 10.1126/science.270.5235.467.PubMedView Article
- Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008, 455 (7216): 1061-8. 10.1038/nature07385.View Article
- The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004, 306 (5696): 636-40. 10.1126/science.1105136.View Article
- Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME, Dermitzakis ET: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007, 315 (5813): 848-53. 10.1126/science.1136678.PubMed CentralPubMedView Article
- Kerr MK, Churchill GA: Experimental design for gene expression microarrays. Biostatistics. 2001, 2 (2): 183-201. 10.1093/biostatistics/2.2.183.PubMedView Article
- Yang YH, Speed T: Design issues for cDNA microarray experiments. Nature Reviews Genetics. 2002, 3 (8): 579-88.PubMed
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America. 1998, 95 (25): 14863-8. 10.1073/pnas.95.25.14863.PubMed CentralPubMedView Article
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004, 5 (10): R80-10.1186/gb-2004-5-10-r80.PubMed CentralPubMedView Article
- Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA: mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods. 2009, 6 (5): 377-82. 10.1038/nmeth.1315.PubMedView Article
- UHN Microarray Centre: Validation of the Illumina iScan System for gene expression. 2009, A UHN microarray centre technical note, University Health Network Microarray Centre, University Health Network Microarray Centre The Toronto Medical Discovery Tower 101 College Street, Rm 9-301 Toronto, Ontario, M5G 1L7 Canada, [http://www.microarrays.ca/info/technical_notes.html]
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (Oxford, England). 2003, 4 (2): 249-64.View Article
- Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P: Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Research. 2005, 33 (18): 5914-23. 10.1093/nar/gki890.PubMed CentralPubMedView Article
- Shippy R, Fulmer-Smentek S, Jensen RV, Jones WD, Wolber PK, Johnson CD, Pine PS, Boysen C, Guo X, Chudin E, Sun YA, Willey JC, Thierry-Mieg J, Thierry-Mieg D, Setterquist RA, Wilson M, Lucas AB, Novoradovskaya N, Papallo A, Turpaz Y, Baker SC, Warrington JA, Shi L, Herman D: Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nature Biotechnology. 2006, 24 (9): 1123-31. 10.1038/nbt1241.PubMed CentralPubMedView Article
- Dunning MJ, Barbosa-Morais NL, Tavaré S, Ritchie ME: Statistical Issues in the analysis of Illumina data. BMC Bioinformatics. 2008, 9 (85): 1-15.
- Shi W, Banerjee A, Ritchie ME, Gerondakis S, Smyth GK: Illumina WG-6 BeadChip strips should be normalized separately. BMC Bioinformatics. 2009, 10: 372-10.1186/1471-2105-10-372.PubMed CentralPubMedView Article
- Ha KC, Coulombe-Huntington J, Majewski J: Comparison of Affymetrix Gene Array with the Exon Array shows potential application for detection of transcript isoform variation. BMC Genomics. 2009, 10: 519-10.1186/1471-2164-10-519.PubMed CentralPubMedView Article
- Mane SP, Evans C, Cooper KL, Crasta OR, Folkerts O, Hutchison SK, Harkins TT, Thierry-Mieg D, Thierry-Mieg J, Jensen RV: Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing. BMC Genomics. 2009, 10: 264-10.1186/1471-2164-10-264.PubMed CentralPubMedView Article
- Dunning MJ, Smith ML, Ritchie ME, Tavaré S: beadarray: R classes and methods for Illumina bead-based data. Bioinformatics. 2007, 23 (16): 2183-2184. 10.1093/bioinformatics/btm311.PubMedView Article
- Cairns JM, Dunning MJ, Ritchie ME, Russell R, Lynch AG: BASH: a tool for managing BeadArray spatial artefacts. Bioinformatics. 2008, 24 (24): 2921-2922. 10.1093/bioinformatics/btn557.PubMed CentralPubMedView Article
- Holloway AJ, Oshlack A, Diyagama DS, Bowtell DDL, Smyth GK: Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis. BMC Bioinformatics. 2006, 7: 511-10.1186/1471-2105-7-511.PubMed CentralPubMedView Article
- Ramdas L, Coombes KR, Baggerly K, Abruzzo L, Highsmith WE, Krogmann T, Hamilton SR, Zhang W: Sources of nonlinearity in cDNA microarray expression measurements. Genome Biology. 2001, 2 (11): 11-10.1186/gb-2001-2-11-research0047.View Article
- Zheng X, Huang HC, Li W, Liu P, Li QZ, Liu Y: Modeling nonlinearity in dilution design microarray data. Bioinformatics. 2007, 23 (11): 1339-47. 10.1093/bioinformatics/btm002.PubMedView Article
- Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, Lynch AG, Tavaré S: A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Research. 2010, 38: e17-10.1093/nar/gkp942.PubMed CentralPubMedView Article
- Smith ML, Dunning MJ, Tavaré S, Lynch AG: Identification and correction of previously unreported spatial phenomena using raw Illumina BeadArray data. BMC Bioinformatics. 2010, 11: 208-10.1186/1471-2105-11-208.PubMed CentralPubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.