Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments

Background The requirement of a large amount of high-quality RNA is a major limiting factor for microarray experiments using biopsies. An average microarray experiment requires 10–100 μg of RNA. However, due to their small size, most biopsies do not yield this amount. Several different approaches for RNA amplification in vitro have been described and applied for microarray studies. In most of these, systematic analyses of the potential bias introduced by the enzymatic modifications are lacking. Results We examined the sources of error introduced by the T7 RNA polymerase based RNA amplification method through hybridisation studies on microarrays and performed statistical analysis of the parameters that need to be evaluated prior to routine laboratory use. The results demonstrate that amplification of the RNA has no systematic influence on the outcome of the microarray experiment. Although variations in differential expression between amplified and total RNA hybridisations can be observed, RNA amplification is reproducible, and there is no evidence that it introduces a large systematic bias. Conclusions Our results underline the utility of the T7 based RNA amplification for use in microarray experiments provided that all samples under study are equally treated.


Background
The utility of microarrays for disease classification, prognosis and progression, or the identification of target genes for novel therapeutic approaches is well documented [1][2][3][4][5] and is likely to change our views of disease development [6,7]. The major bottlenecks for these experiments are the limited availability and low quality of tissue or RNA. The issue of quality can be easily solved using appropriate tissue handling techniques. However, solid tumours are usually too small to yield enough RNA for direct use in microarray experiments. Therefore, amplification techniques have to be applied to increase the amount of available RNA and minimise the required starting material. Two basically different approaches for RNA amplification have been used by various laboratories: In the first, the polymerase chain reaction (PCR) is performed to increase the amount of sample either exponentially [8,9] or linearly [10]. The second approach applies in vitro transcription with T7 RNA polymerase [11][12][13][14] for linear amplification of the sample. Both approaches differ in the lengths of the double-stranded cDNA molecules generated in the reverse transcription process prior to amplification. Since enzymatic modifications on a highly complex mixture of RNA/cDNA molecules are performed during the reaction, it can be presumed that noise and systematic biases are introduced. Thus, it is essential to quantify the effect of amplification and examine the reproducibility of the method.
To study genetic changes during breast cancer, we have developed a cDNA microarray consisting of 7,347 genes and expressed sequence tags (ESTs). We have established a robust method to amplify total RNA via a T7 RNA polymerase based process [11] and used this method for the amplification of RNA from different tissues and hybridised the corresponding labeled cDNA products on cDNA microarrays. Our goals were to assess the reproducibility of the amplifications, compare the results between unamplified and amplified RNA (aRNA), and quantify the bias introduced to the data by the RNA amplification process.

Amplification factors
We used the linear amplification technique of van Gelder et al. [11] which is based on a double-stranded cDNA synthesis with an oligo-dT primer coupled to the T7 RNA polymerase promoter and subsequent in vitro transcription into aRNA using T7 RNA polymerase. In 30 separate reactions, we achieved amplification factors of 150-560 when 100-3,200 ng of total RNA was used as starting material (table 1). The amplification of no more than 100-200 ng of total RNA (14 reactions) resulted in a 368 fold average increase of poly (A) + RNA equivalents whereas the amplification of 1,000-3,200 ng (16 reactions) resulted in an average amplification factor of 253. All amplification factors were calculated according to the assumption that 5% of total RNA correspond to poly (A) + RNA. The quality of all total and amplified RNA was checked using the Agilent 2100 Bioanalyzer. Only highquality RNA was used for amplification because low-quality RNA can influence the outcome of amplification. The measurement of the amplified RNA showed a length reduction of the first-round amplified RNA compared to the total RNA (not shown). The sizes of the amplified RNA molecules were distributed between 100 and 4,500 bases. In total, 69 microarray hybridisations were used for the analysis of the different parameters and were performed as summarised in tables 2 and 3. All microarrays used in these experiments were taken from two spotting runs in which a total of 200 arrays had been produced.

Comparability of data sets obtained from amplified and non-amplified RNA
In 49 hybridisation experiments (tables 2 and 3), we compared the generalised log ratios of 7,347 genes using amplified or total RNA for labelling ( figure 1). When labelled samples derived from total (Cy-3) vs. total (Cy-5) RNA were hybridised on 10 different microarrays, the boxplots, indicating the variation of the generalised log ratios, were generally wider in comparison to those of a second set of 10 microarrays with samples derived from amplified (Cy-3) vs. amplified (Cy-5) RNA ( figure 1A). This result holds for the cases where differential expression is not expected (figure 1A, Ta and Tb compared to Aa and Ab) due to the use of the same tissue source for RNA isolation or whether it is anticipated (figure 1A, Tc and Td compared to Ac and Ad) since different tissues were used. These findings indicate a considerably lower variability of generalised log ratios of aRNA vs. aRNA hybridisations compared to total vs. total RNA hybridisations.
In a second experiment (figure 1B), we hybridised total RNA against aRNA from the same tissue samples on 15 microarrays. The result (figure 1B, group A) shows that the degree of variation of generalised log ratios is even larger than in the case of the total vs. total RNA hybridisation ( figure 1A). Although no differentially expressed gene is expected in these experiments, the high variation of the generalised log ratios indicates decreased sensitivity of the measurement if total RNA is compared to aRNA on the same array. Therefore, different types of RNA (amplified and total RNA) should not be compared within one experiment. In contrast, only small variations were observed for the generalised log ratios of the corresponding 14 microarray hybridisations with aRNA vs. aRNA (figure 1B, groups B and C). In summary, the variability of the generalised log ratios of the 49 microarray hybridisations decreased from (1) total vs. aRNA, (2) total vs. total RNA to (3) aRNA vs. aRNA.

Linearity of amplification reactions
Prior to performing large scale microarray gene expression studies using RNA amplification, it is important to address the question as to whether the enzymatic modification of the RNA introduces a systematic bias to the list of differentially expressed genes. In other words: Is the amplification "linear"? To approach this problem, we hybridised 20 microarrays with samples derived from either total vs. total RNA or aRNA vs. aRNA from the same sources (table 2). The arithmetic mean of the resulting values from all hybridisations was determined for each gene and each channel separately (figure 2, Cy-3 channel not shown). In the ideal situation (100% correlation, identity line), the overall correlation coefficient should be equal to 1. The scattering of values around the identity line reflects the influences of gene specific RNA amplification factors and noise. For both channels, the scattering of values is homogeneously distributed on both sides of the identity line, which indicates an overall linear amplification. The correlation coefficients between total and aRNA calculated from the experiments were 0.87 and 0.89 for the individual Cy-5 and Cy-3 channels, respectively. Thus, there is a clear correlation between signal intensities resulting from non-amplified RNA as compared to amplified RNA, suggesting that aRNA hybridisations correspond to those obtained with unmodified samples.

Reproducibility of RNA amplification
Several other groups [8,10,14,15] have applied Pearson correlation coefficients between log ratios in order to show the reproducibility of the RNA amplifications. High correlation coefficients are 0.8 or greater. We performed pairwise calculations of correlation coefficients using six hybridisations with aRNA samples (table 2, correlation). In result, the coefficients varied from 0.95 to 0.99 indicating a very high correlation between the individual amplification reactions.
A statistically solid way to estimate the reproducibility of hybridisations is the F statistic in which the ratio of variances between the different amplification groups is compared to that within the amplification groups. When the between variation is almost the same as the within variation, the logarithmic F-value comes to lie around zero, indicating high reproducibility; when the variation between is larger or smaller than within hybridisations, the F-value becomes larger or smaller than zero, respectively. To estimate the ratios, we performed four amplification reactions from the same total RNA source. The labelled products were hybridised in triplicate against an amplified reference sample on 12 microarrays (table 2, reproducibility). These experiments were carried out on two different days (table 2; day 1, arrays 58-61; day 2, arrays 62-69).  For each gene from these hybridisations, an F-value was calculated and plotted in a histogram (figure 3). The negative median log F-value (-0.66) indicated a higher variance within than between the amplification groups. When only hybridisations performed on day 2 were considered, a median log F of 0.20 was found. These results suggest that other sources of variation (e.g. day-to-day variation, labelling, chip-batches) are of greater influence on the total variation of microarray hybridisations rather than the RNA amplification itself.

Correspondence of gene expression data obtained with amplified and total RNA
For the biological interpretation of the data, the crucial question concerning the use of RNA amplification methods is: Which proportion of the differentially expressed genes identified by hybridisation of total vs. total RNA corresponds to those found by hybridisation of aRNA vs. aRNA? To examine this question, we used eight microarray hybridisations (table 2), four of which were performed using total RNA vs. total RNA from different tissues, the other four with aRNA vs. aRNA derived from these total RNA samples. The generalised log ratios of four hybridisations were averaged and plotted against each other. For the highest 2,000 out of 7,347 genes, the correlation coefficient was 0.85 (figure 4). When low intensity genes were included, the correlation dropped to 0.72 (not shown), owing to the lower signal to noise ratios for these genes. The distribution of generalised log ratios in the aRNA hybridisations appears to be compressed (slope of 0.54 for the regression line), indicating a decreased sensitivity of the aRNA results compared to total RNA hybridisation data. Within the group of 2,000 highest expressed genes we also looked in each case, total RNA and aRNA experiments, at the 500 highest differentially expressed genes. The minimum fold change was 2.6 for total RNA and 1.8 for amplified RNA experiments. Out of 344 common genes (69%) all showed the same direction of regulation.

Discussion
In vitro transcription of cDNA for the amplification of RNA is a commonly used method that requires low amounts of input total RNA, typically less than 1 µg, often as little as 100 ng. This amount can be reliably reduced further to the level of RNA from a few hundred cells when microdissected tissue is used for RNA isolation. Thus, the method is an absolute prerequisite for gene expression  microarray experiments in which the amount of sample is the major limiting factor. We have obtained average amplification factors of more than 350 poly (A) + equivalents in a single amplification round. With these values, the amount of aRNA derived from 1 µg total RNA is sufficient to perform at least eight microarray hybridisations with 2 µg aRNA each. The benefit is twofold: First, it is possible to perform multiple experiments with minute amounts of sample and perform a statistical analysis of the data. Second, additional data validation experiments (e.g. quantitative RT-PCR) can be performed using the same physical sample. These benefits will further increase in the future when array based diagnostics become available.
Despite the obvious advantages of the amplification method, systematic analyses of the impact of this enzymatic modification on the microarray gene expression results are scarce. From the analysis of the variability of the results, we conclude that it is vital to use equally treated samples for any particular experiment: If one sample requires amplification, all other samples should be amplified as well. Comparison of data from amplified and non-amplified RNA results in the widest log ratio distribution of all combinations of aRNA and total RNA Comparison of generalised log ratios in hybridisations on 7,347-gene microarrays Figure 1 Comparison of generalised log ratios in hybridisations on 7,347-gene microarrays Each boxplot (x-axis) characterises the distribution of the generalised log ratios (y-axis) of A) total RNA vs. total RNA and aRNA vs. aRNA from the same origin (duplicate hybridisations of Ta, Tb and Aa, Ab, respectively) and from different tissue sources (triplicate hybridisations of Tc, Td and Ac, Ad, respectively) and B) 15 microarray hybridisations from the same tissue sources using total RNA vs. aRNA (A) and 14 hybridisations using aRNA vs. aRNA (B and C).
samples. This may lead to the identification of too many false positive differentially expressed genes. The opposite effect was seen when amplified vs. amplified RNA was used: The log ratio plots are slimmer when compared to total vs. total RNA hybridisations. Although this leads to a loss of sensitivity of the experiment, the criteria for the identification of differentially expressed genes are more stringent when aRNA rather than total RNA samples are used. RNA amplification does not introduce a systematic bias to the gene expression data: Microarray hybridisation results obtained with aRNA are comparable with those from total RNA. The relationship is linear over the whole dynamic range. Despite the linearity for most genes, however, one should consider the possibility that, due to sequence or structure-specific properties, a minority of genes have gene-specific amplification factors different to the rest. We cannot exclude the existence of such genes; however, we are not aware of a systematic study that has addressed this question down to the level of individual genes.
RNA amplification reactions are very reproducible. We found high correlation coefficients between repeated microarray hybridisations indicating that there is no major bias due to the amplification reaction that affects the results obtained from separate amplification reactions. Thus, in the light of all other well known technologically caused sources of error in microarray experiments which have to be considered carefully when the experiments are designed, the error introduced by RNA amplification is small. The differentially expressed genes identified by hybridisation of aRNA vs. aRNA correspond well with those found by hybridisation of total RNA. For the differentially expressed genes with the 27% highest signal intensities, the correlation coefficient was 0.85.

Conclusions
We showed that the use of the T7 RNA polymerase based amplification prior to microarray experiments is wellgrounded because no systematic bias is introduced. Although variations in differential expression between amplified and total RNA hybridisations can be observed, the amplifications are reproducible, and the results of aRNA correspond to those obtained with total RNA. The slightly lower sensitivity of aRNA to detect differential gene expression can be compensated by the use of additional technical improvements or biological repetitions coupled to statistical data analysis. Figure 2 Correlation of normalised signal intensities Averaged vsn-normalised signal intensities [quanta] of unamplified samples (x-axis) were compared to amplified ones (y-axis) for the Cy-5 channel. Fourteen microarray hybridisations were used (for details, see tables 2 and 3). A symmetric scatter around the identity line indicates an overall linear amplification. Similar results were obtained for the Cy-3 channel.

Correlation of normalised signal intensities
Reproducibility of RNA amplifications Figure 3 Reproducibility of RNA amplifications The distribution (frequency, y-axis) is plotted against the gene specific log F values (x-axis). aRNA samples from four distinct amplification reactions were used in 12 separate microarray hybridisations.

Tissue samples and RNA isolation
Tissue samples: Intraperitoneal injection of the human breast cancer cell line mcf7 into severe combined immunodeficiency (SCID) mice resulted in the growth of a solid tumour mass [16][17][18] varying in diameter from 1 to 7 mm. Six to nine weeks post injection, mice were sacrificed by application of CO 2 , and the tumour mass was immediately frozen in liquid nitrogen. Human placental tissue was frozen as soon as possible. Total cellular RNA was isolated with the Trizol-method (TriFast, peqlab, Erlangen, Germany), following the manufacturers instructions, after homogenisation with a Mikro-Dismembrator S (Braun Biotech, Melsungen, Germany). The quality of the RNA samples was checked by the Agilent 2100 bioanalyzer (Agilent Technologies, Waldbronn, Germany).
Only high-quality RNA samples (ratio 28S/18S rRNA > 1.8) were selected for the experiments.

RNA amplification
The cDNA synthesis was performed using a cDNA Synthesis System Kit (Roche Diagnostics, Mannheim, Germany) according to the manufacturers instructions. The ds cDNA was purified by phenol/chloroform extraction [19] pre-cipitated in 5 M NH4OAc [19] and resuspended in 8 µl RNase-free water.
In vitro transcription [11] was performed using the AmpliScribeTM T7 High Yield Transcription Kit (Epicentre Technologies, Madison, WI, USA) according to the manufacturers instructions with the exception of increasing the incubation time to 4 hours. The newly synthesized aRNA was extracted in phenol/chloroform/isoamylalcohol as described above. Unincorporated dNTPs were removed by chromatography with MobiSpin S-300 columns (MoBiTec GmbH, Göttingen, Germany). Ethanol precipitation was performed as described in [19] and the resulting pellet was resuspended in 20-50 µl RNasefree water.

Microarray spotting
Glass slides used for this study carried 7,347 breast specific cDNA clones selected from the Human UniGene 1 clone set (German Resource Centre for Genome Research, Berlin, Germany). The clones were amplified by PCR and spotted in a "replicate array" design on silane-prep™ slides (Sigma Diagnostics, St. Louis, USA) to produce 14,694 individual spots. The slides were rehydrated, and the DNA was denatured with boiling water treatment prior to washing with 0.2% SDS, MilliQ-H 2 O, 95% and 100% Ethanol. After the washing procedure, the microarrays were dried with compressed air.

RNA labelling and hybridisation
2 µg amplified antisense RNA (aRNA) were mixed with 500 ng random hexamer primers, incubated at 70°C for 10 min and cooled on ice. Alternatively, 10 µg total RNA were mixed with 500 ng (dT) 17 primer, incubated at 70°C for 10 min and cooled on ice. The labelling reaction was performed in a 12.5 µl reaction volume using 2.5 µl 5x RT buffer (Invitrogen, Karlsruhe, Germany),  [19] and 70 µl Cot1-DNA (10 ng/µl). The 30 µl probe was heat denatured at 65°C for 2 min and hybridised to cDNA glass microarrays in a hybridisation chamber (Corning Inc., Acton, USA) over night at 37°C.
Correspondence of gene expression data between amplified and non-amplified samples Figure 4 Correspondence of gene expression data between amplified and non-amplified samples The averaged generalised log ratios of the total RNA hybridisations (x-axis) are plotted against those of the aRNA hybridisations (y-axis).
Only the 2,000 highest intensity genes were considered.
After hybridisation the microarrays were washed at 25°C in a water bath with 1x SSC containing 0.1% SDS once for 15 min, and twice for 10 min. This was followed by washing twice with 0.1x SSC containing 0.1% SDS for 10 min. Washing steps were completed in 70% and 95% ethanol before the microarrays were dried.

Image quantification
The hybridised arrays were scanned with the GenePix 4000 B microarray scanner (Axon Instruments Inc., Union City, CA, USA), and the scanned images were analysed using GenePix Pro 4.0 software (Axon Instruments). Spot intensities were obtained by subtracting the median brightness of a region around the spot from the median of the region within.

Data analysis
For each of the datasets α ... η (table 2), the spot intensities were calibrated and transformed by the VSN method [20]. Differences between the resulting values are referred to as "generalised log ratios". All 14.694 spots from each slide were analysed, without filtering or thresholding procedures.
The reproducibility of different amplification reactions was measured using the F statistic to compare the variance for each gene within and between groups of hybridisations using amplified samples. The F value is defined as the ratio of the between and within groups mean squares mSS B and mSS W , whereas the mean square is defined as the sum of squares divided by the degrees of freedom: given and , where y ikl denotes the measured value of the lth repetition of the kth amplification reaction for RNA transcript i. K denotes the number of all amplification reactions and L k the number of replicas in group k.

Authors' contributions
JS carried out the experiments and performed study design and data analysis with the help of AB and WH. PK, MH, and JV provided human tissue samples and performed the experiments concerning induction of human tumours in SCID mice. AP and HS initiated the study and supervised the data generation and data analysis. All authors read and approved the final manuscript.