Optimization and evaluation of T7 based RNA linear amplification protocols for cDNA microarray analysis

  • Hongjuan Zhao1,

    Affiliated with

    • Trevor Hastie2,

      Affiliated with

      • Michael L Whitfield3,

        Affiliated with

        • Anne-Lise Børresen-Dale4 and

          Affiliated with

          • Stefanie S Jeffrey1Email author

            Affiliated with

            BMC Genomics20023:31

            DOI: 10.1186/1471-2164-3-31

            Received: 6 July 2002

            Accepted: 30 October 2002

            Published: 30 October 2002

            Abstract

            Background

            T7 based linear amplification of RNA is used to obtain sufficient antisense RNA for microarray expression profiling. We optimized and systematically evaluated the fidelity and reproducibility of different amplification protocols using total RNA obtained from primary human breast carcinomas and high-density cDNA microarrays.

            Results

            Using an optimized protocol, the average correlation coefficient of gene expression of 11,123 cDNA clones between amplified and unamplified samples is 0.82 (0.85 when a virtual array was created using repeatedly amplified samples to minimize experimental variation). Less than 4% of genes show changes in expression level by 2-fold or greater after amplification compared to unamplified samples. Most changes due to amplification are not systematic both within one tumor sample and between different tumors. Amplification appears to dampen the variation of gene expression for some genes when compared to unamplified poly(A)+ RNA. The reproducibility between repeatedly amplified samples is 0.97 when performed on the same day, but drops to 0.90 when performed weeks apart. The fidelity and reproducibility of amplification is not affected by decreasing the amount of input total RNA in the 0.3–3 micrograms range. Adding template-switching primer, DNA ligase, or column purification of double-stranded cDNA does not improve the fidelity of amplification. The correlation coefficient between amplified and unamplified samples is higher when total RNA is used as template for both experimental and reference RNA amplification.

            Conclusion

            T7 based linear amplification reproducibly generates amplified RNA that closely approximates original sample for gene expression profiling using cDNA microarrays.

            Background

            Gene expression profiling using complementary DNA (cDNA) microarrays is being applied for multiple purposes such as defining the taxonomy of different molecular subtypes of human breast and other cancers [110] and discovering biomarkers and therapeutic targets [11, 12]. A limitation of the use of this technology is that small specimens of human tissue, such as obtained by core needle or fine needle aspiration (FNA) biopsies, may not be sufficient for microarray hybridization using direct labelling protocols. Typical microarray labelling procedures require 2–4 μg poly(A)+ RNA or 25–50 μg total RNA per cDNA microarray. This amount of poly(A)+ RNA or total RNA can be obtained from samples of human tissue that weigh greater than 50–100 mg. However, core needle biopsies of breast cancers, for example, weigh in the 10–25 mg range and yield only 3–15 μg of total RNA. Small tumors identified using early detection strategies may thus be too small to excise a specimen with enough RNA for microarray analysis. A pilot study by Assersohn et al. [13] showed that only 15% of FNA samples from human breast cancers produced sufficient mRNA for expression array analysis. One approach to low specimen RNA input has been to use indirect labelling techniques to increase fluorescence signal intensity, such as with aminoallyl nucleotides. Although less expensive, we and other colleagues have found that indirect labelling techniques are not always reliable compared to direct labelling methods. For valuable tumor specimen, reliability is paramount. A very recent report used amino C6dT-modified random hexamers to prime cDNA synthesis in conjunction with aminoallyl-dUTP and increased fluorescence intensity enough such that as little as 1 μg of total RNA from cell lines gave sufficient signal for cDNA microarray hybridization [14]. The reliability of this method with human tumor specimen warrants further testing.

            RNA amplification techniques have been developed to address the need for sufficient RNA from tiny specimen for microarray hybridization. Other examples of specimen requiring amplification for genome-wide characterization of gene expression include purified populations of cells obtained by either flow cytometry, laser capture microdissection, breast ductal or bronchial lavage, or microendoscopy. Although one group has used unamplified total RNA extracted from ~2 × 104 microdissected cells for hybridization on 5000 clone membrane-based arrays [15], most groups perform RNA amplification for this purpose [1618], especially when using high-density slide-based arrays.

            The most commonly used mechanism for RNA amplification is a T7 based linear amplification method first developed by Van Gelder, Eberwine and coworkers [1921]. This method utilizes a synthetic oligo(dT) primer containing the phage T7 RNA polymerase promoter to prime synthesis of first strand cDNA by reverse transcription of the poly(A)+ RNA component of total RNA. Second strand cDNA is synthesized by degrading the poly(A)+ RNA strand with RNase H, followed by second strand synthesis with E. coli DNA polymerase I. Amplified antisense RNA (aRNA) is obtained from in vitro transcription of the double-stranded cDNA (ds cDNA) template using T7 RNA polymerase. Several protocols based on this mechanism have been developed and used in microarray analyses [16, 22, 23, 24, 25, 26, 27, 28].

            In spite of the increasing use of T7 based linear amplification techniques in the study of human disease, systematic evaluation of the fidelity and the reproducibility of amplification mechanisms has been limited. Such information is important to determine how well the amplified sample resembles the unamplified sample and the validity of applying this technique to the study of human tissues. A study by Wang and Miller et al. [24] described a T7 based amplification protocol modified with a template-switching (TS) primer used to theoretically generate a full-length ds cDNA. The gene expression of amplified total RNA from melanoma cell lines hybridized to 2000 gene microarrays was compared to that of unamplified total or poly(A)+ RNA using cluster analysis and determining the number of outlier genes between single experiments. Approximately 3–6% of genes were discordant outliers when analyzed using 3-fold or greater expression ratios in at least one hybridization and compared between total RNA, mRNA, and different amounts of input aRNA. Hu and co-workers [27] compared amplified and unamplified samples using total RNA obtained from human glioma cell lines and 2300 clone microarrays (printed with duplicates on the same array) to evaluate a similar T7 based protocol with a TS mechanism adopted from Wang and Miller et al. [24]. Their results were based on nine microarray experiments and showed concordance between amplified and unamplified samples, verifying four expressed and two differentially expressed genes using Northern and Western blotting and immunohistochemical assay.

            Since there are multiple T7 based amplification protocols in use, questions remain regarding the effects of differences between these protocols and how these differences translate when applied to solid tumors rather than cell lines on a genome-wide scale. A study from Incyte Genomics [25] examined gene expression of kidney vs. placenta RNA and RNA from matched normal and tumor renal tissue using their own T7 based amplification kit (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ratio cut-off of greater than or equal to 2-fold produced excellent correlation between samples amplified with different amounts of input poly(A)+ RNA but that a 3-fold differential expression ratio threshold should be set for comparing ratios between amplified and unamplified mRNA. Decreasing the input of tissue lysate increased gene expression discordance between amplified and unamplified samples. As more human tissues were tested, single round amplification produced a 200- to 500-fold yield, lower than the 700-fold yield originally found in their study and lower than yields reported in amplification studies using cell cultures.

            The differences between several T7 based linear amplification protocols are mainly the following: 1. whether a template-switching mechanism is used in the synthesis of second strand cDNA, 2. what enzymes are used in the synthesis of second strand cDNA, 3. how ds cDNA is purified ("cleaned up") prior to in vitro transcription, and 4. how in vitro transcription is performed. Information regarding the effects of these differences on the fidelity or reproducibility of amplification should help eliminate both unnecessary procedures and those actually detrimental to amplification. Determination of the range of input total RNA necessary to achieve reasonable fidelity and reproducibility is crucial for researchers dealing with very small specimens of human tissue.

            To answer these questions and to define an optimal protocol for T7 based linear amplification, we carried out a series of amplification reactions under different conditions using total RNA isolated from primary human breast carcinomas. The amplified samples were compared to unamplified samples on high-density cDNA microarrays containing 41,931–42,602 clones. We evaluated the effects of TS primer in second strand cDNA synthesis, DNA ligase activity in second strand cDNA synthesis, column purification of ds cDNA, and in vitro transcription time on the fidelity and reproducibility of amplification and the yield of aRNA. The effect of diminishing amounts of input total RNA was also tested.

            Results and Discussion

            Variation in cDNA microarray analysis of gene expression using unamplified poly(A)+ RNA

            In order to assess the reproducibility of microarray hybridization using standard methods, poly(A)+ RNA was isolated from both primary breast carcinoma BC2 and Universal Human Reference total RNA (Stratagene®). The BC2 poly(A)+ RNA labelled with Cy5 and the reference poly(A)+ RNA labelled with Cy3 were hybridized on five 42,000 clone cDNA microarrays. 16,333 clones had a signal greater than 50% above background on all five arrays. Three hybridizations were done on the same day using arrays from the same print batch and the average correlation coefficient between any two hybridizations was 0.97 ± 0.01, demonstrating a high reproducibility between parallel hybridizations done on the same day. Another two hybridizations were performed using poly(A)+ RNA isolated from BC2 total RNA three months later and a different print batch of microarrays. The correlation coefficient between these two arrays was 0.95, similar to the average correlation coefficient of the first three arrays. However, when the unamplified poly(A)+ RNA arrays performed weeks apart from different print batches were compared, the average correlation coefficient dropped to 0.89 ± 0.01, indicating that experimental variations due to differences in microarray printing, poly(A)+ RNA isolation, and RNA labelling and hybridization contribute a small but detectable change in results.

            In order to minimize experimental variations, we created a virtual poly(A)+ RNA expression array that idealizes the gene expression of sample BC2 by using the average expression level of each clone over multiple hybridizations. By "expression level" we mean the normalized log (base2) ratio of signal intensities of Cy5 (experimental sample) to Cy3 (reference) fluorescence. The idealized expression profile from the poly(A)+ RNA virtual array is used as our unamplified "gold standard" for data analyses involving BC2. The correlation coefficient between each individual poly(A)+ RNA array and the gold standard ranges from 0.95–0.97, similar to that observed between hybridizations performed on the same day. The gold standard virtual array therefore represents well-measured gene expression in the primary tumor and minimizes individual experimental variations.

            Template-switching does not affect the fidelity of amplification

            As previously mentioned, a protocol based on T7 based linear amplification published by Wang, Miller and coworkers [24] incorporated a TS mechanism [29] in the synthesis of second strand cDNA at the 5' end in order to generate full-length ds cDNA. This was speculated to be of advantage in the hybridization of unmapped sequences spotted on arrays and to enable higher temperature cDNA synthesis that would enhance sequence specificity. However, no experimental evidence was provided to support the idea that the TS mechanism increases the fidelity of amplification.

            To determine whether the addition of TS primer improves the fidelity of amplification, we compared gene expression profiles of aRNA amplified in the presence or absence of TS primer with expression profiles of unamplified poly(A)+ RNA. Total RNA isolated from primary breast carcinoma BC91 and reference total RNA were amplified with or without TS primer using the Wang-Miller protocol [24], except that aRNA was purified using an RNeasy® kit (Qiagen®). A virtual "gold standard" poly(A)+ RNA array was created for BC91 using the average expression level of four hybridization replicates of unamplified poly(A)+ RNA. A "virtual correlation coefficient" for a given amplification protocol was obtained by comparing the virtual amplified array (averaged expression level for each clone from multiple amplified samples) to the virtual gold standard unamplified array for BC91. To determine the correlation between individual amplified samples and the gold standard, an "average correlation coefficient" for a given amplification protocol was also calculated (the sum of correlation coefficients of individual amplified samples with the gold standard divided by the number of amplified samples tested for each condition).

            As shown in Table 1, using aRNA amplified from total RNA as reference, the expression profiles obtained in the absence of TS primer correlated with the gold standard slightly better than in the presence of TS primer, although the difference was not statistically significant. When poly(A)+ RNA rather than total RNA was amplified as reference, the correlation with the gold standard was slightly, but not statistically significantly, better with TS primer (Table 1). The correlation coefficient using aRNA amplified from total RNA as reference is higher than using aRNA amplified from poly(A)+ RNA as reference regardless of whether TS primer is used. This suggests that when total RNA from a tumor sample is amplified for microarray analysis, the reference RNA should also be amplified from total RNA.
            Table 1

            Correlation coefficients of amplified and unamplified expression levels of 14,044 genes selected according to the described criteria. Amplifications with or without TS primer and with two different ds cDNA cleanup protocols were performed on BC91 total RNA.

            Column for ds cDNA cleanup

            Bio-6

            G-50

            Reference RNA amplified

            Total RNA

            Poly(A)+ RNA

            Total RNA

            Poly(A)+ RNA

            w/o TS

            Virtual

            0.84 (n = 2)

            0.77 (n = 4)

            0.82 (n = 2)

            0.80 (n = 4)

             

            Average

            0.83 ± 0.01 (n = 2)

            0.73 ± 0.07 (n = 4)

            0.81 ± 0.00 (n = 2)

            0.76 ± 0.05 (n = 4)

            TS

            Virtual

            0.81 (n = 2)

            0.79 (n = 3)

            0.77 (n = 1)

            0.71 (n = 2)

             

            Average

            0.80 ± 0.05 (n = 2)

            0.74 ± 0.04 (n = 3)

             

            0.68 ± 0.02 (n = 2)

            The yield of aRNA amplified from BC91 total RNA using different protocols is shown in Table 2. Assuming 1% of the total RNA is poly(A)+ RNA, a 253 and 370-fold amplification was observed in the presence and absence of TS primer, respectively. The yield of aRNA amplified from total RNA which are generated from cultured cell lines was 2- to 3-fold higher than when the primary tumor total RNA was amplified (data not shown).
            Table 2

            Efficacy of amplification using 3 μg total RNA from BC91 and different ds cDNA cleanup methods, with or without TS primer.

            Column used for ds cDNA cleanup

            Bio-6

            G-50

            Amplification Protocol

            TS

            w/o TS

            TS

            w/o TS

            Yield of aRNA (μg)

            7.6 ± 1.0 (n = 5)

            11.1 ± 2.2 (n = 3)

            8.9 ± 3.8 (n = 6)

            10.0 ± 5.1 (n = 6)

            Fold of amplification

            253 ± 33

            370 ± 73

            297 ± 126

            333 ± 170

            These experiments demonstrate that the TS mechanism does not increase the fidelity of amplification, and therefore can be eliminated from the protocol. The reasons for the limited effect of the TS mechanism on the correlation coefficients probably are: 1) the second strand cDNA synthesis primed by the TS primer probably represents a small fraction, while the majority of the synthesis is self-primed or primed by small pieces of RNA generated by RNase H; and 2) adding a few base pairs to the ds cDNA prior to in vitro transcription does not change the aRNA significantly enough to affect the array hybridization.

            DNA ligase activity is not required for amplification

            Amplification protocols that do not include DNA ligase in second strand cDNA synthesis generate the same length aRNA (ranging from 0.2 kb to 6 kb, data not shown) as generated from a widely used T7 based amplification protocol developed by Affymetrix®[26] which uses E. coli DNA ligase. The correlation coefficient between amplified and unamplified sample and the yield of aRNA amplified without DNA ligase are high enough to suggest that DNA ligase activity is not necessary for RNA amplification in microarray analysis. For confirmation, we omitted DNA ligase from the protocol developed by Affymetrix® and compared the expression profiles of the resulting aRNA to aRNA obtained using the standard Affymetrix® protocol that includes DNA ligase. The correlation coefficient between amplified and unamplified samples is slightly higher in the absence of ligase (Table 3), supporting our previous conclusion that ligase is not required for total RNA amplification in cDNA microarray analysis. However, the yield of aRNA is higher when ligase is used, suggesting that ligase may play a role in improving the efficiency of amplification.
            Table 3

            Effect of DNA ligase on the fidelity of amplification.a,b

            Protocol

            Affymetrix

            Affymetrix w/o ligase

            Number of amplifications

            6

            5

            Correlation coefficient

            Virtual

            0.84

            0.86

             

            Average

            0.79 ± 0.04

            0.82 ± 0.04

            Yield of aRNA (μg)

            24.1 ± 4.7

            19.2 ± 5.9

            Fold of amplification

            803 ± 157

            640 ± 197

            aData were obtained from comparing expression level of 13,783 clones using the described selection criteria. bInput BC2 total RNA is 3 μg.

            Column cleanup of ds cDNA does not improve the fidelity of amplification, but decreases the yield of aRNA

            In the Wang-Miller protocol [24], ds cDNA is purified using a Bio-6 column (Bio-Rad). A drawback to this method is that the cDNA is eluted with a large volume and needs to be concentrated into a much smaller volume by lyophilization prior to in vitro transcription. This is a time-consuming step, especially when large numbers of samples are processed. To eliminate the lyophilization step from the protocol, we used an alternative column-the Sephadex™ G-50 column-to filter out free nucleotides from the ds cDNA after completion of the second strand synthesis reaction. The ds cDNA is then precipitated following phenol-chloroform extraction and re-suspended in proper volume for in vitro transcription. The correlation coefficient between amplified and unamplified samples and the yield of aRNA using this less time-consuming modification are similar to that using the Wang-Miller protocol (Tables 1 and 2).

            We further explored the question of what effects the ds cDNA column cleanup step itself had on amplification. We amplified total RNA from tumor BC2, either with or without the cleanup step of Sephadex™ G-50. Seven amplifications were done on different dates with the Sephadex™ G-50 column and five amplifications were done without this cleanup step. Both the virtual and the average correlation coefficients using the column are slightly lower than without it (Table 4), suggesting that the column cleanup does not improve the fidelity of amplification. Moreover, the yield of aRNA is significantly higher without the column purification of ds cDNA, suggesting some loss of ds cDNA on the column. Since the column had a negative effect on amplification by decreasing the yield of aRNA without improving the fidelity of amplification, we eliminated this step from our protocol.
            Table 4

            Effect of column cleanup on the fidelity and yield of amplification.a,b

            Protocol

            G-50 cleanup

            w/o G-50 cleanup

            Number of amplifications

            7

            5

            Correlation coefficient

            Virtual

            0.83

            0.85

             

            Average

            0.79 ± 0.03

            0.81 ± 0.02

            Yield of aRNA (μg)

            11.9 ± 2.8

            15.9 ± 2.7

            Fold of amplification

            397 ± 93

            530 ± 90

            aData were obtained from comparing expression level of 12,305 clones using the described selection criteria. bInput BC2 total RNA is 3 μg and amplification was done without TS primer.

            Effect of in vitro transcription time on the fidelity of RNA amplification

            To determine the effect of in vitro transcription time on amplification, duplicate reactions were performed at 37°C for 2, 3, 4, 5 and 6 hours. Two additional 5-hour incubation reactions were stored at 4°C overnight to determine the effect of low temperature incubation on amplification. The virtual correlation coefficient is slightly higher for the 5-hour incubation at 37°C (Figure 1A). However, in vitro transcription for 5 hours at 37°C plus overnight incubation at 4°C gives the highest yield of aRNA (Figure 1B). Since the yield of aRNA at any time point is sufficient for multiple hybridizations, we decided to use 5-hour incubation at 37°C for all subsequent amplifications.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-3-31/MediaObjects/12864_2002_Article_44_Fig1_HTML.jpg
            Figure 1

            Effects of in vitro transcription time on the fidelity of T7 based amplification and the yield of aRNA amplified from BC2 total RNA. Average correlation coefficients between amplified samples vs. unamplified poly(A)+ RNA at each time point are shown in A and average yields of aRNA from each time point in B.

            Evaluation of the fidelity of T7 based linear amplification protocols

            To systematically evaluate the fidelity of T7 based amplification, we compared the correlation coefficient obtained from four different protocols. The correlation coefficients of individual samples amplified using different protocols with the gold standard range from 0.74–0.86 (Figure 2). The scatter plots comparing the gene expression of the virtual amplified samples for each protocol with the unamplified gold standard are shown in Figure 3, with the virtual correlation coefficients ranging from 0.83–0.86. The differences in correlations obtained using different protocols are not statistically different by Student's t-test, demonstrating that differences in gene expression for samples amplified using different protocols are minor.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-3-31/MediaObjects/12864_2002_Article_44_Fig2_HTML.jpg
            Figure 2

            Box graph of correlation coefficients of the gene expression levels for 11,123 clones, comparing individual amplified samples to the gold standard of BC2 (idealizing unamplified poly(A) + RNA). Each closed circle represents the correlation coefficient for each individual sample amplified with a particular protocol to the gold standard. The average and virtual correlation coefficients of the replicate samples for each protocol are shown below the graph.

            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-3-31/MediaObjects/12864_2002_Article_44_Fig3_HTML.jpg
            Figure 3

            Scatterplot matrix using average expression ratios of multiple replicate amplifications for each protocol and the gold standard. The X-axis and Y-axis show virtual gene expression level [normalized log(base2) fluorescence intensity ratio of sample to reference averaged over multiple arrays] measured using aRNA amplified by different protocols or unamplified poly(A)+ RNA as labelled. The last column of plots shows each amplification protocol (Y-axis) vs. gold standard (X-axis). The correlation coefficient for each pair is listed in each plot. The orange and blue shaded regions indicate more than a two-fold difference between the virtual expression values for each protocol being compared.

            Our results also suggest that the level of bias introduced into gene expression profiling by amplification is relatively low; expression profiles obtained using aRNA provide a close approximation of the true expression profile of the original sample. To assess the biases of amplification quantitatively, we calculated the number and percentage of genes with expression level change by 4- or 2-fold after amplification. The biases of amplification by different protocols are similar. Specifically, less than 0.2% of 11,123 genes (12–15 genes) changed their expression level by 4-fold or greater and less than 6% (306–594 genes) changed expression by 2-fold or greater after amplification. With the Jeffrey lab protocol, less than 4% of genes showed changes in expression level by 2-fold or greater. Of the genes that changed, 7 genes and 139 genes changed their expression in all four protocols in the same direction greater than 4-fold and 2-fold, respectively. Also, the virtual correlation coefficients between different protocols are high (average 0.95) (Figure 3), suggesting that slight differences in protocols based on T7 linear amplification mechanism do not affect the correlation of amplified samples to unamplified samples. These results suggest the conclusion that aRNA provides a close approximation of the true expression profile of the original sample.

            We present here a components of variance model for explaining the different sources of variation in the amplification protocols (see Statistical Appendix, additional file 1). The expression measurement for a gene for a specific array/protocol/sample can be broken down as

            X = Z + e, where

            X is the measured expression value

            Z is the "true" expression that does not change under replication, and

            e is the measurement error that does change under replication.

            While we cannot estimate Z and e directly, we can estimate their variances from the data. For the four different amplified protocols and the unamplified arrays, the relevant variances are estimated in Table 5. The variance of the true expression (Var Z) ranges from 0.623–0.661 for the amplified protocols and is 0.726 for the unamplified arrays. The variance of the measurement error (Var e) ranges from 0.055–0.102 for the amplified protocols and is 0.059 for the unamplified arrays. The estimates of Var Z were obtained by averaging the pairwise covariances of the replicates within each protocol, and the estimates of Var e by using the within-protocol variance. (While the variance of a collection of numbers measures how much they vary about their mean, the covariance of two sets of numbers measures how much they vary with respect to each other. See Statistical Appendix for more details). We notice that Var Z for unamplified poly(A)+ RNA is larger than all of the others, indicating a dampening effect on gene expression by amplification. Measurement error variance is lowest for the Jeffrey lab protocol.
            Table 5

            Variance of true gene expression (Var Z) and measurement error (Var e) for each of the different amplified protocols and the unamplified arrays.

             

            Affymetrix

            Affymetrix w/o ligase

            Jeffrey lab with G-50 cleanup

            Jeffrey lab

            Poly(A)+ RNA

            Var Z

            0.661

            0.658

            0.623

            0.661

            0.726

            Var e

            0.102

            0.066

            0.078

            0.055

            0.059

            The covariances between the different Zs for the different methods (estimated from the virtual arrays) are shown in Table 6. The off-diagonal elements of Table 6 are the covariances; the diagonal elements are the variances of the virtual arrays. We notice that covariances among the amplified protocols, which range from 0.62 to 0.66, are higher than their covariances with the unamplified arrays, which range between 0.57 and 0.61. Furthermore, the variances of the amplified protocols (0.63 to 0.68) are lower than that of the unamplifed (0.74).
            Table 6

            Covariances between the "true" gene expression for the different amplification protocols, estimated from the virtual arrays. The diagonal of the table contains the variances for the techniques.

             

            Affymetrix

            Affymetrix w/o ligase

            Jeffrey lab with G-50 cleanup

            Jeffrey lab

            Poly(A)+ RNA

            Affymetrix

            0.68

            0.66

            0.62

            0.63

            0.60

            Affymetrix w/o ligase

            0.66

            0.67

            0.62

            0.64

            0.61

            Jeffrey lab with G-50 cleanup

            0.62

            0.62

            0.63

            0.64

            0.57

            Jeffrey lab

            0.63

            0.64

            0.64

            0.67

            0.60

            Poly(A)+ RNA

            0.60

            0.61

            0.57

            0.60

            0.74

            This suggests the following further breakdown:

            Z = Zc + Zs, where

            Zc is a common expression component, with variance about 0.6, and

            Zs is a specific expression component, with variance about 0.04 for the amplified arrays and 0.14 for the unamplified arrays.

            Therefore, the amplified expression values for genes on an array are largely the same as on the unamplified array. The component of variation in which they differ appears to be common for all amplified protocols, and shows a much higher variance in the unamplified arrays. The effect of amplification can be summarized by saying that it has a dampening effect on the true expression of some genes (decreased variance in gene expression - see Figure 4).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-3-31/MediaObjects/12864_2002_Article_44_Fig4_HTML.jpg
            Figure 4

            Scatterplot of the t-statistics (the numerical score underlying a t -test) comparing the differences in gene expression between amplified and unamplified RNA for BC2 (X-axis) and BC91 (Y-axis). The tests were based on 7 amplified total RNA and 5 unamplified poly(A)+ RNA samples for BC2, and 2 amplified total RNA and 4 unamplified poly(A)+ RNA samples for BC91.

            A recent report compared amplified expression profiles of different primary breast tumors using Affymetrix® arrays [30]. Unger et al. found that correlation coefficients between gene expressions in different tumors revealed by aRNA ranged between 0.71–0.89. However, gene expression was not measured using unamplified poly(A)+ RNA from these different tumors, raising the question of whether the observed fairly high correlation between diverse tumors was due to amplification bias. To answer this question, we compared the correlation coefficients between gene expression profiles of different tumors determined either with unamplified poly(A)+ RNA or aRNA amplified using our protocol (Table 7). The correlation coefficients between BC2 and BC91 measured using poly(A)+ RNA or aRNA are the same, 0.55. Moreover, the correlation between differences in gene expression between amplified and unamplified samples for different tumors is weak (Figure 4), suggesting that genes that differ through amplification depend on the sample rather than on systematic changes from amplification. Our results demonstrate that different primary breast tumors are not closely related to each other in gene expression profiles, and amplification does not affect the correlation of gene expression between different tumors. Amplification is therefore suitable for comparison of gene expression profiles among large sample sets.
            Table 7

            Correlation between expression levels of different tumors (BC2 and BC91) determined with both poly(A)+ RNA and aRNA for each tumor.a,b

             

            Virtual correlation coefficient

            Number of arrays

            BC2 poly(A)+ RNA vs. BC91 poly(A)+ RNA

            0.55

            5 vs. 4

            BC2 aRNA vs. BC91 aRNA

            0.55

            2 vs. 2

            BC2 aRNA vs. BC2 poly(A)+ RNA

            0.84

            2 vs. 5

            BC91 aRNA vs. BC91 poly(A)+ RNA

            0.82

            2 vs. 4

            BC2 aRNA vs. BC91 poly(A)+ RNA

            0.52

            2 vs. 4

            BC91 aRNA vs. BC2 poly(A)+ RNA

            0.46

            2 vs. 5

            aData were obtained from comparing expression level of 11,929 clones using the described selection criteria. bData were obtained using the Jeffrey lab protocol with G-50 cleanup.

            Evaluation of reproducibility of T7 based linear amplification

            Another important aspect of RNA amplification is the degree of reproducibility. To evaluate this, we calculated the correlation coefficients between individual amplified samples. The correlation coefficients between individual hybridizations done on the same day for poly(A)+ RNA averaged 0.97 and for aRNA amplified on the same day averaged from 0.91–0.98 (Table 8). The correlation coefficients between individual hybridizations done on different days averaged 0.89 for poly(A)+ RNA and 0.85–0.90 for aRNA amplified from total RNA on different days. The reproducibility of our protocol is slightly better than that using other protocols. Notably, when samples are amplified on the same day, the correlations are significantly higher than when samples are amplified on different days. In addition, samples amplified with protocols omitting ligase activity give higher reproducibility regardless of whether they are amplified on the same day or not.
            Table 8

            Evaluation of the reproducibility of T7 based amplification.a,b

            Protocol

            Affymetrixc

            Affymetrix w/o ligase

            Jeffrey lab with G-50

            Jeffrey lab

            poly(A)+ RNA

            Average correlation coefficient

            Same day

            0.91 ± 0.04 (n = 3)

            0.98 (n = 2)

            0.95 ± 0.01 (n = 3)

            0.97 (n = 2)

            0.97 ± 0.01 (n = 3)

             

            Different day

            0.84 ± 0.05 (n = 3)

            0.88 ± 0.05 (n = 3)

            0.88 ± 0.03 (n = 4)

            0.90 ± 0.03 (n = 3)

            0.89 ± 0.03 (n = 2)

             

            Overall

            0.86 ± 0.05 (n = 6)

            0.89 ± 0.05 (n = 5)

            0.89 ± 0.03 (n = 7)

            0.91 ± 0.03 (n = 5)

            0.92 ± 0.05 (n = 5)

            aData were obtained from comparing expression level of 11,123 clones using the described selection criteria. bAmplification was done using 3 μg BC2 total RNA. cNote that hybridizations were performed on Stanford printed cDNA microarrays, not Affymetrix arrays.

            The effect of the amount of input total RNA on amplification

            To determine the effect of the amount of input total RNA on single round amplification, we amplified different amounts of total RNA, 3 μg, 1 μg, 300 ng, 100 ng, 30 ng and 10 ng, using different amounts of T7 primer according to the quantity of input total RNA (Table 9). When the input total RNA is lower than 300 ng, the yield of aRNA for is lower than the standard quantity required for one hybridization (3 μg). At amounts greater or equal to 300 ng, the correlation coefficients between amplified and unamplified samples and among amplified samples remain about the same. The fold of amplification increases with smaller quantities of template RNA, but the absolute yield of aRNA decreases. Therefore, within the range of 0.3–3 μg total RNA, decreasing the input RNA does not affect the fidelity and reproducibility of amplification.
            Table 9

            The effect of the amount of template BC2 total RNA on the fidelity, reproducibility and yield of amplification.a

            Input total RNA

            3 μg

            1 μg

            300 ng

            100 ng

            30 ng

            10 ng

            T7 primer used (μg)

            0.5

            0.2

            0.1

            0.1

            0.1

            0.1

            Average correlation coefficient

            Amplified vs. gold standard (fidelity)

            0.80 ± 0.04 (n = 4)

            0.81 ± 0.05 (n = 3)

            0.84 ± 0.05 (n = 2)

            ND

            ND

            ND

             

            Amplified vs. amplified (reproducibility)

            0.92 ± 0.04 (n = 4)

            0.88 ± 0.04 (n = 3)

            0.90 (n = 2)

            ND

            ND

            ND

            Yield (μg)

            15.2 ± 5.6 (n = 4)

            7.4 ± 4.6 (n = 3)

            3.1 ± 1.6 (n = 2)

            0.60 ± 0.3 (n = 2)

            0.33 ± 0.2 (n = 3)

            0.11 ± 0.1 (n = 3)

            Fold of amplification

            507 ± 186

            740 ± 460

            1033 ± 530

            600 ± 300

            1100 ± 667

            1100 ± 1000

            aData were obtained from comparing expression levels of 13,164 clones using the described selection criteria. ND = not determined due to insufficient yield of aRNA for microarray hybridization after single round of amplification.

            All original microarray data may be accessed at the RNA Amplification for Microarrays website http://​genome-www.​stanford.​edu/​breast_​cancer/​amplification/​.

            Conclusions

            In conclusion, T7 based linear amplification generates high fidelity aRNA for gene expression profiling using high-density cDNA microarrays. The average correlation coefficient between amplified and unamplified samples is 0.82 with less than 4% of genes showing changes in expression level by 2-fold or greater using the optimized (Jeffrey lab) protocol. The correlation to unamplified poly(A)+ RNA increases to 0.85 when experimental variability is minimized by configuring multiple amplified samples into a virtual array. Reproducibility between samples amplified with this technique is high, especially when performed on the same day rather than weeks apart. Amplification produces a dampening effect on gene expression variation.

            Methods

            Tissue acquisition

            Two primary breast carcinomas, BC2 and BC91, in which more than 90% of the breast epithelial cells were cancer cells, were chosen for experiments in this study. The specimens were frozen in either liquid nitrogen (BC2) or on dry ice (BC91) within 30 minutes following devascularization and stored at -80°C. Frozen sections were cut from primary breast carcinoma specimens and stained with hematoxylin and eosin to confirm tumor content.

            RNA preparation

            Total RNA was isolated from primary tumor tissue using TRIzol® solution (Invitrogen™) following homogenization using a PowerGen Model 125 (Fisher Scientific). Poly(A)+ RNA was isolated from total RNA with the FastTrack 2.0 kit (Invitrogen™). The concentration of total RNA and poly(A)+ RNA was determined using a GeneSpec I spectrophotometer (Hitachi) and the integrity of total RNA and poly(A)+ RNA was assessed using a 2100 Bioanalyzer (Agilent).

            RNA amplification

            The amplification of total RNA or poly(A)+ RNA was performed based on a previously described protocol [24] with our modifications.

            For first strand cDNA synthesis, 3 μg (unless otherwise specified) tumor total RNA, Universal Human Reference total RNA (Stratagene®), or 150 ng poly(A)+ RNA was mixed with 1 μg Eberwine primer (Operon®) in RNase-free water to a total volume of 9 μl. The RNA/primer mixture was denatured at 70°C for 3 min and cooled on ice for 2 min, followed by adding: 4 μl of 5X first strand buffer (Invitrogen™), 2 μl 0.1 M DTT, 1 μl RNasin® (40 U/μl, Promega™), 2 μl 10 mM dNTP, and 2 μl Superscript™ II (200 U/μl, Invitrogen™), and incubated at 42°C for 1.5 hours.

            Second strand cDNA synthesis was performed by mixing the first strand synthesis reaction with 106 μl RNase-free water, 15 μl 10X Advantage™ PCR buffer (Clontech), 3 μl 10 mM dNTP mix, 3 μl Advantage™ cDNA polymerase mix (Clontech), and 1 μl RNase H (2 U/μl, Invitrogen™). The reaction was incubated at 37°C for 5 min to digest RNA, followed by 94°C for 2 min to activate the Advantage™ cDNA polymerase, 65°C for 1 min to prime and 75°C for 30 min to extend the second strand cDNA. The reaction was stopped by the addition of 7.5 μl 1 M NaOH/2 mM EDTA and incubated at 65°C for 10 min.

            ds cDNA was extracted with an equal volume of phenol:chloroform: isoamyl alcohol (25:24:1), transferred to a Phase Lock Gel™ tube (Eppendorf) and centrifuged at 16,000 g for 5 min. The ds cDNA (aqueous layer) was transferred to a new tube and precipitated by adding 1 μl linear acrylamide (0.1 μg/μl), 70 μl 7.5 M NH4Ac and 1 ml 200 proof ethanol, and centrifuged at 16,000 g for 20 min at room temperature. The pellet was washed in 500 μl 75% ethanol, centrifuged at 16,000 g for 5 min, air dried and resuspended in 16 μl RNase-free water.

            In vitro transcription of ds cDNA was performed using a T7 MEGAscript™ kit (Ambion®). Four microliters of each 75 mM NTP, 4 μl of 10X reaction buffer and 4 μl of T7 polymerase mix was added to the 16 μl of ds cDNA. The reaction was then carried out at 37°C for 5 hours. aRNA was cleaned up using an RNeasy® mini kit (Qiagen®) as described by the manufacturer.

            The optimized version of the amplification protocol may also be downloaded from http://​www.​stanford.​edu/​group/​sjeffreylab/​.

            RNA labelling and hybridization

            Three micrograms of aRNA (unless otherwise specified) or 2 μg poly(A)+ RNA were labelled either with Cy5-dUTP (experimental sample) or Cy3-dUTP (reference) in a 30.4 μl reaction. RNA was first mixed with either 8 μg of random primer for aRNA or 5 μg of oligo(dT) primer for poly(A)+ RNA in 16 μl of RNase-free water. RNA/primer mix was incubated at 70°C for 10 min and cooled on ice for 2 min. The following reagents were added: 6 μl of 5X first strand buffer, 3 μl 0.1 M DTT, 0.7 μl 50X dNTP (25 mM dATP, dCTP, dGTP and 10 mM dTTP), 3 μl 1 mM Cy3-dUTP or Cy5-dUTP and 1.7 μl Superscript™ II (200 U/μl). The labelling reaction was carried out at 42°C for 2 hour during which 1 μl Superscript™ II was added to the reaction at the end of the first hour. The input RNA was hydrolyzed by adding 15 μl 0.1 M NaOH/2 mM EDTA and incubated at 65°C for 8 min, followed by neutralization with 15 μl 0.1 M HCl. The Cy5 and Cy3 labelled probes were combined and purified in a Microcon® YM-30 column (Millipore) by washing three times with Tris-EDTA buffer. 15 μg Human Cot-1 DNA was added to the probe before the first wash. The purified probe was adjusted to a total volume of 26 μl and mixed with 5.3 μl 20X saline-sodium citrate (SSC), 1 μl yeast tRNA (10 μg/μl), 2 μl poly(A) DNA (10 μg/μl), and 0.6 μl 10% sodium dodecyl sulfate (SDS). The resulting 35 μl probe solution was denatured at 95°C for 2 min and then incubated at 42°C for 25 min. The probe was then hybridized to cDNA arrays at 65°C for 14–18 hours. Depending on the print batch, the arrays contained from 42,772 to 43,915 spots (41,931–42,602 distinct clones representing 16,907–18,417 named genes, 3946–4145 ESTs with known functions and 19,369–21,384 ESTs with unknown functions), and were manufactured as previously described [3133]. Following hybridization, the arrays were washed with 2X SSC with 0.05% SDS once for 2 min at room temperature, 1X SSC for 2 min at room temperature, 0.2X SSC three times for 1 min at 45–50°C.

            Imaging and data analysis

            The arrays with hybridized probes were scanned using an Axon scanner. The scanned images were analyzed first using GenePix® Pro 3.0 software (Axon Instruments) and spots of poor quality determined by visual inspection were also removed from further analysis. The resulting data collected from each array was submitted to the Stanford Microarray Database (SMD, http://​genome-www5.​stanford.​edu/​microarray/​SMD) [34]. A total of 97 arrays were submitted (60 experiments done with BC2 and 37 experiments performed with BC91). Only features with a signal intensity >50% above background in both Cy5 and Cy3 channels for all of the samples included in a particular analysis were retrieved from SMD. Pearson's correlation coefficient was calculated using Microsoft® Excel 2000. A components of variance model was used to explain different sources of variation in the amplification protocols.

            Declarations

            Acknowledgements

            We are grateful to Drs. David Botstein and Patrick O. Brown for helpful discussions and Susan Overholser for her invaluable assistance in the preparation of this manuscript. This work was supported by NIH/NCI Grant U01 CA85129 and California Breast Cancer Research Program Grant 5JB-0126. M.L.W. is supported by a National Research Service award from the National Human Genome Research Institute and by funds from the Scleroderma Research Foundation. S.S.J.'s website is Stefanie Jeffrey Lab http://​www.​stanford.​edu/​group/​sjeffreylab/​.

            Authors’ Affiliations

            (1)
            Department of Surgery, Medical School Lab-Surge Bldg P214, Stanford University
            (2)
            Department of Statistics, Sequoia Hall, Stanford University
            (3)
            Department of Genetics, CCSR 2260, Stanford University
            (4)
            Department of Genetics, Norwegian Radium Hospital, University of Oslo

            References

            1. Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, Pergamenschikov A, Williams CF, Zhu SX, Lee JC, et al.: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA 1999, 96:9212–9217.View ArticlePubMed
            2. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286:531–537.View ArticlePubMed
            3. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403:503–511.View ArticlePubMed
            4. Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al.: Molecular portraits of human breast tumours. Nature 2000, 406:747–752.View ArticlePubMed
            5. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, et al.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 2001, 98:10869–10874.View ArticlePubMed
            6. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioneiemi OP, et al.: Gene expression profiles in hereditary breast cancer. N Engl J Med 2001, 344:539–548.View ArticlePubMed
            7. Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen GD, Perou CM, et al.: Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci USA 2001, 98:13784–13789.View ArticlePubMed
            8. Hegde P, Qi R, Gaspard R, Abernathy K, Dharap S, Earlem-Hughes J, Gay C, Nwokekeh NU, Chen T, Saeed AI, et al.: Identification of tumor markers in models of human colorectal cancer using a 19,200-element complementary DNA microarray. Cancer Res 2001, 61:7792–7797.PubMed
            9. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415:530–536.View Article
            10. Jeffrey SS, Fero MJ, Børresen-Dale A-L, Botstein D: Expression array technology: applications for the diagnosis and treatment of breast cancer. Mol Interv 2002, 2:101–109.View ArticlePubMed
            11. Debouck C, Metcalf B: The impact of genomics on drug discovery. Annu Rev Pharmacol Toxicol 2000, 40:193–207.View ArticlePubMed
            12. Clarke PA, te Poele R, Wooster R, Workman P: Gene expression microarray analysis in cancer biology, pharmacology, and drug development: progress and potential. Biochem Pharmacol 2001, 62:1311–1336.View ArticlePubMed
            13. Assersohn L, Gangi L, Zhao Y, Dowsett M, Simon R, Powles TJ, Liu ET: The feasibility of using fine needle aspiration from primary breast cancers for cDNA microarray analyses. Clin Cancer Res 2002, 8:794–801.PubMed
            14. Xiang CC, Kozhich OA, Chen M, Inman JM, Phan QP, Chen Y, Brownstein MJ: Amine-modified random primers to label probes for DNA microarrays. Nat Biotechnol 2002, 20:738–742.View ArticlePubMed
            15. Sgroi DC, Teng S, Robinson G, LeVangie R, Hudson JR Jr, Elkahloun AG: In vivo gene expression profile analysis of human breast cancer progression. Cancer Research 1999, 59:5656–5661.PubMed
            16. Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, Xiao H, Rogers KE, Wan JS, Jackson MR, et al.: Gene expression profiles of laser-captured adjacent neuronal subtypes. Nat Med 1999, 5:117–122.View ArticlePubMed
            17. Ohyama H, Zhang X, Kohno Y, Alevizos I, Posner M, Wong DT, Todd R: Laser capture microdissection-generated target sample for high-density oligonucleotide array hybridization. Biotechniques 2000, 29:530–536.PubMed
            18. Luzzi V, Holtschlag V, Watson MA: Expression Profiling of ductal carcinoma in situ by laser capture microdissection and high-density oligonucleotide arrays. Am J Pathology 2001, 158:2005–2010.
            19. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, Eberwine JH: Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci U S A 1990, 87:1663–1667.View ArticlePubMed
            20. Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R, Zettel M, Coleman P: Analysis of gene expression in single live neurons. Proc Natl Acad Sci U S A 1992, 89:3010–3014.View ArticlePubMed
            21. Phillips J, Eberwine JH: Antisense RNA amplification: a linear amplification method for analyzing the mRNA population from single living cells. Methods 1996, 10:283–288.View ArticlePubMed
            22. Wodicka L, Dong H, Mittmann M, Ho MH, Lockhart DJ: Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat Biotechnol 1997, 15:1359–1367.View ArticlePubMed
            23. Mahadevappa M, Warrington J: A high-density probe array sample preparation method using 10– to 100-fold fewer cells. Nat Biotechnol 1999, 17:1134–1136.View ArticlePubMed
            24. Wang E, Miller LD, Ohnmacht GA, Liu ET, Marincola FM: High-fidelity mRNA amplification for gene profiling. Nat Biotechnol 2000, 18:457–459.View ArticlePubMed
            25. Pabón C, Modrusan Z, Ruvolo MV, Coleman IM, Daniel S, Yue H, Arnold LJ Jr, Reynolds MA: Optimized T7 amplification system for microarray analysis. Biotechniques 2001, 31:874–879.PubMed
            26. Affymetrix GeneChip® Expression Analysis Technical Manual [http://​www.​affymetrix.​com/​Download/​manuals/​expression_​manual.​pdf]
            27. Hu L, Wang J, Baggerly K, Wang H, Fuller GN, Hamilton SR, Coombes KR, Zhang W: Obtaining reliable information from minute amounts of RNA using cDNA microarrays. BMC Genomics 2002, 3:16.View ArticlePubMed
            28. Sotiriou C, Powles TJ, Dowsett M, Jazaeri AA, Feldman AL, Assersohn L, Gadisetti C, Libutti SK, Liu ET: Gene expression profiles derived from fine needle aspiration correlate with response to systemic chemotherapy in breast cancer. Breast Cancer Res 2002, 4:R3.View ArticlePubMed
            29. Matz M, Shagin D, Bogdanova E, Britanova O, Lukyanov S, Diatchenko L, Chenchik A: Amplification of cDNA ends based on template-switching effect and step-out PCR. Nucleic Acids Res 1999, 27:1558–1560.View ArticlePubMed
            30. Unger MA, Rishi M, Clemmer VB, Hartman JL, Keiper EA, Greshock JD, Chodosh LA, Liebman MN, Weber BL: Characterization of adjacent breast tumors using oligonucleotide microarrays. Breast Cancer Res 2001, 3:336–341.View ArticlePubMed
            31. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270:467–70.View ArticlePubMed
            32. Shalon D, Smith SJ, Brown PO: A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 1996, 6:639–645.View ArticlePubMed
            33. DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997, 278:680–686.View ArticlePubMed
            34. Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, et al.: The Stanford Microarray Database. Nucleic Acids Res 2001, 29:152–155.View ArticlePubMed

            Copyright

            © Zhao et al 2002

            This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.

            Advertisement