A comparative analysis of data generated using two different target preparation methods for hybridization to high-density oligonucleotide microarrays

To generate specific transcript profiles, one must isolate homogenous cell populations using techniques that often yield small amounts of RNA, requiring researchers to employ RNA amplification methods. The data generated by using these methods must be extensively evaluated to determine any technique dependent distortion of the expression profiles. High-density oligonucleotide microarrays were used to perform experiments for comparing data generated by using two protocols, an in vitro transcription (IVT) protocol that requires 5 μg of total RNA and a double in vitro transcription (dIVT) protocol that requires 200 ng of total RNA for target preparation from RNA samples extracted from a normal and a cancer cell line. In both cell lines, about 10% more genes were detected with IVT than with dIVT. Genes were filtered to exclude those that were undetected on all arrays. Hierarchical clustering using the 9,482 genes that passed the filter showed that the variation attributable to biological differences between samples was greater than that introduced by differences in the protocols. We analyzed the behavior of these genes separately for each protocol by using a statistical model to estimate the posterior probability of various levels of fold change. At each level, more differentially expressed genes were detected with IVT than with dIVT. When we checked for genes that had a posterior probability greater than 99% of fold change greater than 2, in data generated by IVT but not dIVT, more than 60% of these genes had posterior probabilities greater than 90% in data generated by dIVT. Both protocols identified the same functional gene categories to be differentially expressed. Differential expression of selected genes was confirmed using quantitative real-time PCR. Using nanogram quantities on total RNA, the usage of dIVT protocol identified differentially expressed genes and functional categories consistent with those detected by the IVT protocol. There was a loss in sensitivity of about 10% when detecting differentially expressed genes using the dIVT protocol. However, the lower amount of RNA required for this protocol, as compared to the IVT protocol, renders this methodology a highly desirable one for biological systems where sample amounts are limiting.


Background
High throughput DNA microarray technology has proved to be a powerful approach for gene expression profiling in various cellular systems and has played a significant role in the molecular classification of cancer [1][2][3][4][5]. To generate a meaningful transcript profiling pattern specific to a desired cell type, it is essential to isolate a homogenous cell population using techniques such as cell sorting or laser capture microdissection [6]. However, such techniques yield low amounts of RNA, usually insufficient to perform DNA microarray experiments. In such cases, it is often necessary to employ RNA amplification methods to generate the microgram quantities of RNA required to perform these experiments. The use of RNA amplification methods warrants a thorough analysis and understanding of the variations introduced due to the methodology employed. It is vital to be able to distinguish between the real effects of the biological system being analyzed and changes introduced due to a difference in the methods used to generate the data.
Here we present a comparative analysis of the data generated using two different target preparation techniques for hybridization to high-density oligonucleotide microarrays (U95Av2 GeneChips, Affymetrix, Santa Clara, CA; [7]). We have performed transcription-profiling experiments with samples extracted from normal human tracheobronchial epithelial cells (NHTBE; Clonetics, San Diego, CA) and human pulmonary mucoepidermoid carcinoma cells (NCI-H292; American Type Culture Collection, Manassas, VA). These profiling experiments used U95Av2 GeneChips (Affymetrix, Santa Clara, CA) and two different methods for target preparation, namely, a standard protocol (involving in vitro transcription, IVT; [7,8]) and an amplification protocol (involving double in vitro transcription, dIVT; [7,9]). In the IVT protocol, 5-40 µg of total RNA is reverse-transcribed to generate cDNA using an oligo-dT primer containing the T7 promoter sequence. The cDNA is converted into double stranded DNA using random hexamers and transcribed in vitro in the presence of biotinylated ribonucleotides to generate biotin-labeled complementary RNA (cRNA). The cRNA is fragmented and hybridized to the arrays. The dIVT protocol, which requires only 50-250 ng of total RNA, is a modification of the IVT protocol where unlabeled cRNA is first synthesized followed by a second round of reverse transcription to generate cDNA and in vitro transcription to synthesize biotin-labeled cRNA.
Our results showed that the variation attributable to biological differences between samples was greater than that introduced by differences in the protocols used for target preparation. To identify differentially expressed genes between the two cell lines, we compared the posterior probability of fold change exceeding various thresholds in either protocol. When we checked for genes that had a posterior probability greater than 99% of fold change greater than 2, in data generated by IVT but not dIVT, more than 60% of these genes had posterior probabilities greater than 90% in data generated by dIVT. Moreover, both protocols identified many genes in the same functional categories to be differentially expressed. Although the dIVT protocol is less sensitive than the IVT (a 10% loss of detection in the number of genes whose expression levels could be measured), the requirement of much lesser amount of sample required for dIVT protocol (20-100 times less than IVT protocol), makes it a highly desirable methodology for target preparation, when sample amounts are limiting.

Results and Discussion
In order to be able to perform DNA microarray experiments using limited amounts of starting samples, several studies in the recent years have used amplification methods to generate sufficient RNA for these experiments. In these cases, it is vital to understand the relative effects of amplification methods on the results generated by these studies. The goal of this study is to perform a comparative analysis of the data generated using two different target preparation techniques for hybridization to high-density oligonucleotide microarrays (U95Av2 GeneChips, Affymetrix, Santa Clara, CA; [7]), an in vitro transcription (IVT; [7,8]) protocol that requires 5 µg of total RNA and a double in vitro transcription amplification protocol (dIVT; [7,9]). Total RNA samples extracted from 2 cell lines, normal human tracheobronchial epithelial cells (NHTBE; Clonetics, San Diego, CA) and human pulmonary mucoepidermoid carcinoma cells (NCI-H292; American Type Culture Collection, Manassas, VA) were used for target preparation using IVT and dIVT protocols. The data set generated, consisted of 2 replicates per cell line per protocol used, resulting in a total of eight experiments that were used to investigate both cell type-based (normal and cancer) and protocol-based effect (IVT and dIVT) on the final results.

Quality Control
As an initial quality control measure to check for variations introduced by differences in target preparation, labeling, hybridization, and handling of individual samples, we compared the data reports for all the arrays as generated by MAS 5.0 [10]. The background levels and the range of percentage of probe sets called present (25-48%) were within acceptable levels as described previously [10]. All the exogenously added prokaryotic hybridization controls showed signal intensities significantly above threshold limits [10] in all reports. Similar results were obtained for housekeeping genes such as GAPDH and β-actin, highlighting the efficiency and accuracy of target preparation and hybridization.
Further quality control was performed at 4 levels (1) automated grid alignment of the probe cells, (2) spatial variation of probe cells, (3) probe outliers, and (4) distributions of the detection p-values. Automated grid alignment and spatial variation of probe cells were examined at the .CEL file level using MATLAB scripts [11]. Briefly, to assess grid alignment, we checked the alternating pattern of positive and negative control cells in the borders of each GeneChip. The borders of the chips consist of alternate probe cells containing oligonucleotide B2 (a DNA sequence complementary to this oligonucleotide is included in the hybridization cocktail) and oligonucleotide B1 that serve as positive and negative controls respectively (Affymetrix, Santa Clara, CA). The intensities for positive controls (blue) and negative controls (red) were plotted as a function of probe cell column position to determine the quality of grid alignment for each chip ( Figure 2). Intersection of the spectra for positive (blue) and negative (red) control intensities would signify an alignment drift out of phase. Spatial variation in the probe cells was assessed by plotting the coefficient of variation of pixel-level intensities within each probe cell on a chip by position ( Figure 3). This analysis is useful in detecting any spatial artifacts on the microarray. The overall quality of the arrays was assessed with dChip [12] to compute the percentage of array outliers and single outliers. For each array, array outliers measure the percent of probe sets with expression patterns that are inconsistent from the rest of the arrays. Single outliers are analogous for single probes, measuring the percent of probe outliers on each array (Table 1). In general, dChip flags an array when either single or array outlier percentages exceed 5% and recommends removal of arrays from further analysis when any of these values is 15% or more. None of the chips were flagged based on these criteria. Finally, we looked at the percentage of probe sets on each array with detection pvalues less than 0.01 or 0.05 (Table 1). Detection p-values are a measure of how likely a transcript was expressed at a level to be called present on the array. As a general rule, chips with less than 10% of the probe sets detected with p < 0.01 are excluded from further analysis. This rule was derived from empirical experience. All GeneChips in this study passed all four quality control checks.  cRNA before cRNA after Marker fragmentation fragmentation

Comparison of the ability to detect genes using IVT and dIVT protocols
The ability of the two protocols to detect expressed genes was examined using detection p-values. Because replicate chips were available, we had two measures of detection for each probe set in each group. We took as the detection p-value the maximum for each probe set over replicates as a conservative estimate of detection. The empirical distribution functions, which measure for any value x the percentage of values in the empirical sample less than or equal to x, of detection p-values were plotted by protocol and sample type ( Figure 4). Regardless of protocol, more probe sets were detected in cancer cells than in normal cells (compare figure 4C with A). Almost all probe sets detected with the dIVT protocol were also detected (at the same p-value cutoff) with the IVT protocol ( Figure 4B and 4D). In both cell lines, significantly more probe sets were detected (at a given cutoff of the detection p-value) using the IVT protocol than were detected using the dIVT protocol ( Figure 4A and 4C). For example, at a detection cutoff of p < 0.01 in the NCI-H292 cancer cell line, approximately 11% of the genes detected using IVT were not detected using dIVT ( Figure 4C). At the same cutoff in the NHTBE cells, 8% of genes detected using IVT were not detected using dIVT ( Figure 4A). At a detection cutoff of p < 0.05, the differences in detection rates were 12% and 10%, respectively. Additional detection in both normal and cancer cell lines with dIVT was less than 2% at these levels.

Signal Intensity
The decrease in the number of genes detected by dIVT is likely due to truncation of the 5' end of the RNA during amplification, as reported by McClintick and colleagues [15]. Luzzi and colleagues [8] found a similar decrease Spatial variation on microarrays  Column Index of Probe Cells (left to right) (11-16%) in the number of genes called present when using an amplification protocol for target preparation for hybridization to HuGeneFL microarray (Affymetrix, Santa Clara, CA). They also found that the overall hybridization signal, as measured by the scaling factor used for normalization in MAS 4.0, was approximately two-fold lower when using amplification.
By contrast, several studies have reported an increase in the number of detectable genes when combining amplification techniques with microarrays that use spotted cDNAs or long oligonucleotides on glass slides [16][17][18][19]. Hu and colleagues [16] suggested that the increase was due primarily to improvements in the signal intensity of low abundance genes, raising them above the noise level on the arrays. The difference in findings on the two different microarray platforms is probably related both to biochemical differences in the probe design and to differences in bioinformatical processing algorithms. According to a recently published thermodynamic model for estimating gene expression levels on oligonucleotide microarrays [20], it is conceivable that focusing on probes closer to the 3' end would find similar increases in the detection of low abundance transcripts. (C) corresponds to about 10% of the genes that were detected using the IVT but not the dIVT protocol. The near coincidence of graphs in (B) and (D) shows that virtually all genes detected using the dIVT protocol are also detected using the IVT protocol.

Determination of overall variation in samples by hierarchical clustering
Next, we used hierarchical clustering to explore the relative overall variation observed due to differences in sample source (cell lines) and protocol (IVT and dIVT) used for target preparation. The hierarchical clustering algorithm is agglomerative in that it joins samples based on a measure of multivariate distance, and once joined these samples are not free to cluster independently again. Pairwise joins in samples are represented as combined branches of a tree in a Dendrogram Plot ( Figure 5). Since the results can be sensitive to the distance measure specified, both Euclidean and Pearson Correlation were used to determine if samples clustered together predominantly by protocol or biology. The probe sets were filtered based on detection p-values. For a probe set to be included in this analysis, the detection p-value had to be less than 0.3 on at least one chip. This choice was liberal enough to include most probe sets and to exclude the noisiest probe sets with the weakest signal. Of the 12,625 probe sets on the U95Av2 microarray, 9,482 passed this filter. We clustered samples by Euclidean distance between the logarithmic expression values of these 9,482 genes ( Figure 5A). The major split in the clusters reflects the biological difference between cancer and normal samples, and the difference in protocols is a secondary effect. Within each group of samples, reproducibility (as measured by Euclidean distance between samples) was better with the IVT protocol than the dIVT protocol. Clustering based on the Pearson correlation between logarithmic expression values led to similar results ( Figure 5B). These findings are consistent Hierarchical Clustering of samples with those from previous reports that used amplification protocols for target preparation for hybridization to microarrays [21,22].
The biologic differences between the two human cell lines, normal tracheobronchial epithelial cells and pulmonary mucoepidermoid carcinoma cells, used in this study were expected to be quite significant. Hierarchical clustering showed that the variation between cell lines exceeded variation between protocols. Similar analysis performed using two more closely related samples may show larger relative variation by protocol.

Comparison of differentially expressed genes detected by IVT and dIVT protocols
The filtered dataset consisting of 9,482 genes (generated above) was used for comparing differentially expressed genes detected by IVT and dIVT protocols. Each of the 9,482 probe sets was modeled separately for each hybridization protocol using a linear model (see equations 1 and 2 below) to account for overall expression, cell line, and other variation on the logarithmic scale.
IVT Protocol (5 µg): log 2 (Y gi1k ) = µ g1 + τ gi1 + ε gi1k ε gi1k ~ N(0, σ 2 1 ) (1) DIVT Protocol (0.2 µg): log 2 (Y gi2k ) = µ g2 + τ gi2 + ε gi2k ε gi2k Ñ (0, σ 2 2 ) (2) g = 1, ..., 12625 probe sets, i = 1, 2 Cancer/Normal k = 1, 2 Replicate Chips The first term in each model, µ, accounted for the overall log mean expression. The second term, τ, accounted for differences between the normal cell line (NHTBE) and the cancer cell line (NCI-H292) on the log scale and represented fold change (FC) on the original scale. The final term, ε, accounted for unexplained variation and was modeled with a normal distribution on the log base 2 scale with mean 0 and variance σ 2 . Since no prior information was available for these parameters, they were assumed to be generated a priori from uninformative prior distributions. The data were used to update our prior knowledge to produce posterior distributions [13]. These distributions were used to compute the posterior probability that FC by cell line exceeded some desired amount ∆. Hence, for a given ∆, a high posterior probability in (3) is strong posterior evidence that the expected FC lies outside of the interval (1/∆, ∆). We examined these posterior probabilities for probe sets meeting the change criterion in each protocol for FC > ∆ = 1.5, 2, 4, 8 and probability cutoffs of 85%, 90%, 95% and 99% (Table 2). At each level of desired FC and probability cutoff, more differentially expressed probe sets were discovered with the IVT protocol than the dIVT protocol (compare columns 3 and 4 in table 2). This finding is consistent with the results presented above (Figure 4). Figure 6A displays a Venn diagram comparing protocol performance for the criterion P(FC < 1/2 ∪ FC > 2) > 99% with the detection criterion implemented. Using these criteria 59 genes were found to be differentially expressed by both protocols, 163 found in IVT alone, and 91 in dIVT alone. The histograms in figures 6B and 6C shows that nearly 60% of probe sets discovered with only one protocol still had posterior probabilities exceeding 90% in the other. For FC = 8, about 31% of probe sets exceeded the top 93 rd rank in both protocols. Therefore we concluded that although fewer probe sets were discovered in the dIVT protocol, many were still detectable for differential expression at a lower threshold. This is a significant extension to the previous report by McClintick and coworkers [15] that relied on MAS probe set specific present/absent calls for detection analysis. They observed that genes at lower levels of detection were less reproducible in comparing treatment groups. We considered a range of cutoffs for detecting signal and change. We found that while reproducibility varied, for those genes showing at least FC of 2 with 99% posterior probability in just one protocol, approximately 60% were detectable in the other for a slightly reduced posterior probability cutoff of 90% (Figure 6B and 6C).

Comparison of functional gene categories identified as differentially expressed by IVT and dIVT protocols
Since the two protocols did not identify identical sets of differentially expressed genes, we investigated whether the functional categories of the genes that were identified as differentially expressed were consistent. Each differentially expressed gene was annotated using functional annotations downloaded from the dChip web site, http:/ /biosun1.harvard.edu/complab/dchip/info_file.htm. The number of differentially expressed genes detected by dIVT and IVT protocols in each functional category (with a posterior probability of 0.99 of having a fold change at least equal to 2) is shown in column 2 in tables 3 and 4, respectively. These tables also show a count of a subset of these genes that were identified as differentially expressed by the alternate protocol for different cutoffs (>0.90, >0.95 and > 0.99; columns 3, 4 and 5 in tables 3 and 4) on the posterior probability. In most functional categories, many genes detected with a posterior probability of 0.99 of having a fold change at least equal to 2 in one protocol were discovered at the slightly lower threshold of 0.90 in the other protocol (compare column 2 with column 3 in tables 3 and 4). These data show that the results generated by the two methods identify the same functional gene categories to be differentially expressed. Although the number of genes identified to be differentially expressed in each category by the two protocols may be different, the usage of any of these protocols will identify the same functional pathways to be affected when comparing the two cell lines.

Confirmation by QRT-PCR
Among the genes that were identified as differentially expressed by at least one protocol with a posterior probability of 0.99 of FC exceeding 2, 10 were selected for confirmation studies using QRT-PCR. Differences in expression of these genes between the NHTBE and NCI-H292 were estimated using the comparative C T method [14], normalizing the values relative to the housekeeping gene, GAPDH. Each gene was tested in duplicate reactions in three independent experiments. Replicates were used to compute 90% confidence intervals on the ∆∆C T scale, which were then converted to confidence intervals on the FC scale (Table 5). For each gene, array FC model estimates are listed for IVT and dIVT protocols in columns 2 and 3, respectively. These are printed in bold by protocol for those meeting the change criterion. Notice that these show array FC estimates well exceeding 2 for their respective protocols. Four probesets were selected which met the change criterion in both protocols (TNF, K6A, SFP and IGFB), from the 59 in the set intersection in Figure 6A (center), two in IVT alone (LOH and AXL) from the 163 in the exclusive IVT set (left), and four in dIVT alone (DMP, GR, EGFR and NFKB), from the 91 in the exclusive dIVT set (right). The following three columns list FC with 90% lower and upper bounds as modeled from the QRT-PCR data. Down regulation is reported as 1/FC. Eight of the 10 differentially expressed genes tested, as determined by the microarray data using either protocol, produced similar results with QRT-PCR. SFP was observed to be up regulated by both array protocols and unchanged (QRT-PCR Est. FC = 1/1.05) by QRT-PCR. QRT-PCR results for EGFR showed down regulation while up on both array protocols. These contradictory results may be due to the presence of degenerate probe sets for these transcripts on the arrays that may cross hybridize with other genes, leading to false results. In a recently published report, out of a set of 66 genes identified as differentially expressed by macroarray screening, only 40 genes could be validated by QRT-PCR [23].

Conclusion
This study was performed to compare the relative effects of using two different protocols for target preparation for hybridization to high-density oligonucleotide arrays, on the final results generated by these experiments. Samples from two different human cell lines, a normal and a cancer cell line, were used for target preparation by the two different methods and hybridized to U95Av2 GeneChips. The conclusions derived from this study were that: (1) the decrease in the number of genes detected with the dIVT protocol was not substantial. For the 0.01 and 0.05 levels, the loss in detection for amplification was 8% and 10% respectively in the normal cell line. The loss was 11% and 12% in the cancer line (see Figure 4), (2) when comparing differentially expressed genes that had a posterior proba- bility greater than 99% of fold change greater than 2, in data generated by IVT but not dIVT, more than 60% of these genes had posterior probabilities greater than 90% in data generated by dIVT. Thus, these genes may still have been discovered using dIVT protocol at a lower threshold based on a search, for instance that stressed functional stratification rather than simple criteria inclusion, (3) the two protocols showed reasonable agreement in the functional categories identified to be differentially expressed based on the gene lists generated by each protocol and (4) hierarchical clustering showed that variation due to biology of the samples was higher than that due to protocols used for target preparation.

Cell Culture and RNA Extraction
NHTBE cells (strain 6178, Clonetics, San Diego, CA) were seeded onto a 24 mm permeable membrane insert (1 ×     For the dIVT protocol, the first and second cDNA strands were synthesized as described above. The first transcription was performed in the absence of biotin-labeled ribonucleotides, resulting in unlabeled cRNA, which was then used as starting material for the second cycle. In the second cycle, the first and second cDNA strands were synthesized as described above. The second transcription was performed in the presence of biotin-labeled-ribonucleotides, resulting in labeled cRNA. The cRNA was fragmented and checked by gel electrophoresis, as reported earlier [7]. A representative picture of such a gel is shown in figure 1. The intact cRNA samples that appear as a smear on gels are reduced to 50-100 bp fragments after treatment.

Hybridization, Staining and Imaging
The Affymetrix GeneChip system was used for hybridization, staining and imaging of the probe arrays. Hybridization cocktails of 300 µl each containing 15 µg of cRNA and exogenous hybridization controls were prepared as described previously [7] and hybridized to U95Av2 Gene-Chips (Affymetrix, Santa Clara, CA) overnight at 42°C. Hybridized fragments were detected using streptavidinlinked to phycoerythrin (Molecular Probes, Eugene, OR; [7]). GeneChips were scanned and imaged using Affymetrix Microarray Analysis Suite (MAS), version 5.0 [10]. The MAS expression data report was generated for each sample and used to judge the quality of sample preparation and hybridization. The report includes information about noise, background and percentage of probe sets called present based on arbitrary threshold values [10]. Information about performance of exogenously added prokaryotic hybridization control genes such as BioB, BioC and BioD [7] of the E. coli biotin synthesis pathway and the ratio of intensities of 3' probes to 5' probes for housekeeping genes such as GAPDH and β-actin are also included in the report.

Microarray Data Analysis
MAS, version 5.0, [10] was used for data acquisition, quantification, and normalization. This software is designed to take as input either the raw binary file containing the TIFF image pixel level information or a .CEL file, a smaller sized file summarizing the raw file. The software output includes summary statistics of probe set intensity such as signal and p-values for detection (detection p-values) testing whether or not signal was significantly greater than 0. Quality of the automated grid alignment was assessed using a MATLAB (The Math-Works, Natick, MA) script [11]. These were developed to perform high-throughput quality control with Affymetrix GeneChips. Overall array quality was assessed using the MAS 5.0 detection p-values along with the outlier detection implemented in dChip [12]. Normalized data was exported from MAS 5.0 to S-Plus (Insightful Corp., Seattle WA), where the gene expression values were transformed by computing the logarithm (base two). We identified differentially expressed genes by constructing a linear model that accounted for known factors in the experimental design and applying Bayesian methods [13] to compute the posterior probability of a desired fold change (FC).

Quantitative Real Time PCR (QRT-PCR)
Differentially expressed genes identified by microarray data analysis were validated by QRT-PCR. Four micrograms of template RNA were used in 210 µl of RT reaction mix according to the manufacturer's instructions using the GeneAmp RNA PCR Core Kit (Applied Biosystems, Foster City, CA, USA). The reaction was performed at 42°C for 30 min followed by denaturation of enzymes at 95°C for 5 min. Each QRT-PCR was performed in quadruplicate in 25 µl volume in iCycler (Bio-Rad, Hercules, CA) using SYBR Green PCR Core Reagents (Applied Biosystems, Foster City, CA). Each PCR reaction used 0.2 mM of each gene-specific primer (Invitrogen Corporation, Carlsbad, CA), 3 mM MgCl 2 , and 3 µl of RT mix. GADPH was used as a reference. Delete sentence The thermo-cycling profile used was 55°C for 3 min, 95°C for 8 min, and 40 cycles of denaturation at 95°C for 20 sec and elongation at 60°C for 60 sec.

QRT-PCR Data Analysis
The QRT-PCR data were analyzed using the comparative CT method [14]. Briefly, the difference in cycle times, ∆CT, was determined as the difference between the tested gene and the reference housekeeping gene, GAPDH. We then obtained ∆∆CT by finding the difference between treatments. The fold change was calculated as . To determine confidence intervals, we used replicate measurements and experiments to model the variability in ∆∆CT with a normal distribution. We then converted the intervals to the fold-change scale.

Authors' Contributions
DG and KC performed the statistical analysis, DM and AR carried out the target preparation and hybridization, ZJ and JSK performed the cell culture, RNA extraction and QRT-PCR, LS, KC and MK conceived the study and participated in its design and coordination. All authors read and approved the final manuscript.