New data on robustness of gene expression signatures in leukemia: comparison of three distinct total RNA preparation procedures
BMC Genomics volume 8, Article number: 188 (2007)
Microarray gene expression (MAGE) signatures allow insights into the transcriptional processes of leukemias and may evolve as a molecular diagnostic test. Introduction of MAGE into clinical practice of leukemia diagnosis will require comprehensive assessment of variation due to the methodologies. Here we systematically assessed the impact of three different total RNA isolation procedures on variation in expression data: method A: lysis of mononuclear cells, followed by lysate homogenization and RNA extraction; method B: organic solvent based RNA isolation, and method C: organic solvent based RNA isolation followed by purification.
We analyzed 27 pediatric acute leukemias representing nine distinct subtypes and show that method A yields better RNA quality, was associated with more differentially expressed genes between leukemia subtypes, demonstrated the lowest degree of variation between experiments, was more reproducible, and was characterized with a higher precision in technical replicates. Unsupervised and supervised analyses grouped leukemias according to lineage and clinical features in all three methods, thus underlining the robustness of MAGE to identify leukemia specific signatures.
The signatures in the different subtypes of leukemias, regardless of the different extraction methods used, account for the biggest source of variation in the data. Lysis of mononuclear cells, followed by lysate homogenization and RNA extraction represents the optimum method for robust gene expression data and is thus recommended for obtaining robust classification results in microarray studies in acute leukemias.
Microarrays have been demonstrated to be a powerful technology capable of successfully identifying novel taxonomies for various types of cancers [1–5] and gene expression signatures could also be associated with clinical outcome [2, 4, 6–9]. Those findings indicate that the data from different microarray assays are comparable enough to identify biological heterogeneity between distinct tumor types. Moreover, it has recently been demonstrated that, under properly controlled conditions, it is feasible to perform tumor microarray analysis, at multiple independent laboratories [10–15]. In addition, it has been shown that sample preparation by different operators did not impair the robustness of so-called diagnostic gene expression signatures . To avoid possible sources of variation in the data, individual laboratories developed standardized protocols involving all the various steps of the sample preparation procedure, starting from tumor sample collection, through sample processing, total RNA isolation, cDNA synthesis, cRNA synthesis and labeling, target fragmentation, microarray hybridization, to washing and staining protocols. Users are recommended to use specific RNA isolation protocols, since one of the major concerns in microarray technology is the quality of starting material and various studies helped in a better understanding of the pre-analytical factors influencing gene expression signatures in peripheral blood and bone marrow [17, 18]. However, until now, no fundamental information has been available about the degree of variation in the leukemia gene expression profiles resulting from different RNA extraction procedures although it is recognized that different RNA stabilization and isolation techniques will introduce varying amounts of analytical noise into the data [19–21].
Here we present a comparative study of the microarray data using three different RNA isolation and purification techniques (HG-U133 Plus 2.0 microarrays, Affymetrix, Inc., Santa Clara, CA, USA). We have performed standardized experiments with total RNA extracted from pediatric acute leukemia patients to investigate whether different extraction protocols (see methods) result in comparable gene expression data from the same sample source (Figure 1A). Moreover, we assessed the variability between gene expression levels arising from multiple technical replicates of the same sample (Figure 1B). Leukemia gene expression signatures have been studied by numerous laboratories and have been proposed to have an application in a routine diagnosis workflow [22–25]. However, it is not clear, to what degree the various RNA isolation protocols impact the gene expression signatures due to method-related changes. We comprehensively addressed the question of RNA preparation for microarray analysis in leukemia and suggest a technique for introduction into routine laboratory diagnosis of pediatric acute leukemia by gene expression profiling.
Assessment of data quality
In this study we first monitored data quality parameters. All gene expression profiles passed the quality filter and met our criteria for inclusion into further data analyses [see Additional File 2]. In detail, the cRNA yield was higher than 10.0 μg, the percentage of present called probe sets represented on the HG-U133 Plus 2.0 microarray is greater or equal to 20.0%, the scaling factor is below 10, the ratios of intensities of exogenous Bacillus subtilis control transcripts from the Poly-A control kit (lys, phe, thr, and dap) are greater or equal to 1, and the intensity ratio of the 3' probe set to the 5' probe set for the housekeeping gene GAPD is less than 3.0. Four samples showed a higher 3'/5' GAPD ratio (#25 method C, two preparations of #26 method B, #16 method B) but had otherwise acceptable quality parameters.
As illustrated in Figure 2 the preparations of total RNA by QIAshredder homogenization followed by RNeasy purification (method A) resulted in acceptable cRNA yields and very reproducible low 3'/5' GAPD ratios. Preparations of total RNA by TRIzol (method B) yield slightly higher amount of cRNA, generate a lower image background as measured by Q value, but have a higher 3'/5' GAPD ratio. When the total RNA was prepared by TRIzol followed by RNeasy purification (method C) the cRNA yield was high, the background low, with the 3'/5' GAPD ratio being a little bit higher than for preparations of total RNA by QIAshredder homogenization followed by RNeasy purification. All three preparation methods generated an acceptable range of present calls on the whole genome microarray.
Total RNA quality can also be indirectly assessed by a so-called RNA degradation plot analysis as implemented in the "Simpleaffy" Bioconductor analysis package . The sample degradation was consistently more severe in gene expression profiles when total RNA was processed for microarray analysis directly after isolation with TRIzol only (method B) [see Additional File 1, Supplementary Figure 1]. This might reflect that in method B more impurities such as phenol, salts, or residual ethanol are present in the starting total RNA as compared to method A or method C. These impurities influence the sample preparation reactions' efficiency, e.g. by inhibiting enzyme activities during cDNA synthesis or in vitro transcription reaction, and thus impair the microarray data generated with method B.
Comparability of gene expression profiles
To assess the comparability of global gene expression data between samples isolated with different preparation methods it is useful to examine the overall signal distribution of all probe sets as density curve for each microarray experiment. Outlier experiments would be detected by their different behavior of the density curves. As shown in Figure 3A no substantial curve shifts in the microarray signal distribution are observed among samples representing different leukemia subtypes. The density curves are also overlapping when the signal distribution is plotted according to the total RNA preparation method (Figure 3B).
Unsupervised data analysis
We next investigated the consistency of gene expression measurements of leukemia samples when using different total RNA extraction methods by performing an unsupervised hierarchical clustering analysis. Expression data have been normalized using the PQN algorithm . 2821 genes were selected using the interquartile range (IQR) as filtering criteria. The resulting dendrogram (Figure 4) clearly grouped the samples first by patient replicates using three different extraction methods and secondly separates the leukemias by lineage origin in B lineage ALL (orange), T lineage ALL (blue) and AML (green). In 22/27 of the patient replicates samples processed by QIAshredder homogenization followed by RNeasy purification (method A) cluster next to the two TRIzol-based purifications (method B, C). In 5/27 of triplets method A and C clustered next to method B (TRIzol without further purification). In no case did methods A and B together cluster next to method C. Within each lineage dendrogram the samples from the same leukemia subclasses are linked to each other. Within the B lineage cluster two patients with c-ALL with t(9;22) are linked together as well as two patients with hyperdiploid karyotype. Also, 3 patients with c-ALL with t(12;21) are linked in the same sub-branch. Patient samples with c-ALL-preB with DNA-index DI = 1 and negative for recurrent translocations are distributed over the three sub-branches of the B-ALL cluster. The latter may be interpreted as an illustration of the known heterogeneity within this subclass of acute leukemia. The group of T-ALL samples is not further subdivided. The cluster of myeloid leukemias is divided into two branches: AML with t(11q23)/MLL and AML with normal karyotype or other abnormalities. This clearly demonstrates that the underlying biology and not the RNA extraction protocol accounts for the biggest source of variation in the data. Also, in an unsupervised Principal Component Analysis (PCA) two distinct types of AML are clearly separated from T lineage ALL and from B lineage ALL and the three total RNA preparation methods for each patient sample can be found in close proximity next to each other [see Additional File 1, Supplementary Figure 2].
Supervised data analysis
A supervised analysis was performed to assess the potential impact of the use of different total RNA extraction methods on a leukemia classification approach. An all-pairwise t-test analysis identified differentially expressed genes that would distinguish between the 9 classes of pediatric leukemias that are represented in our dataset. A gene set of 1089 differentially expressed probe sets was then examined by three-dimensional PCA. As shown in Figure 5A this gene set clearly separates the various leukemia lineages (B lineage ALL, T lineage ALL, AML) from each other. In the AML group t(11q23)/MLL positive samples are separated from AML with a normal karyotype or other abnormalities. In the B lineage ALL group subclusters can be identified for ALL with the recurrent translocations t(1;19), t(4;11), t(9;22), or t(12;21). Importantly, Figure 5B demonstrates that the three preparation methods for each patient sample can be found in close proximity next to each other. This again indicates that the data variability due to different preparation methods is less influential in the gene expression profiles than the leukemia subclass.
As shown in Figure 1B three patients had been analyzed with three technical replicates. To further assess the influence of total RNA preparation methods on a potential leukemia classification approach, an one-way analysis of variance (ANOVA) was performed separately for these technical replicates. For each method A, B, and C the absolute number of differentially expressed genes was identified using the following filtering strategy: (i) filtering by present calls, followed by (ii) filtering by fold-change, and (iii) filtering by false discovery rate (FDR). In detail, in the first filtering step for every probe set of 9 microarrays at each ANOVA at least 3 microarrays called the probe set as "present". In the second filtering step for every probe set at each comparison, i.e. #25 vs. #26, #25 vs. #27, and #26 vs. #27, the fold change is at least 1.5 fold. In the third filtering step the FDR cutoff was set as a threshold of 0.001. Then, the number of differentially expressed genes that are overlapping between the three methods was summarized. The analysis results are summarized in Figure 6. Figure 6A represents the FDR curves for the three different methods. At a FDR of 0.1% it can be observed that the absolute number of differentially expressed genes between the various leukemia subclasses is the highest when method A is performed (n = 13,010). The second highest number of differentially expressed genes is observed with method B (n = 11,517). The lowest number of differentially expressed genes is observed with method C (n = 9,794).
We next investigated the percentage of overlapping genes that are found to be differentially expressed between the three methods used when analyzing the various leukemia subclasses in a supervised way. The percentage of overlapping genes is another suitable parameter to address the impact of the use of different total RNA extraction methods on a leukemia classification approach. Figure 6B represents a Venn diagram visualization of the absolute number of differentially expressed genes that are overlapping between the three methods at a chosen false discovery rate (FDR) of 0.1%. In detail, n = 7,728 genes are found to be consistently differentially expressed between the various leukemia subclasses when comparing all three methods. Overall, comparisons of absolute numbers of differentially expressed genes of method A showed a greater overlap to the other methods than comparisons based on method B or method C, respectively. This can also be examined by percentages of overlapping differentially expressed genes between the three methods (Figure 6C). Again, at a chosen FDR of 0.1% the highest percentage of overlap is observed for method A. In detail, 83.61% of differentially expressed genes between the 9 leukemia subclasses are overlapping in the comparison of method A to method B. 91.91% of genes are commonly detected to be overlapping in the comparison of method A to method C. The second highest overlap is identified in the comparison of method B to method A (74.01%) and to method C (82.68%). Only 69.19% of differentially expressed genes are overlapping in the comparison of method C to method A, and 70.31% are overlapping in the comparison of method C to method B. Interestingly, n = 2,107 genes are exclusively found to be differentially expressed when using method A. An analysis where these 2,107 genes were annotated according to their biological function revealed that the top biological functions associated with these genes were cancer, cell cycle, cell signaling, DNA replication, recombination, and repair, gene expression, or RNA post-transcriptional modification [see Additional File 1, Supplementary Figure 3].
Additionally, to further illustrate the assay performance, a statistical power analysis for the RNA preparation methods A, B, and C is performed based on the Bioconductor package "ssize". The power analysis is used, for statistical comparison of identical leukemia samples, to assess the precision of technical replicates obtained from different RNA preparation methods. The data sets generated based on the preparations of total RNA following the methods A and B have greater average statistical power than the microarray data set based on method C [see Additional File 1, Supplementary Figure 4].
In summary, these analyses indicate that preparation of total RNA by QIAshredder homogenization followed by RNeasy purification is a robust sample preparation method for microarray experiments that outperforms other procedures for isolation of total RNA.
Reproducibility and precision of different sample preparation methods
As three patients had been analyzed with three technical replicates (Figure 1B) we therefore were further able to assess the technical reproducibility and precision of gene expression data using the different total RNA extraction methods by examining squared correlation coefficients (R2), box plots, scatter plots, and coefficient of variation (CV) assessments. These analyses included all 54675 probe sets represented on the HG-U133 Plus 2.0 microarray.
As shown in Figure 7, the mean values and interquartile ranges (IQR) of probe set level signals (PS) are highly comparable within the technical replicates as well as across three sample preparation methods. Furthermore, a pairwise scatter plot analysis demonstrates that gene expression data are well correlated within the three sample preparation methods [see Additional File 1, Supplementary Figures 5A,B,C]. The squared correlation coefficients R2 range from 0.985 to 0.989 for preparations of total RNA by QIAshredder homogenization followed by RNeasy purification (method A), 0.976 to 0.987 for TRIzol isolation (method B), and 0.967 to 0.988 for TRIzol followed by RNeasy purification (method C). Between the three different sample preparation methods the mean value of R2 is 0.952 and standard deviation is 0.005 for method A versus method B, 0.976 mean value and 0.005 standard deviation for method A versus method C, and 0.965 mean value and 0.011 standard deviation for method B versus method C, respectively.
Analysis of coefficient of variation is a useful way for assessment of reproducibility and precision of the gene expression profiles generated from three different total RNA sources. The box plots demonstrate the variability in gene expression measurements within the three technical replicates using different sample preparation methods [see Additional File 1, Supplementary Figure 6]. The data demonstrate that the sample replicates prepared with QIAshredder homogenization followed by RNeasy purification (method A) are tighter and more consistent across the three different subtypes of pediatric leukemia samples than those obtained with the other two RNA isolation methods. Also, it can be seen that microarray data generated with QIAshredder homogenization followed by RNeasy purification is least varied, most reproducible and precise. Supplementary Figure 7 [see Additional File 1, Supplementary Figure 7] represents the slopes in the scatter plots of the standard deviation versus the mean PS intensity signals calculated for each probe set on the HG-U133 Plus 2.0 microarray, referred to as robust CV (as described in the formula). Mean value and standard deviation of the slopes are 0.025 and 0.007 for method A, 0.052 and 0.017 for method B, 0.035 and 0.019 for method C.
Recent investigations successfully applied gene expression microarrays to classify known tumor types and also various hematological malignancies [5, 25, 28–34]. The increasing amount of data supports the concept that microarray analysis could be introduced soon into the routine classification of cancer [16, 23, 35]. However, several questions about the multitude of sources of variation in gene expression data have not been addressed and therefore continue to leave doubts about the performance of gene expression microarrays in clinical laboratory diagnosis. Here, for the first time, we present a study focused on analyzing the impact of different RNA preparation procedures on gene expression data for different subtypes of pediatric acute leukemias. The sample preparation and purification methods analyzed here are not only the three currently most used protocols for isolation of total RNA in laboratory diagnosis analyses but are also used by many laboratories working with different microarray platforms. The protocols examined are method A: lysis of the mononuclear cells, followed by lysate homogenization, which reduces viscosity caused by high-molecular-weight cellular components and cell debris, using a biopolymer shredding system in a microcentrifuge spin-column format, followed by total RNA purification; method B: TRIzol RNA isolation, and method C: TRIzol RNA isolation followed by a total RNA purification step using selective binding columns. The RNA purification step, based on selective silica-membrane, purifies all RNA molecules longer than 200 nucleotides consequently increasing the amount of mRNA. These three methods were analyzed in triplicates for each of 24 samples. Moreover, for an additional three samples triplicate technical replicates were performed for each protocol. The main purposes of this investigation were to address to what extent distinct total RNA template isolation techniques impair the precision and reproducibility of gene expression data from the same sample and secondly, whether the underlying characteristic leukemia-specific gene expression signatures are affected by the RNA preparation procedure. We finally aimed to identify the most robust sample preparation method for microarray experiments and, at the same time, a technique that could be introduced into daily routine laboratory practice.
After a first analysis of the quality of our microarray data, we could assert that since in all cases the quality parameters met our criteria, each of the three preparation methods is able to generate acceptable gene expression profiles of pediatric leukemias. We found that samples representing different leukemia subclasses and extracted using different RNA preparation methods are characterized by a high comparability of gene expression data thus demonstrating that sample preparation procedures do not impair the overall probe set signal intensity distribution. Importantly, even though yielding lower amounts of cRNA if compared to TRIzol (method B) and TRIzol followed by RNeasy (method C) protocols (A<C<B; P = 5,308e-12), the isolation of total RNA using QIAshredder homogenization followed by RNeasy purification (method A) resulted in a better quality of starting material as demonstrated by the A260/280 ratio of cRNA (A<B, C<B, A~C; p = 0,00227), by very reproducible low 3'/5' GAPD ratios, and by consistently lower scaling factors (A<B, A<C, B~C; p = 1,477e-5). This was then further examined by a so-called RNA degradation plot analysis as implemented in the Simpleaffy Bioconductor analysis package . This analysis, although being an indirect approach for assessing the sample quality, demonstrated that the overall quality was consistently lower for microarray data when total RNA was processed for microarray analysis directly after isolation with TRIzol only (method B). While Agilent Bioanalyzer measurements showed acceptable total RNA quality profiles for all three methods the RNA degradation plot analysis might be a good way to indirectly identify poor quality samples via their global gene expression signatures on a probe level. The reason that total RNA samples prepared using method B demonstrate poor quality is probably due to the fact that impurities such as salts or residual amounts of phenol or ethanol are carried over in the sample preparation assay and subsequently impair enzymatic reactions.
Next, an unsupervised hierarchical clustering as well as unsupervised principal component analyses demonstrated that samples are grouped first by each patient's replicate method conditions, then by leukemia type, and finally by leukemia lineage. In fact, the B lineage ALL samples are all clustered together and separately grouped from T-ALL and AML. Moreover, inside each lineage-cluster leukemias with the same diagnostic features – e.g. recurrent translocations – are linked to each other. This finding is the demonstration that the variation in sample preparation method is a secondary effect, and that the major splits in the clusters reflect true underlying biological differences between leukemias.
These findings are then confirmed by a subsequent supervised analysis of gene expression data. Considering only the (n = 1,089) differentially expressed genes between the nine distinct leukemia categories that we studied here, all samples are clearly separated by leukemia lineages and without being influenced by the total RNA isolation method. Furthermore, AML with normal karyotype is separated from the two patient samples with AML with t(11q23)/MLL demonstrating an intra-lineage distinction within the AML group. The same separation can be observed in the B lineage ALL group where samples with the chromosomal aberrations t(1;19), t(4;11), t(9;22), or t(12;21) are split into distinct groups. As such, this is also an independent confirmation of the clustering organizations as presented in recent gene expression profiling studies of acute lymphoblastic leukemias [5, 25, 28, 30–33, 36].
The first conclusion we draw from this study is that underlying biological characteristics of the pediatric acute leukemia classes are quite significant and largely exceed the variations between different total RNA sample preparation protocols. Having shown that at a chosen false discovery rate of 0.01% method A is producing a higher number of differentially expressed genes as compared to method B and method C, we would propose that lysis of the mononuclear cells, followed by lysate homogenization (QIAshredder) and total RNA purification (Qiagen) is the more robust total RNA isolation procedure for gene expression experiments using microarray technology. The importance of this new data is further strengthened by the analysis of the technical replicates. In fact, the gene expression data obtained with method A show the lowest degree of variation and are more reproducible, as compared to the alternative methods we tested for the isolation of total RNA. Finally, all these evidences, combined with the standardized microarray analysis protocol that we followed for this study let us conclude that the initial homogenization of the leukemia cell lysate followed by total RNA purification using spin columns is currently the optimal protocol available with respect to the robustness of gene expression data and that this method is practical for a routine laboratory use. Here we limited our microarray study to pediatric leukemia, but certainly these statements could also be applied to similar cohorts of adult leukemias.
Between December 2005 and March 2006 samples from twenty-seven acute pediatric leukemia patients were analyzed at the time of diagnosis. All patients received a laboratory diagnosis based on white blood cell count, cytomorphology, cytochemistry, multiparameter immunophenotyping, cytogenetics, fluorescence in situ hybridization (FISH), and molecular genetics (PCR). Chromosome aberrations t(1;19)(q23;p13)(E2A-PBX1), t(4;11)(q21;q23)(MLL-AF4), t(9;22)(q34;q11)(BCR-ABL) t(12;21)(p13;q22)(TEL-AML1), t(8;21)(q22;q22)(AML1-ETO), t(15;17)(q22;q21)(PML-RARA), inv(16)(p13;q22)(CBFB-MYH11), and t(8;14)(q24;q32) were screened following the BIOMED-1 concert action protocol . Also, DNA index (DI) value analysis for all samples was performed to distinguish between patients with hyperdiploid karyotype and normal ploidy or hypodiploidy as reported by the Pediatric Oncology Group (POG) and Berlin-Frankfurt-Munster (BFM) group . Patients with a DI value between 1.16 and 1.6 as detected by flow cytometry were considered hyperdiploid [38, 39]. Based on the laboratory diagnosis, patients were subsequently risk stratified and enrolled in the AIEOP LAL-2002 or LAM-2002 protocols. This study was conducted after obtaining the informed consent from all patients following the tenets of the Declaration of Helsinki and was approved by the ethics committees of the participating institutions before the initiation of the study. All but one sample were drawn from bone marrow (BM). For one patient, an infant patient (age lower than one year; patient #26), a peripheral blood (PB) specimen was processed. Mononuclear cells from patients were isolated using Ficoll-Hypaque (Pharmacia-LKB, Uppsala, Sweden) density gradient centrifugation at our laboratory. For three myeloid cases (samples #8, #16, and #26) the specimens were processed by hemolysis. Both childhood acute myeloid leukemia (AML) (n = 4) and acute lymphoid leukemia (ALL) (n = 23) samples were collected (Table 1). The AML group included samples with t(11q23)/MLL rearrangement (n = 2; #16 is t(9;11) and #26 is t(1;11)) and AML patients with normal karyotype or other abnormalities (n = 2). The ALL group included Pro-B-ALL t(4;11) (n = 1), Pro-B-ALL/c-ALL with t(9;22) (n = 2), T-ALL (n = 5), c-ALL with t(12;21) (n = 3), Pre-B-ALL with t(1;19) (n = 1), B lineage ALL with hyperdiploid karyotype (n = 3), and B lineage ALL negative for the screened recurrent translocations and with a DNA index value equal to 1.0 (n = 8). The percentage of blast cells ranged between 70% and 98%.
As outlined in the study concept in Figures 1A and 1B 15 × 106 fresh mononuclear cells were collected for each of the first twenty-four leukemia samples (#1–24). Subsequently, total RNA was extracted from aliquots of 5 × 106 cells and 10 × 106 cells following two distinct total RNA purification method A and method B, respectively (see "RNA isolation for microarray analysis"). Total RNA obtained from method B was either used for the subsequent microarray analysis without further purification (method B), or was additionally purified following method C (see "RNA isolation for microarray analysis"). Microarray analysis was performed on each sample and each preparation method (Affymetrix HG-U133 Plus 2.0). Thus, for 24 patient samples a total of 72 microarrays were analyzed (Figure 1A). In three additional samples (#25–27) 45 × 106 fresh mononuclear cells each were collected and divided into nine aliquots of 5 × 106 cells. Again, total RNA was extracted from each aliquot following one of the three methods and for each method three technical replicates were performed (A,A,A, B,B,B, C,C,C), resulting in additional 27 gene expression profiles on Affymetrix HG-U133 Plus 2.0 microarrays (Figure 1B) [see Additional File 3].
RNA isolation for microarray analysis
Mononuclear cells were processed immediately after or within 24 hours after the biopsy was obtained. Appearance and fluidity of the samples were monitored before starting with RNA isolation. Total RNA was isolated using three different methods. Method A: lysis of the mononuclear cells, followed by lysate homogenization using a biopolymer shredding system in a microcentrifuge spin-column format (QIAshredder, Qiagen, Hilden, Germany), followed by total RNA purification using selective binding columns (RNeasy Mini Kit, Qiagen). The cell lysate homogenization phase reduces viscosity caused by high-molecular-weight cellular components and cell debris. Method B: TRIzol RNA isolation (Invitrogen, Karlsruhe, Germany). Method C: TRIzol RNA isolation (Invitrogen) followed by a purification step (RNeasy Mini Kit, Qiagen). The RNA purification step previously mentioned combines the selective binding properties of a silica-based membrane with the speed of microspin technology. This system allows only RNA longer than 200 bases to bind to the silica membrane, providing an enriching for mRNA since nucleotides shorter than 200 nucleotides are selectively excluded. In all three methods we followed the protocols provided by the manufacturers. After extraction, total RNA was stored at -80°C until used for microarray analyses. RNA quality was assessed on the Agilent Bioanalyzer 2100 using the Agilent RNA 6000 Nano Assay kit (Agilent Technologies, Waldbronn, Germany). RNA concentration was determined using the NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Inc., Wilmington, DE USA). The overall total RNA quality was assessed by A260/A280 ratio (NanoDrop) and electropherogram (Agilent Bioanalyzer).
From each RNA preparation 2.0 μg of total RNA were converted into double-stranded cDNA by reverse transcription using a cDNA Synthesis System kit including an oligo(dT)24 – T7 primer (Roche Applied Science, Mannheim, Germany) and the Poly-A control transcripts (Affymetrix, Santa Clara, CA, USA). The generated cDNA was purified using the GeneChip Sample Cleanup Module (Affymetrix). Then, labeled cRNA was generated using the Microarray RNA target synthesis kit (Roche Applied Science) and an in vitro transcription labeling nucleotide mixture (Affymetrix). The generated cRNA was purified using the GeneChip Sample Cleanup Module (Affymetrix) and quantified using the NanoDrop ND-1000 spectrophotometer. In each preparation an amount of 11.0 μg cRNA were fragmented with 5× Fragmentation Buffer (Affymetrix) in a final reaction volume of 25 μl. The incubation steps during cDNA synthesis, in vitro transcription reaction, and target fragmentation were performed using the Hybex Microarray Incubation System (SciGene, Sunnyvale, CA, USA) and Eppendorf ThermoStat plus instruments (Eppendorf, Hamburg, Germany). Hybridization, washing, staining and scanning protocols, respectively, were performed on Affymetrix GeneChip instruments (Hybridization Oven 640, Fluidics Station 450Dx, Scanner GCS3000Dx) as recommended by the manufacturer.
Image data analysis
Microarray image files (.cel data) were generated using default Affymetrix microarray analysis parameters (GCOS 1.2 software). Subsequently, intensity signals were calculated based on the non-central trimmed mean of Perfect Match intensities with Quantile Normalization . For each gene expression profile a detailed data quality report has been generated to define the overall quality of each experiment [see Additional File 2]. The quality parameters that were monitored besides cRNA total yield and cRNA A260/A280 ratio included: (i) background noise (Q value), (ii) percentage of present called probe sets, (iii) scaling factor, (iv) information about exogenous Bacillus subtilis control transcripts from the Affymetrix Poly-A control kit (lys, phe, thr, and dap), and (v) the ratio of intensities of 3' probes to 5' probes for a housekeeping gene (GAPD).
The data pre-processing included the summarization to generate probe set level signals for each microarray experiment and was performed using the PS or PQN algorithms as described elsewhere . To analyze the quality and comparability of gene expression measurements we used a Quality Control (QC) matrix, density plots of scaled non-central trimmed mean of perfect match (PM) probe intensities (PS signal), and an unsupervised hierarchical clustering algorithm using Ward linkage of quantile normalized signals (PQN). To analyze the consistency of gene expression data we used a Principal Component Analysis (PCA) . A subset of genes was selected using interquartile range (IQR) as filtering criteria and visualized by hierarchical clustering . Data have further been analyzed using R software , Spotfire DecisionSite to generate the box plots , Ingenuity Pathways Analysis to annotate gene lists according to their biological function , and Partek Genomics Suite to generate signal density curves and PCA plots . The power analysis was performed using the Bioconductor package "ssize" . All microarray raw data are available through the Gene Expression Omnibus database, series accession number: GSE7757 .
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403: 503-511. 10.1038/35000501.
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001, 98: 13790-13795. 10.1073/pnas.191502998.
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000, 406: 536-540. 10.1038/35020115.
Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de RM, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I: Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci U S A. 2001, 98: 13784-13789. 10.1073/pnas.241500798.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531.
Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002, 8: 816-824.
Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Richards WG, Jaklitsch MT, Sugarbaker DJ, Bueno R: Using gene expression ratios to predict outcome among patients with mesothelioma. J Natl Cancer Inst. 2003, 95: 598-605.
Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM, Hurt EM, Zhao H, Averett L, Yang L, Wilson WH, Jaffe ES, Simon R, Klausner RD, Powell J, Duffey PL, Longo DL, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, Armitage JO, Montserrat E, Lopez-Guillermo A, Grogan TM, Miller TP, Leblanc M, Ott G, Kvaloy S, Delabie J, Holte H, Krajci P, Stokke T, Staudt LM: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002, 346: 1937-1947. 10.1056/NEJMoa012914.
van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van , Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002, 347: 1999-2009. 10.1056/NEJMoa021967.
Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM, Hurban P, Phillips KL, Xu J, Deng X, Sun YA, Tong W, Dragan YP, Shi L: Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol. 2006, 24: 1162-1169. 10.1038/nbt1238.
Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L, Wolfinger RD: Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project. Nat Biotechnol. 2006, 24: 1140-1150. 10.1038/nbt1242.
Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Scherf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, Leclerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24: 1151-1161. 10.1038/nbt1239.
Shippy R, Fulmer-Smentek S, Jensen RV, Jones WD, Wolber PK, Johnson CD, Pine PS, Boysen C, Guo X, Chudin E, Sun YA, Willey JC, Thierry-Mieg J, Thierry-Mieg D, Setterquist RA, Wilson M, Lucas AB, Novoradovskaya N, Papallo A, Turpaz Y, Baker SC, Warrington JA, Shi L, Herman D: Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat Biotechnol. 2006, 24: 1123-1131. 10.1038/nbt1241.
Tong W, Lucas AB, Shippy R, Fan X, Fang H, Hong H, Orr MS, Chu TM, Guo X, Collins PJ, Sun YA, Wang SJ, Bao W, Wolfinger RD, Shchegrova S, Guo L, Warrington JA, Shi L: Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol. 2006, 24: 1132-1139. 10.1038/nbt1237.
Dobbin KK, Beer DG, Meyerson M, Yeatman TJ, Gerald WL, Jacobson JW, Conley B, Buetow KH, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ: Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res. 2005, 11: 565-572.
Kohlmann A, Schoch C, Dugas M, Rauhut S, Weninger F, Schnittger S, Kern W, Haferlach T: Pattern robustness of diagnostic gene expression signatures in leukemia. Genes Chromosomes Cancer. 2005, 42: 299-307. 10.1002/gcc.20126.
Baechler EC, Batliwalla FM, Karypis G, Gaffney PM, Moser K, Ortmann WA, Espe KJ, Balasubramanian S, Hughes KM, Chan JP, Begovich A, Chang SY, Gregersen PK, Behrens TW: Expression levels for many genes in human peripheral blood cells are highly sensitive to ex vivo incubation. Genes Immun. 2004, 5: 347-353. 10.1038/sj.gene.6364098.
Breit S, Nees M, Schaefer U, Pfoersich M, Hagemeier C, Muckenthaler M, Kulozik AE: Impact of pre-analytical handling on bone marrow mRNA gene expression. Br J Haematol. 2004, 126: 231-243. 10.1111/j.1365-2141.2004.05017.x.
Debey S, Schoenbeck U, Hellmich M, Gathof BS, Pillai R, Zander T, Schultze JL: Comparison of different isolation techniques prior gene expression profiling of blood derived cells: impact on physiological responses, on overall expression and the role of different cell types. Pharmacogenomics J. 2004, 4: 193-207. 10.1038/sj.tpj.6500240.
Feezor RJ, Baker HV, Mindrinos M, Hayden D, Tannahill CL, Brownstein BH, Fay A, MacMillan S, Laramie J, Xiao W, Moldawer LL, Cobb JP, Laudanski K, Miller-Graziano CL, Maier RV, Schoenfeld D, Davis RW, Tompkins RG: Whole blood and leukocyte RNA isolation for gene expression analyses. Physiol Genomics. 2004, 19: 247-254. 10.1152/physiolgenomics.00020.2004.
Staal FJ, Cario G, Cazzaniga G, Haferlach T, Heuser M, Hofmann WK, Mills K, Schrappe M, Stanulla M, Wingen LU, van Dongen JJ, Schlegelberger B: Consensus guidelines for microarray gene expression analyses in leukemia from three European leukemia networks. Leukemia. 2006, 20: 1385-1392. 10.1038/sj.leu.2404274.
Bullinger L, Dohner K, Bair E, Frohling S, Schlenk RF, Tibshirani R, Dohner H, Pollack JR: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med. 2004, 350: 1605-1616. 10.1056/NEJMoa031046.
Haferlach T, Kohlmann A, Schnittger S, Dugas M, Hiddemann W, Kern W, Schoch C: Global approach to the diagnosis of leukemia using gene expression profiling. Blood. 2005, 106: 1189-1198. 10.1182/blood-2004-12-4938.
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A. 2001, 98: 15149-15154. 10.1073/pnas.211566398.
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002, 1: 133-143. 10.1016/S1535-6108(02)00032-6.
Wilson CL, Miller CJ: Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis. Bioinformatics. 2005, 21: 3683-3685. 10.1093/bioinformatics/bti605.
Liu WM, Li R, Sun JZ, Wang J, Tsai J, Wen W, Kohlmann A, Mickey WP: PQN and DQN: Algorithms for expression microarrays. J Theor Biol. 2006, 243 (2): 273-278. 10.1016/j.jtbi.2006.06.017.
Chiaretti S, Li X, Gentleman R, Vitale A, Wang KS, Mandelli F, Foa R, Ritz J: Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation. Clin Cancer Res. 2005, 11: 7209-7219. 10.1158/1078-0432.CCR-04-2165.
Ferrando AA, Neuberg DS, Staunton J, Loh ML, Huard C, Raimondi SC, Behm FG, Pui CH, Downing JR, Gilliland DG, Lander ES, Golub TR, Look AT: Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer Cell. 2002, 1: 75-87. 10.1016/S1535-6108(02)00018-1.
Kohlmann A, Schoch C, Schnittger S, Dugas M, Hiddemann W, Kern W, Haferlach T: Molecular characterization of acute leukemias by use of microarray technology. Genes Chromosomes Cancer. 2003, 37: 396-405. 10.1002/gcc.10225.
Kohlmann A, Schoch C, Schnittger S, Dugas M, Hiddemann W, Kern W, Haferlach T: Pediatric acute lymphoblastic leukemia (ALL) gene expression signatures classify an independent cohort of adult ALL patients. Leukemia. 2004, 18: 63-71. 10.1038/sj.leu.2403167.
Ross ME, Zhou X, Song G, Shurtleff SA, Girtman K, Williams WK, Liu HC, Mahfouz R, Raimondi SC, Lenny N, Patel A, Downing JR: Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. Blood. 2003, 102: 2951-2959. 10.1182/blood-2003-01-0338.
Ross ME, Mahfouz R, Onciu M, Liu HC, Zhou X, Song G, Shurtleff SA, Pounds S, Cheng C, Ma J, Ribeiro RC, Rubnitz JE, Girtman K, Williams WK, Raimondi SC, Liang DC, Shih LY, Pui CH, Downing JR: Gene expression profiling of pediatric acute myelogenous leukemia. Blood. 2004, 104: 3679-3687. 10.1182/blood-2004-03-1154.
Schoch C, Kohlmann A, Schnittger S, Brors B, Dugas M, Mergenthaler S, Kern W, Hiddemann W, Eils R, Haferlach T: Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles. Proc Natl Acad Sci U S A. 2002, 99: 10008-10013. 10.1073/pnas.142103599.
Ebert BL, Golub TR: Genomic approaches to hematologic malignancies. Blood. 2004, 104: 923-932. 10.1182/blood-2004-01-0274.
Kohlmann A, Schoch C, Dugas M, Schnittger S, Hiddemann W, Kern W, Haferlach T: New insights into MLL gene rearranged acute leukemias using gene expression profiling: shared pathways, lineage commitment, and partner genes. Leukemia. 2005, 19: 953-964. 10.1038/sj.leu.2403746.
van Dongen JJ, Macintyre EA, Gabert JA, Delabesse E, Rossi V, Saglio G, Gottardi E, Rambaldi A, Dotti G, Griesinger F, Parreira A, Gameiro P, Diaz MG, Malec M, Langerak AW, San Miguel JF, Biondi A: Standardized RT-PCR analysis of fusion gene transcripts from chromosome aberrations in acute leukemia for detection of minimal residual disease. Report of the BIOMED-1 Concerted Action: investigation of minimal residual disease in acute leukemia. Leukemia. 1999, 13: 1901-1928. 10.1038/sj/leu/2401592.
Smith M, Arthur D, Camitta B, Carroll AJ, Crist W, Gaynon P, Gelber R, Heerema N, Korn EL, Link M, Murphy S, Pui CH, Pullen J, Reamon G, Sallan SE, Sather H, Shuster J, Simon R, Trigg M, Tubergen D, Uckun F, Ungerleider R: Uniform approach to risk classification and treatment assignment for children with acute lymphoblastic leukemia. J Clin Oncol. 1996, 14: 18-24.
Harris MB, Shuster JJ, Carroll A, Look AT, Borowitz MJ, Crist WM, Nitschke R, Pullen J, Steuber CP, Land VJ: Trisomy of leukemic cell chromosomes 4 and 10 identifies children with B-progenitor cell acute lymphoblastic leukemia with a very low risk of treatment failure: a Pediatric Oncology Group study. Blood. 1992, 79: 3316-3324.
Mardia KV, Kent JT, Bibby JM: Multivariate analysis. London: Academic Press. 1979
Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.
The R Project for Statistical Computing: [http://www.R-project.org]
Spotfire DecisionSite Product Suite, Start Page. [http://www.spotfire.com/products/decisionsite.cfm]
Ingenuity Systems, Start Page. [http://www.ingenuity.com]
Partek Incorporated, Start Page. [http://www.partek.com]
Dudoit S, Gentleman RC, Quackenbush J: Open source software for the analysis of microarray data. Biotechniques. 2003, Suppl: 45-51.
Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R: NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res. 2005, 33: D562-D566. 10.1093/nar/gki022.
Supported in part by Fondazione Città della Speranza, CNR, MURST ex 40% and 60% and Roche Molecular Systems, Inc., Pleasanton, CA, USA. The authors would like to thank the European LeukemiaNet gene expression profiling working group members Torsten Haferlach, Ken Mills, and Amanda Gilkes for helpful comments and critical reading of the manuscript.
This study is part of the MILE Study (Microarray Innovations In LEukemia) program, an ongoing collaborative effort headed by the European Leukemia Network (ELN) and sponsored by Roche Molecular Systems, Inc., addressing gene expression signatures in acute and chronic leukemias. This study further supports the AmpliChip Leukemia Test program, a gene expression microarray test for the subclassification of leukemia. Roche Molecular Systems, Inc. has business relationships with Qiagen and is currently validating Qiagen products for the AmpliChip Leukemia Test.
MCDO performed the microarray experiments and wrote the paper, LT contributed to perform the experiments, AZ, RL, and WML analyzed the microarray data, GB recorded clinical data, GK supervised the study and writing of the manuscript, and AK provided the original concept of the study, and contributed to writing the paper.
Electronic supplementary material
Additional File 1: Supplementary Data. This file contains supplementary figures with additional comments explaining details of analysis, results, and interpretation. (DOC 1 MB)
Additional File 2: This Excel file contains further details about each total RNA isolation method, including cRNA quality and quantity values as well as microarray quality and quantity values for each experiment. (XLS 32 KB)
Additional File 3: This Excel file contains details about each total RNA isolation method and leukemia classification details for each CEL file. All microarray raw data (*.cel files) are available online through the Gene Expression Omnibus database with the series accession number GSE7757. (XLS 25 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Campo Dell'Orto, M., Zangrando, A., Trentin, L. et al. New data on robustness of gene expression signatures in leukemia: comparison of three distinct total RNA preparation procedures. BMC Genomics 8, 188 (2007). https://doi.org/10.1186/1471-2164-8-188