- Research article
- Open Access
Analyses and interpretation of whole-genome gene expression from formalin-fixed paraffin-embedded tissue: an illustration with breast cancer tissues
© Kibriya et al; licensee BioMed Central Ltd. 2010
- Received: 18 May 2010
- Accepted: 8 November 2010
- Published: 8 November 2010
We evaluated (a) the feasibility of whole genome cDNA-mediated Annealing, Selection, extension and Ligation (DASL) assay on formalin-fixed paraffin-embedded (FFPE) tissue and (b) whether similar conclusions can be drawn by examining FFPE samples as proxies for fresh frozen (FF) tissues. We used a whole genome DASL assay (addressing 18,391 genes) on a total of 72 samples from paired breast tumor and surrounding healthy tissues from both FF and FFPE samples.
Gene detection was very good with comparable success between the FFPE and FF samples. Reproducibility was also high (r2 = 0.98); however, concordance between the two types of samples was low. Only one-third of the differentially expressed genes in tumor tissues (compared to corresponding normal) from FF samples could be detected in FFPE samples and conversely only one-fourth of the differentially expressed genes from FFPE samples could be detected in FF samples. GO-enrichment analysis, gene set enrichment analysis (GSEA) and GO-ANOVA analyses also suggested small overlap between the lead functional groups that were differentially expressed in tumor detectable by examining FFPE and FF samples. In other words, FFPE samples may not be ideal for picking individual target gene(s), but may be used to identify some of the lead functional group(s) of genes that are differentially expressed in tumor. The differentially expressed genes in breast cancer found in our study were biologically meaningful. The "cell cycle" & "cell division" related genes were up-regulated and genes related to "regulation of epithelial cell proliferation" were down-regulated.
Gene expression experiments using the DASL assay can efficiently handle fragmentation issues in the FFPE tissues. However, formalin fixation seems to change RNA and consequently significantly alters gene expression in a number of genes which may not be uniform between tumor and normal tissues. Therefore, considerable caution needs to be taken when interpreting gene expression data from FFPE tissues, especially in relation to specific genes.
- False Discovery Rate
- Adjacent Normal Tissue
- FFPE Tissue
- Enrichment Score
- FFPE Sample
High-throughput microarray technology is a powerful tool for genome-wide genotyping and gene expression analysis. Microarray-based gene expression assessment is a very useful method for prediction of diseases, tumor classification and drug responses. Although good quality RNA can be extracted from fresh frozen (FF) tissues, tissues preserved in RNAlater reagent and primary cell culture, the limited availability of these sources is a problem with regards to the utility of gene expression measurements. As Formalin-Fixed Paraffin-Embedded (FFPE) sample collection and storage are routine practices in pathological laboratories worldwide, there is great interest in the use of RNA extracted from those archived samples.
Integrity of nucleic acid is a very important issue for microarray analysis. It is well known that the FFPE tissue RNA is often degraded and at the same time, it is chemically modified [1, 2]. Previous studies using different microarray platforms showed that only approximately 3% or less of the RNA isolated from the FFPE tissue are useful for cDNA synthesis, which is an important step for gene expression analysis on a microarray platform . Illumina Inc. introduced a gene expression profiling method, DASL (cDNA-mediated Annealing, Selection, extension and Ligation) specially designed for analysis of fragmented RNA samples [4–6]. In the present study, we used the Whole Genome DASL Assay on paired breast tumor and surrounding healthy tissue from both FFPE tissue and FF tissue to examine: (a) the feasibility of genome-wide gene expression analyses using DASL on FFPE tissue and (b) whether similar conclusions can be drawn by examining widely available FFPE tissue as a proxies for FF tissues.
The study was approved by the Institutional Review Board of The University of Chicago. We obtained four different sets of tissue samples (paired tumor and adjacent normal breast tissue from fresh frozen as well as corresponding FFPE blocks) from the same patients through our Human Tissue Resource Center (HTRC) at The University of Chicago http://pathcore.bsd.uchicago.edu/BSB/BSB_overview.shtml. HTRC collected these breast tissues from mastectomy/lumpectomy specimens that were not needed for pathological diagnosis by the Department of Pathology at The University of Chicago Medical Center. There were a total of 21 such patients for whom all 4 types of tissues were available. For the present study, tissue samples from the first 18 patients were used. These tissue samples were archived for a 3-6 year period.
RNA extraction and quality control
For the extraction of total RNA from both FFPE and FF breast tissue samples, we used the High Pure RNA Paraffin Kit (Roche Applied Science, cat# 03 270 289 001). Xylene was used for deparaffinization of the FFPE samples. Proteinase K digestion was carried out by overnight incubation at 55°C. DNAse treatment was carried out for all of the samples and samples were eluted with 40 μl of the elution buffer provided with the extraction kit. For quantification, each sample was (a) tested in a NanoDrop 1000 for concentration and 260/280 ratio and also (b) run in an Agilent Bioanalyzer using the Eukaryotic total RNA Nano Series II kit. Prequalification of all the RNA samples derive from FFPE and FF tissue was done by qPCR using SYBR Green detection. cDNA was synthesized as recommended by Illumina, and then a 90 bp amplicon was amplified from the highly expressed RPL13a ribosomal protein gene (GenBank Accession # NM_012423.2). The amplification reaction was carried out for all FFPE and FF samples in duplicates in ABI Prism 7900 HT Sequence Detection Systems (Applied Biosystems). The ΔCt was calculated as Ct (test sample) - Ct (control sample). As per Illumina protocol for DASL, a ΔCt of up to 12 was acceptable as prequalification for the assay. Only 4 out of 72 breast tissue samples had ΔCt just above 12 and none > 12.5. Only 1 of these 4 samples showed a poor performance on the assay (see result section).
Whole genome DASL assay
We used 300 ng of total RNA (5 μL at 60 ng/μL concentration) as starting material for each sample run on the microarray. Each chip accommodates 8 samples. Four types of sample for each patient were processed in one chip. The four types of sample were FF tumor tissue (T_FF), FF adjacent normal tissue (N_FF), FFPE tumor tissue (T_FFPE) and (4) FFPE adjacent normal tissue (N_FFPE). Thus, a single chip contained 8 samples from 2 patients and three such chips (24 samples) were processed in one batch. We also ran four technical replicates of the T_FFPE RNA samples to analyze the reproducibility of the DASL assay. According to the manufacturer's protocol, RNA was first converted to cDNA through reverse transcription with biotinylated primers. The cDNA was then annealed to the assay oligonucleotide and bound to streptavidin conjugated paramagnetic beads. After the oligohybridization, non-hybridized and miss-hybridized sequences were washed away and the hybridized sequences were extended and ligated to start PCR amplification with fluorescent tagged primers. The fluorescently labeled amplified PCR products were hybridized overnight onto the bead chip to be scanned on Bead Array reader.
Statistical analyses and power
Where Yijkl represents the gene expression intensity of gene "Y" in l-th sample with i-th tissue with j-th storage for the k-th case ID; μ is the common effect for the whole experiment; and εijkl represents the random error, which is assumed to be normally and independently distributed with mean 0 and standard deviation δ for all measurements. Case ID is a random effect in this model.
Where Yijkl represents the gene expression intensity of gene "Y" in l-th sample with i-th tissue for the j-th Batch no with k-th Block_Age. As in the model for combined analyses above, μ is the common effect for the whole experiment and εijkl represents the random error, which is assumed to be normally and independently distributed with mean 0 and standard deviation δ for all measurements. Batch no is a random effect in this model.
In GO Enrichment analysis, we tested if the genes found to be differentially expressed fall into a Gene Ontology category more often than expected by chance. We used Chi-square test to compare "number of significant genes from a given category/total number of significant genes" vs. "number of genes on chip in that category/total number of genes on the microarray chip". Negative log of the p-value for this test was used as the enrichment score. Therefore, a GO group with a high enrichment score represents the lead functional group. The enrichment scores were analyzed in a hierarchical visualization and in tabular form.
In addition to looking at differential expression at individual gene level, we also examined the differential expression of gene sets using the Gene Set Enrichment Analysis (GSEA) . Given an a priori defined set of genes S (sharing the same GO category), the goal of GSEA was to determine whether the members of S were randomly distributed throughout the ranked list or primarily found at the top or bottom. Considering the fact that GSEA can look at single variable (unadjusted expression), we also used GO-ANOVA that offers adjustments for other factors like "person-to-person" variation, "tissue type" variation etc.
Where Y represents expression of a GO-category, μ is the common effect or average expression of the GO-category, T is tissue-to-tissue (tumor/healthy) effect, P is patient-to-patient effect, G is gene-to-gene effect (differential expression of genes within the GO-category independent of tissue types), S(T*P) is sample-to-sample effect (this is a random effect, and nested in tissue and patient) and ε represents the random error.
Cross-validation: For the one-level cross validation, the data was first divided into 10 random partitions. At each iteration, 10% samples were held out for testing while the remaining 90% samples were used to fit the parameters of the model. We also used a 10 × 10 two-level nested cross-validation . In the outer cross-validation, with 10% of samples held out as test cases, the remaining 90% were used in a 10-fold cross-validation to determine the optimal predictor variables and other classifier parameters. The model that performed the best on the inner cross-validation was applied to the held-out test samples in the outer cross-validation. Thus an inner cross-validation was performed in order to select predictor variables and optimal model parameters, and an outer cross-validation was used to produce overall accuracy estimates for the classifier. In the first step, we considered all the differentially expressed genes for inclusion in the model and then in the next step(s), for selecting the top 50 or 100 genes, expression of which could be used to differentiate the FFPE samples from the FF samples, we used 3-way ANOVA ["storage" (FFPE or FF) adjusted for "tissue type" (tumor or normal) and "person-to-person" variation]. We tested three classification methods - (a) K-Nearest Neighbor (KNN) with Euclidean distance measure and 1-neighbor, (b) nearest centroid with equal prior probability and (c) linear discriminent analysis with equal prior probability.
In genome-wide gene expression experiments requiring multiple testing, it is more powerful and more reasonable to control false discovery rate (FDR) or positive FDR (pFDR) [12–14] instead of type I error. We followed that strategy for sample size calculation. When controlling FDR, the traditional approach of estimating sample size by controlling type I error is no longer applicable . In their paper, Liu P and Hwang JTG have compared calculations of sample size using four different approaches - all of which had good agreement and show that, for standardized effect size Δ/σ = 1 [e.g., for fold change of 1.4 with σ = 0.5; Δ/σ = (log2 1.4)/0.5) = 1], if identification of 95% of the truly altered genes are desired (our set target), then the estimated sample size for each group would be between 33 and 34. For standardized effect size Δ/σ = 2 [in other words, for fold change of 2 with σ = 0.5; as Δ/σ = (log2 2)/0.5) = 2], if identification of 95% of the truly altered genes is desired (our set target), then the estimated sample size for each group would be 11. Therefore, our study was sufficiently powered to detect a 1.4-fold change difference in the combined analysis and a 2-fold change in the subgroup analysis where FF and FFPE samples were analyzed separately.
Sample characteristics and assay performance
95% Confidence Interval for Mean
Detected Genes (0.05)
Difference of CT
Correlations of log2 signal intensity between a pair of RNA samples extracted from N_FF and N_FFPE tissue (r2 = 0.86) and between T_FF & T_FFPE tissue (r2 = 0.75) from same subject (case ID#50527) are shown in Figure 1B and Figure 1C respectively. Both the graphs show that in normal as well as in tumor tissue, a number of genes were found to be more than 2-fold up- or down-regulated in FFPE samples compared to corresponding FF samples. Similarly, correlations between signal intensity from paired RNA samples extracted from N_FF and corresponding T_FF tissue (r2 = 0.85) and paired RNA samples extracted from N_FFPE and T_FFPE tissue (r2 = 0.72) from the same subject (case ID#50527) are shown in Figure 1D and Figure 1E respectively. As expected, in both the FF and FFPE samples, there were a number of genes that were more than 2-fold up- or down-regulated in the tumor tissue compared to the corresponding normal tissue.
Sources of variation in the gene expression
In the next step, to further investigate the source of variation in the expression, we used multivariate ANOVA. We did not use any filter for selecting the genes to be included in the analysis. In other words, ANOVA was performed on log2-transformed intensity value for all the 18,391 genes irrespective of their level of detection. The average F-ratio (F-statistics for the test variable/F-statistics for error) for all the genes was considered as representative of significance of signal-to-noise ratio. Figure 2B shows the significance of different sources of variation in the entire data in a model where tissue type (tumor/normal), sample storage (FFPE/FF) and person-to-person variation (case ID#) were entered as explanatory variables at a time for gene expression. The figure shows that "sample storage" (F ratio 16.02) was the most significant source of variation, followed by the "disease status" (F-ratio 7.14) and person-to-person variation (F-ratio 2.22).
Differential gene expression in FFPE samples compared to FF
A 10 × 10 two-level nested cross-validation analysis using all the 1,863 differentially expressed genes in FFPE in present study suggested that the expression of these genes could differentiate FFPE samples from FF samples with overall accuracy of 95.7%. The same 10 × 10 two-level nested cross-validation analysis suggested that the expression of the top 100 genes could be used in a model that could differentiate FFPE samples from FF samples with overall accuracy of 100%. We also tested the top 50 genes, and again the overall accuracy was 100%. This further supported our ANOVA finding that a number of genes are differentially expressed in FFPE samples compared to their corresponding FF sample.
Differential expression profile at "gene set" level in FFPE samples compared to corresponding FF
Considering the effect of FFPE on the gene expression, we then looked at the differential gene expression in tumor tissue compared to adjacent normal tissue in two ways - (a) looking at the combined data (FF and FFPE) with adjustment for "sample storage" and "person to person variation" and (b) looking at the FF and FFPE data separately with paired comparisons to detect differential expression in tumor and to see if similar conclusion(s) could be made.
Differential gene expression in breast tumor tissue compared to corresponding adjacent normal breast tissue
Among the 18 individuals included in the analyses of breast tissues, one had fibroadenoma, 3 had no abnormality detected and in one the histopathology was unknown. Therefore, to get a clearer picture of genes differentially expressed in breast cancer, we included 52 samples from 13 patients with known histopathological diagnosis of breast cancer (4 samples from each patient). Of these 52 samples, one FF tumor sample was excluded due to poor performance on the chip (an outlier on PCA, as mentioned above). The analysis was restricted to the 51 arrays. In addition to PCA, we also used unsupervised hierchical clustering based on all 18,391 genes which showed clustering by sample storage (FF/FFPE) as well as by histopathology [figure not shown].
In this total set of 51 samples, we looked at the genome-wide (n = 18391) differential gene expression in breast tumor tissue compared to adjacent normal tissue after adjustment for "sample storage" (FF/FFPE), and "person to person variation" (case ID#). There were a total of 1319 genes (7.17% of total 18,391 genes; 604 were up-regulated and the rest 715 were down regulated in tumor) that were differentially expressed at least by 2-fold at FDR 0.05 level.
Figure 5C shows the overlap of differentially expressed genes in the FF and FFPE samples and combined analysis. Only 315 genes were picked in both FF and FFPE samples, and all these 315 were also picked-up in the combined analysis. In other words, only one-third of the differentially expressed genes in tumor detected in FF samples could be picked up in FFPE samples and only one-fourth of the differentially expressed genes in tumor found in FFPE samples would be picked up in FF samples. Having this in mind, we further analyzed the lists of differentially expressed genes for GO-Enrichment to see if genes of similar GO- groups were found to be enriched in these lists derived from FF and FFPE samples.
How do these gene lists compare to breast cancer or multiple cancer signatures from prior studies
Overlap between our gene lists and the lists from other studies &/or sources
Martin et al
Multiple cancer gene list **
SA Bioscience Cancer Panel
SA Bioscience Breast Cancer panel
Baasiri et al. (BCGD)
n = 21
n = 184
n = 83
n = 83
n = 51
List from combined analysis, n = 1319
List from analysiss of FF samples, n = 1275
List from analysiss of FFPE samples, n = 966
GO Enrichment Analyses of the lists of differentially expressed genes in breast cancer tissue compared to normal breast tissue in FF and FFPE samples
Therefore our findings suggest that the analysis of FFPE samples does not identify the exact same genes that would have been identified by analyzing FF samples, but at least, the list shows some similarity in terms of enrichment of GO-terms representing the lead functional groups of genes. In other words, FFPE samples may not be ideal for picking individual target gene(s), but may be used to identify the lead functional group(s) of genes that are differentially expressed in tumor.
For the up-regulated genes in breast tumor, that we found by analyzing the FF samples (and also in FFPE samples), if we looked at the enrichment of GO-terms under all the three major groups together - "biological process", "molecular function" and "cellular components", the top most ranking GO-term was "nucleosome" under "cellular components". All the up-regulated genes under this GO-term are from the same family of genes that are related to HIST1 and HIST2 proteins.
For the down-regulated genes in breast tumor, that we found by analyzing the FF samples, if we look at the enrichment of GO-terms under all the three major groups - "biological process", "molecular function" and "cellular components", the top ranking GO-term was "regulation of epithelial cell proliferation" under the "biological process". This finding is also very much relevant in the context of breast cancer pathogenesis. These down-regulated genes include FGF2, FGF9, APC, IGF1, CDKN2B, NOTCH1, LAMC1, LAMB1, NKX3-1, TBX18, and GAS1.
Interaction between "tissue type" and "sample storage"
List of 178 genes that show significant "Tissue * Storage" interaction and are also differentially expressed in breast cancer tissue at FDR 0.05 level with at least 1.4 fold change in the combined analysis.
p-value (Case ID)
p-value (Disease * Storage)
Fold- Change (Tumor vs. Normal)
Fold-Change (FFPE vs. FF)
Expression of known "clinically relevant" genes in our data
From a therapeutic point of view, determination of HER2, estrogen receptor (ESR) and progesterone receptor (PGR) gene expressions is very helpful in decision-making for therapeutic regimens . We therefore looked at these genes in particular in our data set (data not shown). Expression of the ESR1 gene (ESR1) was significantly affected by FFPE storage. Expression of the ESR2 and PGR genes were very minimally affected by storage, and that of HER2 was not affected at all. We found HER2 to be up-regulated by 1.5 fold (p = 0.0054) and PGR to be down-regulated by 1.9 fold (p = 0.006) in tumor compared to adjacent normal tissue.
Do we see same "gene sets" to be differentially expressed in breast tumor tissue compared to corresponding adjacent normal breast tissue in FF and FFPE samples?
After looking into the differential expression at individual gene level, we also looked for differential expression of different "gene sets" in breast cancer tissue by using the Gene Set Enrichment Analysis (GSEA) as well as GO-ANOVA. GSEA of the FF samples showed that a total of 100 "gene sets" (each representing different GO description terms) were differentially expressed (p-value = < 0.05) in breast cancer tissue compared to adjacent healthy tissue. On the other hand, GSEA of the FFPE samples showed a total of 440 "gene sets" were differentially expressed at p-value = < 0.05 in breast cancer FFPE tissue compared to adjacent healthy FFPE tissue. Only 38 "gene sets" (i.e. 38% "gene sets" from of the FF analysis and only 8.64% from FFPE analysis) were common in both the analyses. Therefore, GSEA also suggested the need for caution for interpretation of gene expression data from FFPE samples.
GO-ANOVA of the FF samples showed that a total of 883 "gene sets" (each representing different GO description terms) were differentially expressed at FDR 0.05 in breast cancer tissue compared to adjacent healthy tissue. On the other hand, GO-ANOVA of the FFPE samples showed that 1,034 "gene sets" were differentially expressed at FDR = < 0.05 in breast cancer FFPE tissue compared to adjacent healthy FFPE tissue. A total of 641 "gene sets" (i.e. 72.59% gene sets from FF and 61.99% from FFPE analysis) were common in both the analyses. This result also suggested the need for caution for interpretation of gene expression data from FFPE samples.
Results from GSEA for breast cancer in FF samples. The following 26 gene sets (GO-terms) were common in all the four lists shown in figure-4(v)
Gene Set Description
# of Genes
transforming growth factor beta receptor signaling pathway
fatty acid metabolic process
vitamin metabolic process
regulation of insulin secretion
regulation of biomineral formation
muscle cell differentiation
regulation of blood vessel size
regulation of tube size
BMP signaling pathway
regulation of cell proliferation
glucose metabolic process
branching morphogenesis of a tube
regulation of hormone secretion
magnesium ion binding
response to drug
small GTPase binding
protein dimerization activity
negative regulation of DNA metabolic process
Gene expression profiling of human cancer has proved valuable in cancer research leading not only to the identification of targets but also contributing to our understanding of the mechanisms of the process [23, 24]. The application of microarrays is limited by the availability of fresh frozen tissue or the tissue preserved in RNAlater reagent. As FFPE samples are available in almost all the pathological laboratories and are often available in conjunction with clinical and follow-up data, they would be considered as the most valuable sources for microarray analysis , provided similar information can be obtained as would be expected from analyzing the FF samples. Because of fragmentation [3, 26, 27] and some other chemical modifications of RNA in FFPE samples, currently gene expression studies are largely limited to immuno-histochemical (IHC) staining and RT-PCR, which allow only a few genes to be amplified at a time [3, 28, 29]. In this paper we mainly focus on the use of FFPE samples in genome-wide gene expression experiments. We have tried to analyze the data for differential expression from different angles - at individual gene level, at "gene set" level and also used different statistical methods. In follow-up paper we would focus more on gene selection and relevance to breast cancer biology.
Like other investigators [5, 28, 30, 31], we also observed high reproducibility across technical replicates regardless of the sample type. However, the concordance between the paired FF and FFPE samples was weaker in our study, which is also consistent with other studies [23, 32]. The tissue archival age, or the "FFPE block age" is another factor for consideration. Cronin et al. compared frozen section breast tissue with FFPE samples of "various block ages" by RT PCR for 92 genes and found a 90% signal loss in FFPE samples . Our data on the genome-wide level also suggested the significance of FFPE "block age" on gene expression data. Srinivasan et al. , and Karsten et al  and Masuda et. al  have reviewed in detail the effect of fixative and tissue processing on the content and integrity of nucleic acids. There are four types of reactions of formaldehyde with nucleic acids: (1) Addition reaction or methylolation - the N-H groups of primarily adenine and thiamine are converted to N-CH2-OH groups (methylol groups). The poly(A) tail on RNA is thus heavily methylolated leading to poor reverse transcription. This methylolation is possibly reversible through hydrolysis. (2) Cross-linking - methylolated bases can react with N-H groups (on proteins and nucleic acids) to form -N-CH2_N- cross-links, which are not easily hydrolyzed. (3) Formation of apyrimidinic/apurinic sites - the N-glycosidic bond between A, C, G, T or U and the sugar backbone is broken, leaving a blank space in the sequence. This is not base-specific, but is not reversible. (4) Fragmentation - Formaldehyde catalyzes the hydrolysis of phosphodiester bonds, fragmenting strands of nucleic acids, which is also not reversible. Therefore, it was not a surprise to find differential gene expression in FFPE samples compared to FF samples in the present study.
Bibikova et al. used a smaller panel of the DASL assay with 16 pairs of FF and FFPE samples from healthy and tumor breast tissue and healthy and colon cancer tissue . They found that FFPE samples had 50% less gene expression compared to matched FF samples, which may be due to RNA degradation related to fixation and storage . The present study is one of the first few adequately powered, whole-genome DASL assays interrogating more than 18,000 genes that has compared the results of paired tumor and normal tissue from FF and FFPE samples.
Our findings suggest that the analysis of FFPE samples does not identify the exact same genes that would have been identified by analyzing FF samples, but at least, the list shows some similarity in terms of enrichment of GO-terms representing the lead functional groups of genes. In other words, FFPE samples may not be ideal for picking individual target gene(s), but may be used to identify the lead functional group(s) of genes that are differentially expressed in tumor. Findings of the differentially expressed genes in breast cancer were biologically meaningful. On the one hand the "cell cycle" & "cell division" related genes were up-regulated and on the other hand, genes related to "regulation of epithelial cell proliferation" were down-regulated. Genes involved in metalloexopeptidase activity, transforming growth factor beta signaling pathway, BMP signaling pathway, were found to be down-regulated in breast cancer.
As mentioned in the results section, some of the genes (including PPARG and FGF2) were found repeatedly in the lists of different GO-terms. Peroxisome proliferator-activated receptor-γ (PPARG), is expressed in a large number of human cancers, including breast, colon, stomach, prostate, pancreas, bladder, placenta, lung, chondrosarcoma and in leukemia [34, 35]. Recently Jiang et al. showed that PPARG expression in immunohistochemistry was positively correlated to estrogen receptor status, inversely associated with histological grade and tumor size, and in survival analysis patients with higher PPARG expression had significantly better prognosis . In the same direction, the present study showed evidence of down-regulation of PPARG in breast cancer tissue. GWAS using germ line DNA showed a significant association of SNP in the FGFR2 gene with breast cancer . In the same line, in the present study, we also found FGF2 to be down-regulated in breast cancer. Another biologically relevant gene that we found to be differentially expressed in breast cancer was corticotropin releasing hormone binding protein (CRHBP). Corticotropin-releasing hormone is a potent stimulator of synthesis and secretion of preopiomelanocortin-derived peptides. Although CRH concentrations in the human peripheral circulation are normally low, they increase throughout pregnancy and fall rapidly after parturition. Maternal plasma CRH probably originates from the placenta. Human plasma contains a CRH-binding protein that inactivates CRH and may prevent inappropriate pituitary-adrenal stimulation in pregnancy.
An apparent weakness of the present study is the lack of gene expression data from breast tissue preserved in RNA stabilization buffer, which would have served as the gold standard against which FFPE samples could have been compared. However, 72% of the genes that we found to be differentially expressed in FFPE breast tissue compared to corresponding FF in present study were also differentially expressed in FFPE skin tissue compared to the "gold standard"- RNAlater preserved corresponding skin tissue sample (unpublished data from our group). One of the strengths of the present study is that it was adequately powered to detect differential expression arising from tissue storage (FFPE vs. FF) or disease status (tumor vs. adjacent normal tissue).
In agreement with other studies using the DASL platform, our present study also suggests the usefulness of DASL chemistry to study gene expression in fragmented RNA samples. DASL can efficiently handle the fragmentation issue of RNA in FFPE samples. However, formalin fixation used in FFPE induces significant gene expression change in a number of genes, and these changes may differ in degree or even in direction between tumor and normal tissue. Therefore, FFPE samples should not be directly compared with FF samples and considerable caution must be taken when interpreting gene expression data from FFPE samples. Despite these constraints, we found a number of biologically meaningful, differentially expressed genes related to HIST1, HIST2 proteins, and some other such as PPARG, FGF2, APOB, CRHBP, CETP, and RXRG in breast cancer tissue compared to corresponding adjacent normal breast tissue. The validity of these specific observations, however, needs to be confirmed in future larger studies.
Grant Support and Acknowledgements
The authors are thankful to Olufunmilayo Olopade, Leslie Martin and Maria Tretiakova for providing the breast tissue samples. The authors also express their thanks to Ronald Rahaman and Charlotte Dodsworth for critical reading of the text. This work was partly supported by the National Institutes of Health grants U01 CA122171, P30 CA 014599, P42ES010349, R01CA102484, R01CA107431 and P30CA125183.
- Masuda N, Ohnishi T, Kawamoto S, Monden M, Okubo K: Analysis of chemical modification of RNA from formalin-fixed samples and optimization of molecular biology applications for such samples. Nucleic Acids Res. 1999, 27 (22): 4436-4443. 10.1093/nar/27.22.4436.PubMed CentralPubMedView ArticleGoogle Scholar
- Srinivasan M, Sedmak D, Jewell S: Effect of fixatives and tissue processing on the content and integrity of nucleic acids. Am J Pathol. 2002, 161 (6): 1961-1971.PubMed CentralPubMedView ArticleGoogle Scholar
- Godfrey TE, Kim SH, Chavira M, Ruff DW, Warren RS, Gray JW, Jensen RH: Quantitative mRNA expression analysis from formalin-fixed, paraffin-embedded tissues using 5' nuclease quantitative reverse transcription-polymerase chain reaction. J Mol Diagn. 2000, 2 (2): 84-91.PubMed CentralPubMedView ArticleGoogle Scholar
- April C, Klotzle B, Royce T, Wickham-Garcia E, Boyaniwsky T, Izzo J, Cox D, Jones W, Rubio R, Holton K, et al: Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples. PLoS One. 2009, 4 (12): e8162-10.1371/journal.pone.0008162.PubMed CentralPubMedView ArticleGoogle Scholar
- Bibikova M, Talantov D, Chudin E, Yeakley JM, Chen J, Doucet D, Wickham E, Atkins D, Barker D, Chee M, et al: Quantitative gene expression profiling in formalin-fixed, paraffin-embedded tissues using universal bead arrays. Am J Pathol. 2004, 165 (5): 1799-1807.PubMed CentralPubMedView ArticleGoogle Scholar
- Bibikova M, Yeakley JM, Chudin E, Chen J, Wickham E, Wang-Rodriguez J, Fan JB: Gene expression profiles in formalin-fixed, paraffin-embedded tissues obtained with a novel assay for microarray analysis. Clin Chem. 2004, 50 (12): 2384-2386. 10.1373/clinchem.2004.037432.PubMedView ArticleGoogle Scholar
- Downey T: Analysis of a multifactor microarray study using Partek genomics solution. Methods Enzymol. 2006, 411: 256-270. 10.1016/S0076-6879(06)11013-7.PubMedView ArticleGoogle Scholar
- Eisenhart C: The assumptions underlying the analysis of variance. Biometrics. 1947, 3 (1): 1-22. 10.2307/3001534.PubMedView ArticleGoogle Scholar
- Tamhane AC, Dunlop DD: Statistics and data analysis: from elementary to intermediate. 2000, Prentice HallGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMed CentralPubMedView ArticleGoogle Scholar
- Tibshirani RJ, Efron B: Pre-validation and inference in microarrays. Stat Appl Genet Mol Biol. 2002, 1: Article1-PubMedGoogle Scholar
- Pounds S, Cheng C: Sample size determination for the false discovery rate. Bioinformatics. 2005, 21 (23): 4263-4271. 10.1093/bioinformatics/bti699.PubMedView ArticleGoogle Scholar
- Pounds S, Cheng C: Erratum: sample size determination for the false discovery rate. Bioinformatics. 2009, 25 (5): 698-699. 10.1093/bioinformatics/btn661.PubMedView ArticleGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100 (16): 9440-9445. 10.1073/pnas.1530509100.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu P, Hwang JT: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics. 2007, 23 (6): 739-746. 10.1093/bioinformatics/btl664.PubMedView ArticleGoogle Scholar
- Baasiri RA, Glasser SR, Steffen DL, Wheeler DA: The breast cancer gene database: a collaborative information resource. Oncogene. 1999, 18 (56): 7958-7965. 10.1038/sj.onc.1203335.PubMedView ArticleGoogle Scholar
- Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set?. Bioinformatics. 2005, 21 (2): 171-178. 10.1093/bioinformatics/bth469.PubMedView ArticleGoogle Scholar
- Martin KJ, Patrick DR, Bissell MJ, Fournier MV: Prognostic breast cancer signature identified from 3 D culture model accurately predicts clinical outcome across independent datasets. PLoS One. 2008, 3 (8): e2994-10.1371/journal.pone.0002994.PubMed CentralPubMedView ArticleGoogle Scholar
- Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis in primary solid tumors. Nat Genet. 2003, 33 (1): 49-54. 10.1038/ng1060.PubMedView ArticleGoogle Scholar
- Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HF, et al: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 2001, 61 (20): 7388-7393.PubMedGoogle Scholar
- Yu K, Ganesan K, Tan LK, Laban M, Wu J, Zhao XD, Li H, Leung CH, Zhu Y, Wei CL, et al: A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers. PLoS Genet. 2008, 4 (7): e1000129-10.1371/journal.pgen.1000129.PubMed CentralPubMedView ArticleGoogle Scholar
- Sotiriou C, Pusztai L: Gene-expression signatures in breast cancer. N Engl J Med. 2009, 360 (8): 790-800. 10.1056/NEJMra0801289.PubMedView ArticleGoogle Scholar
- Haque T, Faury D, Albrecht S, Lopez-Aguilar E, Hauser P, Garami M, Hanzely Z, Bognar L, Del Maestro RF, Atkinson J, et al: Gene expression profiling from formalin-fixed paraffin-embedded tumors of pediatric glioblastoma. Clin Cancer Res. 2007, 13 (21): 6284-6292. 10.1158/1078-0432.CCR-07-0525.PubMedView ArticleGoogle Scholar
- Hoshida Y, Villanueva A, Kobayashi M, Peix J, Chiang DY, Camargo A, Gupta S, Moore J, Wrobel MJ, Lerner J, et al: Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med. 2008, 359 (19): 1995-2004. 10.1056/NEJMoa0804525.PubMed CentralPubMedView ArticleGoogle Scholar
- Lewis F, Maughan NJ, Smith V, Hillan K, Quirke P: Unlocking the archive-- gene expression in paraffin-embedded tissue. J Pathol. 2001, 195 (1): 66-71. 10.1002/1096-9896(200109)195:1<66::AID-PATH921>3.0.CO;2-F.PubMedView ArticleGoogle Scholar
- Cronin M, Pho M, Dutta D, Stephans JC, Shak S, Kiefer MC, Esteban JM, Baker JB: Measurement of gene expression in archival paraffin-embedded tissues: development and performance of a 92-gene reverse transcriptase-polymerase chain reaction assay. Am J Pathol. 2004, 164 (1): 35-42.PubMed CentralPubMedView ArticleGoogle Scholar
- Goldsworthy SM, Stockton PS, Trempus CS, Foley JF, Maronpot RR: Effects of fixation on RNA extraction and amplification from laser capture microdissected tissue. Mol Carcinog. 1999, 25 (2): 86-91. 10.1002/(SICI)1098-2744(199906)25:2<86::AID-MC2>3.0.CO;2-4.PubMedView ArticleGoogle Scholar
- Abrahamsen HN, Steiniche T, Nexo E, Hamilton-Dutoit SJ, Sorensen BS: Towards quantitative mRNA analysis in paraffin-embedded tissues using real-time reverse transcriptase-polymerase chain reaction: a methodological study on lymph nodes from melanoma patients. J Mol Diagn. 2003, 5 (1): 34-41.PubMed CentralPubMedView ArticleGoogle Scholar
- Daigo Y, Chin SF, Gorringe KL, Bobrow LG, Ponder BA, Pharoah PD, Caldas C: Degenerate oligonucleotide primed-polymerase chain reaction-based array comparative genomic hybridization for extensive amplicon profiling of breast cancers: a new approach for the molecular analysis of paraffin-embedded cancer tissue. Am J Pathol. 2001, 158 (5): 1623-1631.PubMed CentralPubMedView ArticleGoogle Scholar
- Fedorowicz G, Guerrero S, Wu TD, Modrusan Z: Microarray analysis of RNA extracted from formalin-fixed, paraffin-embedded and matched fresh-frozen ovarian adenocarcinomas. BMC Med Genomics. 2009, 2: 23-10.1186/1755-8794-2-23.PubMed CentralPubMedView ArticleGoogle Scholar
- Haller AC, Kanakapalli D, Walter R, Alhasan S, Eliason JF, Everson RB: Transcriptional profiling of degraded RNA in cryopreserved and fixed tissue samples obtained at autopsy. BMC Clin Pathol. 2006, 6: 9-10.1186/1472-6890-6-9.PubMed CentralPubMedView ArticleGoogle Scholar
- Frank M, Doring C, Metzler D, Eckerle S, Hansmann ML: Global gene expression profiling of formalin-fixed paraffin-embedded tumor samples: a comparison to snap-frozen material using oligonucleotide microarrays. Virchows Arch. 2007, 450 (6): 699-711. 10.1007/s00428-007-0412-9.PubMedView ArticleGoogle Scholar
- Karsten SL, Van Deerlin VM, Sabatti C, Gill LH, Geschwind DH: An evaluation of tyramide signal amplification and archived fixed and frozen tissue in microarray gene expression analysis. Nucleic Acids Res. 2002, 30 (2): E4-10.1093/nar/30.2.e4.PubMed CentralPubMedView ArticleGoogle Scholar
- Gupta RA, Brockman JA, Sarraf P, Willson TM, DuBois RN: Target genes of peroxisome proliferator-activated receptor gamma in colorectal cancer cells. J Biol Chem. 2001, 276 (32): 29681-29687. 10.1074/jbc.M103779200.PubMedView ArticleGoogle Scholar
- Ikezoe T, Miller CW, Kawano S, Heaney A, Williamson EA, Hisatake J, Green E, Hofmann W, Taguchi H, Koeffler HP: Mutational analysis of the peroxisome proliferator-activated receptor gamma gene in human malignancies. Cancer Res. 2001, 61 (13): 5307-5310.PubMedGoogle Scholar
- Jiang Y, Zou L, Zhang C, He S, Cheng C, Xu J, Lu W, Zhang Y, Zhang H, Wang D, et al: PPARgamma and Wnt/beta-Catenin pathway in human breast cancer: expression pattern, molecular interaction and clinical/prognostic correlations. J Cancer Res Clin Oncol. 2009, 135 (11): 1551-1559. 10.1007/s00432-009-0602-8.PubMedView ArticleGoogle Scholar
- Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet. 2007, 39 (7): 870-874. 10.1038/ng2075.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.