Revisiting Three Mile Island.

Tumorigenesis is associated with changes in gene expression and involves many pathways. Dysregulated genes include "housekeeping" genes that are often used for normalization for quantitative real-time RT-PCR (qPCR), which may lead to unreliable results. This study assessed eight stages of hepatitis C virus (HCV) induced hepatocellular carcinoma (HCC) to search for appropriate genes for normalization. Gene expression profiles using microarrays revealed differential expression of most "housekeeping" genes during the course of HCV-HCC, including glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and beta-actin (ACTB), genes frequently used for normalization. QPCR reactions confirmed the regulation of these genes. Using them for normalization had strong effects on the extent of differential expressed genes, leading to misinterpretation of the results. As shown here in the case of HCV-induced HCC, the most constantly expressed gene is the arginine/serine-rich splicing factor 4 (SFRS4). The utilization of at least two genes for normalization is robust and advantageous, because they can compensate for slight differences of their expression when not co-regulated. The combination of ribosomal protein large 41 (RPL41) and SFRS4 used for normalization led to very similar results as SFRS4 alone and is a very good choice for reference in this disease as shown on four differentially expressed genes.

Many investigations on cancer include multiple comparisons, by analyzing different stages of the disease, such as normal tissue, pre-neoplasm, and consecutive stages of cancer [29][30][31][32]. Such an experimental design makes it cru-cial to find an appropriate gene for normalization. Prerequisites for normalization genes are constant expression throughout all disease stages and no response to treatment. Extensive evidence indicates that all genes can be regulated under some conditions. This study focuses on hepatitis C virus (HCV) induced hepatocellular carcinoma (HCC), comprising eight pathological stages, including pre-neoplastic lesions (cirrhosis and dysplasia) and four consecutive stages of HCC and reveals that many of the 'housekeeping" genes are indeed differentially expressed. In addition, the effects of different reference genes used for normalization on differentially expressed genes are presented and appropriate genes useful for normalization when investigating HCVinduced HCC are introduced.

Typical "housekeeping" genes are deregulated in HCVinduced HCC
Analyzing the expression profile of all stages of HCVinduced HCC, including preneoplastic stages (cirrhosis and dysplasia) and four cancerous stages with microarrays revealed that almost all pathways were affected [4]. In order to find normalization genes for qPCR verification, we looked for genes that showed no differential expression in any of the eight stages analyzed. First, we selected genes that displayed no change to controls in at least one sample of the 72 samples included. This resulted in a list of over 30,000 genes ( Figure 1A). Among these, many genes showed an increased expression in cancerous stages compared to normal liver controls or were not expressed in the liver and tumor tissues (absent call). In addition, some genes were down-regulated in certain stages of the disease. Hence, most of these genes were inappropriate to be used as reference gene for normalization. In further selection steps, we thus excluded genes that were regulated or that were not expressed (absent call) in any of the stages of the disease. This procedure led to a list of 46 genes, including 27 genes coding for ribosomal proteins and five genes coding for splicing factors. Thus, excluding differentially expressed genes led to only few genes that were expressed in all stages and not changed during the course of HCV-induced HCC: The best candidates for normalization were RPL41 and SFRS4. Genes of different pathways were chosen to exclude the possibility of co-regulation. Furthermore, specifically checking housekeeping genes, with functions in sugar-, nucleotide-, lipid-, amino acid-, or energy-metabolism, or ribosomal proteins, basal transcription factors and proteins of the cytoskeleton ( Figure  1B), we found that most of them were either differentially expressed during disease progression or not expressed at all. These results display clearly that housekeeping genes are affected in HCV-induced HCC.

Candidate reference genes from multiple comparison microarray data
In a different approach to identify genes appropriate for normalization from a microarray study comprising multiple comparisons we calculated the standard deviation (SD) of all fold changes for each gene. Genes with a low SD across all fold-changes and similar signal intensities to the genes of interest (or present call) may provide a pool of normalization candidates, for qPCR (see below).
Six genes were chosen as candidate reference genes for the purpose of this study: RPL41 and SFRS4 and the commonly used reference genes GAPDH, ACTB and TBP, as well as another gene coding for a ribosomal protein, RPS20. The SD of their fold changes (microarray data) ranks them as follows: RPL41 (0.09), ACTB (0.23), SFRS4 (0.24), TBP (0.28), GAPDH (0.34), and lastly RPS20 (0.43).
Importantly, GAPDH was significantly up-regulated in advanced stages of HCC, as calculated by the Student's ttest (p = 0.016 control vs. very advanced HCC). Even more obvious was the up-regulation of RPS20 during HCC, which was already significant between control and early HCC (p = 0.003). TBP and ACTB also showed a significant up-regulation between control and very advanced HCC (p = 0.014, p = 0.011, respectively).
We also used the geNorm program [13], to determine the best normalization gene for HCV-induced HCC by stepwise exclusion of the least stable expressed gene. The most stably expressed genes were RPL41 and SFRS4, resulting in M = 0.65, M describing the average expression stability (lowest for the most stably expressed genes). The expression stabilities for TBP (M = 0.74), ACTB (M = 0.78), GAPDH (M = 0.82), and RPS20 (M = 0.88) were worse. Hence, again, RPL41 and SFRS4 ( Figure 2B) were the best candidates for normalization of HCV-induced HCC.

Effects of different genes used for normalization
Normalization is used to adjust for experimental differences. In qPCR normalization corrects for the RNA quantity, the overall transcriptional activity, the cDNA synthesis and the PCR efficiency. Ideally, a reference gene is an internal endogenous control, shows constant expres-sion in the tissue under investigation and does not respond to the experimental treatment.
Four commonly used "housekeeping" genes (GAPDH, ACTB, RPS20, TBP1) and the combined data of RPL41 and SFRS4 (see Figure 2C) were used for normalization to assess the effects their choice for normalization has on the fold changes of differentially expressed genes during the course of HCV-induced HCC. NRG1 was identified by microarray analysis to be decreased in cirrhosis, elevated in dysplasia, and again down-regulated during all four stages of HCC [4]. QPCR Common "housekeeping" genes are deregulated in HCV-induced HCC (multiple comparison microarray data) Figure 1 Common "housekeeping" genes are deregulated in HCV-induced HCC (multiple comparison microarray data). A) Gene expression of over 30,000 genes that showed no change to controls in at least one of 72 samples studied. B) 323 common "housekeeping" genes whose products have functions in sugar-, nucleotide-, lipid-, amino acid-, and energymetabolism, or code for ribosomal proteins, basal transcription factors, and proteins of the cytoskeleton. In A) and B) the columns correspond to the stages of the disease: c = control, ci = cirrhosis, dn = dysplasia, ve = very early HCC, e = early HCC, a = advanced HCC, and aa = very advanced HCC. Genes (in rows) were clustered using the Pearson correlation. Red indicates up-regulation, green down-regulation, and black no change or not expressed.
was performed on NRG1 to corroborate this expression pattern. Figure 3 shows the effects on relative NRG1 expression depending on which gene was used for normalization. All genes used for normalization were roughly able to confirm that pattern. However, the elevation of the resulting fold changes varied greatly. The up-regulation of NRG1 during dysplasia was much smaller, when GAPDH was used for normalization in comparison to the other reference genes. Similarly, the levels of down-regulation of NRG1 during the successive stages of HCC varied greatly dependent on the different reference genes.
HMMR was found via microarray technique to be not differentially expressed during the precancerous stages (cir-rhosis, low-and high-grade dysplasia), followed by a significant increase for all HCC stages. QPCR corroboration, when normalized to RPL41 and SFRS4, GAPDH, ACTB, RPS20 or TBP revealed similar patterns with varying fold-changes ( Figure 3). However, the increase in gene expression between high-grade dysplasia and very early HCC was very subtle when normalized to RPS20.
In the case of PRIM1, the choice of the normalization gene had dramatic effects on the relative gene expression. PRIM1 was found by microarray analysis to be down-regulated during cirrhosis, dysplasia and very early HCC, followed by increasing up-regulation in the successive stages of HCC. The most similar expression pattern resulted QPCR: expression of candidate genes for normalization of HCV-induced HCC Figure 2 QPCR: expression of candidate genes for normalization of HCV-induced HCC. Plotted are median fold-changes (relative quantification with respect to the median Ct of the control samples, corrected for PCR-efficiencies) plus minus SD for each stage of the disease: c = control (n = 10), ci = cirrhosis (n = 10), lg = low-grade dysplasia (n = 10), hg = high-grade dysplasia (n = 7), ve = very early HCC (n = 8), e = early HCC (n = 10), a = advanced HCC (n = 7), and aa = very advanced HCC (n = 10). A) Expression of RPL41, GAPDH, ACTB, SFRS4, RPS20, and TBP. B) Average of the expression of RPL41 and SFRS4.
when the qPCR data were normalized to the combination of RPL41 and SFRS4 (Figure 3). The Student's t-test showed a significant increase between dysplasia and very early HCC (p = 0.011), confirming the significant increase found in the microarray analysis [4]. When ACTB was used for normalization the resulting fold changes were less evident but the tendency was similar. In contrast, normalization of PRIM1 using either GAPDH, RPS20, or TBP1 changed the expression pattern dramatically. For example, instead of being up-regulated, PRIM1 would be classified as down-regulated between high-grade dysplasia and very early HCC (p = 0.05, Figure 3).
A similar, albeit less dramatic effect is seen in the case of IRAK1. IRAK1 was slightly down-regulated during the precancerous stages of HCV-induced HCC, followed by small but significant up-regulation in HCC (Figure 3). Similar expression pattern were found, when the two genes, Figure 3 Effects of reference genes used for normalization: Relative expression of NRG1, HMMR, PRIM1, and IRAK1 for all stages of HCV-induced HCC. QPCR data were normalized to RPL41 and SFRS4 (shown in pink), to GAPDH (yellow), to ACTB (light blue), to RPS20 (green), and to TBP (brown). Microarray data are shown in dark blue. Fold-changes are indicated on the y-axis. Disease stages as in Figure 2. The table shows p-values for the change in gene expression from high-grade dysplasia to very early HCC for NRG1, HMMR, PRIM1, and IRAK1 (rows) when normalized to the genes indicated above (columns). Significant (p ≤ 0.5) up-regulation between these stages is indicated in red, down-regulation in green.

Effects of reference genes used for normalization
RPL41 and SFRS4 were used for normalization. Again, GAPDH, RPS20 and TBP changed even the tendency of the expression of IRAK1 in HCC.
These results clearly demonstrate the effects genes used for normalization have on the fold change of qPCR data and on the general direction (up or down) of differentially expressed genes.

Discussion
The most commonly used reference genes for normalization of qPCR data are GAPDH and ACTB [17][18][19][20][21][22][23][24]. However, these genes can be significantly differentially expressed as shown in our study in HCV-induced HCC. GAPDH was strongly up-regulated in advanced and very advanced stages of HCC, in some samples up to 7-fold. ACTB was up-regulated two-to three-fold in many advanced and very advanced HCC samples. Also, ribosomal proteins should be considered individually, because many of them, e.g. RPS20 were differentially expressed during HCV-induced HCC [4], while RPL41 showed a relative stable expression throughout all stages of the disease.
It was reported that GAPDH and ACTB were also differentially expressed in other cancer types [8,14,33]. In bladder cancer, a study showed that GAPDH, G6PD and HMBS were significantly changed between malignant and nonmalignant tissues [25]. Similarly, in adenocarcinomas of the colon, the expression of RPLP0, RPS14 and GAPDH varied between primary tumors and corresponding resection margins [34]. Furthermore, in prostate cancer, ACTB, RPL13A and HMBS showed significant differences between cancer and noncancerous tissues [6]. Taken together, genes whose products have basic functions in cellular metabolisms are possibly differentially expressed between tumor and non-tumor tissues.
Normalization is used to adjust for experimental differences. This study presents an easy way to find appropriate candidates for normalization utilizing microarray data, also applicable to multiple comparisons. A pool of candidate genes can be found by selecting genes with low SD across all fold-changes and with similar signal intensity to the genes of interest (at least a present call). This identified the same best candidate, RPL41, as the procedure, in which differentially expressed genes were excluded.
We compared the qPCR data of six possible reference genes. The SD of the Ct values indicated that SFRS4 and RPL41 may be the best choice to be used for normalization. This was confirmed on the level of fold-changes, when we compared the CVs. Furthermore, the Student's ttest revealed that GAPDH, RPS20, TBP and ACTB were significantly regulated between certain stages of HCV-induced HCC. Consistent with these data, the geNormprogram also determined that SFRS4 and RPL41 were the most stable expressed genes. Using Normfinder [35], an additional computer program, aimed at identifying normalization genes, TBP was the best choice for normalization. However, we showed that TBP was significantly regulated between control and advanced HCC. In our situation, Normfinder was thus unable to identify the best normalization gene.
The effects of six genes used for normalization were compared on four differentially expressed genes: NRG1, HMMR, PRIM1, and IRAK1. In contrast to NRG1 and HMMR, where the resulting fold changes were over-and underestimated, depending on the gene used for normalization dramatic effects were found for the differentially expression of PRIM1 and IRAK1. Normalization using an inappropriate gene could lead to misinterpretation of the data, as it was shown for GAPDH, RPS20 or TBP in the context of HCV-induced HCC. Robust results were achieved by using two genes, RPL41 and SFRS4 in combination for normalization. Using at least two genes to normalize qPCR data has the advantages that they can compensate for slight differences in their expression. To profit most, these normalization genes should participate in different pathways.
This study, unlike many cancer studies, which compare tumor versus nontumor, comprised eight stages of HCVinduced HCC. Even though we included 72 tissue samples [4], each stage was only represented with seven to ten samples. This small sample size might be a limitation of the study design when performing statistical tests, such as t-tests between the stages. In order, to find the best normalization gene however, all samples were considered independent of their stage group.
Microarray data are known to be highly variable [36][37][38][39][40][41]. Due to its higher dynamic range qPCR, is thought to be more accurate and therefore is often used to corroborate microarray results [42,43]. Mostly, general direction (upand down-regulation) and rank order of the fold-changes are similar, but the levels of the fold changes of microarray experiments differ compared to qPCR data [44-46] and show a marked tendency of being smaller [42,44,46]. This effect is more pronounced as the fold change ratio is very high [42].
This study shows the effects of reference genes used for normalization on qPCR data. The use of inappropriate genes for normalization can lead to an over-or under-estimation of the fold-changes or to misinterpretation of the results. The best results were achieved when the two genes RPL41 and SFRS4 were used for normalization.

Conclusion
Many pathways are affected by cancer, as recently shown for HCC. Therefore, typical housekeeping genes or maintenance genes are likely to be differentially expressed during the course of the disease.
Appropriate genes for normalization should show a constant expression throughout all comparisons, they should be expressed in similar abundance as differentially expressed genes, and should not respond to the experimental treatment. From microarray experiments, genes, which display stable expression across all fold-changes are likely to be good candidates for normalization for qPCR. The utilization of at least two genes for normalization is highly recommended and will lead to the most reliable and accurate results.
In HCV-induced HCC the combination of RPL41 and SFRS4 were best to normalize qPCR data.

Tissue samples and microarray data
Tissue samples of this manuscript were described in [4].
To analyze hepatitis C virus (HCV) induced hepatocellular carcinoma (HCC) 72 tissue samples, including normal liver tissue (n = 10), cirrhotic liver tissue (n = 10), dysplastic nodules [low-(n = 10) and high-grade (n = 7)] and four successive stages of HCC [(from very early HCC to metastatic tumors with gross vascular invasion (n = 35)] were used to generate gene expression profiles by utilizing the human GeneChip whole genome array (U133 Plus 2.0 from Affymetrix). Data were normalized applying the GC Robust Multi-array Average (GC-RMA) algorithm and the baseline was calculated by the geometric mean using the data generated from 10 normal liver tissue samples (up-and down-regulation refers to the comparison with this baseline). Significant analysis of microarray (SAM) data was performed in GeneTraffic (Stratagene, La Jolla, CA). The microarray data are available at GEO (GSE 6764).

RNA extraction
The tissue specimen were ground in liquid nitrogen and homogenized in Trizol (Invitrogen, Carlsbad, CA) using a polytron homogenizer. Total RNA was purified following the RNeasy Mini protocol (Qiagen, Valencia, CA), including a DNaseI digestion, to avoid contamination with genomic DNA. 28S/18S ratios measured with the Bioanalyzer (Agilent Technologies, Palo Alto, CA) had to be higher than 0.8 to be included into the study. Further quality criteria of the samples to be included into the study are described in detail elsewhere [4].

Data analyses
The raw data were analyzed using SDS2.2 (Applied Biosystems, Foster City, CA) by subtraction the background and setting the threshold to obtain the Ct-value. PCR efficiencies (E) were calculated by using dilution series and the formula E = 10 (-1/slope) -1. The efficiencies for the following PCRs were: GAPDH 0.89, ACTB 0.92, TBP 0.89, RPL41 0.78, SFRS4 0.94, RPS20 0.87, PRIM1 0.89, NRG1 0.97, IRAK1 0.96, and HMMR 0.89. All fold-changes were calculated based on these efficiencies. Further analyses were done in Excel: the median Ct was taken from triplicate reactions and compared to the median of all normal tissue samples, results are expressed as fold-changes. The qPCR reactions for RPL41, SFRS4, GAPDH, ACTB, RPS20, and TBP were done twice (in triplicates), independently to reduce the technical variation. The significance of differential expression was calculated by using the t-test in Excel. The geNorm analysis was performed as described in the manual [13]. Normfinder was used as in [48].