The development of a clinically validated test that could determine the risk of recurrence or death from stage II/III colon cancer and the likelihood of benefit from standard chemotherapy regimens is highly desirable but complex. The process begins with biomarker discovery and ends with a clinical validation study with prospectively defined endpoints. Since the method by which an mRNA species is measured will have a profound effect on the success of such a validation study, it is important to characterize and maintain the assay's performance, particularly its reproducibility and quantitative precision. Consequently, we have adopted RT-PCR, the most robust gene expression method available for gene discovery studies. Although nearly 150 genes were found to be significantly related to RFI in this study, some markers will prove to be false positives and true markers will vary in how robustly they correlate with outcome. It is therefore important to evaluate the candidate genes identified here by conducting further independent studies to identify truly useful disease biomarkers. Only after consistent association with clinical outcomes in multiple independent studies should genes be considered for inclusion in an assay used to make clinical decisions. Employing a single technology consistently throughout biomarker discovery and into clinical testing has the advantage of reducing the time required to fully validate and commercialize a multi-gene clinical decision-making tool.
RT-PCR is often carried out using oligo-dT priming to generically reverse transcribe mRNA from the polyA tail. However, this technique is unsuccessful with degraded RNA, such as that extracted from FPE tissue. We have previously shown that RT-PCR using gene-specific priming can be successfully applied to FPE tissue as old as 30 years . As described here, we have now increased the scale of screening using this technique, to 761 genes. We further demonstrate that gene specific priming for 761 assays can be successfully combined into a single RT reaction that results in precise, sensitive and reproducible quantitative PCR for biomarker discovery.
DNA microarrays are a popular technology for biomarker discovery because one can quickly examine the expression of hundreds or thousands of genes. RT-PCR has often been subsequently used to verify the results of microarray data, since it offers much higher sensitivity, specifiCity, reproducibility and a greater quantitative dynamic range. The present results demonstrate that RT-PCR can also be applied to highly parallel gene expression analysis if robotic processes and assay miniaturization are used. We were able to extract and quantify RNA and generate expression data for 761 unique assays for more than 300 patients in less than 5 weeks.
While screening these patient specimens, it was important to monitor potential sources of variability such as primer and probe stability and gene specific primer pool stability. We used an FPE colon RNA pool as a reference sample and generated a baseline CTvalue for each of the 761 assays. This FPE colon RNA pool reference sample was then included during reverse transcription with every patient sample batch and used to monitor process stability throughout the study. Over the 5 week period, variability within the reference sample remained low, indicating that all patient samples were being analyzed with a stable assay process. Analysis of one of the reference genes, RPLPO, which was assayed on both plates for every patient sample, also highlighted the internal consistency of the process throughout the study.
The robustness of this technology is evidenced by results from hierarchical clustering of all 761 genes which identified known pathways and gene group clusters that one would expect to be co-expressed. One of the largest was a "stromal response" gene group containing genes that are associated with wound healing and are thought to be representative of fibroblast activation, or the 'stromal response' within tumor stroma. Stromal response is becoming increasingly recognized as a marker of invasion and poor clinical outcome in several different classes of solid tumors [16–20]. A number of genes within this group encode proteins that compose or regulate extracellular matrix including BGN, SPARC, CTGF, THBS, VIM, and COL1A1. Several focal adhesion and actin-binding protein genes grouped together to form another distinct cluster, including CALD1, TAGLN, TLN, MYH11, MYLK and CNN. Because MYH11, MYLK and CNN are genes specific to muscle, they could represent a myofibroblast (MF) signature. The myofibroblast cell type has been implicated as a driver of tumor progression . Another cluster might be called an "epithelial/secreted" gene group; within this group are genes known to be markers of epithelial cells or code products secreted by them, such as SERPINB5 (maspin), KRP19, KLK10 and LAMB2. A cluster containing GBP1, GBP2, G1P2, IFIT, CD8A, CD8A, HLA-DRPB1 and CXCR4 represents an immune/interferon-inducible group. Another group contains genes that represent acute response to stimuli, or "early response" genes such as EGR1, EGR3, RhoB, FOS and NR4A. An inflammatory response gene group was also identified, containing genes such as ICAM1, IL1B, IL-8, IL6, OSM and S100A8. A small group of intestinal specific genes such as MUC2, MUC5B, pS2 (TFF1) and TFF3 are also highly correlated in expression. Lastly, it was gratifying to see that the expression of CDH1 and CAPN1 were correlated with one another (Pearson's correlation = 0.48) since CAPN1 cleaves CDH1 . Given the biological connection between these genes, one may have expected this correlation to be higher – possibly indicating specific post-translational control mechanisms may have a role in defining steady State protein levels.
The aim of this study was to identify gene biomarkers that predict recurrence-free interval in patients with Stage II and Stage III colon cancer. Approximately 19% of the 761 genes showed a significant (p < 0.05) association with RFI by univariate Cox proportional hazards regression analysis. It is highly unlikely that any one gene will be able to predict clinical outcome or response to therapy to the extent that it will be useful to oncologists. A successful diagnostic tool is much more likely to consist of a panel of genes and an algorithm weighting and combining each gene contribution into one value that defines the unique risk of recurrence and potential for therapeutic response in each patient. This concept is supported by the observation that several different biological pathways were shown to be associated with RFI in this study.