One hundred and eight colorectal patient biopsies comprising 24 adjacent normal mucosa samples, 49 MSS adenomas (16 tubular, 12 tubulovillous and 21 villous), 30 MSS carcinomas (one stage I, 16 stage II and 13 stage III) and five MSI carcinomas (four stage II and one stage III) were examined (see Additional file 4 for a summary of the histopathological characteristics). All carcinomas were classified according to the WHO/UICC-TNM staging system. Immediately after surgery or polypectomy, the biopsies were embedded in Tissue-Tek O.C.T. Compound (Sakura Finetek), snap-frozen in liquid nitrogen and stored at -80°C. All patients gave informed written consent, and the study was approved by the Central Denmark Region Committee on Biomedical Research Ethics according to the Helsinki Declaration.
For recurrence free survival analysis, IHC was performed on a human colorectal TMA containing 268 biopsies from stage II adenocarcinomas. The fraction of patients with recurrence (distant metastasis excluding carcinosis) was 42 (16%). The median duration of follow-up for the non-recurrence group was 1709 days (range 1099-1825 days), and for the recurrence group 770 days (range 95-1681). Before progression free survival analysis, 20 biopsies were excluded because the core was missing or lacked tumor material, leaving 248 samples for the analysis. A TMA containing 51 stage II adenocarcinomas and 50 normal mucosa samples was used to assess the expression of TCF12 and OSBPL1A.
RNA preparation and Human Exon 1.0 ST Array labeling
A Hematoxylin and Eosin stained cryostat section from all samples was used to evaluate tissue composition and when necessary macroscopic trimming was used to enrich the fraction of tumor cells to ensure a minimum of 60% neoplastic cells (median 85% (60%-90%)). Total RNA was isolated from serial cryo-sections using the RNeasy Mini elute kit (Qiagen). Quality of the RNA was evaluated on the 2100 Bioanalyzer (Agilent), the median RNA Integrity Number (RIN) was 9.1 (6-10), and the median 28S/18S ribosomal peak ratio was 1.8 (1.1-3.1).
One hundred ng of total RNA was labeled according to the GeneChip Whole Transcript (WT) Sense Target Labeling Assay (Affymetrix, Inc., Santa Clara, CA) and hybridized to Human Exon 1.0 ST Arrays (Affymetrix, Inc., Santa Clara, CA) overnight. Scanning was performed in an Affymetrix GCS 3000 7G scanner.
Exon array data analysis
Quantile normalization of exon array data and all further analysis was performed in the GeneSpring GX10 software (Agilent) using ExonRMA16 with core transcripts (17881 transcripts). For stabilization of variance, 16 was added to expression values before log2 transformation, which resulted in a minimum value of four in the log2 transformed dataset. Differential gene expression analysis was performed on transcript values based on core probe sets using Benjamini-Hochberg corrected unpaired t-tests. Alternative TSS analysis was limited to genes in the hg18 Refseq database containing two or more transcription start sites and at least one protein-coding isoform (2176 genes). To have probe sets in all exons of these genes, both core and extended probe sets of the exon array  were included in analysis of alternative TSS. Only genes for which more than 50% of the probe sets were expressed above background in at least half of both the normal and the tumor samples were included in the TSS analysis. Genes with potential alternative TSS or alternative splicing were identified using a multivariate ANOVA analysis (splicing ANOVA) and 663 genes had a splicing ANOVA p-value < 0.05 (Benjamini-Hochberg corrected). More stringent filtration on the p-value (<10-6, Benjamini-Hochberg corrected) and the log2 splicing index (SI = probe set intensity/transcript expression value) SI > 0.5 or SI < -0.5 resulted in 156 candidate genes. To identify genes expressing isoforms with alternative TSSs, the expression data of the candidate genes were visualized in a genomic context using the UCSC genome browser . This manual curation identified nine candidate genes with apparent alternative TSS usage, which were further analyzed in two independent validation sample sets, and it was required that candidate genes were found to be significant in at least one of the validation sets (splicing ANOVA p-value (< 0.05) and SI (> 0.5 or SI < -0.5)). The student's t-test was used when comparing two sample groups, except for the analysis of paired samples were a paired t-test was used.
Quantitative real-time reverse-transcription PCR
One μg of total RNA was converted to cDNA using a mixture of oligo(dT) and random nonamer primers and Superscript II Reverse Transcriptase (Invitrogen). Quantitative real-time reverse-transcription PCR (qRT-PCR) was performed in triplicates on a 7500 Fast or 7900HT Real-Time PCR System (Applied Biosystems). See Additional file 5 for the sequences of the primers used. Normalization was performed with UBC as previously described .
Laser capture microdissection
Laser capture microdissection was performed on cryosections from paired cancer and adjacent normal colon mucosa biopsies from six patients (two stage II and four stage III). Briefly, the sections were fixed in 95% EtOH for 120 sec, followed by 15 sec of staining in Arcturus Histogene Staining Solution (DFA Instruments), dehydrated in 95% EtOH (30 sec) and 100% EtOH (120 sec) before a final treatment in xylene for 120 sec. After drying of slides, epithelial and stromal cells were captured on individual caps using the Veritas 704 apparatus (Arcturus). Captured material was incubated with RLT buffer (Qiagen) for 20 minutes at room temperature in the presence of 30 mM β-mercaptoethanol, and, subsequently, RNA was extracted using RNeasy MinElute spin columns (Qiagen). Five ng RNA was used for cDNA synthesis using the Ovation PicoSL WTA System (NuGEN) followed by qRT-PCR as described above.
TMA staining and scoring
Staining of the TMAs was performed with the following antibodies; Anti-TCF12 (Catalog Number: 14419-1-AP, ProteinTech) in a 1:250 dilution and anti-OSBPL1A (Catalog Number: 18-202-335518, Genway Biotech) in a 1:400 dilution and indirect staining was used as previously described .
The TMAs were scored by two independent investigators using the VIS software (Visiopharm). The intensity of the staining was scored within the following categories (negative, 0; weak, 1; moderate, 2 and strong, 3) and the fraction of positive cancer cells (negative, 0; less than half, 1; 51-80%, 2 and >80%, 3) was evaluated independently by two investigators. The agreement between the investigators was evaluated by Kappa statistics. STATA 9.2 software (StataCorp) was used to perform univariate analysis using the log rank test or the Cox proportional hazards model, for the Fishers exact test and for generating Kaplan-Meier plots.
Wnt-pathway model system
The colon cancer cell lines (Ls174T and DLD1) stably expressing inducible dominant-negative (dn)TCF1 or dnTCF4 were a kind gift from Dr. Hans Clevers (The Hubrecht Laboratory, The Netherlands), and have previously been described . β-catenin knockdown was performed in DLD1 cells by transfecting with 20 nM siRNA targeting β-catenin (Dharmacon) or 20 nM scrambled nontargeting siRNA. Transfections were carried out using Lipofectamine 2000 (Invitrogen). RNeasy (Qiagen) was used for RNA extraction and random nonamer primers were used for cDNA synthesis. qRT-PCR was performed as described above.
Transcription factor binding analysis
For each of the selected genes, 300 bp upstream and 100 bp downstream of the transcription start site were analyzed for transcription factor binding sites. Using the TOUCAN2 program , the regions were scanned for known binding domains from the Transfac Public v7 vertebrate database. Domains overrepresented relative to the Eukaryotic Promoter Database  were selected, and tumor and normal samples were compared.
Additional Human Exon 1.0 ST datasets
The external colon cancer validation dataset was downloaded from http://www.affymetrix.com and consisted of 18 paired samples of adenocarcinoma and adjacent normal tissue . The internal independent colon cancer validation set consisted of nine adjacent normal biopsies, six tubular adenomas, 13 MSS cancer and 6 MSI cancer samples and has previously been described . The bladder cancer dataset consisted of 11 samples of normal epithelium, 12 T1tumors and 12 T2-4 tumors . Five additional datasets covering brain gliomas (GSE9385), gastric (GSE13195), lung (GSE12236), liver (GSE12941) and prostate cancer (GSE21034) were downloaded from the Gene Expression Omnibus. The brain glioma dataset contained 26 glioblastomas, 22 oligodendrogliomas and 6 control brain samples . The gastric cancer dataset consisted of 44 paired samples of adenocarcinoma and adjacent normal tissue. The lung cancer dataset consisted of 40 paired normal and lung adenocarcinomas . The liver cancer dataset contained 20 paired samples of hepatocellular carcinoma and adjacent nontumorous liver , and the prostate cancer dataset consisted of 29 normal adjacent benign samples, 131 primary tumors and 19 metastases .
Bioinformatics analysis of differentially expressed protein features and mRNA regulatory elements
First, the peptides corresponding to the unique sequences of the different isoforms were aligned against the Protein Data Bank database by PSIBLAST . Then FeatureMap3D , PyMOL (Molecular Graphics System, Delano, Scientific system, LLC, Palo Alto, CA), and scripts created for the analysis were used to analyze and visualize the structures and/or conserved domains. Secondary structures were predicted by PSIpred . Hydrophobicity plots were based on Kyte-Doolittle and Hopp-Woods scales to predict potential hydrophilic regions most likely exposed on the protein surface. NetPhos  was used to predict phosphorylation sites. Furthermore, signal peptide cleavage site prediction was performed  along with prediction of propeptide cleavage sites, Yin-Yang sites and N-glycosylation and O-glycosylation sites . Protein functions that potentially change between alternative isoforms were compared with ProtFun . Regulatory RNA elements in UTRs related to transcriptional and translational regulation were predicted by RegRNA [22, 50, 51].