Skip to main content

Associating transcriptional modules with colon cancer survival through weighted gene co-expression network analysis



Colon cancer (CC) is a heterogeneous disease influenced by complex gene networks. As such, the relationship between networks and CC should be elucidated to obtain further insights into tumour biology.


Weighted gene co-expression network analysis, a powerful technique used to extract co-expressed gene networks from mRNA expressions, was conducted to identify 11 co-regulated modules in a discovery dataset with 461 patients.

A transcriptional module enriched in cell cycle processes was correlated with the recurrence-free survival of the CC patients in the discovery (HR = 0.59; 95% CI = 0.42–0.81) and validation (HR = 0.51; 95% CI = 0.25–1.05) datasets. The prognostic potential of the hub gene Centromere Protein-A (CENPA) was also identified and the upregulation of this gene was associated with good survival. Another cell cycle phase-related gene module was correlated with the survival of the patients with a KRAS mutation CC subtype. The downregulation of several genes, including those found in this co-expression module, such as cyclin-dependent kinase 1 (CDK1), was associated with poor survival.


Network-based approaches may facilitate the discovery of biomarkers for the prognosis of a subset of patients with stage II or III CC, these approaches may also help direct personalised therapies.


Colon cancer (CC), the third most common cancer worldwide, is one of the major causes of tumour-related death in the United States [1]. CC is a biologically heterogeneous disease characterised by neoplasms defined by discrete pathological properties and consequent clinical results. Tumour, node and metastasis (TNM) staging is a prognostic classification traditionally used in clinical practice to select patients with CC for adjuvant chemotherapy. However, TNM staging fails to accurately predict recurrence in many CC patients who undergo surgical therapy. For instance, approximately 10 to 20% of patients with stage II colorectal cancer and 30 to 40% of patients with stage III colorectal cancer develop recurrence. Thus, prognostic and predictive factors should be developed to provide reliable information as a basis of medical treatment decisions in routine clinical practices.

Our understanding of the extent of CC complexity has been greatly enhanced through comprehensive studies on molecular biomarkers. The molecular classification of CC is based on a few commonly used biomarkers, such as microsatellite instability (MSI), CpG island methylator phenotype (CIMP), chromosomal instability and BRAF and KRAS mutations [2, 3]. In a large population-based study, five CC subtypes are defined on the basis of the combinations of MSI, CIMP, and BRAF and KRAS mutations, and these subtypes are associated with marked differences in survival [4].

Microarray technology has been applied to investigate gene expression profiles (GEPs) in CC; the extraction of transcriptomics-based prognostic signatures has also been extensively studied [5,6,7,8]. To the best of our knowledge, five GEP-related tests, namely, Oncotype DX® Colon Cancer (Genomic Health, Inc.), ColoPrint® (Agendia NV), ColonPRS® (Signal Genetics, LLC), OncoDefender-CRC™ (Everist Genomics, Inc.) and GeneFx Colon (Precision Therapeutics, Inc.), have been developed to examine CC. However, multigene assays for clinical practices, such as risk assessment and adjuvant treatment determination, have yet to be designed [5, 9]. Furthermore, the repeatability of CC-related GEP studies is poor possibly because CC is composed of distinct molecular entities that may be developed through numerous functional biological pathways. As a result, several prognostic signatures may correspond to different entities of CC. Therefore, an early unsupervised consensus hierarchical clustering of genome-wide mRNA levels has prompted researchers to classify CC into six distinct molecular subtypes [10].

The accumulation of large numbers of CC mRNA datasets in several databases, such as GEO, provides an opportunity to reanalyse the gene mRNA expression data derived from different platforms and institutes (termed as meta-analysis) and to define the objective classifications of sample subtypes [11]. Integrated information from multiple studies can highly tolerate the heterogeneity associated with CC and the variability caused by microarray techniques; such information also helps increase statistical power as the number of samples is increased. Meta-analyses have been applied to evaluate single genes in studies on CC; for instance, a meta-analysis has been conducted to identify the correlation of the overexpression of the metastasis associated in colon cancer-1 (MACC1) gene with poor disease-free survival [12]. Agostini et al. conducted an integrative systematic approach for the identification of prognostic biomarkers in rectal cancer [13]. Large CC databases, such as Georgetown Database of Cancer, have been developed to evaluate the association of prognostic biomarkers with recurrence and to identify the subgroup of patients who may benefit from adjuvant chemotherapy [14].

Weighted gene co-expression network analysis (WGCNA) has emerged as an effective method of multigene analysis to discover the relationship between networks/genes and phenotypes. In WGCNA, gene modules are established from mRNA expression data by using unsupervised hierarchical clustering; thus, this technique does not depend on a priori defined gene sets or pathways. The basic concept of WGCNA involves a gene co-expression module, which is a cluster of genes that maintains a consistent expression pattern and possibly shares a common biological regulatory role [15]. WGCNA has been successfully applied to identify networks and biomarkers that can be used to screen, diagnose and treat cancer. This technique has also been used to reveal the microRNA and mRNA expression network implicated in prostate cancer [16] and to identify the co-expression networks related to proastrocytic differentiation in glioma [17]. Wirapati et al. [18] and Clarke et al. [19] conducted WGCNA to identify co-expressed gene modules among breast cancer patients on the basis of multiple microarray-based gene expression datasets; Wirapati et al. [18] and Clarke et al. [19] also explored the relationship of these transcriptional modules with clinical variables, such as tumour size and grade, survival outcomes related to breast cancer as a whole and the corresponding molecular subtypes.

In this study, WGCNA was applied to analyse a dataset obtained from a transcriptome comprising 461 patients with stage II or III CC to identify gene modules and biomarkers (hub genes) for the prognosis of CC patients. Furthermore, our findings were confirmed via a validation dataset with 111 CC patients.


Microarray-based mRNA expression datasets

We analysed two microarray datasets in CC. The raw gene expression data were retrieved from the GEO data repository ( with the accession numbers GSE39582 [10] and GSE17536 [20]. In addition, we have labelled these datasets on the basis of their GEO accession numbers. The discovery dataset GSE39582 was used to generate network; by contrast, GSE17536 was used as an independent validation dataset. Our survival analysis was restricted to the subgroup of the patients with stage II or III CC because the discovery of reliable prognostic biomarkers is used for this subgroup of patients. A large proportion of patients with stage I CC unlikely provide benefits from adjuvant chemotherapy because of their excellent prognosis after surgical treatment is completed. Most of the stage IV patients who are metastatic may die from the disease; therefore, these patients should be analysed independently for progression-free survival. Moreover, patients without the recorded survival time were excluded from our analysis. After filtering was performed, the discovery and validation datasets respectively containing 461 and 111 samples were originally generated using Affymetrix U133 Plus 2.0 chips. Clinicopathological variables (e.g. age, tumour grade), biomarker gene mutation (BRAF and KRAS) and recurrence-free survival (RFS, defined as the time from surgery to the first recurrence and was cut off in 5 years) were gathered for each dataset (Additional file 1: Table S1). The microarray datasets were processed with Robust Multiarray Average algorithms by utilising the ‘affy’ Bioconductor package. We used the ComBat algorithm to adjust the expression data for potential batch effects [21]. Before conducting WGCNA, we filtered the probes without known gene symbols, and the probe-level expression profiles for the datasets were converted to gene-level expressions by using the collapseRows function to merge probes [22].

Colon cancer molecular subtypes

The CC patients in our study were divided into five molecular subtypes in accordance with the report of Phipps et al. [4] In brief, patients are classified on the basis of the following combinations of tumour biomarkers: type 1 (MSI-high, CIMP-positive, BRAF mutation-positive, KRAS mutation-negative); type 2 (MSI-low, CIMP-positive, BRAF mutation-positive, KRAS mutation-negative); type 3 (MSI-low, CIMP-negative, BRAF mutation-negative, KRAS mutation-positive); type 4 (MSI-low, CIMP-negative, BRAF and KRAS mutation-negative); and type 5 (MSI-high, CIMP-negative, negative for BRAF and KRAS mutations). The information related to the MSI and CIMP status was obtained as described in the original publication of the discovery dataset [10]. In this dataset, 26 samples were classified as type 1, 6 samples were classified as type 2, 108 samples were classified as type 3, 151 samples were classified as type 4 and 8 samples were classified as type 5. Subsequent subtype survival analyses were not performed on subtypes 1, 2 and 5 because of their small sample sizes. The rest of the 162 samples in the discovery dataset and all of the samples in the validation dataset were not included in any subtype because of the lack of information.

Co-expression module detection

We selected the top 5000 varying genes from the 461 patients in the discovery dataset after their standard deviations were sorted in an ascending order. The WGCNA in this study was restricted to the 3600 most co-expressed genes from these 5000 genes in the dataset (based on, as described below) by using the R ‘wgcna’ package [23].

The co-expression networks of the selected genes were generated using the following steps. An unsupervised co-expression relationship was initially built on the basis of the adjacency matrix of connection strengths by using Pearson’s correlation coefficients for gene pairs. This matrix was increased to β = 4 based on the scale-free topology criterion (Additional file 1: Figure S1). Based on the scale-free topology criterion, the power β was selected to amplify the strong connections between genes and penalise the weaker connections. The Network connectivity ( of the ith gene was defined as the sum of its adjacency with all of the other genes to generate networks. The intramodular connectivity ( was calculated as the summation of adjacency performed over all of the genes in a particular network; hub genes were those with a high network connectivity in a particular group. Modules were identified as gene sets with a high topologic overlap [24]. Average linkage hierarchical clustering was conducted on the basis of a topological overlap matrix (TOM)-based dissimilarity measure; in this technique, the hybrid dynamic tree­cutting method was used to cut branches by using a minimum gene module size of 30 and a cut height of 0.95.

The module eigengenes (MEs) were generated as the first principal component after principal component analysis was performed with the expression data for each co-expressed modules in the 461 samples. Module membership assignment (kME) was determined as Pearson’s correlation coefficient between gene expression values and MEs.

The WGCNA algorithm was described in detail by Zhang Bin et al. [24].

Survival analysis

The ‘survival’ R package ( was subjected to survival analysis. The hazard ratio (HR) and the corresponding 95% confidence interval (CI) were calculated using a Cox regression model. Kaplan-Meier survival curves were plotted. RFS was used for the survival endpoints. For module associations, each ME was robustly scaled to obtain −1, 0 and +1 for 2.5, 50 and 97.5% quintiles, respectively; using these values, we can compare different modules. The scaled MEs were then treated as continuous variables. For gene associations, each gene expression was treated as a continuous variable. False discovery rate (FDR) method was used to perform multiple testing corrections. The survival-based gene significance (GS) was defined as minus log 10 of Cox regression p-values. Furthermore, hub genes were defined as those with a high network connectivity (, which corresponds to the connect strength (co-expressed) of a specific gene with all the other members in a module. The hub genes highly associated with clinical traits and highly connected to the modules were identified through GS and In particular, hub genes were obtained on the basis of the following criteria: (i) the value of the is in the top 10 of all of the genes in the module and (ii) GS is greater than 2.

Functional annotation modules

The overrepresentation in gene ontology (GO) categories was searched to extract further biological insights into the genes belonging to the modules associated with the survival of CC patients. DAVID ( [25] was employed to evaluate the modules for the enrichment of the genes with particular GO biological processes compared with the background list of the human genes and to calculate the enrichment scores of the GO biological process terms.


Detection of gene co-expression modules

WGCNA was performed to analyse 3600 gene expression profiles derived from 461 CC tumour tissues from the discovery dataset, to investigate the functional organisation of the CC transcriptome and to construct gene co-expression modules. A total of 11 gene modules were identified (Fig. 1a and b) from 35 to 866 genes (Table 1). Correlation or survival analysis was conducted to determine whether these modules are associated with tumour grade or RFS. The MEs, generated through principal component analysis, provide a general measure of the overall expression information in each module. Associations can then be determined on the basis of MEs. The module membership between each of the 3600 genes and the modules where these genes belong to (kMEs) was also calculated. The complete list of the network metrics (MEs and kMEs) and the module membership of each gene is shown in Additional file 2: Dataset 1.

Fig. 1
figure 1

Identification of colon cancer specific modules using WGCNA, and GO enrichment analysis for these modules. a Clustering dendrogram of gene profilers from the testing dataset with 461 stage II or III colon cancer patients. Hierarchical cluster analysis dendrogram used to detect co-expression clusters. Each short vertical line corresponds to a gene, and the branches are expression modules of highly interconnected groups of genes with a colour to indicate its module assignment. A total of 11 modules ranging from 35 to 866 genes in size were identified. b Topological overlap matrix plot. Genes in the rows and columns are sorted by the clustering tree in (a). Clusters correspond to squares along the diagonal. c Bar plot presenting the top biological process enriched in each of the 11 modules. The original significance output from DAVID was transformed into “–log (p-value)” for plotting

Table 1 Association of expression modules with tumour stage and RFS in the discovery and validation datasets

To determine the relationship among the 11 gene modules, we clustered their module eigengenes and associated these eigengenes with the GO annotation. Interestingly, the green and yellow modules contained the genes involved in cell cycle biological processes; among the modules, these two modules exhibited the closest connection in the cluster tree (Additional file 1: Figure S2).

Gene modules are significantly correlated with RFS

The HRs and p-values of the dichotomised MEs were calculated through Cox regression to evaluate the relationship between RFS and co-expression modules (Table 1). The yellow, pink and green modules were significantly associated with RFS as a whole in the discovery dataset. However, only the association between the green module and RFS was confirmed in the validation dataset. The increased expression of genes in the green module indicated good prognosis (HR = 0.59, p = 1.37 × 10−3, FDR = 7.52 × 10−3 in the discovery dataset, HR = 0.51, p = 6.67 × 10−2 in the validation dataset, Fig. 2a, b). These findings suggested that a higher expression of the green module was associated with a low tumour grade (PCC = −0.14). After GO analysis was conducted, cell cycle-related biological processes were the most significantly overrepresented factor in the green co-expression module (Fig. 1c, Additional file 3: Dataset 2).

Fig. 2
figure 2

Associations between recurrence-free survival and the green module. The hub gene CENPA was observed in the testing and validating datasets. Kaplan–Meier survival plots for recurrence-free survival were shown (colon cancer patients were grouped by the tertile of ME). Increased expression (green) of the green module is associated with high recurrence-free survival in the testing (a) and validating (b) datasets. KM survival plot for CENPA suggests that its increased expression (green) indicates good prognosis (colon cancer patients were grouped by the tertile of gene expression level)(c, d)

Hub genes are associated with RFS

A total of 3600 genes were subjected to survival analysis to evaluate the effectiveness of WGCNA in the identification of novel hub genes that can be used as prognosis indicators. The HRs and the corresponding p-values of the genes included in the analysis of RFS are listed in Additional file 2: Dataset 1.

The increased expression of the green co-expression module containing 170 genes indicated an excellent RFS outcome (Table 1). In the single-gene survival analysis against RFS, 110 genes (Additional file 1: Table S2) that were significantly (p < 0.05) associated with good outcomes were identified in the green module. To the best of our knowledge, the relationship of the hub gene CENPA with RFS-related genes in CC has yet to be described. A high expression of the CENPA gene was associated with good RFS (HR = 0.62, p = 4.40 × 10−4 in the discovery dataset; HR = 0.53, p = 1.46 × 10−2 in the validation dataset, Table 2, Fig. c, d). To evaluate the robustness of the hub gene identification, we investigated the RFS-related modules and the corresponding hub genes with different initial gene selections (the most connected 3600 or 1800 genes) and cut height parameters (0.90 or 0.95) when identify gene modules through an average linkage hierarchical clustering method. CENPA could be identified under these conditions (Additional file 1: Table S2).

Table 2 Relationships between hub genes with RFS in the discovery and validation datasets

Gene modules are significantly associated with CC subtype-specific survival

In addition to the survival analysis of CC as a whole, the investigation of the association between gene modules and molecular subtypes was conducted. Survival analysis revealed that the yellow, red and tan modules were associated with type 3 (Table 3). The increased expression of the yellow (HR = 0.38, p = 4.46 × 10−3) and tan (HR = 0.50, p = 3.39 × 10−2) co-expression modules containing 179 and 35 genes were correlated with good RFS outcomes. While the increased expression of the red co-expression module containing 108 genes was correlated with poor RFS outcomes for the type 3 (HR = 1.74, p = 3.02 × 10−2).

Table 3 Relationship between expression modules with RFS within colon cancer molecular subtypes in the discovery dataset

The HRs and the corresponding p-values of the single gene survival for the type 3 and type 4 subtypes (Addtional file 2: Dataset 1) were also calculated. The yellow module genes in the type 3 group were analysed, and the results revealed three hub gene prognosis indicators, including cyclin-dependent kinase 1 (CDK1; HR = 0.52, p = 3.42 × 10−3), kinesin family member 11 (KIF11; HR = 0.57, p = 5.76 × 10−3) and RAD51 associated protein 1 (RAD51AP1; HR = 0.52, p = 1.42 × 10−3). Meanwhile, we analysis the tan module genes in the type 3 group, and found hub gene OTU domain containing 6B (OTUD6B; HR = 0.56, p = 6.24 × 10−3). The increased expression of these genes suggested good prognosis in the type 3 subtype. There was no gene met the hub gene defination criterion in module tan for type 3 subtype.


We applied a systems biology approach, namely, WGCNA, to analyse one mRNA expression dataset comprising 461 CC patients to identify the networks and genes associated with clinical variables and prognosis indicators. We then confirmed our findings by using an independent validation dataset. WGCNA can be applied to determine complex biological mechanisms responsible for the target phenotypes; this method is effective because the algorithm aims to clarify the relationships between genes above noise and maintain consistency among all of the samples. The unsupervised hierarchical clustering method selected by WGCNA avoids potential biases and subjective decisions attributed to the selection of the candidate genes previously reported as associated with CC or to the early distinction of control samples for supervised methods.

In our study, 11 distinct gene modules from 3600 genes that satisfied our pre-filtering standard for the co-expression analysis were identified. The increased expression of the green module containing 170 genes mostly related to the cell cycle was associated with low tumour grade and correlated with positive RFS outcome. The association relationship reached statistical significance in the discovery dataset (p = 1.37 × 10−3, FDR = 7.52 × 10−3), and marginal significance (p = 6.67 × 10−2) in the validation dataset. This marginal significance may account for the small sample size (n = 111) and lack of statistical power. Since we can make conclusions depend on the effect size and its precision rather than just the p-value [26]. The result of HR and 95% CI of module green in the validation dataset suggested that the association relationship between module green and RFS of CC patients was clinically significance. After conducting the single gene survival analysis of each member of the green module, we found that approximately 65% of the genes were significantly related to RFS (p < 0.05), and all of the genes yielded HR < 1. Furthermore, the hub gene CENPA was identified as a potential novel marker. CENPA, a protein-coding gene, is the histone-H3-like variant essential for centromere functioning and structure. This gene is implicated in cell cycle and mitotic pathways; the GO annotations of this gene include protein heterodimerisation activity and chromatin binding. CENPA is also a potential prognostic biomarker of breast cancer, and the increased expression of this gene is associated with the poor survival of breast cancer patients [27, 28]. Among various core markers in neoplasic intratubullar germ cells, such as CD9, CENPA and PODXL, CENPA is overexpressed, and this finding suggests that this gene may be a potential biological marker of human diseases [29]. Tomonaga et al. demonstrated that CENPA is overexpressed at a transcriptional level in all 11 primary human CC tissues [30]. Furthermore, the immunostaining with anti-CENPA antibodies revealed that the CENPA signals in tumour cells increase; therefore, the overexpression of CENPA may be critical in aneuploidy in colorectal cancers [30]. However, the role of CENPA in CC should be further validated.

The yellow module containing 179 genes involved in the cell cycle was correlated with RFS in the type 3 subtypes, including MSI-low, CIMP-negative, negative for BRAF mutation and positive for KRAS mutation. We also identified CDK1 as a marker. The specific activity of CDK1 is a promising biomarker of the metastasis risk in stage II CC [31]. The type 3 subgroup is the only subgroup with KRAS mutation. KRAS, a proto-oncogene, encodes a small 21 kD guanosine triphosphate/guanosine diphosphate binding protein that modulates cellular proliferation and differentiation [32]. Approximately 97% of KRAS mutations are caused by seven different DNA base-pair substitutions in codons 12 and 13 of exon 2; as a result, an amino acid substitution in the protein occurs [33]. Therefore, KRAS may affect cell cycle processes. In a recent study, CDK1 is reported as a synthetic lethality target for KRAS mutation in colon cancer [34]. Our study highlisht RAD51AP1 as a prognostic marker and therapeutic target. It has been reported that Overexpression of RAD51 is a negative prognostic marker for colorectal adenocarcinoma [35]. However, the roles of the hub gene KIF11 in module yellow and OTUD6B in module tan in CC have yet to be determined.

As a retrospective study, the current study is characterised by several limitations. Firstly, relevant information, such as the accurate definition of RFS, the MSI/CIMP status and the molecular subtypes of the validation dataset, was unavailable. As such, the associations between modules and RFS in each subtype could not be validated. Secondly, although the direction of the association between the green module and RFS in the validation dataset was similar to that in the discovery dataset, the p-value was marginally significant. Thus, the significance and robustness of the network and hub genes should be confirmed in prospective clinical trials, ideally with large prospective patient cohorts.


In summary, 11 gene co-expression modules were identified from an mRNA microarray-based study through WGCNA. We associated these gene modules to tumour grade and RFS. We also evaluated the prognostic ability of single genes through Cox-regression analysis. Moreover, a co-expression module indicative of patients’ RFS for a particular molecular marker-based subtype, such as KRAS mutation group, was identified. Indeed, WGCNA is an effective technique that can be applied to investigate the underlying biological mechanisms and identify the genes indicative of patient outcome. The practical utility of this approach is exemplified through the identification of novel prognostic makers, such as CENPA. Our investigation could contribute to personalised therapies. Nevertheless, multicenter randomised controlled clinical trials and in vivo/in vitro experiments should be performed to evaluate the possible applications of molecular signatures to predict survival and to functionally characterise the hub genes for clinical applications.



Colon cancer


Confidence interval


CpG island methylator phenotype


False discovery rate


Gene expression profile


Gene ontology


Hazard ratio


Module eigengene


Microsatellite instability


Recurrence-free survival


Tumour, node and metastasis


Topological overlap matrix


Weighted gene co-expression network analysis


  1. Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA Cancer J Clin. 2013;63(1):11–30.

    Article  PubMed  Google Scholar 

  2. Shen L, Toyota M, Kondo Y, Lin E, Zhang L, Guo Y, Hernandez NS, Chen X, Ahmed S, Konishi K, et al. Integrated genetic and epigenetic analysis identifies three different subclasses of colon cancer. Proc Natl Acad Sci U S A. 2007;104(47):18654–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. Kang GH. Four molecular subtypes of colorectal cancer and their precursor lesions. Arch Pathol Lab Med. 2011;135(6):698–703.

    PubMed  Google Scholar 

  4. Phipps AI, Limburg PJ, Baron JA, Burnett-Hartman AN, Weisenberger DJ, Laird PW, Sinicrope FA, Rosty C, Buchanan DD, Potter JD, et al. Association Between Molecular Subtypes of Colorectal Cancer and Patient Survival. Gastroenterology. 2014;30(14):01192–5.

    Google Scholar 

  5. Eschrich S, Yang I, Bloom G, Kwong KY, Boulware D, Cantor A, Coppola D, Kruhoffer M, Aaltonen L, Orntoft TF, et al. Molecular staging for survival prediction of colorectal cancer patients. J Clin Oncol. 2005;23(15):3526–35.

    CAS  Article  PubMed  Google Scholar 

  6. Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, McLeod HL, Atkins D. Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol. 2004;22(9):1564–71.

    CAS  Article  PubMed  Google Scholar 

  7. Jorissen RN, Gibbs P, Christie M, Prakash S, Lipton L, Desai J, Kerr D, Aaltonen LA, Arango D, Kruhoffer M, et al. Metastasis-Associated Gene Expression Changes Predict Poor Outcomes in Patients with Dukes Stage B and C Colorectal Cancer. Clin Cancer Res. 2009;15(24):7642–51.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Garman KS, Acharya CR, Edelman E, Grade M, Gaedcke J, Sud S, Barry W, Diehl AM, Provenzale D, Ginsburg GS, et al. A genomic approach to colon cancer risk stratification yields biologic insights into therapeutic opportunities. Proc Natl Acad Sci U S A. 2008;105(49):19432–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. Salazar R, Roepman P, Capella G, Moreno V, Simon I, Dreezen C, Lopez-Doriga A, Santos C, Marijnen C, Westerga J, et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J Clin Oncol. 2011;29(1):17–24.

    Article  PubMed  Google Scholar 

  10. Marisa L, de Reynies A, Duval A, Selves J, Gaub MP, Vescovo L, Etienne-Grimaldi MC, Schiappa R, Guenot D, Ayadi M, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10(5):21.

    Article  Google Scholar 

  11. Schmid PR, Palmer NP, Kohane IS, Berger B. Making sense out of massive data by going beyond differential expression. Proc Natl Acad Sci U S A. 2012;109(15):5594–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Wang G, Fu Z, Li D. MACC1 overexpression and survival in solid tumors: a meta-analysis. Tumour Biol. 2014;19:19.

    Google Scholar 

  13. Agostini M, Janssen KP, Kim IJ, D’Angelo E, Pizzini S, Zangrando A, Zanon C, Pastrello C, Maretto I, Digito M, et al. An integrative approach for the identification of prognostic and predictive biomarkers in rectal cancer. Oncotarget. 2015;6(32):32561–74.

    PubMed  PubMed Central  Google Scholar 

  14. G-DOC Plus - an integrative bioinformatics platform for precision medicine.

  15. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302(5643):249–55.

    CAS  Article  PubMed  Google Scholar 

  16. Wang L, Tang H, Thayanithy V, Subramanian S, Oberg AL, Cunningham JM, Cerhan JR, Steer CJ, Thibodeau SN. Gene networks and microRNAs implicated in aggressive prostate cancer. Cancer Res. 2009;69(24):9490–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. Ivliev AE, t Hoen PA, Sergeeva MG. Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res. 2010;70(24):10060–70.

    CAS  Article  PubMed  Google Scholar 

  18. Wirapati P, Sotiriou C, Kunkel S, Farmer P, Pradervand S, Haibe-Kains B, Desmedt C, Ignatiadis M, Sengstag T, Schutz F, et al. Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Res. 2008;10(4):28.

    Article  Google Scholar 

  19. Clarke C, Madden SF, Doolan P, Aherne ST, Joyce H, O’Driscoll L, Gallagher WM, Hennessy BT, Moriarty M, Crown J, et al. Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis. 2013;34(10):2300–8.

    CAS  Article  PubMed  Google Scholar 

  20. Freeman TJ, Smith JJ, Chen X, Washington MK, Roland JT, Means AL, Eschrich SA, Yeatman TJ, Deane NG, Beauchamp RD. Smad4-mediated signaling inhibits intestinal neoplasia by inhibiting expression of beta-catenin. Gastroenterology. 2012;142(3):562–71.

    CAS  Article  PubMed  Google Scholar 

  21. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27.

    Article  PubMed  Google Scholar 

  22. Miller J, Cai C, Langfelder P, Geschwind D, Kurian S, Salomon D, Horvath S. Strategies for aggregating gene expression data: The collapseRows R function. BMC Bioinformatics. 2011;12(1):322.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005;4:Article17.

    PubMed  Google Scholar 

  25. Dennis G, Sherman B, Hosack D, Yang J, Gao W, Lane HC, Lempicki R. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4(5):3.

    Article  Google Scholar 

  26. Halsey LG, Curran-Everett D, Vowler SL, Drummond GB. The fickle P value generates irreproducible results. Nat Methods. 2015;12(3):179–85.

    CAS  Article  PubMed  Google Scholar 

  27. Zhang C, Han Y, Huang H, Min L, Qu L, Shou C. Integrated analysis of expression profiling data identifies three genes in correlation with poor prognosis of triple-negative breast cancer. Int J Oncol. 2014;44(6):2025–33.

    CAS  PubMed  Google Scholar 

  28. Rajput AB, Hu N, Varma S, Chen CH, Ding K, Park PC, Chapman JA, Sengupta SK, Madarnas Y, Elliott BE, et al. Immunohistochemical Assessment of Expression of Centromere Protein-A (CENPA) in Human Invasive Breast Cancer. Cancers. 2011;3(4):4212–27.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. Biermann K, Heukamp LC, Steger K, Zhou H, Franke FE, Guetgemann I, Sonnack V, Brehm R, Berg J, Bastian PJ, et al. Gene expression profiling identifies new biological markers of neoplastic germ cells. Anticancer Res. 2007;27(5a):3091–100.

    CAS  PubMed  Google Scholar 

  30. Tomonaga T, Matsushita K, Yamaguchi S, Oohashi T, Shimada H, Ochiai T, Yoda K, Nomura F. Overexpression and mistargeting of centromere protein-A in human primary colorectal cancer. Cancer Res. 2003;63(13):3511–6.

    CAS  PubMed  Google Scholar 

  31. Zeestraten EC, Maak M, Shibayama M, Schuster T, Nitsche U, Matsushima T, Nakayama S, Gohda K, Friess H, van de Velde CJ, et al. Specific activity of cyclin-dependent kinase I is a new potential predictor of tumour recurrence in stage II colon cancer. Br J Cancer. 2012;106(1):133–40.

    CAS  Article  PubMed  Google Scholar 

  32. Schubbert S, Shannon K, Bollag G. Hyperactive Ras in developmental disorders and cancer. Nat Rev Cancer. 2007;7(4):295–308.

    CAS  Article  PubMed  Google Scholar 

  33. Roth AD, Tejpar S, Delorenzi M, Yan P, Fiocca R, Klingbiel D, Dietrich D, Biesmans B, Bodoky G, Barone C, et al. Prognostic role of KRAS and BRAF in stage II and III resected colon cancer: results of the translational study on the PETACC-3, EORTC 40993, SAKK 60-00 trial. J Clin Oncol. 2010;28(3):466–74.

    CAS  Article  PubMed  Google Scholar 

  34. Costa-Cabral S, Brough R, Konde A, Aarts M, Campbell J, Marinari E, Riffell J, Bardelli A, Torrance C, Lord CJ, et al. CDK1 Is a Synthetic Lethal Target for KRAS Mutant Tumours. PLoS One. 2016;11(2):e0149099.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Tennstedt P, Fresow R, Simon R, Marx A, Terracciano L, Petersen C, Sauter G, Dikomey E, Borgmann K. RAD51 overexpression is a negative prognostic marker for colorectal adenocarcinoma. Int J Cancer. 2013;132(9):2118–26.

    CAS  Article  PubMed  Google Scholar 

Download references




This study was partially supported by the Scientific Foundation of Xiang Ya hospital (2016Q04), the National Scientific Foundation of China (No. 81273595, 81522048, 81573511), the National High Technology Research and Development Program (grant nos. 2012AA02A518), and the National Key Research and Development Program (No. 2016YFC0905000, 2016YFC0905001).

Availability of data and materials

The raw gene expression data were retrieved from the GEO data repository ( with the accession numbers GSE39582 and GSE17536.

Authors’ contributions

RL designed the study, analyzed the data, wrote the paper and final approval of the version to be published. WZ, Z-QL and H-HZ revised the whole manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rong Liu.

Additional files

Additional file 1: Table S1.

Summary of colon cancer microarray datasets used in the study. Table S2. Identified significant module when use different number of genes and cut height parameter. Figure S1. Criteria for choosing the beta parameter. Figure S2. Clustering plot of module eigengenes. (DOC 81 kb)

Additional file 2: Dataset 1.

WGCNA and survival analysis for the 3600 genes contained in the 11 co-expression modules. The kME and with the parent module and the survival calculation for RFS and molecular subtypes (type 3 and type 4) are presented. (XLS 3080 kb)

Additional file 3: Dataset 2.

GO biological process enrichment analysis for the 11 modules. (XLS 406 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, R., Zhang, W., Liu, ZQ. et al. Associating transcriptional modules with colon cancer survival through weighted gene co-expression network analysis. BMC Genomics 18, 361 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Colon cancer
  • Gene expression profiling
  • Systems biology
  • Biomarker