ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers
- Jishnu Das†1, 2,
- Kaitlyn M Gayvert†3,
- Florentina Bunea4,
- Marten H Wegkamp4 and
- Haiyuan Yu1, 2Email author
© Das et al.; licensee Biomed Central. 2015
Received: 3 October 2014
Accepted: 13 March 2015
Published: 3 April 2015
With the explosion of genomic data over the last decade, there has been a tremendous amount of effort to understand the molecular basis of cancer using informatics approaches. However, this has proven to be extremely difficult primarily because of the varied etiology and vast genetic heterogeneity of different cancers and even within the same cancer. One particularly challenging problem is to predict prognostic outcome of the disease for different patients.
Here, we present ENCAPP, an elastic-net-based approach that combines the reference human protein interactome network with gene expression data to accurately predict prognosis for different human cancers. Our method identifies functional modules that are differentially expressed between patients with good and bad prognosis and uses these to fit a regression model that can be used to predict prognosis for breast, colon, rectal, and ovarian cancers. Using this model, ENCAPP can also identify prognostic biomarkers with a high degree of confidence, which can be used to generate downstream mechanistic and therapeutic insights.
ENCAPP is a robust method that can accurately predict prognostic outcome and identify biomarkers for different human cancers.
KeywordsCancer genomics Gene expression Protein interaction network Prognosis prediction Elastic net
The genetic complexity of cancer and its widely varying etiology and outcome make it extremely difficult to treat. It has been realized that rather than being a single disease, different cancers have widely diverse molecular bases [1,2]. There has been a tremendous amount of effort in the literature to understand molecular signatures underlying cancer . A significant number of these efforts have been informatics-based approaches that try to leverage genomic information such as expression alterations, mutations in genomes, copy number changes and epigenetic modifications to elucidate the mechanistic basis of cancer . Global collaborative research endeavors such as The Cancer Genome Atlas (TCGA)  and the International Cancer Genome Consortium (ICGC)  are trying to assimilate these genome-scale datasets for different kinds of cancers across many countries.
One of the key challenges has been to use genomic information to understand the basis for different outcomes for the same cancer. However, this has been difficult because it is unclear as to which parameters contain the most information regarding disease outcome. One of the first attempts at predicting cancer prognosis using genome-scale transcriptomic datasets was undertaken by van de Vijver et al. . Using microarrays, they obtained tissue-specific gene-expression profiles for breast cancer patients. They then clustered these expression profiles and correlated them with prognostic outcome to identify a 70-gene ‘prognosis profile’ for breast cancer. One of the key limitations in using only expression datasets to predict cancer prognosis is the assumption of independence between genes in hypothesis testing. The protein products encoded by these genes are not independent but part of a complex interactome network. The dependencies of this network have been shown to be of great importance in understanding the genetic and molecular bases of disease [7-11]. Chuang et al. , Taylor et al.  and Wu and Stein  used a functional interactome network to predict breast cancer prognosis. Recently, Hofree et al. reported a network-based stratification approach that can use somatic mutations to predict cancer subtypes . However, their method is primarily designed to work with mutation data and is less accurate for expression data . Given the much wider availability of expression datasets as compared to whole genome or exome sequences, it is of paramount importance to have a robust method that can use gene expression to accurately predict prognosis across different types of cancer. To this end, in this manuscript, we report ENCAPP, an elastic-net-based cancer prognosis prediction method. We use tissue-specific gene expression data from patients along with the reference human protein interaction network to construct a regression model that can predict disease outcome for breast, colon, rectal, and ovarian cancers. Our approach outperforms previous methods in terms of accuracy of prognosis prediction. Moreover, ENCAPP can also accurately identify genes that can serve as prognostic biomarkers for different cancers.
Results and discussion
ENCAPP – a schematic
Since the data is inherently high dimensional (i.e., the number of expression modules is greater than the number of patients), ordinary least squares regression cannot be used and a regularization term is essential (see Methods). While ridge regression (L2 regularization term)  uses all input variables to fit the model, the least absolute shrinkage and selection operator (LASSO, L1 regularization term)  attempts to find the most optimal sparse fit. Ridge regression can lead to inflated variance but has low bias, while LASSO can have high bias but ensures low variance. To optimize the bias-variance tradeoff, the elastic net [19,20] has been proposed and is our choice of regression model (see Methods).
Prognosis prediction using differentially expressed functional modules
To identify functional modules, we tried three separate algorithms – hierarchical clustering , affinity propagation clustering  and ClusterOne . We constructed modules from all three algorithms using default parameters (the module creation is independent of any expression data). Using expression values from the van de Vijver dataset, we used modules generated by all three algorithms to construct expression modules and used them to predict prognosis. We find that ClusterOne has the best performance (Figure 2C; see Methods). One possible reason for this is that the protein interactome network is binary (1 corresponding to an interaction between two proteins, while 0 corresponds to no interaction between the two proteins) and sparse. Thus, the number of discrete values (equal to 1 + the graph diameter) the graph distance used for hierarchical clustering can take is limited. Affinity propagation clustering is more suited to identifying hub-group-like topological structures as hubs fit the definition of exemplars. On the other hand, ClusterOne was designed to identify functional modules that capture pleiotropic relationships. Thus, ClusterOne was used for all further analyses.
To compare the performance of our method to previous attempts, we first compared our classification accuracy (i.e., the fraction of patients for which we were able to accurately predict prognosis, see Methods) and AUC to Taylor et al. (Figure 4B). Using expression data in conjunction with the protein network, ENCAPP achieves a median AUC of 0.777, significantly higher than the value of 0.71 reported by Taylor et al (P < 10−3). We then compared the performance of ENCAPP to that of Chuang et al. At a fixed sensitivity of 90%, ENCAPP has a significantly higher accuracy (75.1% vs 70.1%, P = 0.025). Finally, we compared ENCAPP to the results reported by Wu and Stein. Since they do not directly report ROC curves, we adopted a slightly different approach for this comparison. We trained a generalized linear model (GLM) using expression values from van de Vijver for the significant modules identified by them and attempted to predict prognosis for the Wang dataset. We found that the median AUC is 0.510. We then used the same modules and constructed the features that ENCAPP uses to train a GLM. The median AUC goes up to 0.561, significantly higher (P < 10−3) than the earlier median AUC.
We then sought to assess the changes that cause the performance boost over previous methods. We used ENCAPP on an experimentally verified subset of the Ophid interactome used in the Taylor et al. study. We obtained a median AUC of 0.750, which is significantly higher (P = 0.040) than the AUC of 0.71 obtained by them. This confirms that a large portion of the increase in performance is solely due to the core methodology underlying ENCAPP – our approach captures more information regarding the topology of the protein interactome than Taylor et al because of the differences in hub groups and modules outlined earlier. The rest of the increase is due to a higher quality protein network used in our study. The improvement in the protein network can be attributed to two factors – a methodological enhancement: we employ a series of stringent filtering steps  to identify a set of high-quality interactions and an increase in the available data. Thus, ENCAPP is a robust and reliable method that combines expression data with protein network modules to accurately predict cancer prognosis; it works efficiently even in the absence of clinical data.
Robustness of ENCAPP
Is ENCAPP robust to changes of the response variable or the incompleteness of the reference protein network? To systematically test this, we first focused on how the performance of ENCAPP changes when the response variable is altered. For the van de Vijver dataset, he outcome variable (survival) is right censored, i.e., if a patient survives for > =5 years, she is considered to have good prognosis, else bad prognosis. To test the robustness of ENCAPP to the right censoring cutoff, we varied it from 3–14 years i.e, a patient is defined to have good prognosis if she survived for > =k years, where k varies from 3 to 14. We find that ENCAPP performs consistently well for all values of k (Figure 4C), with the highest median AUC being 0.778 and the lowest median AUC being 0.730. This confirms that ENCAPP is robust across a wide range of cutoff values for right censoring.
To further validate the robustness of ENCAPP to alternate definitions of prognosis, we modified the outcome definition. We defined a patient to have a good prognosis, if she does not have metastases for > = k years, where k varies from 3 to 10. Here too, ENCAPP performs consistently well (Figure 4D), with the highest median AUC being 0.744 and the lowest median AUC being 0.652, confirming that it is also robust across prognosis definitions.
To address the robustness of ENCAPP to incompleteness of the protein network, we generated sets of 50 random networks for each of the following scenarios: 5%, 10%, 15% and 20% of the total edges randomly removed. We then generated modules for all these random networks using the same ClusterOne parameters as the original network. We then re-calculated the performance of ENCAPP on the van de Vijver dataset for each of these networks with a certain fraction of the edges removed. We find that ENCAPP still performs well, with median AUCs of 0.744, 0.740, 0.743 and 0.742 at 5%, 10%, 15% and 20% edge deletions respectively (Figure 4E), confirming that it highly robust to network incompleteness.
Pan-cancer prognosis prediction
Another key goal of prognosis prediction algorithms is to be applicable across data collected from different cohorts of patients. To test whether ENCAPP can be trained on a certain dataset and then used to predict outcome for a completely different set of patients, we used the Wang et al. dataset to train the model and then predicted outcomes for the van de Vijver dataset using it. While we originally analyzed the van de Vijver dataset in terms of overall survival, clinical information on metastasis was available. Since, the Wang et al. dataset uses metastasis-free survival as the prognostic outcome, we used this as the outcome for the cross-dataset prediction. ENCAPP was accurate in predicting outcomes (median AUC = 0.649; 95% CI for AUC = 0.649-0.650; P = 0.019; Additional file 1: Table S1), showing that our approach is highly robust and successful in incorporating major differences in clinical parameters (Figure 5B). Here too, we perform better than Chuang et al. who report a classification accuracy of 55.8% at 90% sensitivity (for predictions on the Wang dataset using the van de Vijver sub-network markers). ENCAPP achieves a significantly higher classification accuracy of 62.6% at 90% sensitivity (P = 0.009).
We then used ENCAPP to analyze other kinds of cancer – a colon cancer  and an ovarian cancer  expression dataset published by the TCGA. The ovarian cancer dataset that we analyzed consisted of platinum-resistant cancer patients, which occurs in approximately 25% of patients within 6 months of therapy. For each dataset, we looked to see how well our method could predict overall survival. ENCAPP was able to predict prognostic outcome successfully for both colon and ovarian cancer (median AUCs = 0.666 and 0.766 respectively; 95% CIs for AUC = 0.658-0.674 and 0.760-0.771 respectively; P = 0.001 and 0.097 respectively; Additional file 1: Table S1) confirming that it works robustly across different cancers (Figures 5C, 5E).
Finally, we tried using ENCAPP to predict prognosis across cancer types when they are related. We tried predicting rectal cancer prognosis  having trained ENCAPP using colon cancer data . ENCAPP is very successful (median AUC = 0.803; 95% CI for AUC = 0.782-0.823; Figure 5D; P < 10−3; Additional file 1: Table S1) at predicting rectal cancer prognosis showing that ENCAPP is able to predict prognosis across related cancers.
Identifying prognostic markers using ENCAPP
We also compared the average degree of proteins in these significant modules with that of cancer-associated proteins (11.2) and all proteins in the network (8.2). The average degree of proteins in significant modules is not, in general, skewed towards the average degree of cancer-associated proteins. For the van de Vijver and Wang breast cancer datasets, the average degree of proteins in significant modules are 12.0 and 10.5 respectively, similar to the average degree of cancer-associated proteins. However, for the colon and ovarian cancer datasets, they are 8.3 and 8.8 respectively, similar to the overall average degree. These findings are also consistent with Figure 6B, which shows that the enrichment of cancer genes in significant modules for the 2002 and 2005 breast cancer datasets is higher than the enrichment for the colon and ovarian cancer datasets. This could be due to higher noise for the colon and ovarian cancer datasets or due to the list of cancer genes being incomplete with varying degrees of incompleteness for different tissue types.
To examine whether the significant modules that we find agree with what has been previously reported, we compared the significant modules that we obtained for the van de Vijver dataset with the significant modules that Wu and Stein  obtained for the same dataset. 29/85 (34.1%) of the modules are overlapping. Thus, ENCAPP does find a large number of signatures concordant with what has been reported earlier, but it also finds a significant number of potentially novel signatures. We then compared the significant modules that we obtained for the Wang dataset with the significant modules that Wu and Stein obtained for the van de Vijver dataset. This is a comparison both across methods and cancer sub-types. 25/268 (9.3%) of the modules are still overlapping, showing that there are a number of stable signatures across cancer sub-types. We also find that 3 significant modules for the van de Vijver dataset contain 13 proteins of which 5 have been previously implicated in cancer (Figure 6C) and 3 significant modules for the colon cancer dataset contain 9/21 known cancer genes (Figure 6C). A number of these genes are known to be good prognostic markers.
As a further validation, we examined prognostic biomarkers detected by ENCAPP that were unknown at the time of publication of the expression dataset, but have since been clinically validated. Conceptually, these correspond to novel biomarkers detected by ENCAPP. For example, we detected NFKB2 and BCL3 in a significant module for the breast cancer (2002) dataset (Figure 6C). In 2005, it was shown that the NFκB complex, of which NFKB2 is one of the subunits, can be used a well-known prognostic marker for breast cancer . More recently, it has also been shown that suppression of the NFκB co-factor BCL3 correlates with poor prognosis as it inhibits apoptosis of mammary cells . GATA2 was present in a significant module for the colon cancer (published in 2011) dataset (Figure 6C). In 2013, GATA2 was shown to be a useful prognostic marker for colorectal cancer – patients with high expression levels of GATA2 are likely to have worse disease-free survival outcomes than those with lower expression levels of GATA2 . These confirm that the significant modules identified by ENCAPP contain numerous prognostic markers.
We also found a number of modules with proteins that have not yet been validated as prognostic biomarkers but are excellent candidates for hypothesis-driven follow-up experiments. For example, one of the significant modules for the breast cancer (2002) dataset contains CKS1B, SKP2 and DUSP1 (Figure 6D). It has been shown that CKS1B is required for the SKP2-mediated ubiquitination of PSMD9 (p27) . A recent study shows that PSMD9 expression is altered in breast cancer patients irrespective of the BRCA mutation state . Together, these results suggest that this module and especially CKS1B and SKP2 could be reliable prognostic markers across breast cancer subtypes as altered expression of these genes will lead to mis-regulation of PSMD9, whose expression is altered in breast cancer patients with or without mutations in BRCA1.
For the colon cancer dataset, one of the significant modules contains FAM175B, BARD1, CSTF1, BRE, and UIMC1 (Figure 6D). It is well known that BARD1 interacts with BRCA1 to form a ubiquitin ligase complex [36,37] and the interaction can be disrupted by breast cancer mutations on BRCA1 [36,37]. A blood test based on BARD1 has been proposed as a potential way to diagnose breast cancer . FAM175B (ABRO1) and BRE are two of the 4 subunits of the BRISC deubiquitinating enzyme complex . BRE has already been shown to be a reliable prognostic marker for acute myeloid leukemia [40,41]. In the context of these studies, our results suggest that this module and especially FAM175B, BARD1 and BRE can be potential prognostic markers for colon cancer as altered expression of these genes can modify ubiquitination activity in the cell.
Here we have described ENCAPP, a robust prognosis predictor of different human cancers. Since ENCAPP uses differentially expressed modules between patients with good and bad prognosis to accurately predict disease outcome, the decision boundaries used to make this prediction correspond to functional changes in the cell. This is potentially extremely useful in generating mechanistic hypotheses regarding cancer causation and progression that can then be experimentally tested. Conceptually, the ENCAPP algorithm uses interaction dynamics, a combination of gene expression data with the topological structure of the network, to predict prognosis. Previous studies have shown that interaction dynamics is also useful in understanding the organization and evolutionary modes of biological networks [42,43]. Together, these suggest that approaches using interaction dynamics may be successful in elucidating the mechanistic basis of a wide range of biological phenomena, by combining two discrete layers of information – gene expression and protein networks.
Another key feature of ENCAPP is its ability to identify prognostic markers from the regression model itself. While some previous methods show examples of prognostically relevant genes identified by their method [13,15], the key difference is that such detections are typically anecdotal. On the other hand, we demonstrate that the significant modules in ENCAPP are systematically enriched for cancer genes. Thus, our model identifies biologically relevant genes and uses these for determining prognostic outcome. We also show that significant modules identified by ENCAPP contain known prognostic markers and hypothesize that they may contain novel biomarkers. Follow-up studies may want to validate these putative prognostic markers. Since ENCAPP identifies modules containing these genes, any positive results emerging from such studies will directly tie in to a pathway-level understanding of the mechanistic basis of that specific cancer type.
One limitation of ENCAPP is that the accuracy of the prognosis prediction is highly dependent on the quality of the expression dataset, which is why the AUCs vary across the different cancers. Future approaches may want to combine gene expression and protein networks with other data such as somatic mutations, epigenetic modifications and copy number alterations to make the overall prediction accuracy less dependent on the quality of an individual dataset.
Expression data and the human protein interactome network
Sample size, number of good and bad prognosis patients, and breakdown by stage and grade for the different expression datasets used are available in Additional file 5: Table S3. Expression data were RMA-normalized as described in Additional file 2: Supplementary Notes. High-quality binary and co-complex human protein interactome networks were obtained from HINT . The final network used for this study was the union of the binary and co-complex networks. It comprises 42,604 interactions between 9,985 proteins. All datasets used in this study are obtained from papers that have already been published and required no ethics approval.
Identifying functional modules using clustering
ClusterOne identifies overlapping functional modules based on the topological properties of the protein interactome network . We did a sweep for the ‘s’ (size) and ‘d’ (minimum cluster density) parameters in ClusterOne . The default parameters are s = 3 and d = 0.35. We examined the parameter space around these values. Since the modules were identified independently of the expression datasets, situations occasionally arose in which some modules had missing gene expression values. In these cases, a module was included only if at least 1/3 of the genes in that module had corresponding expression values. For each cancer type, we report the highest AUC value obtained in the parameter sweep.
The elastic-net-based regression model
Here, α is a number between 0 and 1 with α = 0 corresponding to ridge regression alone and α = 1 corresponding to LASSO alone. We choose the best α using cross validation.
Here, s m is the number of genes in module m. E ik corresponds to the expression value of the k th gene for the i th patient. < E kg > and < E kb > represent the average (mean) expression values of the k th gene across all patients with good and bad prognosis respectively. σ kg and σ kb represent the standard deviation of the expression values of the k th gene across all patients with good and bad prognosis respectively. For all the cross-validations, < E kg >, < E kb >, σ kg and σ kb are calculated using only the samples in the training set. However, while using G im and B im as features derived from every module generally gives the most optimum performance, we noticed that in certain cases it is possible to obtain a slight increase in performance by not averaging over each module. There all P ik and Q ik values are used as input. While training the ENCAPP classifier, it is necessary to check which of the two approaches performs better.
For the datasets where clinical information was also available, we incorporated it using a logistic regression model. A description of the available clinical data is given in Additional file 6: Clinical information. Since the clinical data is not high dimensional, elastic net regression is not a suitable choice for it. The final predicted outcome was a weighted linear combination of the two outputs – one predicted by the elastic-net-based model (using expression and protein network data) and the other predicted by the logistic regression model. Thus, Y 1 = f(X 1 ) and Y 2 = g(X 2 ) and Y = k x Y 1 + (1 - k) x Y 2 . Here, X 1 is the set of expression derived features, f the function obtained from the elastic-net based classifier and Y 1 the corresponding outcome variable, X 2 the set of clinical features, f the function obtained from the logistic-regression based classifier and Y 2 the corresponding outcome variable. Y is the final outcome obtained by a linear combination of Y 1 and Y 2 . An optimal value of k, the relative weight parameter is obtained by grid search.
ENCAPP code and associated datasets are available in the Supplementary information (Additional file 7: ENCAPP_Code_Datasets).
The performance of our model is evaluated in a five-fold cross validation framework (Additional file 2: Supplementary Notes). We split the patients into five subsets such that four subsets are used for training and the fifth one is the test set. The prognostic outcomes for the training set were used to determine the regression coefficients. These coefficients were then used to predict outcomes for patients in the test set. We repeated this procedure five times so that each subset served as a test set. The predicted outcomes were compared to the actual outcomes using a receiver operating characteristic (ROC) curve . The area under the ROC curve (AUC) and classification accuracy were used as measure of the quality of the prediction . The cross validation is process was repeated 50 times with a set of random seeds. For all comparisons, each method was run with the same set of random seeds, which ensured that the cross-validation dataset splits were identical across methods. Thus, all observed differences are solely due to one method being superior to the other and not because of how the dataset was split into the 5 folds. P-values evaluating the significance of difference in performance between different methods (two sets of AUC values) were calculated using a Mann–Whitney U test.
Enrichment of cancer genes in modules with significant regression coefficients
P-values were calculated using a cumulative binomial test.
Availability of supporting data
The datasets supporting the results of this article are included within its additional files.
JD was supported in part by a fellowship from the Center for Vertebrate Genomics, Cornell University. KMG was supported in part by NIH Training Grant T32GM083937, Tri-Institutional Training Program in Computational Biology & Medicine, awarded by the National Institute of General Medical Sciences. FB and MHW were supported by NSF-DMS award 1310119. HY was supported by National Cancer Institute grant R01 CA167824, National Institute of General Medical Sciences grants R01 GM104424, R01 GM108716, R01 GM097358 and a Weill Cornell Medical College Clinical and Translational Science Center Pilot Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–74.View ArticlePubMedGoogle Scholar
- Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505(7484):495–501.View ArticlePubMed CentralPubMedGoogle Scholar
- Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes Dev. 2011;25(6):534–55.View ArticlePubMed CentralPubMedGoogle Scholar
- Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–20.View ArticlePubMed CentralPubMedGoogle Scholar
- Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabe RR, et al. International network of cancer genome projects. Nature. 2010;464(7291):993–8.View ArticlePubMedGoogle Scholar
- van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347(25):1999–2009.View ArticlePubMedGoogle Scholar
- Vidal M, Cusick ME, Barabasi AL. Interactome networks and human disease. Cell. 2011;144(6):986–98.View ArticlePubMed CentralPubMedGoogle Scholar
- Wang X, Wei X, Thijssen B, Das J, Lipkin SM, Yu H. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol. 2012;30(2):159–64.View ArticlePubMed CentralPubMedGoogle Scholar
- Guo Y, Wei X, Das J, Grimson A, Lipkin SM, Clark AG, et al. Dissecting disease inheritance modes in a three-dimensional protein network challenges the "guilt-by-association" principle. Am J Hum Genet. 2013;93(1):78–89.View ArticlePubMed CentralPubMedGoogle Scholar
- Das J, Fragoza R, Lee HR, Cordero NA, Guo Y, Meyer MJ, et al. Exploring mechanisms of human disease through structurally resolved protein interactome networks. Mol Biosyst. 2014;10(1):9–17.View ArticlePubMed CentralPubMedGoogle Scholar
- Das J, Lee HR, Sagar A, Fragoza R, Liang J, Wei X, et al. Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks. Hum Mutat. 2014;35(5):585–93.View ArticlePubMedGoogle Scholar
- Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007;3:140.View ArticlePubMed CentralPubMedGoogle Scholar
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, et al. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol. 2009;27(2):199–204.View ArticlePubMedGoogle Scholar
- Wu G, Stein L. A network module-based method for identifying cancer prognostic signatures. Genome Biol. 2012;13(12):R112.View ArticlePubMed CentralPubMedGoogle Scholar
- Hofree M, Shen JP, Carter H, Gross A, Ideker T. Network-based stratification of tumor mutations. Nat Methods. 2013;10(11):1108–15.View ArticlePubMedGoogle Scholar
- Das J, Yu H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. 2012;6(1):92.View ArticlePubMed CentralPubMedGoogle Scholar
- Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.View ArticleGoogle Scholar
- Tibshirani R. Regression shrinkage and selection via the Lasso. J Roy Stat Soc B Met. 1996;58(1):267–88.Google Scholar
- Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol). 2005;67(2):301–20.View ArticleGoogle Scholar
- Bunea F. Honest variable selection in linear and logistic regression models via l(1) and l(1) + l(2) penalization. Electron J Stat. 2008;2:1153–94.View ArticleGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL. Hierarchical organization of modularity in metabolic networks. Science. 2002;297(5586):1551–5.View ArticlePubMedGoogle Scholar
- Ahn YY, Bagrow JP, Lehmann S. Link communities reveal multiscale complexity in networks. Nature. 2010;466(7307):761–4.View ArticlePubMedGoogle Scholar
- Ward JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58(301):236.View ArticleGoogle Scholar
- Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–6.View ArticlePubMedGoogle Scholar
- Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods. 2012;9(5):471–2.View ArticlePubMed CentralPubMedGoogle Scholar
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015.Google Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365(9460):671–9.View ArticlePubMedGoogle Scholar
- Atlas TCG. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330–7.View ArticleGoogle Scholar
- Atlas TCG. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–15.View ArticleGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.View ArticlePubMed CentralPubMedGoogle Scholar
- Zhou Y, Eppenberger-Castori S, Marx C, Yau C, Scott GK, Eppenberger U, et al. Activation of nuclear factor-kappaB (NFkappaB) identifies a high-risk subset of hormone-dependent breast cancers. Int J Biochem Cell Biol. 2005;37(5):1130–44.View ArticlePubMedGoogle Scholar
- Wakefield A, Piggott L, Croston D, Jiang WG, Clarkson R. Suppression of the NF-κB cofactor Bcl3 inhibits mammary epithelial cell apoptosis and, in breast tumours, correlates with poor prognosis. Breast Cancer Res. 2008;10 Suppl 2:O4.View ArticlePubMed CentralGoogle Scholar
- Chen L, Jiang B, Wang Z, Liu M, Ma Y, Yang H, et al. Expression and prognostic significance of GATA-binding protein 2 in colorectal cancer. Med Oncol. 2013;30(2):498.View ArticlePubMedGoogle Scholar
- Ganoth D, Bornstein G, Ko TK, Larsen B, Tyers M, Pagano M, et al. The cell-cycle regulatory protein Cks1 is required for SCF(Skp2)-mediated ubiquitinylation of p27. Nat Cell Biol. 2001;3(3):321–4.View ArticlePubMedGoogle Scholar
- Dressler AC, Hudelist G, Fink-Retter A, Gschwantler-Kaulich D, Pfeiler G, Rosner M, et al. Tuberin and p27 expression in breast cancer patients with or without BRCA germline mutations. J Cancer Res Clin Oncol. 2013;139(8):1349–55.View ArticlePubMedGoogle Scholar
- Hashizume R, Fukuda M, Maeda I, Nishikawa H, Oyake D, Yabuki Y, et al. The RING heterodimer BRCA1-BARD1 is a ubiquitin ligase inactivated by a breast cancer-derived mutation. J Biol Chem. 2001;276(18):14537–40.View ArticlePubMedGoogle Scholar
- Brzovic PS, Keeffe JR, Nishikawa H, Miyamoto K, Fox 3rd D, Fukuda M, et al. Binding and recognition in the assembly of an active BRCA1/BARD1 ubiquitin-ligase complex. Proc Natl Acad Sci U S A. 2003;100(10):5646–51.View ArticlePubMed CentralPubMedGoogle Scholar
- Irminger-Finger I. BARD1, a possible biomarker for breast and ovarian cancer. Gynecol Oncol. 2010;117(2):211–5.View ArticlePubMedGoogle Scholar
- Cooper EM, Cutcliffe C, Kristiansen TZ, Pandey A, Pickart CM, Cohen RE. K63-specific deubiquitination by two JAMM/MPN+ complexes: BRISC-associated Brcc36 and proteasomal Poh1. EMBO J. 2009;28(6):621–31.View ArticlePubMed CentralPubMedGoogle Scholar
- Noordermeer SM, Sanders MA, Gilissen C, Tonnissen E, van der Heijden A, Dohner K, et al. High BRE expression predicts favorable outcome in adult acute myeloid leukemia, in particular among MLL-AF9-positive patients. Blood. 2011;118(20):5613–21.View ArticlePubMedGoogle Scholar
- Noordermeer SM, Monteferrario D, Sanders MA, Bullinger L, Jansen JH, van der Reijden BA. Improved classification of MLL-AF9-positive acute myeloid leukemia patients based on BRE and EVI1 expression. Blood. 2012;119(18):4335–7.View ArticlePubMedGoogle Scholar
- Das J, Mohammed J, Yu H. Genome-scale analysis of interaction dynamics reveals organization of biological networks. Bioinformatics. 2012;28(14):1873–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Das J, Vo TV, Wei X, Mellor JC, Tong V, Degatano AG, et al. Cross-species protein interactome mapping reveals species-specific wiring of stress response pathways. Sci Signal. 2013;6(276):ra38.View ArticlePubMed CentralPubMedGoogle Scholar
- Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York, NY: Springer; 2009.View ArticleGoogle Scholar
- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4(3):177–83.View ArticlePubMed CentralPubMedGoogle Scholar
- Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39(Database issue):D945–50.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.