- Open Access
Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks
© Deng et al.; licensee BioMed Central Ltd. 2015
- Published: 29 January 2015
Bladder cancer is the most common malignant tumor of the urinary system and it is a heterogeneous disease with both superficial and invasive growth. However, its aetiological agent is still unclear. And it is indispensable to find key genes or modules causing the bladder cancer. Based on gene expression microarray datasets, constructing differential co-expression networks (DCNs) is an important method to investigate diseases and there have been some relevant good tools such as R package 'WGCNA', 'DCGL'.
Employing an integrated strategy, 36 up-regulated differentially expressed genes (DEGs) and 356 down-regulated DEGs were selected and main functions of those DEGs are cellular physiological precess(24 up-regulated DEGs; 167 down-regulated DEGs) and cellular metabolism (19 up-regulated DEGs; 104 down-regulated DEGs). The up-regulated DEGs are mainly involved in the the pathways related to "metabolism". By comparing two DCNs between the normal and cancer states, we found some great changes in hub genes and topological structure, which suggest that the modules of two different DCNs change a lot. Especially, we screened some hub genes of a differential subnetwork between the normal and the cancer states and then do bioinformatics analysis for them.
Through constructing and analyzing two differential co-expression networks at different states using the screened DEGs, we found some hub genes associated with the bladder cancer. The results of the bioinformatics analysis for those hub genes will support the biological experiments and the further treatment of the bladder cancer.
- Bladder Cancer
- Microarray Dataset
- Large Relative Difference
- Topological Overlap Matrice
- Regulatory Impact Factor
The morbidity of bladder cancer is in the first place among the cancers of urinary system. The bladder cancer cells can spread by breaking away from the original tumor. They can spread through the blood vessels to the liver, lungs and bones. However, its causes are not yet clear. The bladder cancer is a heterogeneous disease that shows both superficial and invasive growth [1, 2]. Superficial tumors frequently recur and may progress to invasive growth. A part that warrants better treatment regimes bladder cancer is also a good model system to study tumor initiation and progression. To gain insights into the molecular biology of these processes, we performed gene expression analyses to get some important information about bladder cancer-associated genes.
Systems biology is an emerging approach applied to biomedical and biological scientific research. It is a biology-based inter-disciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological and biomedical research [3–5]. Network biology is a new way of representation and analysis of biological information processing, which understands life as a network. In fact, the network biology is a branch of the systems biology.
Differential co-expression network (DCN) is one of biological networks. A gene co-expression network has emerged as a novel holistic approach for microarray analysis [6–9]. Stuart et al.  and Bergmannet al. separately constructed the gene co-expression network that connected genes whose expression profiles were similar across different organisms. A human network was analyzed by Leeet al.  with functional grouping and cluster analysis. van Noort et al.  demonstrated the small-world and scale-free architecture of the yeast co-expression network. They showed that functionally related genes are frequently co-expressed across organisms constituting conserved transcription modules.
We wanted to explore transcriptional changes in terms of gene interactions rather than at the level of individual genes. In the end, we constructed two gene co-expression networks and sought to find cancer-induced changes in the network. The identification of co-expressed pairs in tumor and normal tissues led to the construction of two distinct networks that represent tumor and normal states, respectively. We expected that biological changes would be reflected in transcriptional changes, which could be identified by comparing the two co-expression networks. In the transcriptome analysis, differential co-expression analysis (DCEA) is emerging as a unique complement to traditional differential expression analysis. DCEA investigates differences in gene interconnection by calculating the expression correlation changes of gene pairs between two conditions. The rationale behind differential co-expression analysis is that changes in gene co-expression patterns between two contrasting phenotypes (e.g., healthy and disease) provide hints regarding the disrupted regulatory relationships or affected regulatory subnetworks specific to the phenotype of interest (in this case, the disease phenotype). Therefore, among the many growing directions of DCEA, there is the so-called "differential regulation analysis"(DRA), which integrates the transcription factor (TF)-to-target information to probe upstream regulatory events that account for the observed co-expression changes. Recently, many researchers have integrated differential co-expression and differential expression concepts to propose a novel Regulatory Impact Factor (RIF) that can be used to prioritize disease-causative TFs [10, 11].
In addition, a lot of researchers have begun to perform differential co-expression analyses of microRNAs [12, 13]. Currently, some tools have been developed for differential expression analysis based on microarray, such as R packages "LIMMA", "SAMR", "WGCNA" and so on.
In our study, we collected microarray datasets of bladder cancer from GEO http://www.ncbi.nlm.nih.gov/geo/ to analyze the datasets by an integrated strategy including some functions of SAMR, WGCNA, Cytoscape and other packages. We selected some DEGs at two different state (normal and cancer) and constructed two DCNs. Through the comparisons between two DCNs of two different states, we found some hub genes associated with the bladder cancer.
The datasets of gene expression Affymetrix microarray of bladder cancer [GEO: GSE3167] were download from the GEO database of NCBI. It has 18 samples, [GSM71019-GSM71027] are from the normal tissues and the other 9 samples are from the cancer tissues. The datasets were processed by an integrated strategy. Some DEGs were selected and constructed two DCNs. In the end, simple analysis was applied to the two DCNs and some hub genes were found through comparing two DCNs at different states.
Normalization of the microarray datasets
Selection and bioinformatical analysis of DEGs
After preprocessing the microarray datasets including the above normalization, some DEGs were selected using the R package "SAMR" (nperms(Number of permutations used to estimate false discovery rates) = 100; del (Value of delta to define cutoff rule) = 2.5). 36 up-regulated genes and 356 down-regulated genes were picked out (See Additional file 2).
Construction and analysis of two DCNs
The development of molecular markers for tumor classification and expression signatures that predict outcome will greatly improve diagnosis treatment of bladder cancer. We employed the R package "SAMR" to select 392 DEGs including 36 up-regulated and 356 down-regulated. In the GO function enrichment results (Additional file 6 Additional file 7), it is showed that the main functions of the DEGs are cellular physiological precess(24 up-regulated DEGs; 167 down-regulated DEGs) and cellular metabolism (19 up-regulated DEGs; 104 down-regulated DEGs), which is reasonable for the bladder. However, it is not clear why the difference between the number of the up-regulated and the down-regulated is so large.
We used the R packages "WGCNA" to construct two DCNs under different conditions. And we used the tool "Cytoscape" to visualize and analyze the two different DCNs. Some hub genes were found and analyzed in view of bioinformatics. The hub genes of the normal DCN mainly involve the neuroactive ligand-receptor interaction pathway and their GO functions mostly are response to biotic stimulus,response to stimulus and response to external stimulus. The hub genes of the cancer DCN involve the following three KEGG pathways: gamma-Hexachlorocyclohexane degradation, Fatty acid metabolism and Tryptophan metabolism. Their main GO functions are cellular physiological process, surface receptor linked signal transport and signal transduction. In addition, we found some difference of the two DCNs in modules from the clustering plots and the heatmaps. But it is unknown for us that the functions of the different modules, which is our future work.
We found several hub genes from the selected differential co-expression subnetwork of two different DCNs. Two of them have been reported to be associated with the bladder cancer. Then whether are the other hub genes associated with the bladder cancer? It need to be validated through biological experiments.
In the work, we adopted an integrated strategy to analyzing the bladder cancer-associated genes by combining several R packages, Gene Ontology and KEGG. In the experimental results, it shows that the bladder cancer results from the abnormal signaling pathways caused by many genes. Through the data mining for gene expression microarrays, we found differential co-expression subnetwork and the hub genes of the subnetwork. Through the main GO functions and pathways of the hub genes, we can better understand the development of the bladder cancer, which will support the wet biological experiment and even further promote the prevention, treatment,diagnosis and cure of the bladder cancer in the future.
Selecting differential expressed genes
Where ∑ m and ∑ n are summations of the expression measurements in states I and U, respectively, a = (1/nl + 1/n2)/(n1 + n2 - 2), and n1 and n2 are the numbers of measurements in states I and U.
To find significant change in gene expression, genes were ranked by magnitude of their d(i) values, so that d(i) was the i th largest relative difference. For each of the N balanced permutations relative differences d p (i) were also calculated, and the genes were again ranked such that d p (i) was the i th largest relative difference for permutation p. The expected relative difference, d E (i), was defined as the average over the N balanced permutations, .
To identify potentially significant changes in expression, they used a scatter plot of the observed relative difference d(i) vs. the expected relative difference d E (i). For the vast majority of genes, d(i) ≅ d E (i), but some genes are represented by points displaced from the d(i) = d E (i) line by a distance greater than a threshold Δ and these genes were called "significant genes".
The method for setting thresholds provides asymmetric cutoffs for induced and repressed genes. The alternative is the standard t test, which improves a symmetric horizontal cutoff, with d(i) >c for induced genes and d(i) <c for repressed genes.
Detecting modules in differential co-expression networks
We first needed to construct topological overlap matrices(TOM)[33, 34]. The topological overlap is for measuring pair-wise similarity. They start with a network encoded by its corresponding adjacency matrix A = [a ij ] which is a symmetric with binary entries. By convention, the diagonal elements are assumed to be zero.
Where, , and the index u runs across all nodes of the network.
Where N1(i) denotes the set of neighbors of I excluding I itself and |·| denotes the number of elements (cardinality) in its argument. The quantity |N1(i)∩N1(j)| measures the number of common neighbors that nodes i and j shares whereas |N1(i)| gives the number of neighbors of i.
The matrix is called the m - th order generalized topological overlap matrix (GTOMm). This quantity simply measures the agreement between the nodes that are reachable from i and from j within m steps.
This work was supported by the grants of the National Science Foundation of China(61133010, 61303111, 61373105,30711140358, 61472280 and 6140050774), the National High-Tec h R&D Program (863) (2014AA021502).
The publication costs for this article were partly funded by the grants of the National Science Foundation of China(61133010, 61303111, 61373105, 30711140358, 61472280 and 6140050774) and partly supported by the National High-Tec h R&D Program (863) (2014AA021502). And Publication costs for this article were also funded by the corresponding author's institution.
This article has been published as part of BMC Genomics Volume 16 Supplement 3, 2015: Selected articles from the 10th International Symposium on Bioinformatics Research and Applications (ISBRA-14): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/16/S3.
- Tosini I, Wagner U, Sauter G, Egloff M, Knogagil H, Alund G, Bannwart F, Mihatshg MJ, Gasser TC, Maurer R: Clinical significance of interobserver differences in the staging and grading of superficial bladder cancer. BJU Int. 2000, 85: 48-53. 10.1046/j.1464-410x.2000.00356.x.View ArticleGoogle Scholar
- Oosterlinck W, Lobel B, Jakse G, Malmstrom P-U, Stockle M, Sternberg C: Guidelines on bladder cancer. European Urology. 2002, 41: 105-112. 10.1016/S0302-2838(01)00026-4.View ArticlePubMedGoogle Scholar
- Wang B, Huang DS, Jiang C: A new strategy for protein interface identification using manifold learning method. IEEE Transactions on NanoBioscience. 2014, 13 (2): 118-123.View ArticlePubMedGoogle Scholar
- Huang DS, Zhang L, Han K, Deng S, Yang K, Zhang H: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Current Protein & Peptide Science. 2014, 15 (6): 553-560. 10.2174/1389203715666140724084019.View ArticleGoogle Scholar
- Xia J-F, Zhao X-M, Song J, Huang DS: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics. 2010, 11: 174-10.1186/1471-2105-11-174.PubMed CentralView ArticlePubMedGoogle Scholar
- Stuart JM, et al: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302: 249-255. 10.1126/science.1087447.View ArticlePubMedGoogle Scholar
- Bergmann S, et al: Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2004, 2: E9-10.1371/journal.pbio.0020009.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee HK, et al: Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004, 14: 1085-1094. 10.1101/gr.1910904.PubMed CentralView ArticlePubMedGoogle Scholar
- Noort VV, et al: The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO rep. 2004, 5: 280-284. 10.1038/sj.embor.7400090.PubMed CentralView ArticlePubMedGoogle Scholar
- Hudson NJ, Reverter A, Dalrymple BP: A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput Biol. 2009, 5: e1000382-10.1371/journal.pcbi.1000382.PubMed CentralView ArticlePubMedGoogle Scholar
- Reverter A, Hudson NJ, Nagaraj SH, Perez-Enciso M, Dalrymple BP: Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data. Bioinformatics. 2010, 26: 896-904. 10.1093/bioinformatics/btq051.View ArticlePubMedGoogle Scholar
- Staehler CF, Keller A, Backes C, Chandran A, et al: Whole miRNome-wide differential co-expression of microRNAs. Genomics Proteomics Bioinformatics. 2012, 10: 285-294. 10.1016/j.gpb.2012.08.003.View ArticlePubMedGoogle Scholar
- Bhattacharyya M, Bandyopadhyay S: Studying the Differential Coexpression of MicroRNAs Reveals Significant Role of White Matter in Early Alzheimer's Progression. Molecular BioSystems. 2013, 9: 457-466. 10.1039/c2mb25434d.View ArticlePubMedGoogle Scholar
- Smyth GK: Limma: linear models for microarray data. Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. 2005, Springer New York, 397-420.View ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. PNAS. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMedGoogle Scholar
- Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008, 9: 559-10.1186/1471-2105-9-559.PubMed CentralView ArticlePubMedGoogle Scholar
- Cytoscape. [http://www.cytoscape.org/]
- GEO. [http://www.ncbi.nlm.nih.gov/geo/]
- AmiGO. [http://amigo.geneontology.org/amigo]
- GATHER. [http://gather.genome.duke.edu/]
- Altayli E, Gunes S, Yilmaz AF, Goktas S, Bek Y: CYP1A2, CYP2D6, GSTM1, GSTP1, and GSTT1 gene polymorphisms in patients with bladdercancer in a Turkish population. Int Urol Nephrol. 2009, 41 (2): 259-66. 10.1007/s11255-008-9444-6.View ArticlePubMedGoogle Scholar
- Pavanello S, Mastrangelo G, Placidi D, Campagna M, Pulliero A, Carta A, Arici C, Porru S: CYP1A2 polymorphisms, occupational and environmental exposures and risk of bladder cancer. Eur J Epidemiol. 2010, 25 (7): 491-500. 10.1007/s10654-010-9479-8.View ArticlePubMedGoogle Scholar
- Obara W1, Ohsawa R, Kanehira M, Takata R, Tsunoda T, Yoshida K, Takeda K, Katagiri T, Nakamura Y, Fujioka T.: Cancer peptide vaccine therapy developed from oncoantigens identified through genome-wide expression profile analysis for bladder cancer. Jpn J Clin Oncol. 2012, 42 (7): 591-600. 10.1093/jjco/hys069. doi: 10.1093/jjco/hys069. Epub 2012 May 25View ArticlePubMedGoogle Scholar
- Wang LJ, Matoso A, Sciandra KT, Yakirevich E, Sabo E, Zhang Y, Meitner PA, Tavares R, Noble L, Pareek G, DeLellis RA, Resnick MB: Expression of S100A4 in renal epithelial neoplasms. Appl Immunohistochem Mol Morphol. 2012, 20 (1): 71-6. 10.1097/PAI.0b013e31821fc8b7. doi: 10.1097/PAI.0b013e31821fc8b7View ArticlePubMedGoogle Scholar
- Wang S-L, Zhu Y, Jia W, Huang DS: Robust classification method of tumor subtype by using correlation filters. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012, 9 (2): 580-591.View ArticlePubMedGoogle Scholar
- Zheng C-H, Zhang L, To-Yee Ng V, Shiu SC, Huang DS: Molecular pattern discovery based on penalized matrix decomposition. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2011, 8 (6): 1592-1603.View ArticlePubMedGoogle Scholar
- Zheng C-H, Zhang L, To-Yee Ng V, Shiu SC, Huang DS: Metasample-based sparse representation for tumor classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2011, 8 (5): 1273-1282.View ArticlePubMedGoogle Scholar
- Zheng C-H, Huang DS, Zhang L, Kong X-Z: Tumor clustering using non-negative matrix factorization with gene selection. IEEE Transactions on Information Technology in Biomedicine. 2009, 13 (4): 599-607.View ArticlePubMedGoogle Scholar
- Liu K-H, Huang DS: Cancer classification using rotation forest. Computers in Biology and Medicine. 2008, 38 (5): 601-610. 10.1016/j.compbiomed.2008.02.007.View ArticlePubMedGoogle Scholar
- Shun P, Huang DS: Cooperative competition clustering for gene selection. Journal of Cluster Science. 2006, 17 (4): 637-651. 10.1007/s10876-006-0077-6.View ArticleGoogle Scholar
- Zheng C-H, Huang DS, Shang L: Feature selection in independent component subspace for microarray data classification. Neurocomputing. 2006, 69 (16-18): 2407-2410. 10.1016/j.neucom.2006.02.006.View ArticleGoogle Scholar
- Huang DS, Zheng C-H: Independent component analysis based penalized discriminant method for tumor classification using gene expression data. Bioinformatics. 2006, 22 (15): 1855-1862. 10.1093/bioinformatics/btl190.View ArticlePubMedGoogle Scholar
- Yip AM, Horvath S: The generalized Topological Overlap Matix for Detecting Modules in Gene Networks. Proceedings of the 2006 international conference on bioinformatics & computational biology, BIOCOMP'06, Las Vegas, Nevada,USA, June 26-29. 2006Google Scholar
- Zhang B, Horvath S: A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 2005, 4 (1): Article 17-View ArticleGoogle Scholar
- Ravasz E, Somera A, Mongru D, Oltvai Z, Barabasi A: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297 (5586): 1551-1555. 10.1126/science.1073374.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.