- Open Access
Concordance of copy number loss and down-regulation of tumor suppressor genes: a pan-cancer study
© The Author(s). 2016
- Published: 22 August 2016
Tumor suppressor genes (TSGs) encode the guardian molecules to control cell growth. The genomic alteration of TSGs may cause tumorigenesis and promote cancer progression. So far, investigators have mainly studied the functional effects of somatic single nucleotide variants in TSGs. Copy number variation (CNV) is another important form of genetic variation, and is often involved in cancer biology and drug treatment, but studies of CNV in TSGs are less represented in literature. In addition, there is a lack of a combinatory analysis of gene expression and CNV in this important gene set. Such a study may provide more insights into the relationship between gene dosage and tumorigenesis. To meet this demand, we performed a systematic analysis of CNVs and gene expression in TSGs to provide a systematic view of CNV and gene expression change in TSGs in pan-cancer.
We identified 1170 TSGs with copy number gain or loss in 5846 tumor samples. Among them, 207 TSGs tended to have copy number loss (CNL), from which fifteen CNL hotspot regions were identified. The functional enrichment analysis revealed that the 207 TSGs were enriched in cancer-related pathways such as P53 signaling pathway and the P53 interactome. We further performed integrative analyses of CNV with gene expression using the data from the matched tumor samples. We found 81 TSGs with concordant CNL events and decreased gene expression in the tumor samples we examined. Remarkably, seven TSGs displayed concordant CNL and gene down-regulation in at least 50 tumor samples: MTAP (212 samples), PTEN (139), MCPH1 (85), FBXO25 (67), SMAD4 (64), TRIM35 (57), and RB1 (54). Specifically to MTAP, this concordance was found in 14 cancer types, an observation that is not much reported in literature yet. Further network-based analysis revealed that these TSGs with concordant CNL and gene down-regulation were highly connected.
This study provides a draft landscape of CNV in pan-cancer. Our findings of systematic concordance between CNL and down-regulation of gene expression may help better understand the TSG biology in tumorigenesis and cancer progression.
- Tumor suppressor gene
- Copy number variation
- Copy number loss
- Gene expression
Cancer is characterized by unconstrained cell proliferation. In the normal cell, there is precise control of cell division such as cell cycle check points . In cellular system, tumor suppressor genes (TSGs) are important guardian genes that protect a normal cell from one step on the path to uncontrolled growth [2, 3]. In cancer cells, TSGs may lose their normal functions because of mutations occur at its critical sites. For single nucleotide or small insertions/deletions (indels), these mutations often lead to truncation of transcripts or proteins, including nonsense mutations, splicing site mutations, or frameshift mutations. Similar effects can be caused by larger scale mutations, such as copy number variations (CNVs), gene fusions, or structural variants (SVs) [4, 5]. The mutated TSGs often coordinate with oncogenes for cancer progression [4, 6, 7]. Therefore, the identification and understanding of TSGs have profound influence to develop the diagnosis biomarkers and effective drugs for cancer therapies.
CNVs are the variable number of DNA fragments in the human genome. Their lengths typically range from a kilo base pairs to a mega base pairs . CNVs are divided into two major groups: copy number loss (CNL) and copy number gain (CNG). CNL denotes the decreased gene (or sequence fragment) copies in the genome while CNG denotes the gain of additional gene copies in the human genome. With the development of high throughput technologies such as Comparative genomic hybridization (CGH) array and next-generation sequencing, a very large number of CNVs, as well as other types of mutations and genomics data (e.g., gene expression) have been unveiled, especially in cancer genomes [9, 10]. This allows us to systematically study cancer mutation signatures, heterogeneity, and other molecular features . For CNV, such deleted or duplicated DNA fragments often have profound effects on gene expression, which subsequently affects gene’s function .
Despite a number of studies have explored CNVs and gene expression in various cancers , there has been no systematic study of the features in TSGs yet. Moreover, the results from single cancer type may not be representative in other types of cancer, or they may vary among the subtypes of the cancer. To overcome these limitations, we conducted a pan-cancer CNV analysis on TSGs to explore the landscape of CNV features and cross-validate some observations. This study may help us better elucidate the relationship between CNV and gene expression change in this important gene category in cancer.
The curated TSGs from thousands of literatures
To conduct a systematic CNV survey of TSGs, we downloaded all the 1207 curated human TSGs from TSGene database in a plain text format with all the Entrez Gene IDs and official symbols (version 2.0) . In this version of TSGene database, there were 1088 protein-coding and 198 non-coding TSGs. All these TSGs were manually curated from over 9000 PubMed abstracts by us. To annotate TSGs with CNVs, it requires the genomic location for mapping. Therefore, we downloaded the corresponding RefSeq mapping information for TSGs from RefSeq database. We implemented an in-house script to extract all the genomic location information from the completed human genome RefSeq sequences (accession number starting with NC). In total, 1207 TSGs were annotated with accurate genomic locations in GRCH 38.
The pan-cancer CNV data from The Cancer Genome Atlas (TCGA)
To explore the CNVs in pan-cancer systematically, we downloaded all the prepared TCGA CNV data with the GRCH 38 genomic coordinates from the Catalogue of Somatic Mutations in Cancer (COSMIC) database (V73). When integrating TCGA data, COSMIC introduced a few thresholds to define the copy number loss and gain. CNG was obtained by the following criteria: (the average genome ploidy < =2.7 AND total DNA segment copy number > =5) OR (average genome ploidy >2.7 AND total DNA segment copy number > =9). Similarly, the criteria for CNL were: (the average genome ploidy < =2.7 AND total DNA segment copy number =0) OR (average genome ploidy >2.7 AND total DNA segment copy number < (average genome ploidy – 2.7)). In this study, we followed COSMIC criteria and overlapped all the CNV regions with TSGs using the GRCH 38 coordinates. By intersecting all the CNV gain and loss information to all the 1207 TSGs with GRCH 38 coordinates, we annotated 1170 TSGs with precise gain or loss information. For each cancer type, we calculate the number of samples with CNL and CNG, respectively. Since TSGs are often in loss-of-function in cancer progression, we pulled out those TSGs with higher frequency of CNLs than that with CNGs. Specifically, we set a cut-off of 2 to filter out those TSGs without having at least twice of tumor samples with CNLs as tumor samples with CNGs. This process resulted in 207 TSGs with the evidence of an overall loss of CNVs. These genes were used for the following gene expression analysis.
Gene expression analysis for TSGs with CNL
To check the CNV-correlated gene expression changes on TSGs, we downloaded the TCGA pan-cancer gene expression data from the COSMIC database (V73). In this study, we focused on only those gene expression changes in the matched TCGA samples with TSG CNLs. For the gene expression quantification, COSMIC started from FPKM calculated using trimmed short reads generate by the RNA-Seq platform and the RSEM quantification results from the RNAseq V2 platform. Here, FPKM denotes Fragments Per Kilobase of transcript per Million mapped reads, which is used to indicate the relative expression of a transcript. And RSEM is one of the popular measures for accurate transcript quantification of RNA-Seq data. The average and the standard deviation of expression were computed using those tumor samples that are diploid for each corresponding gene.
The standard Z score was used to characterize whether a TSG is over or under expressed. The Z-score with absolute value 2 was used as the threshold value. The Z-score over 2 was defined as over expression while the Z-score less than −2 represented the decreased gene expression. For those 81 TSGs with CNL-associated gene expression change, we further systematically examined their somatic CNV patterns in pan-cancer of TCGA samples using cBio portal .
Sub-network extraction for the TSGs with high frequency CNLs
To explore the relevant biological mechanisms related to TSGs with frequently observed CNLs and consistent gene down-regulation, we extracted a PPI network to connect 81 TSGs with the remaining human genes. To this goal, we started from a non-redundant human interactome extracted from the Pathway Commons database [15, 16], containing 3629 proteins and 36,034 PPIs. It is worth noting that this integrated human interactome is based on well-curated pathway databases (HumanCyc, Reactome, and KEGG pathway database ). Therefore, those links in the interactome have biological meaning rather than physical interactions. Based on the pathway-based interactions, we used the similar approach implemented in our previous study to extract a sub-network related to our 81 TSGs [16, 18, 19]. In this sub-network extraction strategy, all the 81 seed TSGs were overlapped to the human pathway-based interactome. Then, a sub-network with the maximum number of the seed TSGs was formed by connecting each TSG through the shortest path. To characterize the function of the network, we relied on the network topological properties (degree and shortest path) calculated from the network. In practice, we utilized NetworkAnalyzer plugin in Cytoscape 2.8 to compute topological properties in the TSG network . The degree is defined as the number of direct connections of each node with other nodes in the TSG network [21, 22]. The network layout was conducted based on Cytoscape 2.8 .
Genomic regions with frequent copy number loss in TSGs in multiple cancer types
The 15 genomics regions associated with 207 tumor suppressor genes (TSGs) with frequent copy number losses (CNLs)
CTDSPL, CYB561D2, LIMD1, MST1R, NPRL2, PTPN23, RASSF1, RBM5, RBM6, RHOA, SEMA3B, SEMA3F, TUSC2, ZMYND10
CCAR2, DLC1, LZTS1, MIR383, MTUS1, SOX7, ZDHHC2
CDKN1C, H19, MIR210, MIR483, NUP98, RNH1, SIRT3, TRIM3, TSPAN32, TSSC4
ALOX15B, BCL6B, GABARAP, MIR195, MIR497, TNK1, TP53, XAF1, ZBTB4
AMH, DAPK3, DIRAS1, FZR1, GADD45B, PLK5, SIRT6, STK11, TCF3, TNFSF9
DOK2, MIR320A, PIWIL2, PPP3CC, RHOBTB2
CDCP1, LTF, MIR1226, SETD2, SMARCC1, TDGF1
ACY1, CACNA2D3, MIR135A1, MIRLET7G
MAP3K4, IGF2R, PACRG
BNIP3L, EXTL3, TNFRSF10A
ALOX15, MNT, MYBBP1A, PAFAH1B1, VPS53
FBLN1, MIRLET7B, MIRLET7A3, PPARA
CDKN2A, CDKN2B, MTAP
GNAT1, MST1, PBRM1
On chromosome 8, the 8p22 locus contained 7 neighbouring TSGs (CCAR2, DLC1, LZTS1, MIR383, MTUS1, SOX7, and ZDHHC2), while another 8 TSGs (BNIP3L, DOK2, EXTL3, MIR320A, PIWIL2, PPP3CC, RHOBTB2, and TNFRSF10A) clustered at 8p21. These 15 TSGs at 8p21-22 had CNL detected in 219 TCGA patients. The cancer tissues that had most frequent CNLs in TSGs at this locus are breast (61 samples), lung (42 samples), large intestine (30 samples), ovary (23 samples), and prostate (11 samples). Another CNL hot region is at 17p13.1-3, which covers 14 TSGs, including the most studied TSG TP53. This region on chromosome 17 had detectable CNLs in a total of 50 TCGA tumor samples. Interestingly, the above three genomic regions with frequent CNLs in TSGs harbour not only well-known protein-coding TSGs such as TP53, but also six microRNAs (MIRLET7G, MIR135A1, MIR195, MIR320A, MIR383, and MIR497). By overlapping to TSGene database, we found all the six microRNAs are tumor suppressor microRNAs. Collectively, our systematic examination on CNL in TSG cluster regions provides precise information of such CNL in multiple cancers. The results may be useful for further studying the similar or different roles of CNL in differential cancer types as well as cancer heterogeneity.
Correlation of CNL with gene expression decrease using the matched tumor samples
A connected biological map of TSGs with concordant CNL and decreased gene expression
This study revealed some important somatic mutational features of TSGs in multiple cancer types, particularly with respect to the CNVs and their effects on gene expression. Since the loss-of-function is the typical mechanism that TSGs involve in cancer initiation and progression, a large-scale change of gene copy number may induce gene expression alteration. In this scenario, a critical regulation change is that CNL in a TSG leads to the over-expression of its guardian genes. Although previous studies have explored the balance of germline CNVs and gene expression, there still lack of direct links of somatic CNVs on gene expression dosage compensation. In this study, we only focused on the concordant patterns between CNL and gene down-regulation because TSGs often play functions in a manner of loss-of-function. Our results only provided the insight of correlation between gene dosage and somatic CNV; more systematic examination of the expression quantitative trait locus may provide more depth on the relationship between CNV and gene expression.
This study was mainly based on the TCGA genomic data. The cohort size of some cancer types is relatively small (e.g., ~100 samples). A small sample size may filter out many low-frequency CNVs. In addition, TCGA mainly relies on the CGH array between normal and tumor tissues to characterize CNVs, which may lose signals outside of pre-designed probes. These undetected CNVs may also contribute to TSGs functionality on cancer progression. Another limitation in this study is that we only incorporate the protein-coding gene expression, not including non-coding gene expression. The further integration of large-scale CNV data and gene expression of noncoding RNA (microRNA and long non-coding RNA) may provide new insight into the roles of the non-coding TSGs.
In this study, we made an effort to construct a biological map for the genes with consistent CNL and gene down-regulation in cancer. Although the majority of genes in the reconstructed map are linked with each other, the size of the network is relatively small. Therefore, it has limited power to explore the overall network functions based on the topological features. For example, we found the degree of the network might follow the power-law distribution. This feature is different from the whole human PPI network, in which the majority nodes (genes) are sparsely connected with exponent b as 2.9 . It is not sufficient to impose the scale-free properties on this constructed small network due to the small size. For the same reason, it is not good for us to define the hub nodes based on the high connectivity. Nevertheless, the nodes with multiple connections in our network should provide some clues for the common CNL events related to gene down-regulation. The further experimental validation may provide more insight into the potential molecular mechanisms for those CNL events that were detected in multiple cancers.
In conclusion, our systematic exploration of copy number variations on human TSGs revealed that the copy number loss of TSGs cluster in a few genomics regions. These TSGs with frequent copy number loss often have profound roles in cancer-related pathways. The loss of copy number in a number of TSGs may contribute to the gene expression change involving tumorigenesis.
CNG, copy number gain; CNL, copy number loss; CNV, copy number variation; TCGA, The Cancer Genome Atlas; TSG, tumor suppressor gene
We thank the investigators of The Cancer Genome Atlas (TCGA) whose effort of data generation and analyses made this work possible.
Publication of this article was charged from the faculty retention funds to Z.Z. from Vanderbilt University.
This article has been published as part of BMC Genomics Volume 17 Supplement 7, 2016: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2015: genomics. The full contents of the supplement are available online at http://bmcgenomics.biomedcentral.com/articles/supplements/volume-17-supplement-7.
This work was partially supported by National Institutes of Health (NIH) grant (R01LM011177) and Ingram Professorship Funds (to Z.Z.). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
MZ and ZZ conceived the project. MZ collected the data and carried out all the analyses. MZ wrote the manuscript draft and MZ and ZZ finalized the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Malumbres M, Barbacid M. Cell cycle, CDKs and cancer: a changing paradigm. Nat Rev Cancer. 2009;9(3):153–66.View ArticlePubMedGoogle Scholar
- Zhao M, Kim P, Mitra R, Zhao J, Zhao Z. TSGene 2.0: an updated literature-based knowledgebase for tumor suppressor genes. Nucleic Acids Res. 2016;44(D1):D1023-1031.Google Scholar
- Sherr CJ. Principles of tumor suppression. Cell. 2004;116(2):235–46.View ArticlePubMedGoogle Scholar
- Haber DA, Settleman J. Cancer: drivers and passengers. Nature. 2007;446(7132):145–6.View ArticlePubMedGoogle Scholar
- Pellman D. Cell biology: aneuploidy and cancer. Nature. 2007;446(7131):38–9.View ArticlePubMedGoogle Scholar
- Zhao M, Sun J, Zhao Z. Distinct and competitive regulatory patterns of tumor suppressor genes and oncogenes in ovarian cancer. PLoS One. 2012;7(8):e44175.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao M, Sun J, Zhao Z. Synergetic regulatory networks mediated by oncogene-driven microRNAs and transcription factors in serous ovarian cancer. Mol Biosyst. 2013;9(12):3187–98.View ArticlePubMedGoogle Scholar
- Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12(5):363–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao M, Wang Q, Wang Q, Jia P, Zhao Z. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics. 2013;14 Suppl 11:S1.View ArticleGoogle Scholar
- Zhao M, Zhao Z. CNVannotator: a comprehensive annotation server for copy number variation in the human genome. PLoS One. 2013;8(11):e80170.View ArticlePubMedPubMed CentralGoogle Scholar
- Jia P, Pao W, Zhao Z. Patterns and processes of somatic mutations in nine major cancers. BMC Med Genomics. 2014;7:11.View ArticlePubMedPubMed CentralGoogle Scholar
- Henrichsen CN, Chaignat E, Reymond A. Copy number variants, diseases and gene expression. Hum Mol Genet. 2009;18(R1):R1–8.View ArticlePubMedGoogle Scholar
- Lu TP, Lai LC, Tsai MH, Chen PC, Hsu CP, Lee JM, Hsiao CK, Chuang EY. Integrated analyses of copy number variations and gene expression in lung adenocarcinoma. PLoS One. 2011;6(9):e24829.View ArticlePubMedPubMed CentralGoogle Scholar
- Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):l1.View ArticleGoogle Scholar
- Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011;39(Database issue):D685–690.View ArticlePubMedGoogle Scholar
- Zhao M, Austin ED, Hemnes AR, Loyd JE, Zhao Z. An evidence-based knowledgebase of pulmonary arterial hypertension to identify genes and pathways relevant to pathogenesis. Mol Biosyst. 2014;10(4):732–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36(Database issue):D480–484.PubMedGoogle Scholar
- Zhao M, Li X, Qu H. EDdb: a web resource for eating disorder and its application to identify an extended adipocytokine signaling pathway related to eating disorder. Sci China Life Sci. 2013;56(12):1086–96.View ArticlePubMedGoogle Scholar
- Zhao M, Kong L, Qu H. A systems biology approach to identify intelligence quotient score-related genomic regions, and pathways relevant to potential therapeutic treatments. Sci Rep. 2014;4:4176.PubMedPubMed CentralGoogle Scholar
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27(3):431–2.View ArticlePubMedGoogle Scholar
- Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004;5(2):101–13.View ArticlePubMedGoogle Scholar
- Zhao M, Qu H. High similarity of phylogenetic profiles of rate-limiting enzymes with inhibitory relation in Human, Mouse, Rat, budding Yeast and E. coli. BMC Genomics. 2011;12 Suppl 3:S10.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao M, Sun J, Zhao Z. TSGene: a web resource for tumor suppressor genes. Nucleic Acids Res. 2013;41(Database issue):D970–976.View ArticlePubMedGoogle Scholar
- Rowley H, Jones A, Spandidos D, Field J. Definition of a tumor suppressor gene locus on the short arm of chromosome 3 in squamous cell carcinoma of the head and neck by means of microsatellite markers. Arch Otolaryngol Head Neck Surg. 1996;122(5):497–501.View ArticlePubMedGoogle Scholar
- Killary AM, Wolf ME, Giambernardi TA, Naylor SL. Definition of a tumor suppressor locus within human chromosome 3p21-p22. Proc Natl Acad Sci U S A. 1992;89(22):10877–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Jin Y, Turaev D, Weinmaier T, Rattei T, Makse HA. The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks. PLoS One. 2013;8(3):e58134.View ArticlePubMedPubMed CentralGoogle Scholar