The literature frequency for various brain cancer subtypes
Based on our comprehensive literature curation, we cleaned up all the associations between brain cancer genes and the literature before conducting further analyses. As shown in Fig. 3A, we found 27 genes that were each supported by more than 20 PubMed abstracts. However, 883 of the 1421 genes implicated in brain cancer (62%) were supported by only a single evidentiary mention in the literature; so obviously, those genes’ functions need further experimental validation. Using cancer subtype keywords, we assigned the 1421 genes to different subtypes, while a gene could be associated with multiple cancer subtypes, each subtype has its own literature-based evidence (Table S2). As shown in Fig. 3B, the top three keywords were: glioma (associated with 582 genes), lymphoma (associated with 450 genes), and medulloblastoma (associated with 245 genes). To explore the genetic heterogeneity of brain cancer, we grouped curated subtype information. For example, astrocytoma, oligodendroglioma, ependymoma, GBM, LGG, ganglioglioma, and oligoastrocytoma were all grouped as gliomas, and medulloblastoma was grouped with neuroectodermal tumors. Then, we subsequently identified 809 glioma-related genes and 354 neuroectodermal tumor-related genes in those two major subtype groups.
After we curated 227 and 25 genes for GBM and LGG, respectively, we summarized all the GBM and LGG CNVs on the gene pages in BCGene. To demonstrate how well our data identifies potential tumor suppressors and oncogenes, we first identified 85 GBM-associated tumor suppressors with more copy number loss (the ratio between copy number loss and copy number gain > 2.0) and 39 GBM-associated oncogenes with more copy number gain (the ratio between copy number gain and copy number loss > 2.0). Then, by cross mapping to the tumor suppressor and oncogene databases (TSGene 2.0 [16] and ONGene [8], respectively) (Fig. 3C), we found that 23 GBM genes with more frequent copy number loss are known tumor suppressor genes, and another 15 GBM genes with more frequent copy number gain are known oncogenes.
Functional enrichment of those genes shared by different subtype groups
To check the genetic heterogeneity of the high-level cancer subtype groups, we overlapped their associated genes to compare the common and unique genetic features of the five subtype groups (glioma, lymphoma, meningioma, neuroectodermal tumor, and pituitary tumor) (Fig. 4A) and found 44 genes belonging to four or more groups. Gene ontology enrichment analysis revealed that those 44 genes are highly associated with 12 functional categories (Fig. 4B). Some of those categories are highly related to cancer, such as negative regulation of programmed cell death (Benjamini and Hochberg false discovery rate (FDR) corrected p-value = 4.35E-05), DNA metabolism regulation (Benjamini and Hochberg FDR corrected p-value = 1.42E-04), and regulation of the mitotic G1/S transition (Benjamini and Hochberg FDR corrected p-value = 3.79E-04). A most interesting finding was the response to hypoxia (Benjamini and Hochberg FDR corrected p-value = 3.31E-04). In general, hypoxia is important in drug resistance and poor survival [17]. Therefore, targeting hypoxia might be a practical way to improve patient survival rate of patients with astrocytoma and GBM [18].
Our KEGG pathway [10] analysis based on ToppFun [11] further highlighted a few important cancer-related signaling pathways, such as the PI3K-Akt signaling pathway (corrected p-value = 8.04E-05), pathways in cancer (corrected p-value = 5.32E-10), proteoglycans in cancer (corrected p-value = 3.33E-06), and the advanced glycation end products-receptor for advanced glycation end products pathway (corrected p-value = 1.201E-5). More interestingly, signaling by interleukins (corrected p-value = 3.7E-05) and cytokine signaling in the immune system (corrected p-value = 1.06E-03) highlighted the importance of interleukins in the progression of brain cancer. Previous observations confirmed that many cytokines (mainly interleukins) are involved in brain cancer aggressiveness and the generation of disease-associated pain [19]. In summary, all our functional analyses demonstrated that subtype-specific gene mining using the BCGene database may be used to identify common genes in different brain cancer subtypes and to explore potential common molecular mechanisms.
Identify top-ranked genes with evidence mentioned only once in the literature
To further explore the curated genes’ relevancies to brain cancer, we ranked all the 1421 genes based on the 27 most reliable brain cancer genes as training set. The reliability of these 27 genes are based on each gene having 20 or more evidentiary mentions in the literature. This ranking result is to generate relatively importance to the remaining 1394 (1421 minus 27) genes in our database (Table S3). With similar functions to the 27 genes in the training set, the subsequent 100 top-ranked genes are likely important in brain cancer development. And within those top-ranked 100 genes, 33 were linked only by a single support from the literature. Thus, we consider that the roles of those 33 genes in brain cancer development are likely underestimated.
To investigate the potential oncogenic roles of those 33 genes, we used the large-scale cancer genomics datasets in cBioportal [12]. Altogether, we combined 2997 samples from 14 independent studies, including four datasets related to medulloblastoma, two datasets related to glioma, two GBM studies, two LGG studies, a study of anaplastic oligodendroglioma and anaplastic oligoastrocytoma, a study of a brain tumor patient-derived xenograft, an investigation of pilocytic astrocytoma, and a dataset of pheochromocytoma and paraganglioma. As shown in Figure S1, sample-based mutational patterns revealed 536 samples (18% of the total 2997 samples) that had at least one genetic mutation related to one of the 33 genes. After closely scrutinizing their subtype information (Fig. 5A), we found that the 33 genes were highly mutated in the glioma and GBM datasets but had relatively low mutational rates in the four datasets related to medulloblastoma. Interestingly, those 33 genes had a huge effect on patient survival (Fig. 5B). Among the 2303 patients with survival information, 467 of them had one or more genetic mutations in the 33 genes. The median survival of those 467 patients was 24.59 months, but the remaining 1836 patients’ median survival was 42.20 months, a very significant difference (log rank test, p = 2.30E-8).
Among the 536 samples with genetic mutations in one or more of the 33 genes, the top-ranked gene, CDK4, was mutated in 202 samples (8% of the 2997 samples) and the second-ranked gene, MAP 3 K1, was mutated in 79 samples (2.8%), and 8 of those samples also had a CDK4 mutation. Since the mutated genes in that mutational pattern are almost mutually exclusive, they may have complementary roles in the progression of brain cancer [20]. As shown in Fig. 6A, amplified CDK4 in five samples coincided with mRNA up-regulation, but four of the five samples had low methylation, which could have caused the increased mRNA expression (Fig. 6C). However, MAP 3 K1’s correlation patterns were strikingly different than CDK4’s (Fig. 6B, D). Altogether, CDK4 provides a good example of consistent mRNA up-regulation based on both amplification and methylation patterns, and MAP 3 K1 may be a good candidate for evaluating some brain cancers’ progressions, but those possibilities need further study.
In summary, the functional similarity-based gene prioritization identified 33 top-ranked brain cancer-implicated genes with evidence mentioned only once in the literature. By focusing on 2997 samples from 14 independent brain cancer genetic datasets, we found that these 33 genes are highly mutated in hundreds of brain cancer samples and significantly associated with survival time. In addition, we found a mutually exclusive mutational pattern between the two top-ranked genes, CDK4 and MAP 3 K1, which affected more than 200 brain cancer patients. Therefore, we consider that these two genes might be the most promising genes and might play important roles in the development of brain cancer.