TSGs have the highest frequency of mutations
In this study, we choose the 50 TSGs, 50 OCGs, and 145 essential genes, 171 target genes, and 12,315 other genes for investigation of mutation patterns. To compare the mutation frequencies of the tumor samples among the five gene sets, we performed the Kolmogorov-Smirnov (K-S) tests [31].
Figure 1A shows a comparison of a general mutation percentage of all samples in each gene set, and Figure 1B contains the average values and P-values of five gene sets. The TSGs had the highest average mutation frequency (4.34%), which was significantly higher than that of OCGs (2.36%, P = 0.002), target genes (1.32%, P = 1.04 × 10-10), essential genes (0.59%, P = 2.08 ×10-20), and other genes (0.98%, P = 7.29 × 10-17). The OCGs had the second highest average mutation frequency (2.36%), which was significantly higher than that of target genes (P = 0.007), essential genes (P = 8.46 ×10-13), and other genes (P = 4.53 × 10-17), respectively. The target genes had the third highest average mutation frequency (1.32%), which was significantly higher than that of essential genes (P = 2.79 × 10-13) and other genes (P = 4.15 × 10-6). Interestingly, the essential gene had the lowest mutation frequency among the five gene sets.
We further examined the mutation frequency in the five gene sets across the 12 cancer types (Figure 1C and Figure 1D). The mutation frequency in the TSGs was significantly higher than that of all other tumor types except for GBM, LAML, LUSC, and OV (p <0.05). LAML had the lowest average mutation frequency (1.33%) and UCEC the highest (8.23%). The mutation frequency in the OCGs was significantly higher than that of essential genes and that of other genes (p <0.05), respectively. Only in BRCA and LAML, the mutation frequency of the OCGs was significantly higher than that of the target genes. For the OCGs, OV had the lowest average mutation frequency (0.40%), and UCEC had the highest (6.20%).
In summary, these results indicated that TSGs had the highest mutation frequency in most tumour types, and the OCGs were the second. The essential genes had the lowest mutation frequency in all tumor types.
Network properties
To explore the network properties, we mapped the five gene sets onto human PPI networks and obtained the 48 TSG proteins, 49 OCG proteins, 161 target proteins, 141 essential proteins, and 12,315 other proteins. Then, we calculated four properties for each node in the network, including the degree, betweenness, clustering coefficient and shortest-path distance. To compare the network properties among the five sets of genes, we performed the K-S tests.
TSGs and OCGs tended to have higher degree and betweenness
Figure 2A shows the degree distributions for the five protein sets while Figure 2B contains their average degrees and K-S test P-values. The average degree of the TSG proteins was 87.48, which was significantly higher than that of the target proteins (48.34, p = 5.60 × 10-5), essential proteins (41.81, P = 3.58 × 10-6), and other proteins (14.47, p = 5.92 × 10-22). Similarly, the average degree of the OCGs was 79.31, which was also significantly higher than that of the target proteins (p = 9.81 × 10-5), essential proteins (p = 9.05 × 10-5), and other proteins (P = 2.87 × 10-19). However, we did not observe any significant difference between TSG proteins and OCG protein (p = 0.417). The average degrees of the TSGs and OCGs were approximately 2.0 times that of the target proteins and essential proteins and about 6.0 times that of the other proteins. The latter ratio is higher than that (3.1 times) found in cancer proteins in a previous study [27].
Figure 2C shows the betweenness distributions and Figure 2D contains the average value and K-S test P-values for the five protein sets. The results for the betweenness were consistent with those for the degree. These observations indicated that TSG proteins and OCG proteins had the highest degree and betweenness in the human PPI network compared to other proteins.
TSGs and OCGs tended to have a lower clustering coefficient
For each node, the clustering coefficient reflects the connectivity among its interactors. The higher the clustering coefficient, the higher the connectivity of its neighbors has. Figure 3 shows the distribution of the clustering coefficient values, the average value of each protein set, and the K-S test p-values among the five protein sets. The average clustering coefficient of the TSG proteins was 0.095, which was significantly lower than that of the essential proteins (0.131, p = 1.32 × 10-5) and the other proteins (0.155, p = 0.020). Similarly, we found that the average clustering coefficient of the OCG proteins was 0.118, which was significantly lower than that of the essential proteins (p = 0.001), though only slightly lower than that of the other proteins (p = 0.087). We also found that the clustering coefficient of the essential proteins was significantly lower that of the other proteins (p = 0.004). To obtain the detailed distribution of clustering coefficients, we separated the clustering coefficients into different bins with an interval of 0.1 and calculated the proportion of the proteins in each bin. We found that, the proportion of the TSG proteins (68.8%) was higher than that of the OCG proteins (55.1%) at bin (0-0.1]. In contrast, at bin (0-0.2], the proportion of the TSG proteins (18.8%) was lower than that of the OCG proteins (32.7%).
TSGs and OCGs tended to have shorter shortest-path distance
For each node, the shortest-path distance (SPD) was calculated from the node to all other nodes in the human PPI network. To summarize the measure, we utilized the average value of all shortest path distances to represent its shortest-path distance to others. Figure 4 shows the distribution of the SPD values, the average value of each protein set, and K-S test p-values among the five protein sets. The average shortest-path distance of the TSG proteins was 2.93, which was significantly shorter than that of the target proteins (3.18, p = 1.0 × 10-4), or the other proteins (3.47, p = 5.03 × 10-18). Interestingly, the average shortest-path distance of TSG proteins (2.93) was slightly lower than that of OCG proteins (2.98, p = 0.040). The average shortest-path distance of target proteins (3.18) was significantly longer than that of the essential proteins (3.00, p = 5.80 × 10-7) but significantly shorter than that of the other proteins (3.47, p = 6.29 × 10-17). While the proportion of shortest-path distances at each distance varied between the different sets, there were still a few similarities. In detail, from the shortest-path distance distribution at each distance, the proportion of proteins of different sets had much difference. For example, most proteins in each protein set have a shortest-path distance of 3.
From targets to TSGs or OCGs in the human PPI network
Most drugs exert their therapeutic actions through interactions with specific protein targets. Moreover, the TSGs and OCGs play important roles in the cancer development. Then, we compared the shortest-path distances from targets to TSG proteins or OCG proteins with the shortest-path distances from targets to essential proteins and other proteins. Figure 5A shows the fraction of each protein set in the drug target neighborhood with a measure of shortest-path distance from zero to eight. Among the 161 drug target proteins, 13 also belong to the OCGs and 8 belong to essential proteins. The rest of the OCG proteins (73%) and all TSG proteins (100%) were enriched at the shortest-path distances 1 and 2 from target proteins, which is consistent with the previous results of drug targets to cancer genes [31]. Additionally, most of the TSG proteins (75%), OCG proteins (61%), and target proteins (75%) had direct interactions with protein targets while other proteins (22%) had less direct interactions with protein targets (Figure 5B).
In summary, compared to the target proteins, essential proteins, and other proteins, both TSG and OCG proteins tended to have higher degrees, higher betweenness, lower clustering coefficients, and shorter shortest-path distances. Moreover, the TSG and OCG proteins did not have a significant difference with perspective of network topological properties. Both TSG proteins and OCG proteins tended to have more direct interactions with target proteins.
TSGs and OCGs are highly connected
To further understand the relationship between TSG and OCG proteins in the local network organization and environment, we hypothesized that exploring TSG and OCG network would provide some novel insights. Then we generated one TSG-OCG network starting from the human PPI networks, 50 TSG proteins, and 50 OCG proteins.
The TSG-OCG network consisted of the 106 nodes and 303 edges (Figure 6). Among the 106 nodes, 48 belonged to the TSG proteins, which accounted for 96% of all the TSG proteins; 49 belonged to the OCG proteins, which accounted for 98% of all the OCG proteins; and 9 were linkers. The composition of the network indicated that the TSG-OCG network mainly consisted of the TSG and OCG proteins. Among the 303 edges, 89 links occurred among 42 TSG proteins, 51 among 36 OCGs, 117 among the 71 proteins (38 TSGs and 33 OCGs), and 46 between 9 linkers and 15 TSGs or 26 OCGs. Thus, 257 edges (84.8%) existed among TSGs and OCGs, suggesting that the TSG proteins and the OCG proteins were highly connected to each other in the context of protein-protein interaction networks. Moreover, the proportion of these links between the 38 TSGs and 33 OCGs (38.7%) were higher than that of interactions among the TSGs (29.5%) and that of interactions among OCGs (16.9%), respectively. Most of the TSGs (38, 79%) had at least one edge with OCGs. Similarly, most of the OCGs (67%) had at least one edge with TSGs.
To further explore the joint contribution of mutations in TSGs and OCGs, we integrated the mutation frequency of Pan-Cancer samples in each gene with the TSG-OCG network (Figure 6). The bigger node size represents the higher percentage of samples with mutations in Pan-Cancer project. The mutation frequency of the 106 genes encoding the 106 nodes in the TSG-OCG network ranged from 0.33% to 46.15% with the average value of 3.14%. We further examined the correlation between the mutation frequency and degree of proteins using Pearson's correlation. We found that the mutation frequency and degree of proteins had a significant correlation (r = 0.30, P-value = 0.002). The observation indicated that the higher direct associations among these genes with higher mutation frequencies might contribute to the cancer development jointly. For example, TP53 had the highest mutation frequency in all samples and had 26 interactors. Among them, 21 were TSGs and four OCGs. Among the 21 TSGs, gene PTEN is another TSG gene with higher mutation frequency (11.27%), which might indicate that they might contribute to the cancer development together. In fact, several studies have demonstrated that that the PTEN and T53 genes jointly participate in the carcinogenesis o may malignancies [32]. Similarly, another example is the gene ARID1A that has an association with TP53 and had a higher mutation frequency (11.27%). One previous study has shown that one mutation in the gene associated with mismatch repair efficiency and normal p53 expression [33].