A machine learning classifier trained on cancer transcriptomes detects NF1 inactivation signal in glioblastoma

Background We have identified molecules that exhibit synthetic lethality in cells with loss of the neurofibromin 1 (NF1) tumor suppressor gene. However, recognizing tumors that have inactivation of the NF1 tumor suppressor function is challenging because the loss may occur via mechanisms that do not involve mutation of the genomic locus. Degradation of the NF1 protein, independent of NF1 mutation status, phenocopies inactivating mutations to drive tumors in human glioma cell lines. NF1 inactivation may alter the transcriptional landscape of a tumor and allow a machine learning classifier to detect which tumors will benefit from synthetic lethal molecules. Results We developed a strategy to predict tumors with low NF1 activity and hence tumors that may respond to treatments that target cells lacking NF1. Using RNAseq data from The Cancer Genome Atlas (TCGA), we trained an ensemble of 500 logistic regression classifiers that integrates mutation status with whole transcriptomes to predict NF1 inactivation in glioblastoma (GBM). On TCGA data, the classifier detected NF1 mutated tumors (test set area under the receiver operating characteristic curve (AUROC) mean = 0.77, 95% quantile = 0.53 – 0.95) over 50 random initializations. On RNA-Seq data transformed into the space of gene expression microarrays, this method produced a classifier with similar performance (test set AUROC mean = 0.77, 95% quantile = 0.53 – 0.96). We applied our ensemble classifier trained on the transformed TCGA data to a microarray validation set of 12 samples with matched RNA and NF1 protein-level measurements. The classifier’s NF1 score was associated with NF1 protein concentration in these samples. Conclusions We demonstrate that TCGA can be used to train accurate predictors of NF1 inactivation in GBM. The ensemble classifier performed well for samples with very high or very low NF1 protein concentrations but had mixed performance in samples with intermediate NF1 concentrations. Nevertheless, high-performing and validated predictors have the potential to be paired with targeted therapies and personalized medicine. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3519-7) contains supplementary material, which is available to authorized users.


Background
Genomic tools allow investigators to devise therapies targeting specific molecular abnormalities in tumors. One such alteration is the loss of neurofibromin 1 (NF1), an important tumor suppressor that regulates the activity of RAS GTPases [1,2]. Heterozygous mutation or deletion of NF1 causes neurofibromatosis type 1 (NF), one of the most frequently inherited genetic disorders [3]. NF patients often develop plexiform neurofibromas (PNs), benign nerve tumors for which the only therapy is surgery. However, resection is often impossible due to the tumor's intimate association with peripheral and cranial nerves [4]. PNs can transform to malignant peripheral nerve sheath tumors (MPNSTs), which are chemo-and radiation-resistant sarcomas with a dismal 20% 5-year survival [5]. In addition, patients with NF are susceptible to a broad spectrum of other tumors including low-grade/pilocytic astrocytomas, pheochromocytomas, optic nerve gliomas, and juvenile myelomonocytic leukemias [6]. Many aggressive non-NF associated (sporadic) tumors have recently been shown to harbor NF1 mutations, including glioblastoma (GBM), neuroblastoma, melanoma, thyroid, ovarian, breast, and lung cancers [7]. Therefore, somatic and inherited loss of NF1 function is emerging as a driver of tumors from different organ sites.
Several groups including our own have been working to develop therapeutic approaches to target tumors with loss of NF1. Previously, our lab developed a high throughput approach using yeast and mammalian screening platforms to identify tool compounds and drug targets for cancer cells in which NF1 loss drives tumor formation. Our pipeline identified small molecules that selectively kill or stop the growth of MPNST cells carrying a mutation in NF1 or yeast lacking the NF1 homolog IRA2 [8]. We also developed an assay in yeast to identify the targets of our lead tool compounds and found that one of these compounds (UC-1) shares a mechanism (phosphorylation of RNA Pol II CTD Ser2/5) with experimental drugs in clinical trials [8]. UC-1 impacts CTD phosphorylation, which is regulated by the CTD kinase Ctk1, the yeast homolog of human Cdk9. We showed that deletion of CTK1 was synthetic lethal with loss of the yeast NF1 homolog IRA2. Furthermore, we have found that inhibitors of this process (dinaciclib, SNS-032) can inhibit other types of RASdysregulated tumor cells [9].
However, relying on genetic data alone to identify tumors that may be susceptible to therapies targeting NF1 loss may leave a proportion of potentially actionable tumors unrecognized. NF1 tumor suppressor activity can be lost via mutation of the genomic locus, proteasomemediated degradation, inhibition by miRNA, de novo insertion of an Alu element, and C → U editing of the NF1 mRNA [10][11][12][13][14]. This complexity presents challenges when trying to identify tumors that will benefit from molecules that exert synthetic lethality with dysregulation of NF1/RAS pathways.
The Cancer Genome Atlas (TCGA) has released a large volume of data on several cancer tissues measured on a variety of genomic platforms. In the present study, we leverage TCGA GBM RNAseq expression data with matched mutation calls to construct a classifier capable of identifying an NF1 inactivation signature. This strategy sidesteps the problem of functional characterization of mutations by evaluating a regulator's downstream gene expression activity. We applied this signature to predict NF1 inactivation in a cohort of biobanked GBMs. In general, this approach can be translatable to any gene producing measurable downstream transcriptome-wide effects.

Methods
The Cancer Genome Atlas Data used for building the classifier We downloaded RNAseq and mutation data from TCGA Pan Cancer project from the UCSC Xena data portal [15] and subset each dataset to only the GBMs [16]. The data consists of 607 GBMs; of which 291 have mutation calls, 172 have RNAseq measurements, and 149 have both RNAseq and mutation calls. Of these 149 samples, 15 have inactivating NF1 mutations (10.1%) and were used as gold standard positives in building the classifier (Additional file 1: Table S1). Additionally, to reduce dimensionality while avoiding unexpressed and invariant genes, we subset to the top 8,000 most variably expressed genes by median absolute deviation. We z-scored all gene expression measurements. This resulted in the final input matrix with dimension 149 samples by 8,000 genes. For use in platform independent predictions, we used Training Distribution Matching (TDM) to transform the TCGA RNAseq data to match a microarray expression distribution [17].
Since we are also aware of the NF1 mutation status for each of the samples, we form a supervised learning taskpredicting when a sample has loss of NF1 activity. Our "X" matrix is formed by the RNAseq measurements for all 149 samples measured by 8,000 genes, which are the features in the model. Our "y" vector consist of {0, 1} elements where a 1 corresponds to a sample with an inactivating NF1 mutation and a 0 is an NF1 wildtype sample. The machine learning task is to find the feature weights, or gene coefficients, that best minimize our objective function. Along with these feature weights corresponding to the genes' importance in the learning task, we also output a probability estimate for each sample that they have loss of NF1 activity.

Hyperparameter optimization of the logistic regression classifier
Using the GBM RNAseq data, we trained logistic regression classifiers with an elastic net penalty using stochastic gradient descent to detect tumors with NF1 inactivation. We chose a penalized regression model because it is simple to train and has easily interpretable outputs including importance scores for each gene (feature weights) associated with the downstream consequences of NF1 loss of function and a probability for each sample that NF1 is lost. An elastic net logistic regression model has also been successfully implemented in similar studies [18][19][20].
We identified high-performing alpha and L1 mixing parameters using 5-fold cross validation ensuring balanced membership of NF1 mutations in each fold. Briefly, alpha controls how weight penalty and the L1 mixing parameter tunes the amount of test set regularization by controlling the sparsity of the features. An L1 mixing parameter value of zero corresponds to the L2 penalty and a value of one corresponds to the L1 penalty, with L1 bringing a sparser solution. We used python 3.5.1 and Sci-kit Learn for machine learning implementations [21].

Ensemble classifier construction and application to the validation set
After selecting optimal hyperparameters, we constructed 500 classifiers that would compose our ensemble model. Specifically, across 100 different random initializations, we subset the full TCGA GBM data into five folds and trained a single classifier for each training fold.
We borrowed terminology from the epidemiology field to describe data partitioning. We trained our models on a "training" partition and assessed model performance on a "test" partition, which refers to the held out crossvalidation fold. The independent "validation set" refers to the GBM dataset generated in a different lab (see Fig. 1a).
Because of the small number of gold standard positive training examples, we were concerned about the stability of our model solutions. Therefore, we constructed an ensemble classifier from the 500 models. Specifically, we assigned each classifier a weight using the specific randomization's "test set" cross-validation AUROC. Lastly, for the final NF1 inactivation prediction, we used the mean of the weighted predictions across all iterations as the NF1 inactivation prediction. We applied this ensemble classifier to the validation set in which NF1 protein levels were directly measured.

Effect sizes and power analysis
We calculated the decision function of each ensemble classifier applied to all samples in the training and testing 5-fold cross validation folds to calculate Cohen's D effect size between predicted NF1 wildtype and NF1 inactive samples [22]. The Cohen's D metric quantifies the difference between NF1 wildtype and NF1 inactive samples according to the mean classifier score and directly demonstrates how different the ensemble model predicts the two groups to be.
Moreover, we were also concerned that our relatively small validation set would not provide us with enough power to observe a detectable effect in the ensemble model's final prediction. We performed a one-tailed Welch's two-sample t-test comparing the NF1 protein concentration of our validation samples that were predicted to be either NF1 wildtype or NF1 deficient. Using the given sample size, Cohen's D effect size, and a significance threshold of α = 0.05, we calculated the power of the prediction scores on the validation set. The power analysis was two-sample, one-tailed and incorporated unequal sample sizes in each group.

Validation sample acquisition
Thirteen flash-frozen, de-identified GBM samples were obtained from the Maine Medical Center Biobank. Samples were received on dry ice and stored at −80°C until isolation of DNA/RNA/protein. To isolate DNA, tumor fragments of approximately 20 mg in mass were harvested on an aluminum block pre-chilled on dry ice. Samples were then immediately transferred to a mortar and pestle containing a small volume of liquid nitrogen. The fragments were pulverized in the mortar and pestle, and the liquid nitrogen was allowed to evaporate. Next, samples were immediately processed with a DNA/RNA/Protein Purification Plus kit (Norgen Biotek) following the standard operating protocol for animal tissue. DNA concentration and quality were assessed using an ND-1000 (Nanodrop), a Qubit Fluorometer (Thermo Scientific), and a Fragment Analyzer (Advanced Analytical Technologies). To isolate RNA, −80 C tumor fragments were placed in 5-10 volumes of RNAlater-ICE Frozen Tissue Transition Solution (Ambion) and placed at −20°C until RNA extraction with a mirVana miRNA isolation kit, without phenol, following the standard operating protocol (Thermo Scientific). Samples were homogenized using a manual homogenizer in the presence of mirVana lysis buffer. RNA concentration and quality were determined using a Qubit Fluorometer (Thermo Scientific) and a Fragment Analyzer (Advanced Analytical Technologies). To isolate protein, small tumor fragments were pulverized and lysed in approximately three volumes of ice-cold radioimmunoprecipitation assay (RIPA) buffer (150 mM sodium chloride, 1% v/v nonidet P40, 0.5% w/v sodium deoxycholate, 0.05% w/v sodium dodecyl sulfate, 50 mM Tris pH 8.0) containing 1 mM sodium orthovanadate, 1 mM sodium fluoride, 1 mM phenylmethylsulfonyl fluoride, and 1X protease inhibitor cocktail (0.1 μg/mL leupeptin, 100 μM benzamidine HCl, 1 μM aprotinin, 0.1 μg/mL soybean trypsin inhibitor, 0.1 μg/mL pepstatin, 0.1 μg/mL antipain). Samples were passed through a 25 5/8 g needle and subsequently sonicated on ice to promote efficient lysis and DNA shearing. After a 30 min incubation on ice, lysates were cleared by centrifuging at 16100 × g for 20 min. HEK293T, U87-MG, and U87-MG cells treated for two hours with one micromolar bortezomib (Selleckchem) and ten micromolar MG132 (Selleckchem) were also prepared in RIPA buffer. Protein samples were stored at −80°C until analysis.
Cell culture U87-MG and HEK293T cells were purchased from ATCC. Cell lines were regularly passaged and were cultured in Dulbecco's Modified Eagle Medium (Corning) with 10% v/v fetal bovine serum (Gibco) at 37°C in 5% CO 2 .
Recent data regarding the U87MG cell line published by Allen et al. suggest that the U87MG cell line distributed by ATCC is not from the same tumor as the cell line that was originally isolated in Uppsala. Transcriptome analysis comparing ATCC U87MG cell line to known tumor transcriptomes indicate that the ATCC U87MG cell line is a central nervous system tumor and is likely a glioblastoma cell line [23].
In the present study, we employ this cell line as a control representing an NF1-deficient tumor cell line. Previous studies have shown that the U87MG cell line has elevated proteasome-mediated degradation of NF1 and that this cell line required the loss of NF1 protein to promote tumorigenesis in xenograft tumor models [10]. Given that the ATCC U87MG cell line is a well-characterized and broadly-used model of NF1 deficient tumor cells [10,[24][25][26], we propose that the use of the ATCC U87MG cell line is an appropriate control for Fig. 2.

RNA microarray
After RNA isolation and QC, samples were labeled for the GeneChip Human Transcriptome Array 2.0 (HTA 2.0, Affymetrix). Labeling was performed with Affymetrix Proprietary DNA Label (biotin-linked) using a WT Plus Kit (Affymetrix) provided with the HTA 2.0, following the standard operating protocol for HTA 2.0, including PolyA controls. Hybridization, washing, and staining were performed with the WT Plus Kit, following the standard operating protocol for HTA 2.0. Washing and staining were performed using a GeneChip Fluidics 450. Scanning was performed with a GeneChip Scanner 3000. These data were deposited in the Gene Expression Omnibus under accession GSE85033.

Validation sample processing
We applied a quality control pipeline [27] to all CEL files generated by the HTA 2.0. All validation samples passed processing quality control, which included an inspection of spatial artifacts, MA plots, probe distributions, and sample comparison boxplots. We summarized transcript intensities using robust multi-array analysis (RMA) [28]. We determined batch normalization was unnecessary after a guided principal components analysis (gPCA) using sample processing date and array plate ID as potential batch effect confounders [29]. Lastly, we collapsed HTA2.0 transcripts into gene level measurements using the 'collapseRows()' function with the "maxmean" method from the R package WGCNA [30]. We used the pd.hta.2.0 platform design file (version 3.12.1) and the Bioconductor package "hta20sttranscriptcluster.db" (version 8.3.1) to map manufacturer transcript IDs to genes. We performed all preprocessing steps using R version 3.2.3.
The binding of the primary antibodies was detected by incubation with secondary antibodies goat anti-rabbit HRP 1:20000 or goat anti-mouse HRP 1:10000 (Jackson Immunoresearch Laboratories Inc.) at room temperature in 2% milk in TBST and detection of HRP activity using Pierce ECL Western Blotting substrate (Thermo Scientific), or in the case of NF1, SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Scientific). The chemiluminescent signal was captured with MED-B medical x-ray film (Med X Ray Company Inc.). Between primary antibodies, the membrane was stripped twice for 10 min at room temperature using a mild stripping buffer containing 1.5% w/v glycine, 0.1% w/v SDS, 1% v/ v Tween 20 at pH 2.2 (Abcam). One sample was eliminated due to low yield, and apparent degradation as determined by western blotting (all proteins examined were undetectable with the exception of tubulin, not shown). Densitometry was performed using Li-COR Image Studio Lite 5.0. Briefly, intensity measurements for NF1 and tubulin were taken using equally-sized regions for all bands. The background was subtracted using the local median intensity from the left and right borders (size = 2) of each measurement region. NF1 values were divided by tubulin intensity to adjust for protein loading. All measurement ratios were then normalized by dividing values by the "U87 + PI" measurement for each blot, respectively.

Reproducibility of computational analyses
We provide software with a permissive open source license to reproduce all computational analyses [31]. Ensuring a stable compute environment, we performed all analyses in a Docker image [32]. This image and source code can be used to freely confirm, modify, and build upon this work.

Identification and characterization of NF1 deficient glioblastoma tumor samples
We characterized NF1 protein concentrations as well as other molecules involved in RAS signaling in the 12 GBM samples (Fig. 2a). Two samples (CB2, 3HQ) had no apparent NF1 protein. Eight other samples had similar or less NF1 signal than the U87-MG NF1-low control (H5M, LNA, YXL, VVN, R7K, TRM, UNY, W31). Two samples (PBH, RIW) had equal or greater NF1 than the positive control, U87-MG + proteasome inhibitors (preventing NF1 degradation). We also observed variable EGFR content in these samples, with non-existent to low levels (3HQ, YXL, R7K), or medium to large EGFR signal (CB2, H5M, PBH, LNA, YXL, VVN, RIW, TRM, UNY, W31). All GBM samples had high concentrations of phospho-ERK1/2 signal relative to cell line controls. Samples with increased phospho-ERK1/2 may have greater Ras pathway activation. This can be attributed to multiple factors, including increased EGFR expression and/or NF1 inactivation.
We performed a one-tailed Welch's t-test to determine if NF1 protein concentrations were significantly higher in NF1 wildtype versus NF1 deficient samples based on our classifier predictions (Fig. 2d). We did not observe a significant difference across groups (t = −1.38, p = 0.098, effect size = 0.699). Additionally, while the effect size was fairly large, a power analysis indicated that 22 samples per group would be required to achieve a power = 0.8 at that effect size. With a lack of glioblastoma samples with quantified NF1 protein available, the trend of less protein present in samples scored as NF1 inactivated by the classifier nevertheless remains promising.
One of the samples predicted to be NF1 inactive contains detectable NF1 protein (R7K), suggesting that this sample may have NF1 inactivation not detectable by assaying protein, have a different alteration that phenocopies NF1 loss, or is incorrectly predicted by the classifier. Conversely, there are three samples predicted to be NF1 wildtype that have low or undetectable protein (YXL, VVN, W31), which either indicates unknown elements that confound the detection of some NF1 dysregulated tumors or a classification error.

Highly contributing genes
We observed several genes that consistently contributed to the ensemble classifier performance (Fig. 3). Since we applied several classifiers to the validation set as an ensemble, we took the sum of all classifier's gene weights across all 500 iterations to define these consistently contributing genes. While the data indicate that these genes have an impact on classifier performance, the data do not indicate whether changes in the expression of these genes are a direct consequence in changes in NF1 signaling. Expression of genes such as TXNIP, ARRDC4, ISPD, C10orf107, and DUSP18 appear to be predictive of intact NF1 signaling. Among the list of genes that appear to be expressed in tumors with loss of NF1 function are QPRT, ATF5, HUS1B, PEG10, HMGA2, RSL1D1, and NRG1. A full list of positive and negative weight genes that were two standard deviations beyond the gene weight distribution is provided in Additional file 6: Table S2.
We also performed over-representation analysis of the most influential genes in the classifier to identify gene ontology (GO) sets and pathways that may be predictive of NF1 status [33][34][35][36]. For high-weight genes predictive of intact NF1 signaling, we observed GO sets involved in plasma membrane-localized proteins (GO:0005886, GO:0071944, GO:0016324) and homeostasis (GO:0048871, GO:0001659, GO:0048873, GO:0031224), among others. Annotated pathways associated with genes from this dataset include hematopoietic stem cell differentiation, thyroid cancer, voltage-gated potassium channels, and RHO GTPase functional pathways.
For high-weight genes predictive of NF1 loss of function, we observed GO sets related to cellular adhesion (GO:0007155, GO:0098742), negative regulation of signaling (GO:0009968, GO:0023507, GO:0010648), and nervous system development (GO:0051962, GO0007416, GO: 0050808), among others. These genes were also enriched for elements of the phototransduction cascade and thyroxine production pathways.

Discussion
A machine learning classifier, based on gene expression data, can capture signal associated with the inactivation of a tumor suppressor. Our classifier is able to detect subtle downstream changes in gene expression as a result of the tumor responding to NF1 loss of function. This finding supports using mRNA as a summary measurement capable of capturing system-wide responses to molecular events beyond transcription factor alterations. Machine learning has been applied to gene expression in a variety of studies with various goals [37][38][39][40][41]. In a similar study, Guinney et al. trained a classifier to model RAS activity in colorectal cancer and demonstrated its clinical utility by predicting response to MEK inhibitors and anti-EGFR based treatments [18]. With a wealth of signal embedded in gene expression and a rapidly growing library of datasets, the performance of machine learning models is likely to rapidly improve. An increase in performance leads to more reliable clinical applications that would Fig. 3 Genes that contribute to the classifier performance. Genes are shown ranked by their weighted contribution to the ensemble classifier. Weights are scaled to unit norm. The top ten positive and top ten negative contributing high weight genes are given on the right potentially predict the effectiveness of pathwayspecific targeted therapies.
While our classifier was able to predict NF1 inactivation status to an extent, its performance is far from being clinically actionable. A major difficulty in developing a reliable classifier in this case is contamination in gold standard positives and negatives. While we aim to detect NF1 inactivation events, our gold standard positives can only include samples with known NF1 mutation status. Conversely, we expect that negative samples (about 90% of the data) are also contaminated with NF1 inactivated samples due to protein loss and other mechanisms. We cannot determine scenarios where NF1 is inactivated beyond mutation at scale in the TCGA data. Another challenge with the construction of classifiers from such data is overfitting. Even after hyperparameter optimization we observed substantial overfitting (Fig. 2), which has also been observed in competitions (see, for example, supplementary figure S2 of Noren et al. 2016 [42] in which the best performing algorithms also overfit). Finally with a small number of positive examples the model performance is unstable, which demonstrates high variability in gold standard samples used to train the model [43]. We employed ensemble classification to mitigate this issue as averaging over heterogeneous models would result in a relatively stable classifier (see Fig. 2b). In summary, our results are promising but these challenges are substantial and significant work remains to reach a robust classifier with clinical utility.
The performance of the classifier appears to be impacted by many cancer related genes. For example, genes such as TXNIP and ARRDC4, which are both indicative of lactic acidosis, correlate with better clinical outcomes, and contribute to predicting tumors with intact NF1 signaling [44]. We also observed transcripts that are more highly expressed in brain tissue than either other normal tissue (ISPD, C10orf107), or more highly expressed in normal brain tissue than glioma (EPHA5) [45][46][47]. DUSP18 contributes to the prediction of NF1 wildtype status and is a negative regulator of ERK phosphorylation, possibly by regulating SHP2 phosphorylation [48]. It is unclear whether the expression of these genes is a direct result of NF1 expression, the result of signaling downstream of NF1, or a consequence of other phenomena (such as expression of SPRED1, an NF1 binding partner that is essential for NF1 signaling). Future studies could elucidate the potential connections between NF1 and the genes identified as important for the performance of this classifier.
Over-representation analysis of these data highlighted changes in potassium channel expression. It was previously demonstrated that NF1 wild-type Schwann cells have altered K+ channel activity as compared to NF1 −/− Schwann cells suggesting that this may be one factor by which NF1 mutant and wild-type cells can be distinguished [49].
Regarding prediction of NF1 inactivated tumors, we observed several genes that have been linked to cancer such as QPRT, which is highly expressed in malignant pheochromocytomas as compared to benign; RSL1D1 (CSIG), which stabilizes c-myc in hepatocellular carcinoma; PPEF, which is highly expressed in astrocytic gliomas as compared to normal brain tissue [50][51][52]; and PEG10, a poor prognostic marker and regulator of proliferation, migration, and invasion in several tumor types [53][54][55]. We also observed ATF5, a gene for which expression in malignant glioma is correlated with poor survival [56]. Knockdown of ATF5 in GBM cells causes cell death in vitro and in vivo [57]. Analysis of genes that contribute to the prediction of NF1 inactivation yielded several GO terms related to neural development. It is well established that loss of NF1 can result in abnormal neural development and/or tumorigenesis [14,58,59]. We also observed genes associated with the mesodermal commitment pathway, components of which are linked to the epithelial to mesenchymal transition in human cancer cells [60][61][62]. Analysis of this pathway may be informative in identifying tumors with NF1 loss because mesenchymal GBMs are enriched for tumors with NF1 loss [63].
Our ensemble classifier was able to robustly detect the samples with the highest and lowest NF1 protein concentrations, but it struggled with samples of intermediate NF1 concentrations. This could be a result of an enrichment of mechanisms causing NF1 inactivation beyond protein abundance, an overrepresentation of mesenchymal tumors in NF1 inactivated samples contaminating dataset splits [63], poor classifier generalizability, or incomplete data transformation between RNAseq and microarray data. Because training and testing performance were similar between transformed and non-transformed data (see Fig. 1 and Additional file 4: Figure S3) we don't anticipate performance to be impacted much by platform differences or classifier generalizability. Nevertheless, we demonstrated the ability of system-wide gene expression measurements to capture downstream consequences of a complex biological mechanism that would otherwise require several different types of data acquisition to capture.

Conclusions
A machine learning classifier for transcriptomic data was able to detect signal associated with the inactivation of NF1, a tumor suppressor gene. The gene is an important regulator of the oncogene RAS and is inactivated frequently in GBM and in other tumors. The measurement of NF1 inactivity cannot be comprehensively captured by any single genomic characterization such as targeted sequencing or fluorescence in situ hybridization. This