Remodeling of central metabolism in invasive breast cancer compared to normal breast tissue – a GC-TOFMS based metabolomics study

Background Changes in energy metabolism of the cells are common to many kinds of tumors and are considered a hallmark of cancer. Gas chromatography followed by time-of-flight mass spectrometry (GC-TOFMS) is a well-suited technique to investigate the small molecules in the central metabolic pathways. However, the metabolic changes between invasive carcinoma and normal breast tissues were not investigated in a large cohort of breast cancer samples so far. Results A cohort of 271 breast cancer and 98 normal tissue samples was investigated using GC-TOFMS-based metabolomics. A total number of 468 metabolite peaks could be detected; out of these 368 (79%) were significantly changed between cancer and normal tissues (p<0.05 in training and validation set). Furthermore, 13 tumor and 7 normal tissue markers were identified that separated cancer from normal tissues with a sensitivity and a specificity of >80%. Two-metabolite classifiers, constructed as ratios of the tumor and normal tissues markers, separated cancer from normal tissues with high sensitivity and specificity. Specifically, the cytidine-5-monophosphate / pentadecanoic acid metabolic ratio was the most significant discriminator between cancer and normal tissues and allowed detection of cancer with a sensitivity of 94.8% and a specificity of 93.9%. Conclusions For the first time, a comprehensive metabolic map of breast cancer was constructed by GC-TOF analysis of a large cohort of breast cancer and normal tissues. Furthermore, our results demonstrate that spectrometry-based approaches have the potential to contribute to the analysis of biopsies or clinical tissue samples complementary to histopathology.


Background
In a recent update, Hanahan and Weinberg added reprogramming of energy metabolism to the list of hallmarks of cancer [1]. Historically, cancer was regarded as metabolic disease long before being decoded as disease of genes and mutations. More than 80 years ago, Otto Warburg reported an increased anaerobic glycolysis in cancer cells compared to normal cells [2]. Additional to the Warburg effect [3,4], cancer cells exhibit increased protein and nucleotide synthesis [5,6], increased fatty acid synthesis and changes in fatty acid metabolism [7][8][9]. Integrating data of different sources, models of the altered metabolism in cancer were developed, see for example [4,10,11]. In recent years, there has been a renewed interest in the altered metabolism of cancer cells coupled with progress in development of new metabolic drugs [12]. In order to obtain a comprehensive view on the metabolic changes between invasive carcinomas and normal breast tissues, herein a metabolomics study of breast cancer was carried out with a focus on the low molecular weight molecules of central metabolism. In a separate project, a breast cancer cohort was investigated by lipidomics with a focus on altered membrane lipid metabolism [13].
Analytical chemistry methods allow the investigation of the metabolic changes that occur in cancer tissues. Using gas chromatography (GC), liquid chromatography (LC) or capillary electrophoresis (CE) coupled with mass spectrometry (MS) hundreds of molecules in a tissue sample can be analyzed simultaneously. In two preceding studies using gas chromatography followed by time-of-flight mass spectrometry (GC-TOFMS) we successfully profiled ovarian and colon cancer [14,15]. This approach, allowed the monitoring of hundreds of small molecules with masses of up to 500 Da. Using spectral libraries like BinBase [16] many of these metabolite peaks can be mapped to metabolites with known chemical structures and functions.
Breast cancer is a public health issue of global relevance with more than one million new cases diagnosed annually and more than 400,000 death cases worldwide [17]. After surgery, adjuvant chemotherapy is offered to most breast cancer patients to reduce the risk of relapse. However, about 40% of the early breast cancer patients have a low risk of developing distant metastases and of dying because of the disease [18]. On the other hand, aggressive subtypes like triple-negative breast cancer have a poor prognosis and are difficult to treat [19]. New targeted therapies including small molecule inhibitors and therapeutic antibodies are currently under development and being tested in clinical trials [20], however many of the new approaches achieved only limited response rates. Therefore, as a step to personalized medicine, a better understanding of the functional pathway alterations in breast cancer is needed to avoid over-treatment and select patients for individualized and targeted therapies.
GC-TOFMS based metabolomics provides a wide coverage of the central part of the cellular metabolism including glycolysis, citrate cycle, amino acid and nucleotide metabolism. These pathways are altered in cancer cells and can be targeted by metabolic drugs. Herein, we report on the comparison of 275 invasive breast cancer samples with 94 normal tissue samples using GC-TOFMS. The purpose of this study is two-fold: (i) to analyze the metabolic changes in the central pathways between invasive carcinoma and normal breast tissues on a global scale and (ii) to identify key metabolic markers that separate cancer from normal tissues with high sensitivity and specificity.

Metabolomics analysis of breast cancer
The entire cohort of breast cancer and normal tissues was divided into a training set (TS, 226 samples, of which 184 were tumor samples) and a validation set (VS, 143 samples, of which 87 were tumor samples). Analysis of the GC-TOFMS spectra of the TS samples led to the detection of 468 most abundant metabolite peaks that were present in breast cancer tissues. Subsequently, these metabolites were measured in the VS. 162 of the detected metabolite peaks could confidently be mapped to known chemical structures and metabolite names.
Using unsupervised analysis methods we investigated the contribution of the malignancy of the tissues to the total variance of the dataset. The results of a principal components analysis (PCA) of TS and VS are shown in Figure 1. In the TS, the first two PCs captured 34.9% and 5.2% of the total variance of the metabolomics data. In the TS, the 1 st PC was significantly decreased in cancer compared to normal tissues (p = 2.2E-22). In the VS, the first two PCs captured 34.6% and 3.5% of the total variance. Further, the 1 st PC could be validated as being decreased in cancer compared to normal tissues (p = 8.4E-41). In summary, in TS and VS, the 1 st PC captured more than one-third of the total variance of the metabolomics data and correlated strongly with the malignancy of the studied tissue samples.

Metabolic changes between breast cancer and normal tissue
Metabolite-by-metabolite analysis of the training set (TS) led to detection of 427 significantly changed metabolites between cancer (T) and normal tissues (N). Out of these, 363 (85%) remained significant after Bonferroni correction and 368 (86%) could also be validated by analysis of the validation set (VS). Among the validated metabolites, 247 were decreased and 121 increased in tumors, corresponding to 53% and 26% of the entire set of metabolites detected by GC-TOFMS.
For functional analysis, PROFILE clustering [14] was used to order the metabolites according to their mutual proximity in the metabolic network. Figure 2 shows the fold changes of 129 metabolites that are present in the KEGG data base. By proceeding through the cluster map form left to right, the metabolic changes include upregulation of many amino acids, changes in TCA cycle, changes in glycerophospholipid metabolism, downregulation of the benzoic acid family, up-regulation of most of the nucleotides and their phosphates, downregulation of the sugar cluster including sucrose, fructose and glucose, and down-regulation of most of the free fatty acids.
A metabolic network was constructed using Cytoscape with MetScape plug-in (Figure 3). Two metabolites were connected by an edge, if they can be converted into each other by an enzymatic reaction. In breast cancer tissues, glucose and other sugars were decreased while intermediates of the glycolysis pathway such as glucose-6phosphate and 3-phospho-glycerate were increased. The equilibrium between pyruvate and lactate was shifted towards lactate. Many compounds of the TCA cycle were increased, with the exception of alpha-keto-glutarate that was decreased in cancer tissues.
Strong deregulation occurred in glutamate metabolism with a shift of equilibrium from alpha-ketoglutarate (fold change = -2.1, p = 2.4E-17) towards glutamate (fold change = 6.5, p = 1.6e-40). Nineteen proteinogenic amino acids (all except histidine) could be detected by the GC-TOFMS approach. Out of these 16 were strongly increased in the cancer tissues (all fold changes >1.9, all p-values <5.0e-10). Only one of the detected proteinogenic amino acids, asparagine, was decreased between cancer a normal tissues (fold change = -1.6, p = 1.5e-07), while glutamine and arginine remained unchanged.
Amphiphilic phospholipids are the building blocks of the cell membrane and are synthesized from choline and ethanolamine via the Kennedy pathway. Within this pathway, we have detected a shift of the equilibrium from ethanolamine (fold change = -2.0, p = 0.00041) to phosphoethanolamine (fold change = 16.3, p = 1.8e-44). Nucleotides, nucleosides and their phosphates were generally increased in the cancer tissues with the strongest regulations belonging to CMP (fold change = 10.3, p = 1.4e-57) and AMP (fold change = 7.8, p = 3.4e-49).

Metabolite based separation of cancer and normal tissues
In order to develop a metabolite-based molecular approach for the detection of breast cancer, we analyzed each metabolite for its classifying power. 50 metabolites, of which 20 have a known chemical structure, separated tumor (T) from normal breast tissues (NB) as well as tumor from normal adipose-rich tissues (NA) with sensitivity and specificity >80%. The fold changes of these marker metabolites are shown in Figure 4. We found 13 tumor markers (increased in T) and 7 normal tissue markers (increased in NB and NA).
In order to enhance the separation, each of the tumors markers was divided by each of the normal tissues markers yielding 13 × 7 = 91 metabolite ratios that served as classifiers. For each classifier, we defined a cut-off value for the classification in cancer and normal tissues by maximizing the sum of sensitivity and specificity. Then, we applied a voting system counting each of the classifiers above the cut-off +1, and each of the classifier below the cut-off -1. The votes for 91 classifiers were summed and Figure 3 Metabolic map of breast cancer showing alterations in energy metabolism, catabolism of amino acids, nucleotide metabolism and other pathways. Networks were generated using Cytoscape with the Metscape plug-in. Significantly increased metabolites in tumor compared to normal tissue (red nodes), significantly decreased metabolites (green nodes), unchanged metabolites (yellow nodes) and metabolites not in the GC-TOFMS screen (white nodes). The size of the nodes is proportional to the absolute value of the fold change.
the results are shown in Figure 5. A strong correlation was observed between the votes for different classifiers: For 217 tissue samples (58.8%) there was unisonous agreement among all classifiers, for 319 tissue samples (86.4%) more than 90% of the classifiers agreed on malignancy status.
Next, we compared the classification result obtained by the majority of the classifiers with the histopathological classification that is the gold standard for cancer detection. Only 8 out of 271 tumors and 6 out of 98 normal tissues were classified incorrectly, leading to a sensitivity of 97.0% and a specificity of 93.9% of the molecular test. Further, restricting the analysis to the 319 tissues where more than 90% of the classifiers agreed, only one tumor and four normal tissues were classified incorrectly, leading to a sensitivity of 99.6% and a specificity of 95.5%.
Among all classifiers the ratio of cytidine-5monophosphate / pentadecanoic acid had the highest significance for changing between cancer and normal tissues (T vs. N: p = 8.3E-74) and at the same time had the highest significance for changing between malignant and normal breast tissues (T vs. NB: p = 7.8E-14). Maximizing the sum of sensitivity and specificity lead to a cut-off value = 0.39. Using this cut-off point, only 14 out of 271 tumors and 6 out of 98 normal tissues were incorrectly classified leading to 94.8% sensitivity and 93.9% specificity. Histogram and ROC curve of the classifier are shown in Figure 6.

Nearest centroid classification
We investigated the performance of classifiers including 2, 3, . . ., 162 metabolites with known structure for the separation of cancer from normal tissues. To estimate sensitivity and specificity of the classifiers, we combined nearest centroid classification with a multiple random validation strategy, as in [21]. It turned out that on average a sensitivity of about 96% and a specificity of about 94% was obtained independent of the number of metabolites. The classification results did not improve when a larger number of metabolites was used for classification (Table 1 and data not shown).

Discussion
Our results show that tissues collected during breast surgery have reproducible metabolite profiles that can be analyzed by using GC-TOFMS. Using this approach, 478 metabolite peaks could be detected and quantified; 79% of these were changed between cancer and normal tissues. Thus, there is a difference in the concentrations of many metabolites between cancer and normal tissues including changes as high as tenfold and more for some metabolites. The high rate of validated metabolites in a predefined training-validation analysis, even after the conservative Bonferroni correction, shows that the GC-TOFMS is a robust approach to detect the metabolite changes associated with malignant progression. Data acquisition at different time points is considered as a main source of variance in many GC-MS studies. However, the high degree of reproducibility (86% of metabolic changes) between training and test data set underscores the robustness GC-TOFMS platform and of the detected metabolic changes in the current study.
A metabolic map of breast cancer was constructed by visualizing the metabolite changes in the metabolic network including pathways like glycolysis, TCA cycle, nucleotide metabolism and catabolism of amino acids. The metabolic map can serve as a tool for hypothesis building about the metabolic processes in breast cancer and help to develop strategies for the therapeutic targeting of metabolism in cancer cells. Several metabolic drugs, some of them targeting the central energy metabolism, are currently under development and investigated in clinical studies [12].
Cancer is commonly considered as a genetic disease that is driven by mutations of oncogenes and tumor suppressor genes. However, one of the major underlying purposes of those genetic and gene expression changes is to create a metabolic phenotype for cancer cells that is essential for tumor cell growth and survival [8,22]. The metabolic phenotype of cancer includes alterations in glycolysis, amino acid metabolism, nucleotide metabolism and glycerophospholipid metabolism that were confirmed in the current study. Glutamine, the most abundant amino acid in blood plasma, can be metabolized to pyruvate and lactate through glutamate, alpha-keto-glutarate and via the citric acid cycle. This process, termed glutaminolysis is an important source of energy, carbon and nitrogen in cancer cells [23]. In the present study, we have observed strong regulation of this pathway including a strong up-regulation of glutamate (fold change = 6.5) while glutamine was unchanged. This suggests that glutamine metabolism and in particular glutaminase, the enzyme that converts glutamine to glutamate, should be a potential target for intervention. In the 1980s three glutamine analogs raised great expectations as possible antineoplastic agents. But the promising results obtained in model  Histogram showing the distribution of the ratio R in cancer and normal tissues (x-axis: log2 scale). R was significantly higher in cancer than in normal tissues (p = 8.3e-74, Welch's t-test). A cut-off at R = 0.39 maximizes the sum of sensitivity and specificity. (D) ROC curve of R with confidence intervals for sensitivity (red lines) and specificity (green lines). The cut-off leads to at sensitivity of 94.8% and a specificity of 93.9% for the separation of cancer from normal tissues. systems could not be confirmed in clinical trials due to dose limiting side effects and ineffectiveness of treatment [24]. However, recently a small molecule inhibitor was shown to target GLS (kidney-type glutaminase) and to have antiproliferative activity in breast cancer cells while being unharmful to normal cells [25]. Furthermore, a connection between MYC (v-myc myelocytomatosis viral oncogene homolog), a master transcription factor and oncogene that is deregulated in many cancers and glutamine metabolism has been described [11]. MYC is known to be amplified in about 5% of breast cancers and associated with a more aggressive subtype and shortened survival [26].
In the GC-TOFMS metabolomics based approach we observed a two-to four-fold down-regulation of almost all detected free fatty acids ( Figure 2). The downregulation of fatty acids seems to contradict the upregulation of fatty acid synthase (FASN) and increased de novo fatty acid synthesis that is found in many cancers [5,9]. Interestingly, a lipidomics study of breast cancer showed an up-regulation of many membrane lipids in cancer compared to normal tissues [13]. Thus, the metabolomics described in this paper together with the previously reported lipidomics data support the hypothesis that de novo fatty acid synthesis is potentially increased in breast cancer, but free fatty acids are rapidly metabolized to synthesize membrane phospholipids.
The GC-TOFMS data also showed a shift of the equilibrium from ethanolamine that was decreased to phospho-ethanolamine that was highly increased in the cancer tissues possibly indicating a stimulation of the Kennedy pathway. To correlate these changes with the content of membrane lipids, we have extracted the total content of different kinds of membrane lipids from the UPLC-MS data published before [13]. However, there were neither pronounced correlations between phospho-ethanolamine and the total content of phosphatidylethanolamine (PE) nor between phosphoethanolamine and the total content of phosphatidylcholine (PC), see Figure 7A. The mechanism behind might be that, in tumors tissues, the Kennedy pathway is regulated in such a way, that a sufficient concentration of phospho-ethanolamine is always available. In fact, the same choline kinases that catalyse the reaction of choline to phospho-choline also catalyse the reaction of ethanolamine to phospho-ethanolamine. Choline kinases were detected to be up-regulated in tumors and represent potential targets for therapeutic intervention [27].
In order to develop a classifier that separates between cancer and normal tissues, we identified 13 increased metabolites and 7 decreased metabolites that separated cancer from normal tissues with sensitivity and specificity >80%. We did not detect any perfect biomarker in the sense that a single metabolite was abundant and specific to cancer tissue, but absent in normal tissues or vice versa. Therefore, we built ratios of metabolites in order to construct scale-invariant tissue classifiers that are independent of the amount of tissue under investigation and do not depend on data normalization. These two-metabolite classifiers separated cancer from normal tissues with high sensitivity and specificity and had a high agreement among each other. In an extensive additional analysis we investigated the performance of classifiers that include a larger number of metabolites using a multiple random validation strategy. It turned out that classifiers including more metabolites did not outperform classifiers including only two metabolites (Table 1). This result underscores the relevance of the two-metabolite ratio classifiers as a promising strategy for diagnostic classification.
The metabolite ratio cytidine-5-monophosphate/ pentadecanoic acid showed the strongest discrimination between cancer and normal tissues and permitted detection of cancer with a sensitivity of 94.8% and a specificity of 93.9%. Pentadecanoic acid (C15:0) is a known fatty acid present in milk and it has been found to be a good marker of milk fat intake when found in serum. However, a heatmap of the free fatty acids shows a high correlation of the free fatty acids C14:0, C15:0, C16:0, C17:0, C18:0, C20:0 in the METAcancer cohort (see Figure 7B, all Pearson correlations >0.82). Thus, in the classifier, it is possible to replace pentadecanoic acid by CMP : cytidine-5-monophosphate. The metabolite ratio method was developed in the current work. Nearest centroid classification was executed using the R package cancerclass [37]. For the latter method, classification rates including 95% confidence intervals were estimated using multiple random validation.
some other saturated fatty acids. Indeed, classifiers including ligoneric acid (24:0) or heptadecanoic acid (17:0) instead of pentadecanoic acid to a large extend coincide with the classifiers that include pentadecanoic acid.
For the detection of metabolite changes, we worked with a predefined training set (TS) and validation set (VS). All of the 20 most discriminatory metabolites are among the validated changed metabolites between cancer (T) and normal tissues (N). It should be noted, though, that normal breast tissue might consist of normal epithelium as well as various amounts of adipose tissue. Therefore, it was important to distinguish different kinds of normal tissues (NA and NB) and to develop a classifier that is independent of the fat content of the tissue. Thus, only metabolites that separated cancer from both, normal breast (NB) and normal adipose-rich tissues (NA) were included in the final classifier. This part of the analysis could only be performed in the pooled TS and VS due to a low number of normal breast samples. Another limitation of the current study is its retrospective character and the use of preselected samples that were enriched for cancer or normal cells. The cancer classifiers developed here need to be further validated using random samples from biopsying or surgery. In doing so, concordance with histopathology (accuracy) and reproducibility in repeated analyses of the same tissue (precision) of the classifiers should be further evaluated.
A critical factor determining whether a method is applicable for routine cancer diagnostics, for example during surgery, is the time needed for sample analysis. In order to cope with the inherent heterogeneity nature of the tissues, it is necessary to probe various areas of the tissue and still keep the total analysis time in the range of minutes. There is recent progress in automation and miniaturization of MS devices and such tools are expected to become cheaper and easier to handle during the next few years [28,29]. First hand-portable, automated GC-MS devices is commercially available today [30]. Additionally, an automated LC/CE-MS device in the size of a personal computer is being developed in the European ARROWS FP7 project [31]. Automation of sample acquisition, preparation and data generation will be important to translate MS based tissue investigations approaches into clinical applications.

Conclusions
For the first time, a metabolic map of breast cancer was constructed by GC-TOF analysis of a large cohort of breast cancer and normal tissues. In future, MS based devices could contribute to the analysis of biopsies and surgical samples. While metabolite analyses will not replace microscopic evaluation, metabolomics could be used complementary to histopathology, e.g. for an initial quality control of biopsies directly after sampling or as additional assessment method for biopsies or surgical samples.

Study cohort
The study cohort consisted of 369 fresh frozen breast tissues that were collected at the Charité Hospital in the tissue bank of the European FP7 METAcancer consortium. The project was approved by the institutional review board of the Charité Hospital (EA1/139/05). For histopathological quality control, an H&E section of each frozen tissue was evaluated by a pathologist. 271 samples contained cancer tissue and had a tumor area ≥40% (T), the remaining 98 samples contained exclusively normal tissue (N). The normal tissues included 27 samples with ≥40% normal breast tissue (NB) and 71 adipose-rich samples with <40% normal breast tissue (NA).
As a basis for a predefined training-validation strategy, the entire cohort was divided in a training set (TS) of 184 + 42 (T + N) samples and a validation set (VS) of 87 + 56 samples. In detail, tumor samples were split randomly in a way that training (2/3) and validation (1/3) sets did not differ with respect to the following parameters: tumor stage, tumor grade, age (≤50 and >50 years) and estrogen receptor status. To this end, 100,000 splits were randomly drawn and the one with the lowest Kullback-Leibler divergence was chosen. All 42 normal tissues in the TS were paired with tumor tissues in the TS from the same patients. The VS included 24 normal tissues that were paired with tumor tissues and 32 additional unpaired normal tissues.

GC-TOFMS based metabolomics
GC-TOFMS data acquisition of the 369 tissues was performed as previously published [16], following the guidelines of the metabolomics standard initiative [32]. Briefly, 20 mg frozen breast tissue samples were homogenized and extracted with 1 ml degassed isopropanol/acetonitrile/water (3/3/2) at 4°C for 5 min. The extracts were subsequently dried down and re-suspended in 50% aqueous acetonitrile to remove most of the complex lipids. After dry evaporation, extracts were derivatized and subjected to GC-TOFMS (Leco Pegasus IV) fitted with automatic liner exchange-cold injection (Gerstel). Raw data were deconvoluted using ChromaTOF (Leco) and mass spectra were exported for further data processing by the BinBase database, including identification of metabolites [16]. The TS and VS samples were analyzed as separate cohorts at two different time points (11/2008 and 1/2009). Data analysis was carried out using the statistical computing and graphics environment R [33]. For data normalization, each sample was divided by the sum over the measurements of the metabolites with known structure. Prior to analysis, data were transformed by log-2 scaling.

Principal components analysis
Using unsupervised multivariate data analysis methods, we investigated whether the malignancy of tissues was among the main contributors to the variance of metabolites. To this end, the measurements of each identified metabolite were zero centered and principal components analysis (PCA) was performed. The rotation matrix that connects the 162 metabolites with the principal components was determined from the TS data. By multiplication with the rotation matrix, the principal components were calculated for the TS and VS. Finally, the principal components were investigated for changes between cancer and normal tissue samples. The significance of changes was assessed using Welch's t-test.

Detection of metabolite changes
We performed a supervised analysis and investigated each metabolite for change between cancer and normal tissues. A metabolic change was termed as detected, if there was a significant change in the TS (p < 0.05, Welch's t-test, two-sided). A metabolic change was termed as validated, if there was a significant change in the VS in the same direction (p < 0.05, Welch's t-test, one-sided). Using PROFILE clustering, the detected metabolite changes were analyzed in the context of the underlying metabolic pathways. PROFILE clustering is a bioinformatic method that groups metabolites with respect to the topologic distance within the metabolic network [14]. The distances for clustering are calculated from the network of enzymatic reactions as it is available from the KEGG database [34].

Network reconstruction and visualization
A metabolic network was generated using Cytoscape [35] and MetScape, version 1.01 [36]. MetScape is a plug-in for Cytoscape that integrates reaction and pathway information from KEGG. The metabolites detected by GC-TOFMS were imported as the root for the network. MetScape automatically expanded the network with a depth of 2 reactions around each of the metabolites. Metabolites are mapped to nodes and reactions are mapped to edges connecting the nodes. The resulting network was manually curated by erasing nodes at the periphery of the network and by removing reactions that are not part of the human metabolism. Additional metabolites were added where the expansion depth was not sufficient. Information on the metabolic changes between breast cancer and normal tissues was imported as node attributes. The direction of regulation was visualized by the color of nodes and the fold change by the size of nodes.

Marker metabolites and cancer detection
Separation of tumors (T) from normal breast tissues (NB) and tumors from adipose-rich tissues (NA) were investigated using receiver operator characteristics (ROC) curves. Metabolites that exceeded 80% sensitivity and specificity for both comparisons were considered as classifying. Classifying metabolites were divided in tumor markers (increased in T) and normal tissues markers (increased in NB and NA). Classifiers were constructed by the ratio of each possible combination of a tumor and a normal tissue marker. The quality of each classifier was investigated in an ROC analysis. For each classifier, an optimal cut-off point was determined by maximizing the sum of sensitivity and specificity. A voting system was applied counting each of the classifiers above the cut-off +1, and each of the classifier below the cut-off -1. Finally, each of the tissues was predicted to belong to the class with the majority of votes.

Multiple random validation strategy
We investigated the performance of classifiers including 2, 3, . . ., 162 of the metabolites with known structure for the separation of cancer from normal tissues. For each of the metabolite numbers 200 random splits of the GC-TOFMS data were drawn. All training data sets included the same number of tumor and normal tissues (65 of each tissue type). Classification was performed by the nearest centroid method with respect to euclidean distance after feature selection based on Welch's t-test. All analyses were conducted using the R package cancerclass that is available from the open source project Bioconductor [37].