Integrated metabolic modelling reveals cell-type specific epigenetic control points of the macrophage metabolic network

Background The reconstruction of context-specific metabolic models from easily and reliably measurable features such as transcriptomics data will be increasingly important in research and medicine. Current reconstruction methods suffer from high computational effort and arbitrary threshold setting. Moreover, understanding the underlying epigenetic regulation might allow the identification of putative intervention points within metabolic networks. Genes under high regulatory load from multiple enhancers or super-enhancers are known key genes for disease and cell identity. However, their role in regulation of metabolism and their placement within the metabolic networks has not been studied. Methods Here we present FASTCORMICS, a fast and robust workflow for the creation of high-quality metabolic models from transcriptomics data. FASTCORMICS is devoid of arbitrary parameter settings and due to its low computational demand allows cross-validation assays. Applying FASTCORMICS, we have generated models for 63 primary human cell types from microarray data, revealing significant differences in their metabolic networks. Results To understand the cell type-specific regulation of the alternative metabolic pathways we built multiple models during differentiation of primary human monocytes to macrophages and performed ChIP-Seq experiments for histone H3 K27 acetylation (H3K27ac) to map the active enhancers in macrophages. Focusing on the metabolic genes under high regulatory load from multiple enhancers or super-enhancers, we found these genes to show the most cell type-restricted and abundant expression profiles within their respective pathways. Importantly, the high regulatory load genes are associated to reactions enriched for transport reactions and other pathway entry points, suggesting that they are critical regulatory control points for cell type-specific metabolism. Conclusions By integrating metabolic modelling and epigenomic analysis we have identified high regulatory load as a common feature of metabolic genes at pathway entry points such as transporters within the macrophage metabolic network. Analysis of these control points through further integration of metabolic and gene regulatory networks in various contexts could be beneficial in multiple fields from identification of disease intervention strategies to cellular reprogramming. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1984-4) contains supplementary material, which is available to authorized users.

reduces the reconstruction time of context-specific networks to the order of seconds [1]. In order to adapt FASTCORE for the integration of transcriptomics data from microarrays, we have developed a new workflow named FASTCORMICS (Additional Figure S1). As inputs FASTCORMICS requires microarray data and a GENRE of the organism of interest. Like FASTCORE, FASTCORMICS is devoid of arbitrary parameter settings and has a low computational demand with overall building times in the order of a few minutes. FASTCORMICS pre-processes microarray data with the discretization tool Barcode [2]. Barcode uses prior knowledge on the intensity distribution of each probe set for a given microarray platform to segregate between expressed and non-expressed genes. The preprocessing step with Barcode allows circumventing setting an arbitrary expression threshold to segregate between expressed and non-expressed genes, which is still commonly done [3][4][5]. As such a threshold is arbitrary and critical for the output metabolic models since due to this threshold complete branches, alternative pathways, or subsystems might be included or excluded, thereby significantly changing the functionalities of the model. Furthermore, Barcode shows a better correlation between predicted expression and protein expression than competing discretization methods and decreases batch and lab-effects that affect measurements [2].
To validate FASTCORMICS we performed an essentiality assay on two generic cancer models that are based on Recon 1 and Recon 2 (cancer1 and cancer2, respectively) and generated by the FASTCORMICS workflow using existing microarray expression data from 59 cancer cell lines [6,7]. The first model (cancer1) is composed of 810 reactions and is therefore bigger than the cancer model previously  Table S1) [5]. The second model (cancer2) is composed of 1322 reactions. All reconstructed models are available in SBML format (Additional File S6). The assays performed on cancer1 and cancer2 predict 183 and 78 genes essential for cell growth, respectively (Additional Table S1).
The predicted essential genes were compared to a list of 8000 genes ranked for essentiality by Luo et al. using a shRNA knockdown screen in several different cancer cell lines [8] to assess the predictive power of the FASTCORMICS models. In general, metabolic genes are slightly overrepresented in the top of the list as shown by Folger et al [5,8], suggesting that metabolic genes are more essential than nonmetabolic genes on average. As expected, the Recon 1 and Recon 2 models, even when further constrained by the medium composition (Additional Table S2, medium composition sheet), allowed identification of only a smaller set of essential genes and their distribution along the ranked list of essential genes was not significantly different from the distribution of all metabolic genes (Additional Table S1). Therefore, the predictive power of the reconstructed context-specific models is much better than either of the original GENREs. In contrast, the distribution of essential genes in the FASTCORMICS cancer models is different from the remaining metabolic genes and shifted towards the top of the ranked list as shown by a one-side KS-test (p-value=0.0314 for cancer1 and p-value=0.0502 for cancer2), demonstrating that FASTCORMICS predictions are much more coherent with the experimental data.
Moreover, comparison of the p-values to those obtained previously using the MBA algorithm (p-value=0.0284) [5,9] suggests that FASTCORMICS performs with similar accuracy but with significantly lower running time (Additional Table S1) Consistently, a permutation test showed that the likelihood of finding a gene set of the same size with a better KS-score by chance is low (p-value=0.0063 for cancer1 and p-value=0.0351 for cancer2). In order to benchmark our workflow we also built cancer models using GIMME [3], iMAT [4] and mCADRE [10]. For GIMME and iMAT, the implementation of the Cobra toolbox [11] was run using as thresholds respectively the 75 and the 25 percentile for high and low expressed genes. For mCADRE the data was first discretized using Barcode [12] and then the implementation provided in the supplementary files of [10] was run. We also compared our workflow to PRIME [13]. PRIME uses microarray data and respective growth rate information to adapt the bounds of the input generic reconstruction. Thus it does not extract a context-specific sub-network from a general reconstruction and thereby differs from FASTCORMICS and the others algorithms discussed in this paper. Building a generic cancer model using PRIME was not possible as there is no generic growth rate. Instead the 32 models built by [14], were used to perform KO assays. 112 genes were essential in at least 90% of the 32 models (in fact these 112 genes were essential in all models). Out of the 112 genes, 81 were found in the ranked list of essential genes by [8] and used for p-value calculation.
We also tested iMat [4], but the algorithm does not guarantee that the biomass function is included in the model and therefore the knockout experiment could not be performed here.
In general, (Additional Table S1), more compact models, i.e. mCADRE cancer model, MBA cancer model, and cancer1 generated with FASTCORMICS, tend to predict a higher number of essential genes, respectively 169, 178 and 183, compared to models with a larger number of reaction, i.e. the GIMME cancer model that includes twice as many reactions as cancer1 and only 69 predicted essential genes.
The aforementioned models also tend to perform better in the KO assay with the exception of mCADRE that identifies essential with a lower rank in the ranked essentiality list of Luo et al [8].
Context-specific models were built for the 59 cell lines integrating Recon1 and the cell line specific expression data with the FASTCORMICS workflow. The medium composition was used to constrain the inputs of the models (only input reactions for metabolites present in the medium were allowed to carry a flux). To obtain lactate secretion rates predictions in fmol/cell/h, the biomass coefficients were multiplied by 550 as described in [15]. Further, the bounds of the obtained models were multiplied with 1.5 to obtain a flux range consistent with the measured lactate rate. In order to guarantee lactate, glucose, oxygen and glutamine exchanges, the respective exchange reactions were added to the core set. To allow quantitative predictions for each context-specific models, the bounds of the inputs reactions of glucose and glutamine were fixed to match the experimental data. Additionally, the maximal uptake respectively production rate of alanine, serine, leucine, lysine, isoleucine, valine, arginine, threonine, tyrosine, phenylalanine, methionine, asparagine, choline, glycine, and tryptophan were constrained according to the experimental data. The uptake rates of cysteine, histidine, and myo-Inositol, which were not reported in the table, were set to zero. Random sampling was performed while optimizing for biomass production.
A solution could not be found for 7 cancer models, with these settings. For the other models a R2 value of 0.7 was obtained, indicating a good correlation of context specific predicted and measured lactate secretion rates.
As a second quality control step, a hypergeometric test showed that the neoplasiaassociated genes retrieved from the DisGeNet database [16] are over-represented in the essential genes of both FASTCORMICS models (Additional Table S3). This indicates that FASTCORMICS can help to identify medically relevant genes. Further, among essential genes predicted in cancer1 and cancer 2 130 (71%) and 46 (59%) were known to be associated to cancer, respectively (DisGeNET [16], CCGD database [17]) or to be already predicted as essential by the generic model from which they were extracted.
Taken together, FASTCORMICS outperforms competing algorithms in speed and therefore allows generating robust high-quality models in a high-throughput manner.
This will enable the use of metabolic modelling as a routine process for the analysis of large microarray data sets across different cell types and contexts.  Table S1: Essentiality testing of different cancer models. Comparison of the number of essential genes found by an in silico essentiality assay to a ranked gene list established by Luo et al. based on the effect of shRNA knock-downs on the proliferation of cancer cells [8].

Confidence levels of the reactions of the macrophage model
In Folger et al. [5] a gene is considered as essential if its knock-down results in a decrease of the growth rate of at least 1%. To allow for a comparison of the different methods the 1% criteria was applied here as well. *The number of essential genes was taken from Additional Table 3 Cancer Cytostatic Genes column KO Growth Rate (relative to WT) of [8].

Output model
Generic model

Additional Figures
Additional Figure S1. FASTCORMICS workflow. Microarray data are discretized with Barcode in expressed (z-score > 5) and unexpressed genes (z-score < 0) that are mapped to the input model according to the Gene-Protein-Reactions rules. The can optionally be run to assign a confidence score to each reaction of the contextspecific output model.
Additional Figure S2: Correlation plot of the predicted lactate secretion rates by context-specific cancer cell models and the lactate secretion rates measured by [19].

Context-speci c m od el
Reaction con dence score composition (optional)