 Research
 Open Access
 Published:
Deep in the Bowel: Highly Interpretable Neural EncoderDecoder Networks Predict Gut Metabolites from Gut Microbiome
BMC Genomics volume 21, Article number: 256 (2020)
Abstract
Background
Technological advances in nextgeneration sequencing (NGS) and chromatographic assays [e.g., liquid chromatography mass spectrometry (LCMS)] have made it possible to identify thousands of microbe and metabolite species, and to measure their relative abundance. In this paper, we propose a sparse neural encoderdecoder network to predict metabolite abundances from microbe abundances.
Results
Using paired data from a cohort of inflammatory bowel disease (IBD) patients, we show that our neural encoderdecoder model outperforms linear univariate and multivariate methods in terms of accuracy, sparsity, and stability. Importantly, we show that our neural encoderdecoder model is not simply a black box designed to maximize predictive accuracy. Rather, the network’s hidden layer (i.e., the latent space, comprised only of sparsely weighted microbe counts) actually captures key microbemetabolite relationships that are themselves clinically meaningful. Although this hidden layer is learned without any knowledge of the patient’s diagnosis, we show that the learned latent features are structured in a way that predicts IBD and treatment status with high accuracy.
Conclusions
By imposing a nonnegative weights constraint, the network becomes a directed graph where each downstream node is interpretable as the additive combination of the upstream nodes. Here, the middle layer comprises distinct microbemetabolite axes that relate key microbial biomarkers with metabolite biomarkers. By preprocessing the microbiome and metabolome data using compositional data analysis methods, we ensure that our proposed multiomics workflow will generalize to any pair of omics data. To the best of our knowledge, this work is the first application of neural encoderdecoders for the interpretable integration of multiomics biological data.
Background
The human gut is a complex ecosystem in which host cells and foreign organisms coexist, cooperate, and compete. Suspended in this ecosystem is a milieu of nutrient metabolites that act like a currency, being exchanged and converted by the organisms living in the environment. Technological advances in nextgeneration sequencing (NGS) and chromatographic assays [e.g., liquid chromatography mass spectrometry (LCMS)] have made it possible to identify thousands of microbe and metabolite species, and to measure their relative abundance. By applying NGS and LCMS on fecal samples, one gains two complementary “views” on the complex ecosystem in which gut bacteria produce, consume, and induce the metabolic milieu. These data modalities have each advanced our understanding of elusive gut pathologies like inflammatory bowel disease [1], and are increasingly being collected in parallel [2–4].
Once rarely encountered, inflammatory bowel disease (IBD) has become a major health burden in developed countries, with its incidence steadily rising since the second world war [5]. IBD is an umbrella term for two distinct clinical syndromes, Crohn’s disease (CD) and ulcerative colitis (UC), that are both characterised by a chronic immunological disturbance in the gastrointestinal (GI) tract, caused by genetic and environmental factors [6, 7]. While CD presents with patchy transmural (deep) inflammation in any part of the GI tract, UC is marked by diffuse mucosal (superficial) inflammation that extends from the rectum through the colon [8]. Although the inflammation in IBD does not have an infectious origin, patients with CD and UC exhibit an irregular gut microbiome, having less bacterial diversity, a depletion of healthy bacteria, and an excess of unhealthy bacteria [5, 9, 10]. These changes have been partly attributed to an abnormal immune response to benign commensual organisms [5]. Microbiome irregularity, called dysbiosis, also associates with concurrent changes in microbial functions [11] and in metabolic profiles [12], that together can disrupt normal gut physiology. For example, Marchesi et al. found low levels of short chain fatty acids in IBD fecal samples [13], which may be related to changes in how the gut bacteria metabolize carbohydrates [1].
Franzosa et al. studied the paired microbial and metabolic profiles of 164 IBD patients and 56 healthy controls, producing one of the largest publicly available multiomics data set of its kind [14]. In their multiomics analysis, the authors report that only 6% of all possible pairwise associations were statistically significant, concluding that metabolites “tend not to associate mechanistically” with the microbiome [14]. However, multiomics data integration can be approached in several ways, ranging from simple to complex. We organize these approaches into four tiers. The first, and simplest, uses iterative univariateunivariate regressions, e.g., measuring the Pearson’s correlation between a single bacteria and a single metabolite (as done by Franzosa et al. [14]). This straightforward method is implemented in several microbiomespecific software tools [15]. Although pairwise associations can be easy to interpret, they lack the ability to model the additive effect of bacteria (or metabolite) cooccurence. The second uses iterative univariatemultivariate regressions, e.g., measuring a single bacteria as a function of all metabolites (and vice versa). This method is still easy to interpret, and has been applied to infer gene expression from DNA mutations [16], as well as metabolite variables from the microbiome [2]. The third uses a single multivariatemultivariate regression, such as a canonical correlation (CanCor) analysis. CanCor is a powerful tool that can find a combination of microbes that maximally correlate with a combination of metabolites. This technique has been applied previously to study the relationship between volatile breath metabolites and gut microbiome in IBD patients [17], but its widespread application is limited by highdimensionality [1] (at least without regularization [18]).
Stepping further toward deeper modeling, we explore the fourth tier which leverages a multilayer neural network to model a single multivariatemultivariate regression. The hidden layers of these networks act like a switchboard to connect the input layer with the output layer through a set of intermediate nodes that can learn complex (nonlinear) patterns between the layers. In biology, deep neural networks have been used in many applications, including predicting the expression of 20,000 genes from 1,000 hallmark genes [19]. While useful, the hidden layers of these networks do not have a natural meaning that connects the model to real biological processes. To address this limitation, many neural networks follow the encoderdecoder pattern in which the network has an hourglass shape, featuring a narrow middle layer that compresses the inputoutput relationship. This layer is regularized to be low dimensional so that it only has enough information space to describe the inputoutput transformation. As such, all other information is filtered out. This layer divides the network into two specialized parts: the encoder and the decoder. One could think of a generic encoderdecoder network as a neural “signal translator”, designed to turn one data set X into another data set Y, through a middle representation Z.
Encoderdecoder networks have been studied in computer vision in the form of fully convolutional models [20], where an image is encoded into a compact representation that is then decoded into the desired feature map. In biomedical imaging, Unet [21] has succeeded in segmenting cell images with the encoder and decoder forming the two interacting shafts of a U shape. In biology, autoencoders – a special type of encoderdecoder architecture where the input and output are identical – have been used to cluster yeast, Pseudomonas, and cancer cells, where the hidden layer supposedly provides a biologically meaningful abstraction of the data [22]. However, generic encoderdecoders that transform one data domain to another have not apparently been used for multiomics data integration. In this generalized form, encoderdecoders can act like a CanCor, predicting multiple outputs from multiple inputs through a latent space. Unlike CanCor, encoderdecoders learn deep nonlinear relationships between the features.
Using the encoderdecoder architecture, we seek to find a good model that can provide simple, descriptive, and verifiable patterns within the multiomics data. In this manuscript, we introduce our highly interpretable neural encoderdecoder, designed to learn the nonlinear and synergistic relationship between the gut microbiome and their surrounding metabolites. In doing so, we demonstrate that (a) neural networks outperform linear models in microbiomemetabolome predictions, and that (b) network sparsification, along with a nonnegative weights constraint, improves the accuracy, stability, and interpretability of the encoderdecoder model. Importantly, we show that our neural encoderdecoder model is not simply a black box designed to maximize predictive accuracy. Rather, the network’s hidden layer (i.e., the latent space, comprised only of sparsely weighted microbe counts) actually captures key microbemetabolite relationships that are themselves clinically meaningful. Although this hidden layer is learned without any knowledge of the patient’s diagnosis, we show that the learned latent features are structured in a way that predicts IBD and treatment status with high accuracy. Taken together, our work demonstrates that paired multiomics data can be integrated using a neural network whose hidden layer abstracts a clinically meaningful representation of the input data without any supervision. By preprocessing the data using standard compositional methods, we ensure that our encoderdecoder workflow will generalize to any pair of omics data.
Methods
Data acquisition and processing
We acquired paired microbiome and metabolome data as raw proportions from the supplement of Franzosa et al. [14]. To reduce the dimensionality of the data, we removed features in which more than 50% of the measurements were zero. We replaced the remaining zeros with a very small number using the cmultRepl zeroreplacement function implemented in zCompositions, an imputation tool which explicitly models the relative nature of NGS and LCMS data [23]. Next, we processed the data using one two of pipelines: “Complete” or “Summarized”.
In the “Complete” pipeline, we applied the centered logratio (clr) transformation directly to the bacteria specieslevel and metabolite clusterlevel abundances. The clr is a cornerstone in the analysis of compositional data [24–27]:
where x_{i} is a sample vector of bacteria or metabolite abundances. Beyond transforming the data into real numbers, the clr is also convenient for machine learning applications because the “normalization factor” is applied to each sample independent of all other samples, thus preserving test set independence.
In the “Summarized” pipeline, we aggregated the bacteria specieslevel abundance into genuslevel abundance by summing across the respective genus members. We also aggregated the metabolite clusterlevel abundance into classlevel abundance by summing across the respective class members. We retrieved the speciestogenus conversion table from the Integrated Taxonomic Information System (ITIS) (via the R package taxize [28]), and the metabolitetoclass conversion table from the supplement of Franzosa et al. [14]. Features that did not belong to any genus or class were dropped. Table 1 describes the dimensionality of the data before and after processing.
Our motivation: predicting metabolites from microbes
The nature of the relationship between the two data modalities can be explicitly represented if we can find a way to predict one from the other. The ideal model would not only identify correlations between the pairs of data, but would also reveal the mechanism through which they influence each other. These complex processes can be considered from the pointofview of regression models that can predict metabolite abundance from bacteria abundance.
We formulate the prediction problem as a search for the parametric function f, with parameter set θ, that takes as input the (clrtransformed) microbe abundances X to estimate the (clrtransformed) predicted metabolite abundances \(\hat {Y}\) of the real (clrtransformed) observations Y:
where \(X\in \mathbb {R^{\mathrm {D_{x}\times N}}}\)contains D_{x} microbes and \(Y,\hat {Y}\in \mathbb {R^{\mathrm {D_{y}\times N}}}\) contains D_{y} metabolites for the same N subjects.
The parameter set θ is estimated by minimizing the error between the predicted and real observations, e.g., using the mean square error (MSE):
The choice of the function fand the optimization strategy used to solve Equation 2 is the key to having a predictive model with better accuracy, stability, and interpretability. In the next section, we will propose a deep neural network that is specifically designed for this goal. First, let us consider the most direct and straightforward baseline to model the functional f: a linear regression (LR) model, where the abundance of each metabolite is predicted as a linear combination of all available microbes:
where \(W\in \mathbb {R^{\mathrm {D_{y}\times D_{x}}}}\) is the linear transformation matrix and b is the bias term. The problem in Equation 2 now has the form:
While being the simplest model, LR suffers from operating on a full transformation matrix. The large number of parameters (i.e., weights) not only makes the LR prone to overfit when the data set is small and highdimensional, but also makes the model difficult to interpret. To reduce the density of the linear transformation matrix, Lasso regularization constraints can be added to reduce the number of active (i.e., nonzero) weights in the transformation matrix W [29]. With this regularization, the objective function turns into
where the hyperparameter α controls the sparsity of the model and ∥W∥_{1} is l_{1}norm of the linear weights. An example Lasso model is illustrated in Fig 1b.
Our model: a sparse neural encoderDecoder for data integration
Although simple and easy to interpret, a univariate multiomics model depends on the major assumption that the processes in which microbes affect metabolites are singular and independent from each other. On the other hand, neural networks can learn many subprocesses of a global process that governs the dynamic, multistage interaction between the two modalities.
The traditional method for learning multivariatemultivariate relationships is canonical correlation (CanCor) analysis, though like LR it can only find linear relationships between the data modalities. Instead, we propose to construct a robust and interpretable deep neural network. Our model is designed to relax the key assumption behind LR and CCA: that there exists a direct and linear relationship between the two data modalities. This relaxation will extend our representation of the predictive model via two hypotheses:

1.
There exists intermediate factors that act in the middle of the process that transforms microbes to metabolites.

2.
The transformation between these factors may contain nonlinear parts.
The neural encoderDecoder (NED) network
The neural encoderdecoder (NED) architecture aims to predict a multivariate random process using the information from another multivariate process through an intermediate representation called the latent feature space. The part of the network that extracts relevant information from the input (i.e., microbes) into the latent space is called the encoder. The part of the network that predicts the output (i.e., metabolites) from the latent space is called the decoder. The latent feature space is realized as the narrowhidden layer lying between the encoder and decoder. Compared with a direct linear model like linear regression, the encoderdecoder network should have a more robust representation because the weights undergo nonlinear activation. As such, we expect the NED to perform better.
To maintain the interpretability of the model, and to make it more robust to small amounts of training data, we restrict the number of hidden layers to one. Our initial experiments show that a large number of hidden layers does not improve the predictive performance of the NED because the model is easily overfitted on the limited training data.
In operation, the two parts of the model work together sequentially. First, the encoder extracts the relevant information from the microbes X to store in the latent variables Z by an encoding function consists of a linear kernel W_{e} followed by a fixed nonlinear activation function σ_{e}:
Second, the decoderdecrypts the latent content in Z to predict the value of Yas \(\hat {Y}\)using a similar decoding function:
The final predictive model is the composition of the encoder and decoder:
This model is trained using the loss function:
with the optimized parameter set θ:
Figure 1 compares the computation network of NEDs with their direct linear counterparts. Let \(\phantom {\dot {i}\!}\{X_{i}\}_{i=1,2,...,D_{x}}\) denote the D_{x} components of X, and let \(\phantom {\dot {i}\!}\{Y_{j}\}_{j=1,2,...,D_{y}}\) denote the D_{y} components of Y. The number of latent variables D_{z} is a metaparameter that can be chosen by heuristics (though we set D_{z}=70 a priori). To increase robustness and interpretability, models with fewer connections are preferred. An LR model has C_{LR}=D_{x}.D_{y} connections, while an NED has C_{NED}=D_{x}.D_{z}+D_{z}.D_{y} connections. Thus, we see that when D_{z}<<D_{x},D_{y}, then C_{NED}<C_{LR}. This makes NED less likely to overfit and easier to interpret. We can further reduce the density of NED networks using a sparsity procedure that eliminates redundant connections between the layers.
Sparsifying the nED network
To further improve the stability and interpretability of the model, we seek to learn an NED model with the fewest number of active weights: a sparser neural network where most weights equal zero. Research into sparser neural networks have achieved major advancements recently. This line of work is motivated by the intuition that deep networks are generally overcomplete, and sparse networks could reduce computational overhead [30]. Furthermore, sparser deep networks are also known to ease the explanability of the underlying process [31, 32]. The major approaches to sparser networks work by either pruning unnecessary weights [33] or enforcing sparsity constraints as an additional regulatory loss [34]. To sparsify NED into a new model, called SparseNED, we use a pretraining screening method that is similar to the oneshot pruning strategy first proposed by Lee et al. in [35].
The learning of SparseNED consists of two stages: screening and training. In the screening stage, we identify the connections that are most useful in extracting the information needed to predict metabolite abundance from microbe abundance. All other links are marked as unnecessary. In the training stage, the network is trained with these redundant links deactivated from the forward and backward operations.
The screening stage starts with one training iteration on the fully connected model, where the derivative g(w;D) of the loss L (from Equation 3) is estimated through backpropagation on a sample set of data D. These derivatives are the key to evaluating the importance of a connection because when the derivative at a connection has a high magnitude, that connection will have a measurable effect on the loss, and hence be more salient to the prediction of metabolite abundance. Slightly different from [35] – where only a minibatch is used as D for sensitivity calculation – we use all of the available training data instead. The saliency of each connection s_{c} is calculated as the normalized magnitude of the derivatives:
Next, the connections in the network are sorted in descending order of s_{c}, and only topk connections are kept for network training and inference. The number of kept connections k is controlled through the sparsity level β relative to total number of connections in the fully connected network.
The hyperparameter β is to balance the accuracy and sparsity of the model. In our experiment, we choose a β so that the number of active (i.e., nonzero) connections is on par with other sparse models in the comparison. Using the remaining connections, the SparseNED is trained via backpropagation. An example SparseNED is illustrated in Fig 1d.
Interestingly, a hidden node could lose all incoming connections, causing it to become isolated. For example, this could happen when there were more hidden nodes than needed to represent all of the latent information. Although we did not expect this for our experiment, node isolation would be favorable because it would mean that the multimodal relationship discovered by the model is of a lower rank (i.e., represented by fewer variables in total); a lower rank makes for a more interpretable model. Node isolation could also make the SparseNED model more resilient to changes in the size of the latent space: if a practitioner specifies too many hidden nodes, node isolation could cause the excess nodes to automatically deactivate during training, leaving only the necessary nodes.
The nonNegative weights constraint
With a small number of active connections, SparseNED is significantly easier to understand than a fully connected NED. However, among the remaining connections, many of them spontaneously have negative weights. These negative connections in neural networks have been understood to inhibit an analysis of how the factors from one layer affect another [36]. To improve the interpretability of a network, nonnegative constraints can be used to prevent the weights from falling below zero [37]. In our application, the nonnegative weights provide a clearer meaning to how microbe abundances contribute to metabolite abundances, both in encoder and decoder, because the contribution is always positive.
To implement this insight, we apply the nonnegative constraint on SparseNED model by clamping the intermediate estimation of the parameters to [0,∞) at every training iteration:
where \(\theta _{i}^{t}\) is the ith parameter of the model at iteration t in training.
In our experiment, we observe that nonnegative constraints not only promote interpretability, but also have a beneficial effect on the overall fitness and sparsity of the model. When weights are forced to be positive or zero, the nodes in each layer tend to compete with each other for influence. This actually makes the SparseNED even more sparse.
Model evaluation
With the goal of deeply understanding the bacteriametabolite relationship, we want a model that is not only accurate, but also highly interpretable, and whose interpretation is stable across different folds of the data. In this section, we discuss the three criteria used to evaluate our predictive models: accuracy, sparsity, and stability.
Accuracy
The accuracy of a predictive model f_{θ} is calculated by the Pearson correlation coefficient between the predicted metabolite abundance \(\hat {Y}=f_{\theta }(X)\) and the measured abundance Y for the top 10 best predicted metabolites. Compared to intensitybased criteria, the correlation coefficient is not influenced by the scales used in the signal and is reliable across different data normalization methods. To mitigate the influence of spurious correlation, correlations are always calculated using the clrtransformed data. For all experiments, we use a fivefold crossvalidation scheme that validates the model on five different 80%20% trainingtest set splits, and report the average accuracy.
Sparsity
The sparsity of f_{θ} is measured by the number of active linear connections \(C_{f_{\theta }}\) in the model. In the case of NED and its variations, it includes the weights from both the encoder and decoder network sections.
Stability
While accuracy and sparsity are desired properties of a predictive model, their performance and interpretation are only reliable when they are consistent across changes in the training set composition. For each model family, we evaluate the stability of predictive models by measuring the average pairwise similarity [38] of the model parameters θ across different training sets. Specifically, we divide the data set into 5 folds and learn an instance of the model for each fold. Then, for each pair of model instances, we calculate the Pearson’s correlation coefficient between the model parameters. The main measurement for the stability of a method, the stability index, is calculated as the average similarity between all pairs of model instances within the model family.
To concentrate on the stability of the model architecture, we calculate the stability index using the binary adjacency matrix of the connections between each layer in the model. For fully connected models (e.g., a linear regression or CanCor) – where every factor in a layer affects every factor in the subsequent layer – the binary adjacency matrix contains all ones. Therefore, the stability index always equals one and is not worth mentioning. For sparse methods (e.g., Lasso and NED), the stability index measures the consistency and reliability of the crosslayer connections. For multilayer models like NED and its variants, we use the first model instance to initialize the training of the other instances so that we preserve the correspondence of the midlayer variables (e.g., the latent variable “V5” in fold 1 is the same as the latent variable “V5” in fold 2).
Interpretation of network layers
The neural network contains three layers: the microbe input layer, the hidden (i.e., latent) layer, and the metabolite output layer. In order to understand the nature of the latent space, we performed a separate analysis of each layer and compared their results. All microbe and metabolite analyses were performed on the clrtransformed data, while the latent space analyses were performed on the unaltered (i.e., tanhcompressed) data.
Differential abundance (DA) analysis
We performed an analysis of variance (ANOVA) of the microbe, metabolite, and latent space feature sets (separately) for the three experimental groups [ulcerative colitis (UC), Crohn’s disease (CD), and healthy control (HC)]. We consider any feature with an FDRadjusted pvalue p<0.05 to be significant. Note that the prior clrtransformation makes univariate statistical testing of relative data valid, so long as the results get interpreted with regard to the reference used [26, 27].
Redundancy analysis (RDA)
We performed a redundancy analysis (RDA) of the microbe, metabolite, and latent space feature sets (separately) using the rda function from the vegan R package [39].
Random forests
We used the microbe, metabolite, and latent space features to train a random forest classifier to predict several twofactor outcomes (see Results and Discussion). Random forest models were trained using the randomForest function from the randomForest R package [40] without any feature selection or hyperparameter turning. For each feature space, and for each outcome, we compared the average “outofthebox” AUC across 25 randomly subsampled test set splits using the plMonteCarlo function from the exprso R package [41].
Results and discussion
Summarization preserves data structure
The data sets produced by highthroughput molecular assays like NGS and LCMS often have many more features than samples. In statistics, highdimensionality is a problem because the likelihood of a false discovery increases with each additional test. In machine learning, highdimensionality increases the likelihood of an overfit and also makes the resultant model more difficult to interpret. Therefore, it is sensible to reduce the number of features before training a model. After removing zeroladen features, we used domainknowledge to aggregate the remaining features into biologically meaningful groups. For metabolites, we used functional classes; for bacteria, we used assigned genus. Table 1 describes the dimensionality of the data before and after processing.
Figure 2 shows the first two principal components for the metabolite data (top) and microbiome data (bottom), as processed using the “Complete” (left) and “Summarized” (right) pipelines. Here, we see that using domainknowledge to summarize the metabolite clusters into classes, and the bacteria species into genera, does not appear to alter the fundamental structure of the data. To quantify the agreement between the two pipelines, we calculated the Pearson’s correlation between the “Complete” intersample distances and the “Summarized” intersample distances. We find that the “Summarized” intersample distances agree with the “Complete” distances for both metabolites (ρ=.903) and microbes (ρ=.674). The greater incoherence observed for the microbe data is consistent with the understanding that different bacteria species within the same genus can occupy distinct ecological niches.
Otherwise, we note that the IBD patients cluster more distinctly according to their gut metabolites than their gut bacteria. Indeed, a differential abundance analysis of the metabolic data reveals 128 (out of 143) significant metabolite classes, compared with 15 (out of 51) significant bacterial genera (see Supplement for a complete list). Table 2 shows the average clrtransformed abundances for select bacteria genera, chosen because they were found previously to associate with IBD. Two genera, Ruminococcus and Fusobacterium, show an association that agrees with past literature [42].
Microbe abundance predicts metabolite abundance
Since metabolites and bacteria both associate with IBD, it is meaningful to explore their mutual dependence. In the Introduction, we described four approaches to integrating multiomics data: univariateunivariate regression, univariatemultivariate regression, multivariatemultivariate regression, and neural networks. Franzosa et al. reported weak univariateunivariate regressions for these data [14]. Here, we evaluate the performance of the other multiomics approaches by benchmarking the performance of microbemetabolite predictive models with 5fold crossvalidation. The quantitative evaluation is done using the three criteria described in “Model evaluation” section.
Neural encoderDecoders outperform linear regression
We do not expect that the microbiome alone can predict the abundance of all metabolites. Rather, we are motivated to answer two research questions: (1) Which metabolites can be predicted by the microbiome? and (2) How reliable is that prediction? Table 3 and Table 4 show the performance of the microbemetabolite prediction models for the “Complete” and “Summarized” data, respectively. The tables are organized by the multiomics integration scheme used: univariatemultivariate, multivariatemultivariate, or neural network. To answer the two research questions, we compare the accuracy of each model for the top 10 best predictions, along with its sparsity and stability.
Sparsity improves accuracy and interpretability
Without regularization, it is easy for a model to overfit. Therefore, it is no surprise that the sparse linear regression (i.e., LASSO) and the sparse neural encoderdecoder outperform their fully connected counterparts. Nevertheless, in both cases, the neural encoderdecoder outperforms linear regression (although LASSO performs impressively well).
Nonnegative weights improve interpretability
With the default linear regression and the neural encoderdecoder network, weights can take on any value. Here, we propose using a nonnegative weights constraint to improve the interpretability of the model. For a neural network, this constraint means that when an input node contributes to a hidden node, that contribution is always additive. This allows us to interpret the hidden layer as an aggregation of the input activity. Likewise, each output node is computed as an aggregation of the hidden node activity. Combined with sparsity, the nonnegative weights constraint ensures that each node is the simple sum of just a few elements. Table 3 and Table 4 show that this constraint does not seriously reduce accuracy, though it advantageously does force the network to be even more sparse (see Active links column).
The latent space is clinically coherent
The latent space of the encoderdeconoder network is a learned abstraction of the input data, designed to describe how gut microbes associate with gut metabolites for all patients. As such, variance within the hidden layer reflects interpatient variance within the microbemetabolite axis. We focus this section on the sparse and nonnegative neural encoderdecoder model, trained using the “Summarized” data, because we think it nicely balances accuracy with interpretability. For this neural network, the value of each hidden node equals tanh(x·w+b), where x is the persample microbe abundances, w is the weights associated with each microbe, and b is an offset. Together, these hidden nodes comprise a new feature space, learned in a fully unsupervised way, that can be analyzed directly with routine statistical modelling.
The latent space associates with iBD
For each of the 70 nodes in the hidden layer, we can compute the variance across all patients: some nodes are more variable than other nodes. We can also compute the amount of weight coming in and out of each node: some nodes have more “traffic” than other nodes. Hightraffic nodes strongly relate multiple microbes to multiple metabolites, while lowtraffic nodes describe fewer or subtler relationships. Figure 3 plots the node variance against the node traffic. Here, we see that the highvariance nodes are usually the hightraffic nodes, suggesting that the nodes which model the strongest microbemetabolite interactions also vary the most between individual patients. An ANOVA of the latent features reveals that the highvariancehightraffic nodes significantly associate with IBD (FDRadjusted pvalue < 0.05). Since the latent space is an abstraction of the relationship between microbes and metabolites, its association with IBD suggests that the relationship between microbes and metabolites is itself associated with IBD.
The latent space is a noise filter
While an ANOVA allows us to measure the statistical association between the latent space and IBD, we can further understand the clinical relevance of the latent space with a redundancy analysis (RDA). RDA is a principal component analysis that constrains the principal axes so that they describe the part of the latent space that is also explained a “conditioning matrix”, L. Here, the conditioning matrix contains the clinical covariates: age, fecal calprotectin, diagnosis, antibiotic use, immunosupressant use, mesalamine use, and steroid use. In the left panel of Fig. 4, we see that the first RDA axis involves multiple correlated nodes that all associate strongly with the IBD diagnosis (CD vs. HC vs. UC). Although medication use is confounded by diagnosis, we see that the offaxis node “V21” is perfectly anticorrelated with antibiotic use (and CD), while the offaxis node “V66” is correlated with it. The right panel shows box plots for some of the longest firstaxis and offaxis arrows. By looking at the RDA eigenvalues, we know that 26.9% of the latent space can be explained by the clinical covariates in L. This is more than the 12% of the microbe data, and the 23.6% of the metabolite data, explained by L. In other words, a higher fraction of the hidden layer is explained by clinical covariates than the input or output layer. This finding supports two hypotheses. First, the relationship between microbes and metabolites is itself associated with IBD. Second, the encoderdecoder latent space acts like a noise filter that can extract the clinically relevant part of the source data in a fully unsupervised way.
The latent space is discriminative
From our differential abundance analyses, we know that the microbes, metabolites, and latent features all associate with IBD (though there are more metabolite associations than microbe associations). Although we have shown that the latent space is clinically coherent, we want to further demonstrate its discriminative power in classification tasks. Table 5 shows the average “outofthebox” AUC for binary classifiers trained on 25 randomly subsampled training sets. For most outcomes, the latent space classifier performs at least as well as the microbe classifier. However, when predicting antibiotic use and immunosuppressant use, the hidden layer is actually more predictive than either the microbe or metabolite abundances.
The latent space is interpretable
Figure 5 shows a three layer graph relating microbes to metabolites, built using the edge overlap across all 5 training set folds. The middle layer contains latent variables that weigh the microbe abundances so that they maximally predict the metabolite abundances. The graph reveals a general structure: the top half describes how the microbes that are enriched in healthy guts predict the metabolites that are also enriched in healthy guts, while the bottom half describes how the microbes that are depleted in healthy guts predict the metabolites that are also depleted in healthy guts. Ruminococcus and Fusobacterium are both replicated IBD biomarkers, and the latent variables that relate them to metabolites also associate with IBD.
In the upper graph, we see how Ruminococcus (among others) contributes to 6 latent variables which go on to predict several healthy metabolite signatures, including tropane alkaloids and steroidal saponins, as well as other plantderived compounds. It is interesting, though perhaps not surprising, that some of the plantderived compounds enriched in the healthy gut have known medicinal properties [43, 44]. In the lower graph, we see how Fusobacterium (among others) contributes to a single latent variable; this node, V61, is highly predictive of the abundance of bile acids, alcohols, and derivatives. This finding is consistent with the literature which suggests that the bile acid conjugate taurine is a substrate for bacteria metabolism, and that a defect in the detoxification of taurine byproducts is associated with ulcerative colitis [45].
Conclusions
Inflammatory bowel disease (IBD) presents a major health burden to developed countries. Although IBD is not infectious, patients with Crohn’s disease (CD) and ulcerative colitis (UC) exhibit an abnormal gut microbiome as well as an altered gut metabolome. In this manuscript, we propose a neural encoderdecoder model to learn a set of weighted connections that can predict metabolite abundances using only microbe abundances. We show that this neural network outperforms linear models for microbiomemetabolome predictions, and that sparsification, along with a nonnegative weights constraint, further improves the accuracy, stability, and interpretability of the encoderdecoder model. Importantly, the neural encoderdecoder model is not simply a black box designed to maximize predictive accuracy. Rather, the hidden layer of the model can help visualize the predictive relationship between microbes and metabolites. Moreover, the learned latent feature space (i.e., the hidden nodes themselves) appears to structure the data in a clinically coherent way: the latent space associates with, and predicts, IBD diagnosis and medication use. Our finding suggests that the microbemetabolite axis itself, not just the microbes and metabolites alone, is an IBDspecific biomarker signature. To the best of our knowledge, this work is the first application of neural encoderdecoders for the interpretable integration of multiomics biological data.
Availability of data and materials
The neural encoderdecoder model is available from https://github.com/vuongle2/BiomeNED. The transformed data used to train the model, the latent feature space, and the differential abundance results are available from https://doi.org/10.5281/zenodo.3255151. The raw data is available from the supplement of Franzosa et al. [14].
Abbreviations
 ANOVA:

Analysis of variance
 AUC:

Area under receiver operating curve
 CanCor:

Canonical correlation
 CD:

Crohn’s disease
 clr:

Centered logratio
 FDR:

False discovery rate
 GI:

Gastrointestinal
 HC:

Healthy control
 IBD:

Inflammatory bowel disease
 LCMS:

Liquid chromatographymass spectrometry
 LR:

Linear regression
 MSE:

Mean square error
 NED:

Neural encoderdecoder
 NGS:

Nextgeneration sequencing
 RDA:

Redundancy analysis
 UC:

Ulcerative colitis
References
 1
Segal JP, Mullish BH, Quraishi MN, Acharjee A, Williams HRT, Iqbal T, Hart AL, Marchesi JR. The application of omics techniques to understand the role of the gut microbiota in inflammatory bowel disease. Ther Adv Gastroenterol. 2019; 12:175628481882225. https://doi.org/10.1177/1756284818822250.
 2
Tang ZZ, Chen G, Hong Q, Huang S, Smith HM, Shah RD, Scholz M, Ferguson JF. MultiOmic Analysis of the Microbiome and Metabolome in Healthy Subjects Reveals MicrobiomeDependent Relationships Between Diet and Metabolites. Front Genet. 2019; 10. https://doi.org/10.3389/fgene.2019.00454.
 3
LloydPrice J, Arze C, Ananthakrishnan AN, Schirmer M, AvilaPacheco J, Poon TW, Andrews E, Ajami NJ, Bonham KS, Brislawn CJ, Casero D, Courtney H, Gonzalez A, Graeber TG, Hall AB, Lake K, Landers CJ, Mallick H, Plichta DR, Prasad M, Rahnavard G, Sauk J, Shungin D, VázquezBaeza Y, White RA, Braun J, Denson LA, Jansson JK, Knight R, Kugathasan S, McGovern DPB, Petrosino JF, Stappenbeck TS, Winter HS, Clish CB, Franzosa EA, Vlamakis H, Xavier RJ, Huttenhower C. Multiomics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019; 569(7758):655.
 4
Yachida S, Mizutani S, Shiroma H, Shiba S, Nakajima T, Sakamoto T, Watanabe H, Masuda K, Nishimoto Y, Kubo M, Hosoda F, Rokutan H, Matsumoto M, Takamaru H, Yamada M, Matsuda T, Iwasaki M, Yamaji T, Yachida T, Soga T, Kurokawa K, Toyoda A, Ogura Y, Hayashi T, Hatakeyama M, Nakagama H, Saito Y, Fukuda S, Shibata T, Yamada T. Metagenomic and metabolomic analyses reveal distinct stagespecific phenotypes of the gut microbiota in colorectal cancer. Nat Med. 2019; 25(6):968.
 5
Manichanh C, Borruel N, Casellas F, Guarner F. The gut microbiota in IBD. Nat Rev Gastroenterol Hepatol. 2012; 9(10):599–608.
 6
Hansen JJ, Sartor RB. Therapeutic Manipulation of the Microbiome in IBD: Current Results and Future Approaches. Curr Treat Options Gastroenterol. 2015; 13(1):105–20.
 7
Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, Lee JC, Schumm LP, Sharma Y, Anderson CA, Essers J, Mitrovic M, Ning K, Cleynen I, Theatre E, Spain SL, Raychaudhuri S, Goyette P, Zhi Wei Z, Abraham C, Achkar JP, Ahmad T, Amininejad L, Ananthakrishnan AN, Andersen V, Andrews JM, Baidoo L, Balschun T, Bampton PA, Bitton A, Boucher G, Brand S, Büning C, Cohain A, Cichon S, D’Amato M, Jong DD, Devaney KL, Dubinsky M, Edwards C, Ellinghaus D, Ferguson LR, Franchimont D, Fransen K, Gearry R, Georges M, Gieger C, Glas J, Haritunians T, Hart A, Hawkey C, Hedl M, Hu X, Karlsen TH, Kupcinskas L, Kugathasan S, Latiano A, Laukens D, Lawrance IC, Lees CW, Louis E, Mahy G, Mansfield J, Morgan AR, Mowat C, Newman W, Palmieri O, Ponsioen CY, Potocnik U, Prescott NJ, Regueiro M, Rotter JI, Russell RK, Sanderson JD, Sans M, Satsangi J, Schreiber S, Simms LA, Sventoraityte J, Targan SR, Taylor KD, Tremelling M, Verspaget HW, De Vos M, Wijmenga C, Wilson DC, Winkelmann J, Xavier RJ, Zeissig S, Zhang B, Zhang CK, Zhao H, Silverberg MS, Annese V, Hakonarson H, Brant SR, RadfordSmith G, Mathew CG, Rioux JD, Schadt EE, Daly MJ, Franke A, Parkes M, Vermeire S, Barrett JC, Cho JH. Hostmicrobe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012; 491(7422):119–24.
 8
Xavier RJ, Podolsky DK. Unravelling the pathogenesis of inflammatory bowel disease. Nature. 2007; 448(7152):427–34.
 9
Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Metaanalysis of gut microbiome studies identifies diseasespecific and shared responses. Nat Commun. 2017; 8(1):1784.
 10
Halfvarson J, Brislawn CJ, Lamendella R, VázquezBaeza Y, Walters WA, Bramer LM, D’Amato M, Bonfiglio F, McDonald D, Gonzalez A, McClure EE, Dunklebarger MF, Knight R, Jansson JK. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol. 2017; 2:17004.
 11
Morgan KC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, Reyes JA, Shah SA, LeLeiko N, Snapper SB, Bousvaros A, Korzenik J, Sands BE, Xavier RJ, Huttenhower C. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012; 13(9):R79.
 12
Larsen PE, Dai Y. Metabolome of human gut microbiome is predictive of host dysbiosis. GigaScience. 2015; 4:42.
 13
Marchesi JR, Holmes E, Khan F, Kochhar S, Scanlan P, Shanahan F, Wilson ID, Wang Y. Rapid and Noninvasive Metabonomic Characterization of Inflammatory Bowel Disease. J Proteome Res. 2007; 6(2):546–51.
 14
Franzosa EA, SirotaMadi A, AvilaPacheco J, et al.Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat Microbiol. 2018; 4(2):293–305. https://doi.org/10.1038/s4156401803064.
 15
You Y, Liang D, Wei R, Li M, Li Y, Wang J, Wang X, Zheng X, Jia W, Chen T. Evaluation of metabolitemicrobe correlation detection methods. Anal Biochem. 2019; 567:106–11.
 16
Gamazon ER, Wheeler HE, Shah K, Mozaffari SV, AquinoMichaels K, Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NL, Im HK. A genebased association method for mapping traits using reference transcriptome data. Nat Genet. 2015; 47(9):1091–8.
 17
Smolinska A, Tedjo DI, Blanchet L, Bodelier A, Pierik MJ, Masclee AAM, Dallinga J, Savelkoul PHM, Jonkers DMAE, Penders J, van Schooten FJ. Volatile metabolites in breath strongly correlate with gut microbiome in CD patients. Analytica Chimica Acta. 2018; 1025:1–11.
 18
Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multiomics data. Brief Bioinform. 2016; 17(4):628–41.
 19
Chen Y, Li Y, Narayan R, Subramanian A, Xie X. Gene expression inference with deep learning. Bioinformatics (Oxford, England). 2016; 32(12):1832–39.
 20
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE: 2015. https://doi.org/10.1109/cvpr.2015.7298965.
 21
Ronneberger O, Fischer P, Brox T. Unet: Convolutional networks for biomedical image segmentation. In: Lecture Notes in Computer Science. Springer: 2015. p. 234–241. https://doi.org/10.1007/9783319245744_28.
 22
Ching T, Himmelstein DS, BeaulieuJones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interf. 2018; 15:141.
 23
Albaladejo JP, Fernández M, Antoni J. zCompositions  R package for multivariate imputation of leftcensored data under a compositional approach. 2015; 143:85–96. https://doi.org/10.1016/j.chemolab.2015.02.019.
 24
Aitchison J. The Statistical Analysis of Compositional Data. London: Chapman & Hall, Ltd.; 1986.
 25
van den Boogaart KG, TolosanaDelgado RT. Introduction. Berlin: Springer; 2013, pp. 1–12.
 26
Fernandes AD, Reid JNs, Macklaim JM, McMurrough TA, Edgell DR, Gloor GB. Unifying the analysis of highthroughput sequencing datasets: characterizing RNAseq, 16s rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome; 2(15):2014.
 27
Quinn TP, Erb I, Richardson MF, Crowley TM. Understanding sequencing data as compositions: an outlook and review. Bioinformatics. 2018; 34(16):2870–78.
 28
Chamberlain SA, Szöcs E. taxize: taxonomic search and retrieval in R. F1000Research. 2013; 2:191.
 29
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol). 1996; 58(1):267–88.
 30
Frankle J, Carbin M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In: International Conference on Learning Representations: 2019.
 31
Subramanian A, Pruthi D, Jhamtani H, BergKirkpatrick T, Hovy E. Spine: Sparse interpretable neural embeddings. In: ThirtySecond AAAI Conference on Artificial Intelligence: 2018.
 32
Ribeiro MT, Singh S, Guestrin C. Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM: 2016. p. 1135–1144.
 33
Han S, Pool J, Tran J, Dally W. Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems: 2015. p. 1135–1143.
 34
Louizos C, Welling M, Kingma DP. Learning sparse neural networks through l_0 regularization. In: International Conference on Learning Representations: 2018.
 35
Lee N, Ajanthan T, Torr P. Snip: Singleshot network pruning based on connection sensitivity. In: International Conference on Learning Representations: 2019.
 36
Chorowski J, Zurada JM. Learning understandable neural networks with nonnegative weight constraints. IEEE Trans Neural Netw Learn Syst. 2014; 26(1):62–9.
 37
Zeng X, He Z, Yu H, Qu S. Bidirectional nonnegative deep model and its optimization in learning. J Optim. 2016; 2016:1–8. https://doi.org/10.1155/2016/5975120.
 38
Kalousis A, Prados J, Hilario M. Stability of feature selection algorithms. In: Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE: 2005.
 39
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, L. Simpson GL, Solymos P, Stevens MHH, Szoecs E, Wagner H. vegan: Community Ecology Package. 2019.
 40
Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002; 2(3):18–22.
 41
Quinn T, Tylee D, Glatt S. exprso: an Rpackage for the rapid implementation of machine learning algorithms. F1000Research. 2017; 5:2588.
 42
Kostic AD, Xavier RJ, Gevers D. The microbiome in inflammatory bowel disease: current status and the future ahead. Gastroenterology. 2014; 146(6):1489–99.
 43
KohnenJohannsen KL, Kayser O. Tropane Alkaloids: Chemistry, Pharmacology, Biosynthesis and Production. Molecules. 2019; 24:4.
 44
Sahu NP, Banerjee S, Mondal NB, Mandal D. Steroidal Saponins In: Kräutler B, Sahu NP, Banerjee S, Mondal NB, Mandal D, editors. Fortschritte der Chemie organischer Naturstoffe / Progress in the Chemistry of Organic Natural Products. Vienna: Springer: 2008. p. 45–141.
 45
Ridlon JM, Kang DJ, Hylemon PB. Bile salt biotransformations by human intestinal bacteria. J Lipid Res. 2006; 47(2):241–59.
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Genomics Volume 21 Supplement 4, 2020: Proceedings of the Joint International GIW & ABACBS2019 Conference: genomics (part 2). The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume21supplement4.
Funding
Publication costs have been funded by university funds.
Author information
Affiliations
Contributions
VL and TPQ conceptualized the project and drafted the manuscript. VL implemented the neural encoderdecoder model and performed the benchmark analyses. TPQ reviewed the IBD literature, prepared the data, and performed the analysis of the latent space. TT helped design the neural network. TT and SV supervised the project. All authors revised and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Le, V., Quinn, T.P., Tran, T. et al. Deep in the Bowel: Highly Interpretable Neural EncoderDecoder Networks Predict Gut Metabolites from Gut Microbiome. BMC Genomics 21, 256 (2020). https://doi.org/10.1186/s1286402066527
Received:
Accepted:
Published:
Keywords
 Metabolomics
 Multiomics
 Machine learning
 Deep learning
 Interpretability