Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: Metabolomic biosignature differentiates melancholic depressive patients from healthy controls

Fig. 2

Analytic workflow aimed at maximizing both predictive power and biological interpretability. a Metabolite preprocessing includes (1) correction of each individual metabolite for storage time effects, (2) imputation of missing data, (3) feature clustering, (4) Classification using ensemble learning framework, (5) selection of top clusters/features, (6) pathway analysis and biological interpretation. b Example of cluster containing 5 metabolites. Highly correlated features are grouped into clusters using K-means or Heirarchical clustering. For each cluster, the cluster centroid is computed and used as a feature for ensebml learning. Subsequent pathway analysis includes all members of top clusters. c Illustration of the ensemble learning framework. Given the imbalanced training data, we randomly undersample the training data k times, and then we perform feature selection and classification on each undersampled dataset. Finally, we combine all k classifiers to make the final prediction, and report out top cluster-features

Back to article page