 Research
 Open Access
 Published:
ENIGMA: an enterotypelike unigram mixture model for microbial association analysis
BMC Genomicsvolume 20, Article number: 191 (2019)
Abstract
Background
One of the major challenges in microbial studies is detecting associations between microbial communities and a specific disease. A specialized feature of microbiome count data is that intestinal bacterial communities form clusters called as “enterotype”, which are characterized by differences in specific bacterial taxa, making it difficult to analyze these data under health and disease conditions. Traditional probabilistic modeling cannot distinguish between the bacterial differences derived from enterotype and those related to a specific disease.
Results
We propose a new probabilistic model, named as ENIGMA (Enterotypelike uNIGram mixture model for Microbial Association analysis), which can be used to address these problems. ENIGMA enabled simultaneous estimation of enterotypelike clusters characterized by the abundances of signature bacterial genera and the parameters of environmental effects associated with the disease.
Conclusion
In the simulation study, we evaluated the accuracy of parameter estimation. Furthermore, by analyzing the realworld data, we detected the bacteria related to Parkinson’s disease. ENIGMA is implemented in R and is available from GitHub (https://github.com/abikoushi/enigma).
Background
More than 100 trillion microbes live on and within human beings and form of complex microbial communities (microbiota). Most microbes cannot be cultured in laboratories, making it difficult to understand how individual microorganisms mediate vital microbiomehost interactions under health and disease conditions. However, recent important advances in highthroughput sequencing technology have enabled observation of the composition of these intestinal microbes. For each sample drawn from an ecosystem, the number of occurrences of each operational taxonomic units (OTUs) is measured and the resulting OTU abundance can be summarized at any level of the bacterial phylogeny. Discovering recurrent microbial compositional patterns that are related to a specific disease is a significant challenge, as individuals with the same disease typically harbor different microbial community structures.
Recent largescale sequencing surveys of the human intestinal microbiome, such as the US NIH Human Microbiome Project (HMP) and the European Metagenomics of the Human Intestinal Tract project (MetaHIT), have revealed considerable variations in microbiota composition among individuals [1, 2]. Particularly, community clusters characterized by differences in the abundance of signature taxa, referred to as enterotypes, were first reported in humans [3]. Later, other studies identified enterotypelike clusters that may reflect features of the hostmicrobial physiology and homeostasis in different species [4, 5] or at different human body sites [6–9]. This microbial stratification has motivated the development of methods for examining unknown clusters of microbial communities.
Probabilistic modeling of microbial metagenomics data often provides a powerful framework for characterizing the microbial community structures [10–12]. For example, Knights et al. [10] applied a Dirichlet prior to a singlelevel hierarchy and proposed a Bayesian approach for estimating the proportion of microbial communities. Holmes et al. [11] extended the Dirichlet prior to Dirichlet multinomial mixtures to facilitate clustering of microbiome samples. Shafiei et al. [12] proposed a hierarchical model for Bayesian inference of microbial communities (BioMiCo) to identify clusters of OTUs related to environmental factors of interest.
However, such models are not suitable for discovering enterotypelike clusters of microbial communities and associations between microbes and a specific disease for the following two reasons. First, the frameworks of Knights et al. [10] and Holmes et al. [11] do not explicitly address the association between the microbial compositional patterns and environmental depend on the interest. Second, the framework of Shafiei et al. [12] models the structure of each sample using a hierarchical mixture of multinomial distributions that are depends on the factors of interest. Individual host properties such as body mass index, age, or gender cannot explain the observed enterotypes [3]. Thus, such enterotypelike clusters that describes interindividual variability among humans do not always to directly affect host probabilities such as diseases ranging from localized gastroenterologic disorders to neurologic, respiratory, metabolic hepatic, and cardiovascular illnesses.
Here, we introduce a novel probabilistic model of a microbial community structures, named as ENIGMA (Enterotypelike uNIGram mixture model for Microbial Association analysis), to address these problems. ENIGMA includes the following contributions:

1.
ENIGMA uses OTU abundances as input and models each sample by the underlying unigram mixture whose parameters are represented by unknown group effects and known effects of interest. The group effects are represented by baseline parameters that change with a latent group of microbial communities. One of the most important features of our model is that the group effects are independent of the effects of interest. This enables the separation of interindividual variability and fixed effects of the host properties related to disease risk.

2.
ENIGMA is regarded as Bayesian learning for detecting associations between a community structure and factors of interest. Our model can be used to simultaneously learn how enterotypelike clusters of OTUs contribute to the microbial structure and how microbial compositional patterns may be related to known features of the sample.

3.
We provide an efficient learning procedure for ENIGMA by using a Laplace approximation to integrate latent variables and estimate the evidence of the complete model and credible intervals of the parameters. The software package that implements ENIGMA in the R environment is available from https://github.com/abikoushi/enigma.
We describe our proposed framework and algorithm in the “Methods” section. We evaluate the performance of ENIGMA using simulated data in terms of its accuracy to estimate parameters and identify clusters in the “Simulation study” section. We apply ENIGMA to clinical metagenomics data and demonstrate how ENIGMA simultaneously identifies enterotypelike clusters and gut microbiota related to Parkinson’s disease (PD) in the “Results on real data” section.
Methods
The key idea of ENIGMA is to adjust for the effects of the enterotype and evaluate the increases and decreases of bacterial abundance associated with environmental factors. Figure 1 shows a conceptual view of ENIGMA. The formal definition of the model is described in the following Mode section. Here we introduce several notations.
Suppose that we observe microbiome count data of K taxa for N samples with M individual host properties, (y_{nk},x_{nm}) (n=1,…,n;k=1,…,K;m=1,…,M) where \(y_{nk}\in \mathbb {N}\) represents the abundance of the kth taxa in the nth sample and x_{nm} represents a binary variable such that x_{nm}=1 if the nth sample has the mth host property and is otherwise x_{nm}=0. Here the word taxa can represent any level of the bacterial phylogeny, e.g., species, genes, family, order, etc.
Model
Figure 2 shows a plate diagram of the proposed model for metagenome sequencing, where y_{n} is the read count vector of the nth sample, x_{n} is the vector of the host properties of the nth sample and z_{n}∈{1,…,L} is a latent class of the nth sample. Our model is a simple extension of the unigram mixture model. We assume that each sample is generated from a multinomial distribution with the parameter vector p_{n}=(p_{n1},…,p_{nK})^{⊤}. The elements of p_{n} and p_{nk} (k=1,…,K) are probabilities of the occurrence of the K taxa for the nth sample. We also assume that p_{nk} can be influenced independently by the environmental factor on the taxa that is common to all latent classes and the interindividual factor on the latent enterotypelike classes. More specifically, the generative process of ENIGMA is defined as follows:
where γ_{l} is baseline parameter (Kdimensional vector) that changes with the latent class, M×K matrix \(\boldsymbol {B}=\left (\beta _{mk}\right)\) is effect of a environmental factor common to all enterotypelike clusters, β_{m} is a mth rowvector of B, \(\boldsymbol {\pi }=\left (\pi _{1},\ldots,\pi _{L}\right)\) is a mixing ratio of components, O_{K} is a Kdimensional zero matrix and I_{K} is Kdimensional identity matrix. Here, the softmax function is defined by \(\text {softmax}(\boldsymbol {x})=\frac {\exp (\boldsymbol {x})}{\sum _{k=1}^{K}{\exp (x_{k})}}\) for a vector x=(x_{1},…,x_{K})^{⊤} using an elementwise exponential function and the probability function of categorical distribution is parameterized as \(\Pr (z=l \boldsymbol {\pi }) = \pi _{l}\), l∈{1,…,L}. In a Bayesian approach, the prior distributions for π, β, and γ_{l} must be defiend. We set a prior based on the Dirichlet distribution for π, and flat priors to the hyperparameters σ and τ for β and γ, respectively. For the convenience of later section, let pl′=softmax(γ_{l}) be the probabilities of the occurrence of bacteria in the latent classes l.
Parameter estimation
Let us denote observed matrix by Y=(y_{nk}), X=(x_{nm}), the unknown parameters by θ=(α,B,γ_{1},…,γ_{L},σ,τ), and their prior by ϕ(θ). The posterior distribution is represented as follows:
First, latent variable z_{n} must be marginalized. The likelihood is described by
The posterior distribution is proportional to the product of the likelihood and prior density:
Let \(\boldsymbol {\hat \theta }\) be the MAP estimator of θ, found by maximizing \(\log p(\boldsymbol {\theta },\boldsymbol {Y},\boldsymbol {X})\).
We use a Laplace approximation [13] for parameter estimation. A Taylor expansion around \(\boldsymbol {\hat \theta }\) gives
where \(H(\boldsymbol {\hat \theta })\) is the Hessian of \(\log p(\boldsymbol {\theta }  \boldsymbol {Y}, \boldsymbol {X})\) evaluated at \(\boldsymbol {\hat \theta }\). Eq. 4 gives
where C is a normalizing constant. This relationship shows that p(θY,X) can be approximated by the normal distribution \(N\left (\boldsymbol {\hat \theta }, H^{1}\left (\boldsymbol {\hat \theta }\right)\right)\). Credible intervals can be calculated from this multivariate normal distribution.
We used the stochastic programming language Stan (http://mcstan.org/) for its implementation. The MAP estimators were obtained by the LBFGS method. Credible intervals were computed from the using a Stan function to compute the Hessian at the MAP estimates.
After fitting the model, the enterotypelike cluster of each sample must be classified. The conditional probability of z_{n}=l is
This is the probability that the nth sample belongs to cluster l. Next, the nth sample is then classified into the lth cluster that maximizes the conditional probability given by Eq. 5.
Model Selection
We also examined whether or not the whole set rather than individual bacteria is related to the environmental factors of interest. We compared between the two models when B≠0 and B=0. We used the log marginal likelihood as the goodness of fit for model comparison. The marginal likelihood is given by
From Eq. 4, we have
Thus, the log marginal likelihood is approximated by the following formula:
where D is the number of free parameters. In model comparison, we choose the model showing larger log marginal likelihood.
Simulation study
To demonstrate the performance of ENIGMA, we conducted several simulation experiments. The synthetic data were naturally produced via our generative process given by Eq. 1. We set M=2000, L=3, π_{l}=1/3, and α=(1,1,1)^{T}. We first generated B and γ_{l} from the standard normal distribution. The variables x_{n}, z_{n}, and y_{n} are then sampled from the Bernoulli distribution with probability of 0.5, the categorical distribution, and the multinomial distribution, respectively. For the above parameter settings, we randomly generated a count dataset of 100 taxa for 100 samples for evaluation.

Coverage probability (CP): The coverage probability is the proportion of the time over which the interval contains the true value. A discrepancy between the coverage probability and the nominal coverage probability frequently occurs. When the actual coverage is greater than the nominal coverage, the interval is referred to as conservative. If the interval is conservative, there is no inconsistency in interpretation.

Bias: The bias of B is defined by the difference between true value and estimated value \(E[\hat {\boldsymbol {B}}]\boldsymbol {B}\).

Standard error (SE): The standard error is the standard deviation from the estimate. A smaller standard error indicates the higher accuracy of estimation.

Root mean squared error (RMSE): The RMSE is defined by
\(\sqrt {E[\left (\boldsymbol {\hat {B}}  \mathbf {B}\right)^{2}]}\). A smaller RMSE indicates the higher accuracy of the estimation.

Accuracy: The accuracy is the percentage of samples correctly classified into original group.
To calcurate these metrics, we detrmined that we calculated the sample means and standard deviations of \(\hat {\boldsymbol {B}}\) and \(\left (\hat {\boldsymbol {B}}\boldsymbol {B}\right)^{2}\) from the 10,000 synthetic datasets.
Figure 3 shows a comparison of the true B and the mean and standard deviation of estimates \(\hat {\boldsymbol {B}}\) obtained from the 10,000 simulations. We observed that the points were arranged diagonally, indicating that the estimator of ENIGMA was unbiased. We also calculated the proportion of the time for which the 95% credible interval contains the true value of B. We found that this proportion was greater than nominal value of 0.95 for all B in Fig. 4. Table 1 shows the coverage probability (CP), bias, standard error (SE), and RMSE of \(\hat {\boldsymbol {B}}\), respectively. We observed that the bias and standard error decreased when β_{mk} was large (i.e. the corresponding abundance was large). We also found that the accuracy of classification given by Eq. 5 was exactly 100%. Thus, these results indicate that ENIGMA can produce reasonable estimates.
Results on real data
Arumugam et al. (2011)’s data
We demonstrated that the enterotypelike cluster can be estimated using the data of Arumugam et al. [3]. This data is N=33, K=55. The data of Arumugam et al. [3] does not disclose the total read count. Thus, We used the relative abundance multiplied by 10,000 as y_{nk}. Based on the result of Arumugam et al. [3], the number of latent classes in ENIGMA was chosen to be L=3. We estimated the parameters using the ENIGMA and setting all β_{mk}=0 in Eq. 1. We set the hyperparameters of Dirichlet prior α=(1,…,1)^{⊤}, which is equivalent to a noninformative prior.
Arumugam et al. [3] showed that the enterotype is characterized by the differences in the abundance of Bacteroides, Prevotella, and Ruminococcus. Estimates of the probability of occurrence of those bacteria in three clusters are shown in the Fig. 5. Class 1 contains highlevel Ruminococcus, class 2 contains highlevel Bacteroides, and class 3 contains highlevel Prevotella. This result is consistent with that of Arumugam et al. (2011) [3].
Parkinson’s disease data
To validate the performance of ENIGMA in discovering clusters of microbial communities and associations between microbes and a specific disease, we applied ENIGMA to the real metagenomic sequencing data from Scheperjans et al. [14], HillBurns et al. [15], HeintzBuschart et al. [16] and Hopfner et al. [17]. The data was analyzed by sequencing the bacterial 16S ribosomal RNA genes sampled from patients with Parkinson’s disease (PD) and controls in Finland, USA, and Germany. Table 2 shows the summary statistics of the data. The OTUs were mapped to the SILVA taxonomic reference, version 132 (https://www.arbsilva.de/) and the abundances of familylevel taxa were calculated.
To assess the optimal number of clusters, we used the CalinskiHarabasz (CH) Index. It is defined as:
where BC_{l} is the betweencluster sum of squares (i.e. the squared distances between all points i and j, for which i and j are not in the same cluster) and WC_{l} is the withinclusters sum of squares (i.e. the squared distances between all points i and j, for which i and j are in the same cluster). Here, we used JensenShannon divergence (JSD) as the distance. The JSD between samples a=(a_{1},…,a_{K}) and b=(b_{1},…,b_{K}) is defined as follows:
When calculating the JSD, we used the normalized abundance obtained by dividing y_{nk} by the total read count, and 0 was replaced with pseudo count 10^{−6}. We chose the number of clusters L such that CH_{l} was maximal. To evaluate the CH Index, we use the function index.G1() from the R library clusterSim. The number of latent classes in ENIGMA was chosen to be L=3 in Finland and Germany and L=2 in USA by the CH indexes (Fig. 6).
First, we evaluated whether the model assumption was satisfied when using this data. According to Arumugam et al. (2011) [3], the gender of the host is not related with the enterotype. The genders of the subjects were published in the Finland study. We examined the relationship between gender and enterotypelike cluster using the data from Finland. Table 3 shows there was no correlation between them. We conducted a Chisquared test for independence as shown in Table 3 and the pvalue was 0.66.
We evaluated whether the model showing that bacteria were associated with PD is better than the model without the associations in terms of marginal likelihood. Marginal likelihood represents the model evidence expressing the preference of the data for different models. Let \(\mathcal {M}_{1}\) be the model which is described by Eq. 1 and \(\mathcal {M}_{0}\) be the model setting all β_{mk}=0 in Eq. 1. Table 4 shows that the marginal likelihood of \(\mathcal {M}_{1}\) was greater than \(\mathcal {M}_{0}\). It is preferrd to explain the data by considering the association between the microbiota and PD.
Figure 7 shows the estimated probabilities of the occurrences of bacteria for the three latent classes, pl′, (l=1,2,3). Bacteria detected in fewer three countries were removed. Arumugam et al. [3] showed that enterotype is characterized by the differences in the abundance of Bacteroides, Prevotella, and Ruminococcus. The results of ENIGMA showed the same tendency as the previous survey. Figure 8 shows the \((\boldsymbol {\hat \gamma }_l)'\) values and their credible intervals. The top three microbes in each enterotypelike cluster are shown in excerpts for this plot. According to the results of ENIGMA, the abundance of Enterobacteriaceae and Lachnospiraceae also differed greatly among clusters. Bacterial abundance differed between countries. In the USA, there was a high abundance of Verrucomicrobiaceae, while in Finland, few of these bacteria were detected. In contrast, Finland showed more Prevotellaceae, with fewer in the in USA it is less.
Table 5 shows the coefficients whose 95% credible intervals did not contain zero in more than two countries. The microbes with these coefficients indicates that the corresponding microbial composition patterns were significantly related to PD. We found that at the family levels, Clostridiaceae, Comamonadaceae, Prevotellaceae, Actinomycetaceae, Bifidobacteriaceae, Enterococcaceae, Synergistaceae, Verrucomicrobiaceae and Victivallaceae, the signs of the coefficients matched in all countries. These results are consistent with those of previous studies. HillBurns et al. [15] reported that patients with PD contained high levels of Bifidobacteriaceae and Verrucomicrobiaceae. Scheperjans et al. [14] reported PD patients contained high levels of Verrucomicrobiaceae and low levels of Prevotellaceae. Hopfner et al. reported that patients with PD have high levels of Enterococcaceae.
We compared ENIGMA to the Wilcoxon rank sum test, a classical methods for identifying bacteria related with an environmental factor of interest [16]. Table 6 shows the bacteria significantly related to PD with pvalue <0.05 in more than two countries. We observed that the bacteria detected by the Wilcoxon test were mostly included in those of ENIGMA (Table 5). Notably, all of the corrected pvalues in Table 6 are larger than 0.05. This result shows that ENIGMA was superior to the Wilcoxon rank sum test in terms of identifying a larger number of associations between microbiota and PD.
Finally, we combined the results of ENIGMA to those of PICRUSt (version 1.1.3) [18] in order to evaluate which functions are related to PD. In the present study, PICRUSt was performed using the default settings. The Fig. 9 shows the functions exhibiting an increase and decrease from the median, which matched in all countries and clusters with respect to PD and control (CO). This result indicates that ENIGMA is a valuable tool for discovering new diseaserelated functions.
The analyses using realworld data thus show that ENIGMA can identify enterotypelike clusters and the associations between the gut microbiota and PD. Some of the results were strongly supported by those of previous studies.
Conclusion
We proposed a novel hierarchical Bayesian model, ENIGMA, for discovering the underlying microbial community structures and associations between microbiota and their environmental factors from microbial metagenome data. ENIGMA is based on a probabilistic model of a microbial community structures and supplied with labels for one or more environmental factors of interest for each sample. The structures of each sample are modeled by a multinomial distribution whose parameters are represented independently by group and environmental effects of each sample, which prevents mixing of individual differences and the effects of interest. This framework enables the model to simultaneously learn (i) how microbes contribute to an underlying community structures (cluster) and (ii) how microbial compositional patterns are explained by environmental factors of interest. The effectiveness of ENIGMA was evaluated through experiments involving both synthetic and readworld datasets. These newly discovered clusters and associations estimated using ENIGMA can provide insight into the the mechanisms of a microbial communities.
The major limitation of ENIGMA is its scalability and efficiency, as the number of the parameters in the model increase proportionally with the number of taxa when the number of environmental factors of interest is large. Further studies should focus on developing a scalable probabilistic model of microbial compositions to analyze underlying microbial structures with a large number of these effects by using sparse parameter estimation [19]. We are also interested in developing a dynamic probabilistic model similar to that reported by Blei and Lafferty [20] for analyzing timevarying bacteria compositions during disease progression.
Abbreviations
 CH:

CalinskiHarabasz
 CO:

Control
 CP:

Coverage probability
 JSD:

JensenShannon divergence
 MAP:

Maximum a posteriori
 OTU:

Operational taxonomic units
 PD:

Parkinson’s disease
 RMSE:

Root mean squared error
 SE:

Standard error
References
 1
Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al. Human Mircrobiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012; 486:207–14.
 2
Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F, Falony G, et al. Richness of human gut microbiome correlates with metabolic markers. Nature. 2013; 500:541–6.
 3
Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011; 473(7346):174.
 4
Moeller AH, Degnan PH, Pusey AE, Wilson ML, Hahn BH, Ochman H. Chimpanzees and humans harbour compositionally similar gut enterotypes. Nat Commun; 3:1179.
 5
Hildebrand F, Nguyen TL, Brinkman B, Yunta RG, Cauwe B, Vandenabeele P, et al. Inflammationassociated enterotypes, host genotype, cage and interindividual effects drive gut microbiota variation in common laboratory mice. Genome Biol. 2013; 14(1):R4.
 6
Ravel J. Vaginal microbiome of reproductiveage women. Proc Natl Acad Sci USA. 2011; 108.Supplement 1:4680–7.
 7
Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Huttenhower C, et al. A guide to enterotypes across the human body: metaanalysis of microbial community structures in human microbiome datasets. PLoS Comput Biol. 2013; 9(1):e1002863.
 8
Ding T, Schloss PD. Dynamics and associations of microbial community types across the human body. Nature. 2014; 509:357–60.
 9
Zhou Y, Mihindukulasuriya KA, Gao H, La Rose PS, Wylie KM, Martin JC, et al. Exploration of bacterial community classes in major human habitats. Genome Biol. 2014; 15:R66.
 10
Knights D1, Costello EK, Knight R. Supervised classification of human microbiota. FEMS Microbiol Rev. 2011; 35(2):343–59.
 11
Holmws I, Harris K, Quince C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PloS ONE. 2012; 7(2):e30126.
 12
Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, et al. BioMiCo: a supervised Bayesian model for inference of microbial community structure. Microbiome. 2015; 3(1):8.
 13
Bishop C. Pattern recognition and machine learning. New York: SpringerVerlag; 2006.
 14
Scheperjans F, Aho V, Pereira PA, Koskinen K, Paulin L, Pekkonen E, et al. Gut microbiota are related to Parkinson’s disease and clinical phenotype. Mov Disord. 2015; 30(3):350–8.
 15
HillBurns EM, Debelius JW, Morton JT, Wissemann WT, Lewis MR, Wallen ZD, et al. Parkinson’s disease and Parkinson’s disease medications have distinct signatures of the gut microbiome. Mov Disord. 2017; 32(5):739–49.
 16
HeintzBuschart A, Pandey U, Wicke T, SixelDöring F, Janzen A, SittigWiegand E, et al. The nasal and gut microbiome in Parkinson’s disease and idiopathic rapid eye movement sleep behavior disorder. Mov Disord. 2018; 33(1):88–98.
 17
Hopfner F, Künstner A, Müller SH, Künzel S, Zeuner KE, Margraf NG, et al. Gut microbiota in Parkinson disease in a northern German cohort. Brain Res; 1667:41–5.
 18
Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013; 31(9):814–21.
 19
Yang Y, Chen N, Chen T. mLDM: a new hierarchical Bayesian statistical model for sparse microbioal association discovery. bioRxiv. 2016;:042630. https://doi.org/10.1101/042630.
 20
Blei DM, Lafferty JD. Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. New York: ACM: 2006. p. 113–20.
Acknowledgements
Not applicable.
Funding
This work was supported by GrantsinAid from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT); Ministry of Health, Labour and Welfare of Japan (MHLW); Japan Agency for Medical Research and Development (AMED), and the Hori Sciences and Arts Foundation. Publication of this article was sponsored by AMED CREST JP18gm1010002.
Availability of materials
ENIGMA is implemented with R and is available from GitHub (https://github.com/abikoushi/enigma).
About this supplement
This article has been published as part of BMC Genomics Volume 20 Supplement 2, 2019: Selected articles from the 17th Asia Pacific Bioinformatics Conference (APBC 2019): genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume20supplement2.
Author information
Affiliations
Contributions
KA and TS designed the proposed algorithm. KO and MH designed the experiments. All authors have read and approved the final manuscript.
Corresponding author
Correspondence to Teppei Shimamura.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Published
DOI
Keywords
 Enterotype
 Topic model
 Unigram mixture
 Bayesian inference
 Metagenomics