Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: HIPPI: highly accurate protein family classification with ensembles of HMMs

Fig. 2

Algorithm for generating the ensemble of HMMs. The input is a seed alignment and a maximum likelihood (ML) tree that has been estimated for the seed alignment. The algorithm begins by adding the HMM built on the seed alignment to the ensemble. If the seed alignment has more than 10 sequences, the ML tree is decomposed into two subtrees by deleting the centroid edge (i.e., the edge that produces a maximally balanced split of the taxon set into two sets). The subtrees are used to generate induced alignments. HMMs are built for each induced alignment and added to the ensemble. The process iterates on those subtrees that meet the criterion for decomposition (subset size more than max(10, n/10), where n is the number of sequences in the seed alignment, and mean pairwise sequence identity less than 40 %)

Back to article page