HMMEditor: a visual editing tool for profile hidden Markov model
© Dai and Cheng; licensee BioMed Central Ltd. 2008
Published: 20 March 2008
Profile Hidden Markov Model (HMM) is a powerful statistical model to represent a family of DNA, RNA, and protein sequences. Profile HMM has been widely used in bioinformatics research such as sequence alignment, gene structure prediction, motif identification, protein structure prediction, and biological database search. However, few comprehensive, visual editing tools for profile HMM are publicly available.
We develop a visual editor for profile Hidden Markov Models (HMMEditor). HMMEditor can visualize the profile HMM architecture, transition probabilities, and emission probabilities. Moreover, it provides functions to edit and save HMM and parameters. Furthermore, HMMEditor allows users to align a sequence against the profile HMM and to visualize the corresponding Viterbi path.
HMMEditor provides a set of unique functions to visualize and edit a profile HMM. It is a useful tool for biological sequence analysis and modeling. Both HMMEditor software and web service are freely available.
Hidden Markov Model (HMM) is a widely used statistical model for biological sequence analysis [1–6]. It has been used in many bioinformatics areas such as motif identification [5, 6], gene structure prediction , multiple sequence alignment [1–4], profile-profile alignment [8, 9], protein sequence database search [1, 3], protein fold recognition [1, 3, 9], and protein and gene family modeling (profile HMM) [1–4].
Several powerful profile HMM tools such as HMMer , SAM , and HMMpro  have been developed for analyzing biological sequences. The popular tool HMMer can build a profile HMM from a family of aligned sequences (hmmbuild), search a profile HMM against a sequence database (hmmsearch), search a sequence against a profile HMM database (hmmpfam), and align a group of sequences against a profile HMM (hmmalign).
In contrast, there are only a few profile HMM visualization tools without editing functionality. HMMpro  can visualize HMM architecture and probabilities but, is not publicly available. HMMviewer  can visualize profile HMM produced by HMMer, but its visualization functionality is limited. Similarly, SAM  can only visualize, but only has limited editing function. Another different type of visualization tool, HMM Logo [11, 12] is designed to visualize emission probabilities and transition probabilities in the popular logo style. However, HMM Logo does not provide functions to edit HMM architecture and parameters.
We also notice that some general Hidden Markov Model software includes visualization tools [13, 14]. But these tools uses general input and visualization formats that are not very suitable for visualizing the special profile HMM of biological sequences.
Here we develop a visual editor for a profile HMM in the HMMer format. The HMM models produced by other tools such as SAM  and HHSearch  can be visualized after being converted into the HMMer format.
In this section, at first, we briefly introduce profile Hidden Markov Model generated by HMMer. Then we describe the HMMEditor's visualization and editing functions.
Profile hidden Markov model
The whole profile HMM starts from start (S) state and ends at terminal (T) state. The core of HMM between beginning (B) and ending (E) states consists of the matching (M) states, insertion states (I), and deletion states (D). A matching state (e.g., M1-M3 in Figure 1) represents a fairly conserved position. Each matching state has a deletion state (e.g., D1-D3 in Figure 1) associated with it, allowing the deletion of the matching state (or position). Each matching state except for the last one also has an insertion state (e.g., I1-I2 in Figure 1) associated with it, allowing the insertion of additional positions after it. Unlike profile HMM in , transitions between I and D states are not allowed.
N and C are two special states to accommodate additional insertions before and after the conserved regions of a family of sequences, which allows local alignment between a sequence and the HMM (i.e. matching a part of a sequence against the core of HMM between state B and state E). J state joins the end of a profile HMM to the beginning. J state allows aligning a sequence against the core of a profile HMM multiple times, which is called multi-domain alignment (domain duplication). J state can model the linker (insertion) region between two domains.
Another interesting feature of the profile HMM is that there is a transition from B to each M state, and a transition from each M state directly to E state. These transitions make it possible to match only a part of the model against a sequence, allowing local alignment with respect to the HMM.
Each M, I, N, C, J states has an emission probability vector derived from input sequences. When we align a sequence against the profile HMM model, HMMer will report a log-odds score. It is the logarithm of the ratio between the probability that the sequence is generated by a profile HMM and the probability that it is generated by a null model. For the null model, the residues in a sequence are emitted according to the background distribution.
We develop a profile HMM visualization and editing tool called HMMEditor (profile Hidden Markov Model Visual Editor). HMMEditor was written in Java. Thus it works on all major operating systems (UNIX, Linux, Windows, and Mac). User can run HMMEditor in a web browser through the web start or download and install the software locally.
Figure 2 shows the graphical user interface (GUI) of HMMEditor. HMMEditor has the four main features: (1) visualize profile HMM in different views; (2) edit profile HMM; (3) show Viterbi path; (4) draw HMM Logo. The features (1) and (3) are unique, which can not be found in other tools. The visualized HMM and HMM Logo can be saved into files through the GUI.
1. HMM visualization
HMM layout view shows the structure of a profile HMM. The thickness of the transition line is proportional to the probability of the transition. The thickness of a border of an M state indicates the level of conservation. The label of each matching state denotes the most probable (consensus) residue emitted from the state.
Inside layout view, user can zoom in and zoom out the view and drag any node to get a better appearance. The layout view can be saved as a JPG or PNG file.
HMM text view shows the profile HMM in text view. The format of the text view is the same as HMMer. HMM text view is dynamically associated with the layout view. If a user edits profile HMM in the layout view, the changes will reflect in HMM text view instantly.
2. HMM editing
3. Viterbi path visualization
HMMEditor can read the sequence file in many popular formats such as FASTA and ClustalW . Once sequences are loaded into HMMEditor, user can view the Viterbi path of each sequence. Furthermore, use can edit a sequence, i.e. add or remove residues to see how the path changes. This provides a useful means for user to adjust sequence alignments manually.
User can also align multiple sequences against a profile HMM into multiple sequence alignment and save it into a file from HMMEditor, just as hmmalign does.
4. HMM logo
We have developed HMMEditor, a visual editor for profile Hidden Markov Models. HMMEditor provides a convenient and appealing user interface to visualize and edit profile HMM models. It also allows user to visually adjust and align sequences against HMM. Thus, HMMEditor is a useful tool for the HMM-based biological sequence analysis in the post-genomic era. The software, source code, and web service are freely available at the HMMEditor web site .
JC is supported by a faculty start-up grant at University of Missouri Columbia.
This article has been published as part of BMC Genomics Volume 9 Supplement 1, 2008: The 2007 International Conference on Bioinformatics & Computational Biology (BIOCOMP'07). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/9?issue=S1.
- Krogh A., Brown M., Mais I. S., SjÖlander K., Haussler d.: Hidden Markov models in computational biology: applications to protein modeling. Journal of Molecular Biology. 1994, 235: 1501-1531. 10.1006/jmbi.1994.1104.PubMedView ArticleGoogle Scholar
- Baldi P., Chauvin Y., Hunkapiller T., McClure M. A.: Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994, 91: 1059-1063. 10.1073/pnas.91.3.1059.PubMedPubMed CentralView ArticleGoogle Scholar
- Karplus K., Burrett C., Hughey R.: Hidden Markov models for detecting remote protein homologies. Bioinformatics. 1998, 14: 846-856. 10.1093/bioinformatics/14.10.846.PubMedView ArticleGoogle Scholar
- Eddy S. R.: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Churchill G. A.: Stochastic models for heterogeneous DNA sequence. Bull. Math. Biol. 1989, 51: 79-94.PubMedView ArticleGoogle Scholar
- Durbin R., Eddy S. R., Krogh A., Mitchison G.: Biological sequence analysis: probabilistic models of proteins and nucleic acids. 1999, Cambridge University, LondonGoogle Scholar
- Burge C., Karlin S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997, 268: 78-94. 10.1006/jmbi.1997.0951.PubMedView ArticleGoogle Scholar
- Edgar R. C., Sjölander K.: COACH: Profile-profile alignment of protein families using hidden Markov models. Bioinformatics. 2004, 20: 1309-1318. 10.1093/bioinformatics/bth091.PubMedView ArticleGoogle Scholar
- Söding J.: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21: 951-960. 10.1093/bioinformatics/bti125.PubMedView ArticleGoogle Scholar
- Dowell R., Eddy S. R.: Interactive visualization of HMMER models. Intelligent Systems for Molecular Biology. 1999, (Poster)Google Scholar
- Schuster-Böckler B., Schultz J., Rahmann S.: HMM Logos for visualization of protein families. BMC Bioinformatics. 2004, 5: 7-10.1186/1471-2105-5-7.PubMedPubMed CentralView ArticleGoogle Scholar
- Schneider T. D., Stephens R. M.: Sequence Logos: a new way to display consensus sequences. Nucleic Acids Res. 1990, 18: 6097-6100. 10.1093/nar/18.20.6097.PubMedPubMed CentralView ArticleGoogle Scholar
- GHMM Hidden Markov Model Editor. [http://ghmm.sourceforge.net/hmmed.html]
- JaHMMViz – A GUI for the Jahmm. [http://www.run.montefiore.ulg.ac.be/~francois/software/jahmm/jahmmViz/]
- Higgins D., Thompson J., Gibson T., Thompson D. J., Higgins D. G., Gibson T. J.: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralView ArticleGoogle Scholar
- HMMEditor. [http://babbage.cs.missouri.edu/~chengji/cheng_software.html]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.