COMUS: Clinician-Oriented locus-specific MUtation detection and deposition System
© Jho et al; licensee BioMed Central Ltd. 2009
Published: 3 December 2009
A disease-causing mutation refers to a heritable genetic change that is associated with a specific phenotype (disease). The detection of a mutation from a patient's sample is critical for the diagnosis, treatment, and prognosis of the disease. There are numerous databases and applications with which to archive mutation data. However, none of them have been implemented with any automated bioinformatics tools for mutation detection and analysis starting from raw data materials from patients. We present a Locus Specific mutation DB (LSDB) construction system that supports both mutation detection and deposition in one package.
COMUS (Clinician-Oriented locus specific MUtation detection and deposition System) is a mutation detection and deposition system for developing specific LSDBs. COMUS contains 1) a DNA sequence mutation analysis method for clinicians' mutation data identification and deposition and 2) a curation system for variation detection from clinicians' input data. To embody the COMUS system and to validate its clinical utility, we have chosen the disease hemophilia as a test database. A set of data files from bench experiments and clinical information from hemophilia patients were tested on the LSDB, KoHemGene http://www.kohemgene.org, which has proven to be a clinician-friendly interface for mutation detection and deposition.
COMUS is a bioinformatics system for detecting and depositing new mutations from patient DNA with a clinician-friendly interface. LSDBs made using COMUS will promote the clinical utility of LSDBs. COMUS is available at http://www.comus.info.
Genetic mutations have two major types: large mutation (deletion, insertion, duplication, and inversion) and point mutation (nonsense, missense, and frame shift). Some mutations can induce DNA transcription and translation errors eventually causing protein dysfunction that leads to disease [1, 2]. Currently, many whole genome scale association studies between disease and variation are being published . However, medical researchers have had to go through mutations in patient DNA to detect mutations that may be the cause of a disease [4, 5].
There are many human disease gene databases that contain disease-causing mutation information as locus-specific databases (LSDBs). Also, large databases, such as Online Mendelian Inheritance in Man (OMIM)  and the Human Gene Mutation Database (HGMD) , collect and describe comprehensively all disease-related genes. In contrast, LSDBs usually describe variations in a small number of genes. The LSDBs aim to provide particular genetic mutation information for disease-causing genes. The Human Genome Variation Society (HGVS) has incorporated information from many LSDBs for rare human disorders. The key activities of HGVS for LSDB construction were: 1) collecting mutations and databases by inviting reviewers of mutations, 2) creating guidelines for mutation nomenclature, 3) initiating quality control of LSDB content, and 4) specifying the minimum content of LSDBs .
In order to improve the mutation collection, several programs were created for an automated LSDB creation. The UMD (Universal Mutation Database) , LOVD (Leiden Open Variation Database) , MuStaR (Mutation Storage and Retrieval) , and MUTbase  are major LSDB creation programs and resulting databases. Curators wishing to construct an LSDB use the programs according to their specific disease targets. However, these programs do not support any bioinformatics sequence analysis method for variation deposition.
Generally, variation detection is achieved with sequencing patient DNA, the key activity for variation detection. However, clinicians who study disease-causing mutation are usually not experts on analyzing sequences. In order to encourage their data submission to LSDBs, simpler and more convenient program interface is necessary. We have developed a simple LSDB construction system that supports mutation detection and deposition to promote easier mutation data submission and maintenance http://www.comus.info.
As a test database, we built a hemophilia disease LSDB. The disease hemophilia (hemophilia A, HA [MIM #306700]; hemophilia B, HB [#306900]) is one of the most historical and archetypical Mendelian disorders in human. Patients with hemophilia suffer from uncontrolled bleeding from factor VIII or IX deficiency due to a mutation in either the F8 (HA) or F9 (HB) gene, respectively, on Xq27.1~q28. The clinical utility of COMUS was validated in this test LSDB, called KoHemGene http://www.kohemgene.org, using a set of raw data files from direct sequencing, as well as clinical information from hemophilia patients.
Our system consists of a database and web application. The web application was constructed for mutation candidate prediction, submission, and registration. Our system was constructed using the JSP programming language and MySQL database.
Predicting mutation candidate
The user can submit data using an AB1 file (chromatogram files used by instruments from Applied Biosystems) or a FASTA file as input. When the user inputs an AB1 file, the web application checks the quality of the sequence and converts it to FASTA format using the Phred program [13, 14]. After preparing FASTA sequence files, the sequences are aligned to reference sequences of the gene locus using the BLAT program . In order to compare patient data with known sequences, we extracted the genome sequence and gene structure information from UCSC Hg18 . Input patient sequences are then aligned to reference genomic regions. After that, we calculate various mutation types, such as mismatch, insertion, and deletion, as mutation candidates. In order to define novel mutations, we compare the genomic positions of mutation candidates to known variations from public databases. To explore the evolutionary constraints of mutation candidates, we calculate evolutionary conservation scores using those UCSC phastCons score . Finally, amino acid changes caused by each mutation candidate are analyzed.
In order to curate submitted data, we created a curator account in the web application. Curators can approve any user's account (submitter account) and can see all the submitted sequence data. When the curators register submitted data, they can use the mutation predicting system to check whether the submitted mutations are appropriate to deposit into their target LSDB. If the submission is not appropriate, curators can return the submission with a return message. With a curator's approval, the submitted sequences and mutation are deposited into LSDB.
Results and discussion
We have constructed a mutation candidate detection and deposition system, COMUS. COMUS addresses two disadvantages in common LSDB systems such as LOVD and UMD. First, we incorporated a mutation prediction system which supports clinicians' mutation data identification and submission. Second, COMUS alleviated the time-consuming bottleneck of specialized curators maintaining the LSDB systems. Because COMUS makes it possible to work with an integrated mutation prediction system, anyone, especially the major variation detectors who are often clinicians or field workers, can be curators. To construct a useful LSDB, some private patient information is necessary. However, because the specifics of patient information vary depending on the disease, COMUS supports only fundamental specifics among patient information: patient ID, gender, age, country, geographic origin, ethnic origin, disease severity, forced vital capacity (FVC), motor ability, and comments.
Recently, Next Generation Sequencing (NGS) technologies are quickly developing and several complete human genomes have been sequenced with NGS. However, in clinics, NGS is not widely used due to difficulty of processing enormous amounts of data, and the current high cost of NGS. As Sanger sequencing machines are still the main facilities in clinics or small-scale wet-labs, COMUS was constructed focusing on Sanger sequencing data as input. When the cost of NGS decreases in the near future, clinicians will use NGS as the method of patient mutation detection.
COMUS is a comprehensive bioinformatics system that has been developed to efficiently bridge genetic data from benchwork to clinics and bedsides. Tailored to have a clinician-friendly interface, COMUS is believed to promote the clinical utility of LSDBs and thereby facilitate translational research in the field of medical genetics, particularly in terms of genotype-phenotype correlations.
Other papers from the meeting have been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, available online at http://www.biomedcentral.com/1471-2105/10?issue=S15.
We thank our colleagues at KOBIC and the Korea Hemophilia Foundation. This research was supported by a grant from the KRIBB Research Initiative Program. This project was also supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (No.M10869030001-08N6903-00110). We thank Maryana Bhak for editing the manuscript.
This article has been published as part of BMC Genomics Volume 10 Supplement 3, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/10?issue=S3.
- Cruts M, Van Broeckhoven C: Presenilin mutations in Alzheimer's disease. Hum Mutat. 1998, 11 (3): 183-190. 10.1002/(SICI)1098-1004(1998)11:3<183::AID-HUMU1>3.0.CO;2-J.View ArticlePubMedGoogle Scholar
- Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, Burgt van der I, Crosby AH, Ion A, Jeffery S, et al: Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet. 2001, 29 (4): 465-468. 10.1038/ng772.View ArticlePubMedGoogle Scholar
- Franke B, Neale BM, Faraone SV: Genome-wide association studies in ADHD. Hum Genet. 2009Google Scholar
- Yamaguchi Y, Watanabe H, Yrdiran S, Ohtsubo K, Motoo Y, Okai T, Sawabu N: Detection of mutations of p53 tumor suppressor gene in pancreatic juice and its application to diagnosis of patients with pancreatic cancer: comparison with K-ras mutation. Clin Cancer Res. 1999, 5 (5): 1147-1153.PubMedGoogle Scholar
- Sprecher E, Chavanas S, DiGiovanna JJ, Amin S, Nielsen K, Prendiville JS, Silverman R, Esterly NB, Spraker MK, Guelig E, et al: The spectrum of pathogenic mutations in SPINK5 in 19 families with Netherton syndrome: implications for mutation detection and first case of prenatal diagnosis. J Invest Dermatol. 2001, 117 (2): 179-187. 10.1046/j.1523-1747.2001.01389.x.View ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005, D514-517. 33 DatabaseGoogle Scholar
- Stenson PD, Mort M, Ball EV, Howells K, Phillips AD, Thomas NS, Cooper DN: The Human Gene Mutation Database: 2008 update. Genome Med. 2009, 1 (1): 13-10.1186/gm13.PubMed CentralView ArticlePubMedGoogle Scholar
- Horaitis O, Talbot CC, Phommarinh M, Phillips KM, Cotton RG: A database of locus-specific databases. Nat Genet. 2007, 39 (4): 425-10.1038/ng0407-425.View ArticlePubMedGoogle Scholar
- Beroud C, Collod-Beroud G, Boileau C, Soussi T, Junien C: UMD (Universal mutation database): a generic software to build and analyze locus-specific databases. Hum Mutat. 2000, 15 (1): 86-94. 10.1002/(SICI)1098-1004(200001)15:1<86::AID-HUMU16>3.0.CO;2-4.View ArticlePubMedGoogle Scholar
- Fokkema IF, den Dunnen JT, Taschner PE: LOVD: easy creation of a locus-specific sequence variation database using an "LSDB-in-a-box" approach. Hum Mutat. 2005, 26 (2): 63-68. 10.1002/humu.20201.View ArticlePubMedGoogle Scholar
- Brown AF, McKie MA: MuStaR and other software for locus-specific mutation databases. Hum Mutat. 2000, 15 (1): 76-85. 10.1002/(SICI)1098-1004(200001)15:1<76::AID-HUMU15>3.0.CO;2-8.View ArticlePubMedGoogle Scholar
- Riikonen P, Vihinen M: MUTbase: maintenance and analysis of distributed mutation databases. Bioinformatics. 1999, 15 (10): 852-859. 10.1093/bioinformatics/15.10.852.View ArticlePubMedGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8 (3): 175-185.View ArticlePubMedGoogle Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.View ArticlePubMedGoogle Scholar
- Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M: The UCSC Genome Browser Database: update 2009. Nucleic Acids Res. 2009, D755-761. 10.1093/nar/gkn875. 37 DatabaseGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15 (8): 1034-1050. 10.1101/gr.3715005.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.