Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants

Han, Qingmei; Yang, Ying; Wu, Shengyang; Liao, Yingchun; Zhang, Shuang; Liang, Hongbin; Cram, David S.; Zhang, Yu

doi:10.1186/s12864-021-07728-6

Software
Open access
Published: 03 June 2021

Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants

Qingmei Han¹^na1,
Ying Yang²^na1,
Shengyang Wu¹,
Yingchun Liao¹,
Shuang Zhang¹,
Hongbin Liang¹,
David S. Cram¹ &
…
Yu Zhang¹

BMC Genomics volume 22, Article number: 407 (2021) Cite this article

3519 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Background

Next-generation sequencing (NGS) is an efficient tool used for identifying pathogenic variants that cause Mendelian disorders. However, the lack of bioinformatics training of researchers makes the interpretation of identified variants a challenge in terms of precision and efficiency. In addition, the non-standardized phenotypic description of human diseases also makes it difficult to establish an integrated analysis pathway for variant annotation and interpretation. Solutions to these bottlenecks are urgently needed.

Results

We develop a tool named “Cruxome” to automatically annotate and interpret single nucleotide variants (SNVs) and small insertions and deletions (InDels). Our approach greatly simplifies the current burdensome task of clinical geneticists and scientists to identify the causative pathogenic variants and build personal knowledge reference bases. The integrated architecture of Cruxome offers key advantages such as an interactive and user-friendly interface and the assimilation of electronic health records of the patient. By combining a natural language processing algorithm, Cruxome can efficiently process the clinical description of diseases to HPO standardized vocabularies. By using machine learning, in silico predictive algorithms, integrated multiple databases and supplementary tools, Cruxome can automatically process SNVs and InDels variants (trio-family or proband-only cases) and clinical diagnosis records, then annotate, score, identify and interpret pathogenic variants to finally generate a standardized clinical report following American College of Medical Genetics and Genomics/ Association for Molecular Pathology (ACMG/AMP) guidelines. Cruxome also provides supplementary tools to examine and visualize the genes or variations in historical cases, which can help to better understand the genetic basis of the disease.

Conclusions

Cruxome is an efficient tool for annotation and interpretation of variations and dramatically reduces the workload for clinical geneticists and researchers to interpret NGS results, simplifying their decision-making processes. We present an online version of Cruxome, which is freely available to academics and clinical researchers. The site is accessible at http://114.251.61.49:10024/cruxome/.

Background

Genetic diseases that follow an autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive or mitochondrial pattern of inheritance are known as Mendelian disorders. [1,2,3]. Currently, in the order of 7,000–9,600 Mendelian disorders have been recorded by Global Genes (https://globalgenes.org/), Online Mendelian Inheritance in Man (OMIM, https://omim.org/) and Orphadata (http://www.orphadata.org/) databases and approximately 300 new Mendelian phenotypes are updated each year [4]. Of all the Mendelian disorders, approximately 80 % now have a defined genetic cause [5, 6] whereas for the remaining 20 %, the genes and genetic lesions remain unknown [7,8,9]. Thus, clinical research is ongoing to fully characterize the causative genes, develop a better understanding of the underlying disease mechanisms and, explore potential treatment options [10].

Next-generation sequencing (NGS) has emerged as an innovative tool for medical genetics, and has led to a paradigm shift in medical research and clinical practice [11,12,13]. With the decreasing cost of sequencing, methods such as whole exome sequencing (WES) have become affordable and are widely used for the diagnosis of Mendelian disorders, with typical positive diagnostic yields of 25–40 % [14, 15]. With the fast development of different NGS techniques, the gap between data yield, quality and gene coverage between platforms is rapidly closing. The challenge now is the ability to systematically analyze the hundreds of thousands of high-quality variant calls (including single nucleotide variants, SNVs, short insertions or deletions, InDels and large copy number variants, CNVs) that are revealed in WES sequencing files [16,17,18,19]. Even after rigorous filtering, there are still tens to hundreds of candidate causal variants to be considered [19,20,21,22]. Thus, an important step is to choose the appropriate analysis tools to efficiently and precisely mine the causative variants, especially when the analysis team lacks training in the use of sophisticated bioinformatic programs. In addition, secondary confirmatory analyses are also required for verification or support when candidates of causative variation are related to the phenotype.

Several open-source analysis tools for variant annotation and functional effect prediction have been reported including spliceAI [20], ANNOVAR [21], SnpEff [23], PolyPhen-2 [24], CADD [25] and InterVar [26]. For example, SpliceAI is a deep learning-based tool specifically designed to identify splice variants. Combined Annotation Dependent Depletion (CADD) is used to score the deleteriousness of SNV as well as InDel variants in the human genome. Alternatively, InterVar can be used for clinical interpretation of genetic variants using the ACMG/AMP 2015 guidelines [26]. However, almost all of these tools are command line tools that have an unfriendly user interface and require a strong background in bioinformatics to comprehensively analyze the data.

When a set of candidate variants are identified, the aim of follow-up analysis is to establish a strong relationship between the candidate genes and known diseases by using information in the published literature and databases. However, this information is sometimes incomplete or fragmented and distributed differently across many databases, which makes this step very time-consuming and inefficient. There are several reported tools that integrate the various databases and simplify the search. These tools include IPAD (integrated pathway analysis database for systematic enrichment analysis) [27], SIDD (semantically integrated database towards a global view of human disease) [28], VariED (integrated database of gene annotation and expression profiles for variants related to human diseases) [29], DisGeNET (integrated information on human disease-associated genes and variants) [30] and Human Disease Insight (integrated knowledge-based platform for disease-gene-drug information) [31]. However, while useful, these tools only focus on specific applications. Thus, comprehensive integration of different databases for relevant knowledge is urgently needed to increase the yield of positive diagnoses.

Electronic health records (EHRs) have been widely implemented by clinical geneticists and include the patient’s information such as name, age, gender, laboratory test results, phenotypic description, diagnosis and medication details. Almost all tools or databases adopt Human Phenotype Ontology (HPO) as the reference. HPO uses standardized vocabulary for describing phenotypic abnormalities in human disease, drawing on over 13,000 terms and over 156,000 annotations to hereditary diseases (https://hpo.jax.org/) [29, 32]. For clinical geneticists, it is almost impossible to accurately describe all of patient’s phenotype using standard terms, and often the diagnosis records are more colloquial and not directly computationally useful [32, 33]. Benefiting from the development of big data techniques, large-scale EHR data mining has become widely used in data-driven medical studies, clinical decision making, and health management [34,35,36]. Since the phenotypic description of patients is a critical factor for precise variant interpretation, it is urgent to develop new algorithms to efficiently and accurately transform colloquial descriptions to more standardized vocabulary.

Based on these challenges, we develop Cruxome, an automated and user-friendly tool for variant interpretation which is designed to efficiently and precisely handle the Variant Call Format (VCF) file (either from WES or gene panel data) and generate standardized clinical reports. By mining the hundreds of thousands of literature accounts and integrating appropriate databases, Cruxome harbors a comprehensive and regularly updated biomedical knowledge base to keep pace with precise variant interpretation. Cruxome uses a natural language processing algorithm (NER) to transform colloquial descriptions of phenotype to standard HPO vocabulary. Cruxome also supports building a personal knowledge base to efficiently manage patient’s information and interpret results with traceable evidence record of interpretation decisions. Above all, Cruxome provides an overall solution for variant interpretation, dramatically reducing workload and facilitating better decision-making processes.

Implementation

Construction of Cruxome and main features

Cruxome was designed with a user-friendly interface and developed based on a Browser/Server style to facilitate easy access and to minimize incompatibility with different computer operating systems. Cruxome runs in Docker mode (https://www.docker.com/) which is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. Thus, Cruxome can easily be deployed on either a cloud server (for example Amazon Web Services, Microsoft Azure) or on a local server. To enhance the functionality of Cruxome, improve efficiency and simplify code maintenance, a layered pattern was used in the basic architecture of Cruxome (Fig. 1). Cruxome consists of six sublayers: a user interface layer (UIL), a model layer (ML), a controller layer (CL), a support layer (SL), a data exchange layer (DEL) and a data storage layer (DSL). UIL, ML and CL provide the interactive and data presentations to users; SL provides support to CL; DEL provides compatibility to various database types and a connection to laboratory information management systems (LIMS) and other software and DSL is responsible for read/write data from database (MySQL as default, https://www.mysql.com/) and for storage of the information.

Minimum requirements of Cruxome (available on all modern computers):

A modern browser (Chrome, FireFox, Safari or Edge).
A 24-core server with 64G memory, 1T hard disk.
An internet or intranet connection of 10Mbit.

Results

Cruxome pipeline

The overall workflow of Cruxome is shown in Fig. 2. The workflow of Cruxome commences with uploading of VCF files listing the genetic variants identified from gene panel or WES data and, uploading of the phenotypic records of each patient. Next, Cruxome performs variant annotation, phenotype processing and interpretation, and then generates a standardized report summarizing the candidate genetic variants, and provides conclusions and relevant references (PDF or Word format). A user manual file for step by step instruction of how to use the Cruxome software is available for download on the Cruxome website.

Typical application scenario

After login to Cruxome, the user first creates the patient’s record with detailed information about phenotype, age, family relationship or directly imports the records from existing patient databases. Secondly, the VCF files are uploaded. Cruxome supports all of the VCF file formats that meet VCF 4.2 standard or higher, and supports both the GRCh37 (hg19) and GRCh38 reference genomes. Cruxome then automatically checks file formats and standardizes the files.

After checking the patient information and VCF file format, Cruxome then launches its annotation module. For the most comprehensive evaluation and interpretation of variants, Cruxome integrates multiple databases, including sequence databases for gene functional information, population databases for calculation of variants allele frequencies and disease databases to define clinical significance and phenotype relationships relevant to disease phenotype. Multiple tools are then applied to evaluate the effect of variants on protein function and to finally generate a variant score [37].

Next, Cruxome uses a natural language processing algorithm NER to transform clinical diagnosis records to HPO standard format (Fig. 2). By using our newly developed algorithm, Cruxome automatically performs the variant interpretation and clinical classification by combining the phenotypic diagnosis description and the hot gene panel according to American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) guidelines [38]. At the end of the process, Cruxome generates outputs of variant interpretation results with corresponding evidence ordered by pathogenic criterion. Users can further review the interpreted variants, combine more clinical information if required and then generate a clinical report summarizing the relevant genetic variants, conclusions and references (PDF or Word format) (Fig. 2).

Management of your own knowledge base

Once variant interpretation is complete, Cruxome automatically updates your personal knowledge base and stores all the information generated during the interpretation process, including candidate variants, ACMG evidences, literature and versions of modules and databases, thus making the interpretation decisions traceable (Fig. 2). Personal knowledge base dramatically facilitates data tracking, data management and re-interpretation variants using updated databases. Users can also manually modify or update literature records in their own knowledge base, including the clinical level of variants or other fields. When the same variants are again found following the analysis of new samples, the variants are automatically highlighted showing the information from previous records and thus provides users greater confidence with the case at hand.

User case demonstration

A representative proband-only case is presented to demonstrate the functionality of the Cruxome pipeline (Fig. 3). The clinical diagnosis of the six-month-old proband was “decreased fetal movement in the prenatal period and increased head circumference (45.7 cm), global developmental delay, periventricular leukomalacia, hip dysplasia, motor deterioration and impaired pursuit initiation and maintenance post birth”. After login to Cruxome, the home page is loaded (Fig. 3A). The left panel of home page shows the modules of Cruxome whereas the right panel shows the list of patient records. After clicking the “Add” button in Sample Management module, patient’s information such as name, gender, age, clinical phenotype needs to be entered into the pop-up window (Fig. 3B). The VCF file is then uploaded (click “import” button), and Cruxome automatically performs variant interpretation. The progress of VCF uploading, analyzing and interpretation can be visualized in real time by the progress bar on the home page (Fig. 3A). The final interpretation results can be accessed in the “Sample Interpretation” module (Fig. 3C). Supporting information about variants or interpretation can be examined or reviewed by clicking the corresponding button. If candidate pathogenic gene variants are found (AHDC1 gene in this case), users should mark the corresponding variants as “Positive” in the conclusion column (Fig. 3C). By clicking the “Generate Report” icon in upper-right of interpretation results (Fig. 3B), the generate report page will be loaded (Fig. 3D). By choosing the “Positive conclusion” or “Negative conclusion” selection box, variants, references and an automated conclusion will be displayed in corresponding section (Fig. 3D). In this example case, Cruxome successfully identified a pathogenic variant (NM_001029882.3: c.2773 C > T: p.R925*) in the AHDC1 gene, which has been reported to be responsible for autosomal dominant Xia-Gibbs syndrome [39]. Users can simply export a standardized clinical report by clicking the “Generate Report” button below (Fig. 3D). The new report can be accessed in Report Management module.

In another representative trio-family case, the clinical diagnosis of the proband was hyperhomocystinemia, methylmalonic acidemia, anemia, megaloblastic anemia, proteinuria, occult blood, feeding difficulties. Cruxome successfully identified a likely pathogenic (NM_015506.2: c.80 A > G: p.Q27R) and a pathogenic (NM_015506.2: c.217 C > T: p.R73X) variant in the MMACHC gene, which is responsible for methylmalonic aciduria and homocystinuria [40] (Supplemental Table 1). The proband was a compound heterozygote for variants p.Q27R and p.R73X whereas the father and mother were confirmed to be heterozygous for the respective variants.

Extra Tools

Cruxome also provides other useful tools to help clinical geneticists visualize their data. First, the “getting sequence” tool can display DNA sequence of a given region (Fig. 4A). Second, the “examine bam file” tool can be used to schematically display NGS reads that aligned on the reference genome (Fig. 4B). Third, the “locus searching” tool can be used to calculate frequency of variants in all samples in the personal knowledge base (Fig. 4C). Lastly, the “gene coverage and depth” tool can search coverage, depth and the number of variants of a given gene in all samples (Fig. 4D).

Update and version options

Cruxome is frequently updated to incorporate the latest clinical genetic research findings with options for adding new algorithms and new annotation sources and analysis modules. Benefitting from version updates, Cruxome users can easily re-analyze cases stored in the knowledge base, and potentially identify novel pathogenic variants.

Comparison of Cruxome with other software

A range of commercial software has been reported to perform variant annotation and interpretation [26, 41,42,43]. Compared with above mentioned software, Cruxome offers unique advantages (Table 1). Firstly, it facilitates (i) transformation of colloquial description of phenotype to standard HPO vocabulary using a natural language processing algorithm (NER), (ii) automatic variant annotation and interpretation which greatly reduces the workload of users and (iii) export of a standard clinical report summarizing the relevant genetic variants, conclusions and references. However, in the current version of Cruxome, only variants from WES and panel data are supported, and file format of variants is restricted to standard VCF format. This limitation prevents its usages in annotating and interpretating variants from whole genome sequencing (WGS), and reduces flexibility of input files. Accordingly, further development of Cruxome is planned to include modules for annotation and interpretation of WGS variants, and modules that accept various files that contain structured records of variants (e.g. Excel or txt format) as input.

Table 1 Functional comparison of different software

Full size table

Conclusions

By using in-house algorithms and multiple databases, Cruxome can effectively perform variant annotation and interpretation. A user-friendly interface combined with a natural language processing algorithm NER makes Cruxome easy-to-use and importantly, users do not need to change their phenotype descriptions that they write in clinical diagnosis records. Although Cruxome is designed for users with less bioinformatics knowledge, others with a more solid grounding in bioinformatics can also use Cruxome in a more convenient and time-saving way. These features make Cruxome more versatile for use by clinical geneticists and can also provide important information to genetic counselors to discuss the results with patients. Above all, Cruxome is a powerful solution for annotating and interpreting variants and for managing personal knowledge bases and, overcomes the current bottleneck of clinical geneticists spending valuable time mining and evaluating causative variants.

Availability and requirements

Project name: Cruxome.

Project home page: http://114.251.61.49:10024/cruxome/.

Operating system(s): Platform independent.

Programming language: Java.

Other requirements: Java (version > = 1.8.1), Tomcat (version > = 8.0), Docker (version > = 18.03.1-ce), MySQL (version > = 5.7).

License: Free for academic and research use.

Availability of data and materials

The example used in this paper (Fig. 3 B and D, Supplemental Table 1) is available in the free trial account as a demonstration case.

Abbreviations

NGS:: next-generation sequencing
SNVs:: single nucleotide variants
InDels:: insertions or deletions
CNVs:: copy number variants
HPO:: human phenotype ontology
ACMG/AMP:: American College of Medical Genetics and Genomics/ Association for Molecular Pathology
WES:: whole exome sequencing
EHR:: electronic health records
NER:: natural language processing algorithm
VCF:: variant call format
MAF:: minor allele frequency

References

Kennedy MA. Mendelian Genetic Disorders. eLS. 2005. https://doi.org/10.1038/npg.els.0003934.
Antonarakis SE, Beckmann JS. Mendelian disorders deserve more attention. Nat Rev Genet. 2006;7(4):277–82. https://doi.org/10.1038/nrg1826.
Article CAS PubMed Google Scholar
Chakravarti A. Genomic contributions to Mendelian disease. Genome Res. 2011;21(5):643–4. https://doi.org/10.1101/gr.123554.111.
Article CAS PubMed PubMed Central Google Scholar
Hartley T, Balci TB, Rojas SK, Eaton A, Canada CR, Dyment DA, et al. The unsolved rare genetic disease atlas? An analysis of the unexplained phenotypic descriptions in OMIM(R). Am J Med Genet C Semin Med Genet. 2018;178(4):458–63. https://doi.org/10.1002/ajmg.c.31662.
Article CAS PubMed Google Scholar
Field MJ, Boat TF, editors. Rare Diseases and Orphan Products: Accelerating Research and Development. Washington (DC): National Academies Press (US); 2010. https://doi.org/10.17226/12953.
Book Google Scholar
Wright CF, FitzPatrick DR, Firth HV. Paediatric genomics: diagnosing rare disease in children. Nat Rev Genet. 2018;19(5):253–68. https://doi.org/10.1038/nrg.2017.116.
Article CAS PubMed Google Scholar
Wright CF, Fitzgerald TW, Jones WD, Clayton S, McRae JF, van Kogelenberg M, et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet. 2015;385(9975):1305–14. https://doi.org/10.1016/S0140-6736(14)61705-0.
Article PubMed PubMed Central Google Scholar
Deciphering Developmental Disorders S. Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017;542(7642):433–8. https://doi.org/10.1038/nature21062.
Article CAS Google Scholar
Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010;42(1):30–5. https://doi.org/10.1038/ng.499.
Article CAS PubMed Google Scholar
Oti M, Brunner HG. The modular nature of genetic diseases. Clin Genet. 2007;71(1):1–11. https://doi.org/10.1111/j.1399-0004.2006.00708.x.
Article CAS PubMed Google Scholar
Kaname T, Yanagi K, Naritomi K. A commentary on the promise of whole-exome sequencing in medical genetics. J Hum Genet. 2014;59(3):117–8. https://doi.org/10.1038/jhg.2014.7.
Article CAS PubMed Google Scholar
Rabbani B, Tekin M, Mahdieh N. The promise of whole-exome sequencing in medical genetics. J Hum Genet. 2014;59(1):5–15. https://doi.org/10.1038/jhg.2013.114.
Article CAS PubMed Google Scholar
Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26(10):1135–45. https://doi.org/10.1038/nbt1486.
Article CAS PubMed Google Scholar
Dragojlovic N, Elliott AM, Adam S, van Karnebeek C, Lehman A, Mwenifumbo JC, et al. The cost and diagnostic yield of exome sequencing for children with suspected genetic disorders: a benchmarking study. Genet Med. 2018;20(9):1013–21. https://doi.org/10.1038/gim.2017.226.
Article PubMed Google Scholar
Trujillano D, Bertoli-Avella AM, Kumar Kandaswamy K, Weiss ME, Koster J, Marais A, et al. Clinical exome sequencing: results from 2819 samples reflecting 1000 families. Eur J Hum Genet. 2017;25(2):176–82. https://doi.org/10.1038/ejhg.2016.146.
Article CAS PubMed Google Scholar
Hwang S, Kim E, Lee I, Marcotte EM. Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep. 2015;5:17875. https://doi.org/10.1038/srep17875.
Article CAS PubMed PubMed Central Google Scholar
Liu M, Zhong Y, Liu H, Liang D, Liu E, Zhang Y, et al. REDBot: Natural language process methods for clinical copy number variation reporting in prenatal and products of conception diagnosis. Mol Genet Genomic Med. 2020;8(11):e1488. https://doi.org/10.1002/mgg3.1488.
Chen J, Li X, Zhong H, Meng Y, Du H. Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers. Sci Rep. 2019;9(1):9345. https://doi.org/10.1038/s41598-019-45835-3.
Article CAS PubMed PubMed Central Google Scholar
MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012;335(6070):823–8. https://doi.org/10.1126/science.1215040.
Article CAS PubMed PubMed Central Google Scholar
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176(3):535–48. https://doi.org/10.1016/j.cell.2018.12.015. e24.
Article CAS PubMed Google Scholar
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. https://doi.org/10.1093/nar/gkq603.
Article CAS PubMed PubMed Central Google Scholar
Smedley D, Jacobsen JO, Jager M, Kohler S, Holtgrewe M, Schubach M, et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10(12):2004–15. https://doi.org/10.1038/nprot.2015.124.
Article CAS PubMed PubMed Central Google Scholar
Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92. https://doi.org/10.4161/fly.19695.
Article CAS PubMed PubMed Central Google Scholar
Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;76(1):7.20.21-27.20.41. https://doi.org/10.1002/0471142905.hg0720s76.
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886-D94. https://doi.org/10.1093/nar/gky1016.
Article CAS Google Scholar
Li Q, Wang K, InterVar. Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. Am J Hum Genet. 2017;100(2):267–80. https://doi.org/10.1016/j.ajhg.2017.01.004.
Article CAS PubMed PubMed Central Google Scholar
Zhang F, Drabier R. IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis. BMC Bioinformatics. 2012;13(15):S7. https://doi.org/10.1186/1471-2105-13-S15-S7.
Cheng L, Wang G, Li J, Zhang T, Xu P, Wang Y. SIDD: a semantically integrated database towards a global view of human disease. PLoS One. 2013;8(10):e75504. https://doi.org/10.1371/journal.pone.0075504.
Article CAS PubMed PubMed Central Google Scholar
Kohler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2019;47(D1):D1018-D27. https://doi.org/10.1093/nar/gky1105.
Article CAS Google Scholar
Pinero J, Ramirez-Anguita JM, Sauch-Pitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845-D55. https://doi.org/10.1093/nar/gkz1021.
Article CAS Google Scholar
Tasleem M, Ishrat R, Islam A, Ahmad F, Hassan MI. Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information. J Infect Public Health. 2016;9(3):331–8. https://doi.org/10.1016/j.jiph.2015.10.018.
Article PubMed Google Scholar
Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. 2008;83(5):610–5. https://doi.org/10.1016/j.ajhg.2008.09.017.
Article CAS PubMed PubMed Central Google Scholar
Kohler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42(Database issue):D966-74. https://doi.org/10.1093/nar/gkt1026.
Article CAS PubMed Google Scholar
Lei J, Tang B, Lu X, Gao K, Jiang M, Xu H. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc. 2014;21(5):808–14. https://doi.org/10.1136/amiajnl-2013-002381.
Article PubMed Google Scholar
Wu Y, Xu J, Jiang M, Zhang Y, Xu H. A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text. AMIA Annu Symp Proc. 2015;2015:1326-33.
Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics. 2017;33(14):i37–48. https://doi.org/10.1093/bioinformatics/btx228.
Article CAS PubMed PubMed Central Google Scholar
Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. https://doi.org/10.1038/nature15393.
Article CAS Google Scholar
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. https://doi.org/10.1038/gim.2015.30.
Article PubMed PubMed Central Google Scholar
Jiang Y, Wangler MF, McGuire AL, Lupski JR, Posey JE, Khayat MM, et al. The phenotypic spectrum of Xia-Gibbs syndrome. Am J Med Genet A. 2018;176(6):1315–26. https://doi.org/10.1002/ajmg.a.38699.
Article CAS PubMed PubMed Central Google Scholar
Liu MY, Yang YL, Chang YC, Chiang SH, Lin SP, Han LS, et al. Mutation spectrum of MMACHC in Chinese patients with combined methylmalonic aciduria and homocystinuria. J Hum Genet. 2010;55(9):621–6. https://doi.org/10.1038/jhg.2010.81.
Article CAS PubMed Google Scholar
Dahary D, Golan Y, Mazor Y, Zelig O, Barshir R, Twik M, et al. Genome analysis and knowledge-driven variant interpretation with TGex. BMC Med Genomics. 2019;12(1):200. https://doi.org/10.1186/s12920-019-0647-8.
Article PubMed PubMed Central Google Scholar
Caspar SM, Dubacher N, Kopps AM, Meienberg J, Henggeler C, Matyas G. Clinical sequencing: From raw data to diagnosis with lifetime value. Clin Genet. 2018;93(3):508–19. https://doi.org/10.1111/cge.13190.
Article CAS PubMed Google Scholar
Hintzsche JD, Robinson WA, Tan AC. A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data. Int J Genomics. 2016;2016:7983236. https://doi.org/10.1155/2016/7983236.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by an Innovation Capability Support Plan of Shaanxi province (Grant number 2019KJXX-055). The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Qingmei Han and Ying Yang contributed equally to this work.

Authors and Affiliations

Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China
Qingmei Han, Shengyang Wu, Yingchun Liao, Shuang Zhang, Hongbin Liang, David S. Cram & Yu Zhang
Xian Children’s Hospital, 710003, Xian, China
Ying Yang

Authors

Qingmei Han
View author publications
You can also search for this author in PubMed Google Scholar
Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shengyang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yingchun Liao
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Liang
View author publications
You can also search for this author in PubMed Google Scholar
David S. Cram
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YZ designed and supervised the development of Cruxome. QH, YL and SZ developed the software. YY provided WES data and thoroughly tested Cruxome for clinical use. QH, SW and HL drafted the manuscript. DC and YZ were major contributors in writing and revising the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to David S. Cram or Yu Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

All authors (exception Ying Yang) are employees of Berry Genomics Co., Ltd. None of the authors hold stocks or bonds.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Han, Q., Yang, Y., Wu, S. et al. Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants. BMC Genomics 22, 407 (2021). https://doi.org/10.1186/s12864-021-07728-6

Download citation

Received: 22 January 2021
Accepted: 20 May 2021
Published: 03 June 2021
DOI: https://doi.org/10.1186/s12864-021-07728-6

Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants

Abstract

Background

Results

Conclusions

Background

Implementation

Construction of Cruxome and main features

Results

Cruxome pipeline

Typical application scenario

Management of your own knowledge base

User case demonstration

Extra Tools

Update and version options

Comparison of Cruxome with other software

Conclusions

Availability and requirements

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Additional file 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us