Volume 15 Supplement 3
A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders
© Mutarelli et al.; licensee BioMed Central Ltd. 2014
Published: 6 May 2014
Mendelian disorders are mostly caused by single mutations in the DNA sequence of a gene, leading to a phenotype with pathologic consequences. Whole Exome Sequencing of patients can be a cost-effective alternative to standard genetic screenings to find causative mutations of genetic diseases, especially when the number of cases is limited. Analyzing exome sequencing data requires specific expertise, high computational resources and a reference variant database to identify pathogenic variants.
We developed a database of variations collected from patients with Mendelian disorders, which is automatically populated thanks to an associated exome-sequencing pipeline. The pipeline is able to automatically identify, annotate and store insertions, deletions and mutations in the database. The resource is freely available online http://exome.tigem.it. The exome sequencing pipeline automates the analysis workflow (quality control and read trimming, mapping on reference genome, post-alignment processing, variation calling and annotation) using state-of-the-art software tools. The exome-sequencing pipeline has been designed to run on a computing cluster in order to analyse several samples simultaneously. The detected variants are annotated by the pipeline not only with the standard variant annotations (e.g. allele frequency in the general population, the predicted effect on gene product activity, etc.) but, more importantly, with allele frequencies across samples progressively collected in the database itself, stratified by Mendelian disorder.
We aim at providing a resource for the genetic disease community to automatically analyse whole exome-sequencing samples with a standard and uniform analysis pipeline, thus collecting variant allele frequencies by disorder. This resource may become a valuable tool to help dissecting the genotype underlying the disease phenotype through an improved selection of putative patient-specific causative or phenotype-associated variations.
Mendelian disorders are inherited diseases caused by inborn defects in the DNA sequence of one or few genes. Most inherited genetic disorders are rare, although if taken collectively, they are estimated to affect ~4% of newborns. There are ~7000 disease phenotypes described in the Online Mendelian Inheritance in Man (OMIM) Database  but the cause of about half of the described diseases is still unknown . Whole Exome Sequencing (WES) of patients allows to find causative mutations of genetic diseases thanks to High-Throughput Sequencing (HTS) technologies . WES is an effective alternative to standard genetic screenings to find causative mutations of genetic diseases when only few patients are available, as it is often the case for Mendelian disorders . When compared to Whole Genome Sequencing (WGS), WES is still to be preferred because the targeted region comprises only 1-2% of the genome sequence and thus much less reads are required to get the sequencing depth necessary to reliably identify mutations. Furthermore, the potentially damaging effect of a coding-region mutation on the gene product activity can be predicted with good accuracy [5–10], but this is much more difficult in the case of a non-coding region mutation [11, 12].
WES has been successfully used to find candidate causative mutations with as low as one affected individual [13–18]. One limitation of WES is that the percentage of samples where a candidate causative mutation is not found is still high . This may happen when the causative mutation lies outside the targeted region or in a position difficult to sequence, or may be due to incomplete penetrance and the presence of modifier genes [20, 21]. Another factor affecting the outcome of the analysis is the bioinformatic analysis pipeline  and its stringency level, since no standard operating procedure is currently available. This means that in order to compare results of different WES samples, it is important to use a uniform analysis pipeline and a common reference databases to prioritise the detected variants.
Indeed, despite the ever decreasing cost of sequencing experiments, the bioinformatic analysis of WES data requires high computational resources, trained experts and a reference variant database to select and prioritise the best candidate pathogenic variants.
Our aim was to build a community-based resource providing a disease-oriented allele variant frequency repository for Mendelian disorders populated by means of an automatic exome-sequencing analysis pipeline. The expansion and usefulness of this resource will be driven by user-submitted WES samples collected from Mendelian disorder patients.
The analysis pipeline is fully automated and it has a modular structure, as detailed below and in Additional file 1. Each module performs its task using custom scripts and state-of-the-art tools (Additional file 2). The pipeline was designed to run on a high-performance computing cluster using the Torque resource manager, but can easily be ported to any other job manager. The exome.tigem.it website uses a cluster with 8 computing nodes equipped with dual Xeon E5-2670 for a total amount of 128 computing cores and 376GB of RAM.
Read quality assessment and trimming module
Read sequences are submitted by the user in FastQ format  and are initially assessed for the general quality using FastQC . Reads are then trimmed to remove the Illumina adapter sequence and low quality ends (with quality score threshold of 20) using Trim Galore  and cutadapt ; a FastQC report is generated also on the trimmed sequences.
Alignment on reference, post-alignment processing and summary statistics Modules
Paired sequencing reads are aligned to the reference genome (UCSC, hg19 build)  using BWA . Post-alignment process, including SAM conversion, sorting and duplicate removal are performed using Picard  and SAMtools . The Genome Analysis Toolkit (GATK)  is then used to prepare the raw alignment for the variation calling with local realignment around small insertions-deletions (INDELs) and Base Quality Score Recalibration. This module is followed by a small module computing the read summary, target enrichment and target coverage statistics with SAMtools and BEDTools .
SNVs and INDELs calling and annotation Module
The identification of Single Nucleotide Variants (SNVs) and INDELs are separately performed using GATK UnifiedGenotyper, followed by Variant Quality Score Recalibration  when applicable. The SNV and INDEL calls are then merged and annotated using ANNOVAR  to add the following information: the position in genes and amino acid change relative to the RefSeq gene model , presence in dbSNP , OMIM , frequency in NHLBI Exome Variant Server  and 1000 Genomes Project stratified by population , prediction of the potential damaging effect on protein activity with different algorithms [5–10] and evolutionary conservation scores [40, 41]. The annotated results are then imported into the variation database.
Variation database and report generation module
The variation database is implemented in PostgreSQL and its structure with the main tables and relationships is shown in Additional file 3. A variations table contains an entry for each variation progressively collected in the database, each uniquely identified by genomic coordinates, reference and alternative alleles. Separate tables collect the statistics of the analysis calls, the annotation, the analysis and samples details. Finally, the diseases table contains the MEDIC hierarchical disease terms . Once all the detected variants have been imported, the report generation module creates a report including all the variations found in the samples accompanied by the available annotations. Importantly, this module also dynamically computes allele frequencies stratified by disease groups, using the hierarchical disease ontology. In this way, even if no or few samples are available in the database for a specific Mendelian disorder, a sufficient number of samples can be reached by grouping samples at the higher levels of the disease ontology. The variation reports of all the archived analysis are periodically refreshed to update allele frequencies on the analyses gradually added to the database.
Results and discussion
We developed a variation database for Mendelian disorders and associated WES analysis pipeline, in order annotate and store insertions, deletions and single nucleotide variants found in targeted resequencing projects, with a focus on patients affected by Mendelian disorders. The pipeline automates the analysis workflow using state-of-the-art tools, starting with raw sequences and providing the final list of annotated variants found in the sample. The pipeline allows for the simultaneous analysis of multiple samples of related individuals. This option is recommended when analysing members of the same family, who are expected to share the same causative mutation. In this case, the variant calling algorithm uses a multi-sample model that takes into account the global allele count in calling the individual genotypes, which can highly improve sensitivity . It is also possible to analyse unaffected members of the family indicating them as controls. In this case the variants called in the unaffected members can be directly used to filter out all shared mutations that are not relevant in causing the proband phenotype.
This resource is complementary to free and commercial databases of known mutations associated to specific diseases or phenotype, such as the HGMD  or the ClinVar  databases or locus specific databases (LSDBs) , since it focuses on patients affected by Mendelian disorder. It is also different from the other large scale databases providing population frequencies because the collected samples are not phenotypically normal. Moreover, the associated WES analysis pipelines here presented has to be considered only as an accompanying tool to uniformly populate the database and cannot be considered a general purpose exome analysis pipeline, such as those recently presented in the literature [45–47].
The aim of this resource is to provide a standardised analysis of WES samples by providing state-ofthe-art pipeline and a standardised output of the variant calls and annotations, including the relative allele frequency in the anonymised samples already analysed in the database, stratified by disease.
Uniformity of the calling quality is ensured by analysing all samples with the same pipeline. The analysis was implemented to have a low stringency for the initial variant calling, in order to minimise the false negatives, but it relies heavily on intersection filters for controls and general population frequency to rule out non-causative mutations.
Submission of whole exome sequencing samples
Automated analysis workflow
As detailed in the Implementation section, the pipeline workflow follows a state-of-the-art implementation of the exome sequencing analysis  (Additional file 1). The analysis is initialized by a master script that configures and submits the modules performing the actual analysis steps on the computing cluster. The modules are configured with pre-defined sets of parameters to ensure uniformity of sensitivity across analyses. The user can only choose the number of samples to analyze, either as a single case or as a group analysis by selecting the Family option. In this latter case, also control samples are allowed, but these are analyzed separately.
The first module in the pipeline performs a quality assessment of reads and trimming of read ends to remove the adapter sequence or trailing low quality bases. Then reads are aligned to the reference genome (UCSC hg19 ) and the alignment is prepared for variation calling trough a series of steps: format conversion, sorting, local realignment around INDELs and Base Quality Score Recalibration. The local realignment around INDELs is an important step. It finds a consensus alignment among all the reads spanning a deletion or an insertion to both improve INDEL detection sensitivity and accuracy and to reduce SNV false calls due to misalignment of the flanking bases. The Base Quality Score Recalibration is a procedure through which the raw quality scores provided by the instrument are recalibrated according an empirical error model derived by the sequences . The SNV and INDEL variant calling are then performed and the calls are merged and annotated with information collected from several sources (Figure 2). The pipeline is designed to run on a cluster and can submit jobs in parallel to analyse several samples simultaneously. The annotated variant calls are then imported into the variant database.
Variant annotation and reporting
The variation database is used to store the annotated exonic/splicing variants and to calculate allele frequencies stratified by groups of patients presenting the same, or similar, disease or phenotype according to the OMIM identifiers and MeSH terms, implementing the MEDIC hierarchical disease ontology . Importantly, the internal allele frequency among samples progressively collected in the database itself, stratified by Mendelian disorder, is estimated, thus leading to a better selection of putative disease-specific causative variations.
The database includes also annotations of variants from external sources (e.g. dbSNP, 1000genomes, Exome Variant Server and prediction algorithms), which are stored in a separate table and are periodically updated upon release of a new version of one or more external source database.
We give priority to the frequency criterion since when dealing with rare Mendelian disorders it is unlikely that the causative mutation may be common in the general population. These categories should be regarded as guides in prioritising the variant called in the analysis and can help in quickly highlighting the best candidate(s).
We developed a resource for the analysis of WES samples for researchers studying Mendelian disorders. We believe this resource will be useful not only for those who do not have the hardware resources or the necessary expertise to run the analysis, but, more importantly, as a common reference for the community to collect and compare variants across patients with the same, or similar, disease.
Each researcher by submitting data to the resource will enrich the database and thus leverage the frequency of the variations potentially associated to the Mendelian diseases. For this reason, we require all samples to be annotated with the OMIM/MeSH corresponding to the patient phenotype in order to update the corresponding group allele frequencies with the new samples variant calls.
The analysis report classifies variation by classes to help the user in prioritising candidate mutants. These classes should be regarded as prioritising guides and not as hard filters because it is possible that low-quality calls (e.g. due to low coverage or other technical problems in the regions) are true mutations that can be validated and could be lost in a highly stringent analysis.
The resource provides variant frequencies according to disease groups, thus helping in detecting modifier or secondary mutations which tend to be more represented in the patients affected by the same phenotype. The estimation of statistically significant associations will improve with the number of patients with homogeneous phenotype collected in the resource.
The TIGEM Exome Mendelian Disorder Pipeline is a new community-based resource available to the Mendelian diseases research community, built with the aim of help in dissecting the genotype underlying the disease phenotype in patients affected by rare diseases.
Availability and requirements
Project name: TIGEM Exome Mendelian Disorder Pipeline
Project home page: http://exome.tigem.it
Operating system(s): Platform independent
Programming language: bash, perl, R, SQL, PHP
List of abbreviations
Binary Alignment Map
Genome Analysis Toolkit
small insertion or deletion
Next Generation Sequencing
Single Nucleotide Polymorphism
Single Nucleotide Variation
Whole Exome Sequencing
Whole Genome Sequencing
Variant Call Format.
This work was supported by the Fondazione Telethon, the Italian National Research Center Flagship Project EPIGEN, PONa3 00311 and PON01 00862 (Programma Operativo Nazionale "Ricerca & Competitività" 2007-2013 Regioni Convergenza ASSE I). The authors would like to thank Vincenzo Nigro and Sandro Banfi for critical discussion.
The publication costs for this article were funded by the Italian National Research Center (CNR -Flagship Project EPIGEN).
This article has been published as part of BMC Genomics Volume 15 Supplement 5, 2014: Italian Society of Bioinformatics (BITS): Annual Meeting 2013: Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S3
- Amberger J, Bocchini C, Hamosh A: A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®). Human Mutation. 2011, 32 (5): 564-567. [http://onlinelibrary.wiley.com/doi/10.1002/humu.21466/abstract]View ArticlePubMedGoogle Scholar
- Online Mendelian Inheritance in Man. [http://omim.org]
- Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J: Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009, 461 (7261): 272-276. [PMID: 19684571]PubMed CentralView ArticlePubMedGoogle Scholar
- Robinson P, Krawitz P, Mundlos S: Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clinical Genetics. 2011, 80 (2): 127-132. [http://onlinelibrary.wiley.com/doi/10.1111/j.1399-0004.2011.01713.x/abstract]View ArticlePubMedGoogle Scholar
- Ng PC, Henikoff S: Predicting Deleterious Amino Acid Substitutions. Genome Research. 2001, 11 (5): 863-874. [PMID: 11337480 PMCID: PMC311071], [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC311071/]PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols. 2009, 4 (7): 1073-1081. [PMID: 19561590]View ArticlePubMedGoogle Scholar
- Chun S, Fay JC: Identification of deleterious mutations within three human genomes. Genome Research. 2009, 19 (9): 1553-1561. [PMID: 19602639], [http://www.ncbi.nlm.nih.gov/pubmed/19602639]PubMed CentralView ArticlePubMedGoogle Scholar
- Schwarz JM, Rödelsperger C, Schuelke M, Seelow D: MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods. 2010, 7 (8): 575-576. [PMID: 20676075], [http://www.ncbi.nlm.nih.gov/pubmed/20676075]View ArticlePubMedGoogle Scholar
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nature Methods. 2010, 7 (4): 248-249. [PMID: 20354512], [http://www.ncbi.nlm.nih.gov/pubmed/20354512]PubMed CentralView ArticlePubMedGoogle Scholar
- Liu X, Jian X, Boerwinkle E: dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions. Human Mutation. 2011, 32 (8): 894-899. [http://onlinelibrary.wiley.com/doi/10.1002/humu.21517/abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Ward LD, Kellis M: Interpreting noncoding genetic variation in complex traits and human disease. Nature biotechnology. 2012, 30 (11): 1095-1106. [PMID: 23138309]PubMed CentralView ArticlePubMedGoogle Scholar
- Li X, Montgomery SB: Detection and impact of rare regulatory variants in human disease. Frontiers in genetics. 2013, 4: 67-[PMID: 23755067]PubMed CentralPubMedGoogle Scholar
- Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ: Exome sequencing identifies the cause of a mendelian disorder. Nature genetics. 2010, 42: 30-35. [PMID: 19915526]PubMed CentralView ArticlePubMedGoogle Scholar
- Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve HI, Beck AE, Tabor HK, Cooper GM, Mefford HC, Lee C, Turner EH, Smith JD, Rieder MJ, Yoshiura Ki, Matsumoto N, Ohta T, Niikawa N, Nickerson DA, Bamshad MJ, Shendure J: Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genetics. 2010, 42 (9): 790-793. [http://www.nature.com/ng/journal/v42/n9/full/ng.646.html]PubMed CentralView ArticlePubMedGoogle Scholar
- Gilissen C, Arts HH, Hoischen A, Spruijt L, Mans DA, Arts P, Lier Bv, Steehouwer M, Reeuwijk Jv, Kant SG, Roepman R, Knoers NVAM, Veltman JA, Brunner HG: Exome Sequencing Identifies WDR35 Variants Involved in Sensenbrenner Syndrome. The American Journal of Human Genetics. 87 (3): [http://www.cell.com/AJHG/abstract/S0002-9297(10)00417-9]
- Worthey EA, Mayer AN, Syverson GD, Helbling D, Bonacci BB, Decker B, Serpe JM, Dasu T, Tschannen MR, Veith RL, Basehore MJ, Broeckel U, Tomita-Mitchell A, Arca MJ, Casper JT, Margolis DA, Bick DP, Hessner MJ, Routes JM, Verbsky JW, Jacob HJ, Dimmock DP: Making a definitive diagnosis: Successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genetics in Medicine. 2011, 13 (3): 255-262. [http://www.nature.com/gim/journal/v13/n3/full/gim9201146a.html]View ArticlePubMedGoogle Scholar
- Peluso I, Conte I, Testa F, Dharmalingam G, Pizzo M, Collin RW, Meola N, Barbato S, Mutarelli M, Ziviello C, Barbarulo AM, Nigro V, Melone MA, Simonelli F, Banfi S: The ADAMTS18 gene is responsible for autosomal recessive early onset severe retinal dystrophy. Orphanet journal of rare diseases. 2013, 8: 16-[PMID: 23356391]PubMed CentralView ArticlePubMedGoogle Scholar
- Torella A, Fanin M, Mutarelli M, Peterle E, Del Vecchio Blanco F, Rispoli R, Savarese M, Garofalo A, Piluso G, Morandi L, Ricci G, Siciliano G, Angelini C, Nigro V: Next-Generation Sequencing Identifies Transportin 3 as the Causative Gene for LGMD1F. PLoS ONE. 2013, 8 (5): [PMID: 23667635 PMCID: PMC3646821], [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3646821/]Google Scholar
- Stitziel NO, Kiezun A, Sunyaev S: Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biology. 2011, 12 (9): 227-[PMID: 21920052], [http://genomebiology.com/2011/12/9/227/abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Lupski JR: Digenic inheritance and Mendelian disease. Nature genetics. 2012, 44 (12): 1291-1292. [PMID: 23192179]View ArticlePubMedGoogle Scholar
- Schäffer AA: Digenic inheritance in medical genetics. Journal of medical genetics. 2013, [PMID: 23785127]Google Scholar
- O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, Bodily P, Tian L, Hakonarson H, Johnson WE, Wei Z, Wang K, Lyon GJ: Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Medicine. 2013, 5 (3): 28-[PMID: 23537139], [http://genomemedicine.com/content/5/3/28/abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Davis AP, Wiegers TC, Rosenstein MC, Mattingly CJ: MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database. Database. 2012, 2012 (0): bar065-bar065. [http://database.oxfordjournals.org/content/2012/bar065.abstract]PubMed CentralPubMedGoogle Scholar
- Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic acids research. 2010, 38 (6): 1767-1771. [PMID: 20015970]PubMed CentralView ArticlePubMedGoogle Scholar
- FastQC. [http://www.bioinformatics.babraham.ac.uk/projects/fastqc]
- Trim Galore. [http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/]
- Martin M: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011, 17: 10-12. [http://journal.embnet.org/index.php/embnetjournal/article/view/200]View ArticleGoogle Scholar
- Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP, Haussler D, Kent WJ: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Research. 2012, 41 (D1): D64-D69. [http://nar.oxfordjournals.org/content/41/D1/D64.abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England). 2009, 25 (14): 1754-1760. [PMID: 19451168]View ArticleGoogle Scholar
- Picard. [http://picard.sourceforge.net]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England). 2009, 25 (16): 2078-2079. [PMID: 19505943]View ArticleGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010, 20 (9): 1297-1303. [PMID: 20644199]PubMed CentralView ArticlePubMedGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26 (6): 841-842. [PMID: 20110278 PMCID: PMC2832824], [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832824/]PubMed CentralView ArticlePubMedGoogle Scholar
- DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics. 2011, 43 (5): 491-498. [PMID: 21478889]PubMed CentralView ArticlePubMedGoogle Scholar
- Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic acids research. 2010, 38 (16): e164-[PMID: 20601685]PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Brown GR, Maglott DR: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic acids research. 2012, 40 (Database): D130-135. [PMID: 22121212]PubMed CentralView ArticlePubMedGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic acids research. 2001, 29: 308-311. [PMID: 11125122]PubMed CentralView ArticlePubMedGoogle Scholar
- NHLBI Exome Variant Server. [http://evs.gs.washington.edu/EVS]
- Consortium TGP: A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073. [http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html]View ArticleGoogle Scholar
- Goode DL, Cooper GM, Schmutz J, Dickson M, Gonzales E, Tsai M, Karra K, Davydov E, Batzoglou S, Myers RM, Sidow A: Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome research. 2010, 20 (3): 301-310. [PMID: 20067941]PubMed CentralView ArticlePubMedGoogle Scholar
- Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A: Detection of nonneutral substitution rates on mammalian phylogenies. Genome research. 2010, 20: 110-121. [PMID: 19858363]PubMed CentralView ArticlePubMedGoogle Scholar
- Cooper DN, Ball EV, Krawczak M: The human gene mutation database. Nucleic Acids Research. 1998, 26: 285-287. [PMID: 9399854], [http://nar.oxfordjournals.org/content/26/1/285]PubMed CentralView ArticlePubMedGoogle Scholar
- ClinVar. [http://www.ncbi.nlm.nih.gov/clinvar]
- LSDB list. [http://www.hgvs.org/dblist/glsdb]
- Blanca JM, Pascual L, Ziarsolo P, Nuez F, Cañizares J: ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 2011, 12: 285-[PMID: 21635747], [http://www.biomedcentral.com/1471-2164/12/285/abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Asmann YW, Middha S, Hossain A, Baheti S, Li Y, Chai HS, Sun Z, Duffy PH, Hadad AA, Nair A, Liu X, Zhang Y, Klee EW, Kalari KR, Kocher JPA: TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data. Bioinformatics. 2012, 28 (2): 277-278. [PMID: 22088845 PMCID: PMC3259432], [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259432/]PubMed CentralView ArticlePubMedGoogle Scholar
- D'Antonio M, Meo PDD, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignanò T: WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinformatics. 2013, 14 (Suppl 7): S11-[PMID: 23815231], [http://www.biomedcentral.com/1471-2105/14/S7/S11/abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Coletti MH, Bleich HL: Medical Subject Headings Used to Search the Biomedical Literature. Journal of the American Medical Informatics Association. 2001, 8 (4): 317-323. [PMID: 11418538], [http://jamia.bmj.com/content/8/4/317]PubMed CentralView ArticlePubMedGoogle Scholar
- VCF format specifications. [http://vcftools.sourceforge.net/specs.html]
- Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z: A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in bioinformatics. 2013, 15 (2): 256-278. [PMID: 23341494]PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.