- Open Access
Disease-associated variants in different categories of disease located in distinct regulatory elements
- Meng Ma†1, 4,
- Ying Ru†1, 3,
- Ling-Shiang Chuang1,
- Nai-Yun Hsu1,
- Li-Song Shi1,
- Jörg Hakenberg1,
- Wei-Yi Cheng1,
- Andrew Uzilov1,
- Wei Ding1,
- Benjamin S Glicksberg1, 2 and
- Rong Chen1Email author
© Ma et al.; licensee BioMed Central Ltd. 2015
Published: 18 June 2015
The invention of high throughput sequencing technologies has led to the discoveries of hundreds of thousands of genetic variants associated with thousands of human diseases. Many of these genetic variants are located outside the protein coding regions, and as such, it is challenging to interpret the function of these genetic variants by traditional genetic approaches. Recent genome-wide functional genomics studies, such as FANTOM5 and ENCODE have uncovered a large number of regulatory elements across hundreds of different tissues or cell lines in the human genome. These findings provide an opportunity to study the interaction between regulatory elements and disease-associated genetic variants. Identifying these diseased-related regulatory elements will shed light on understanding the mechanisms of how these variants regulate gene expression and ultimately result in disease formation and progression.
In this study, we curated and categorized 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations. Compared against nine different types of regulatory regions from FANTOM5 and ENCODE projects, we found that different types of disease variants show distinctive propensity for particular regulatory elements. Mendelian disease variants and recurrent cancer somatic mutations are 22-fold and 10- fold significantly enriched in promoter regions respectively (q<0.001), compared with allele-frequency-matched genomic background. Separate from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoters (q<0.001). Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distribution across types of functional effects.
We further found that regulatory regions are located within over 50% coding exon regions. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively.
Disease-associated variants in different disease categories are preferentially located in particular regulatory elements. These results will be useful for an overall understanding about the differences among the pathogenic mechanisms of various disease-associated variants.
Along with the wide application of high throughput technologies, hundreds of millions genetic variants have been identified with a dramatic growth of dbSNP occurring after 2007. From these resources/studies, it was found that ~97% of all identified variants are noncoding variants, consistent with the notion that 98% of human genome sequences are noncoding . The studies that have resulted from the ENCODE project show that over 80% of human genome are functional , participating in at least one biochemical RNA- or chromatin-associated event in at least one cell type. Any variant that is located within a functional genomic region potentially has the ability to cause a dysregulation on gene expression through modifying regulatory elements, possibly resulting in diseases pathogenesis [4, 5]. A lot of well-annotated disease-variants have been collected in the Human Gene Mutation Database (HGMD) ; these variants are organized into three groups of significant functional disease SNPs, namely coding SNPs (cSNPs), splicing SNPs (sSNPs) and regulatory SNPs (rSNPs), which account for ~86%, ~10% and ~3% of variants in HGMD respectively [6–9]. There is plenty of information about coding variants but limited knowledge about noncoding variants. In recent years, genome-wide association studies (GWAS)  identified over ten thousand variants associated with various diseases/traits, ~90% of which localize outside of known protein-coding regions. This phenomenon highlights the substantial gap between the plethora of disease- or trait-associated noncoding variants and our understanding of how most of these variants contribute to diseases/traits. (Figure S1)
Gene expression is a tightly regulated process, involving various regulatory elements including promoters, enhancers, insulators, and silencers. Moreover, the chemical modifications (i.e. methylation and acetylation) on histone proteins present in chromatin has been shown to change the accessibility of the chromatin for transcription to occur and thusly influence gene expression [11, 12]. Some projects, such as ENCODE  and FANTOM5 [13, 14], adopted various experimental technologies including ChIP- seq , DNase-seq , ChIA-PET , and CAGE [18–21], and identified a lot of various regulatory regions throughout the human genome across hundreds of tissues and cell types . These various experiments validated regulatory regions datum provide an opportunity to investigate the underlying pathogenic mechanism of disease-associated variants.
A possible mechanism underlying the pathogenesis of disease-associated variants is the disruption of the binding of transcription factors, local chromatin structure, and/or co-factors recruitment, ultimately altering the expression of the target genes. Some published studies support such a hypothesis through analyzing the distribution of regulatory complex disease variants by GWAS [3, 23–30]. In the current study, we focus on the dissimilarity of underlying pathogenic regulatory mechanisms of disease-associated variants in different disease categories, including Mendelian diseases, complex diseases, cancer predisposing germline variants, and recurrent cancer somatic mutations.
Results and discussion
Distinct densities of disease-associated variants within different types of regulatory regions
Curation of disease-associated variants and regulatory regions
Summary of disease variants.
Mendelian disease variants
Complex disease variants
GWAS catalog, VarDi
Cancer predisposing germline variants
Recurrent cancer somatic mutations
Summary of regulatory regions from FANTOM5 and ENCODE.
Percent of human genome (%)
HudsonAlpha Institute for Biotechnology, Yale University, Harvard University
HudsonAlpha Institute for Biotechnology
Histone modification region
Broad institute, Massachusetts General Hospital, Harvard Medical School
Chromatin physical interaction regions
Genome Institute of Singapore, Stanford University
DNA binding sites of protein
HudsonAlpha Institute for Biotechnology, Yale University, Harvard University
Open chromatin regions (DNase I hypersensitive sites)
Washington University, Duke University
Open chromatin regions by FAIRE-seq
Duke University, University of North Carolina at Chapel Hill, University of Texas at Austin, European Bioinformatics Institute, University of Cambridge
Regulatory regions are widely located within coding and noncoding regions
Percentage of each types of regulatory regions overlapped with different human genomic regions
Overlapping Coding Exon (bp)
Overlapping Upstream (bp)
Overlapping 3'-UTR (bp)
Overlapping 5'-UTR (bp)
Overlapping Introns (bp)
Overlapping Downstream (bp)
Overlapping Intergenic Regions (bp)
physical interaction regions
sites of protein
chromatin regions (DNase
I hypersensitive sites)
chromatin regions by
Illumina SureSelect TruSeq and Nimblegen SeqCap EZ are two popular exome DNA sequencing technologies which can be used to identify Mendelian disease variants, cancer predisposing germline mutations and cancer somatic mutations. The target regions of these two exome DNA sequencing platforms can be located within various human genomic regions (Table S2, S3). Moreover, these target regions also are overlapped with various regulatory regions (Table S4, S5), suggesting any disease variants identified by such exome DNA sequencing platform can likely be located within any type of regulatory regions.
Highest density of disease-associated variants within transcription promoter
Summary of disease variants residing within regulatory regions in seven types of human genomic regions
#Disease Variants within upstream
#Disease Variants within 5'UTR
#Disease Variants within coding exons
#Disease Variants within introns
#Disease Variants within 3'UTR
#Disease Variants within downstrea m
#Disease Variants within intergenic region
Total (unique variants)
Open chromatin regions (DNase I hypersensitiv e sites)
DNA binding sites of protein
Chromatin physical interaction regions
Open chromatin regions by FAIRE-seq
Histone modification region
Similar pattern of functional effects between Mendelian disease variants and recurrent cancer somatic mutations
The majority of complex disease variants are noncoding variants. Intron_variant (46%), upstream_gene_variant (10%), downstream_gene_variant (10%) and intergenic_variant (8%) sum up to ~75% of the overall complex disease variants. Considering that complex disease variants identified via GWAS are not necessarily the causal variants, and functional annotation of the GWAS SNPs may not reflect the nature of complex disease causal variants, we further recompiled the annotation on those complex disease variants that were replicated in at least two different ethnicities, and more likely to be causal than just markers. We produced a similar annotation result for complex disease causal variants (Figure S2). Intron_variant (39%), upstream_gene_variant (19%), downstream_gene_variant (19%), and intergenic_variant (4%), sum up to ~80% of the overall complex disease causal variants, supporting complex disease causal variants mainly are located within noncoding region.
More deleterious functional effects are found for Mendelian disease variants, cancer predisposing germline variants, and recurrent cancer somatic mutations compared to complex disease variants. Deleterious functional effects, such as stop_gained and frameshift_variant make up a substantial part of recurrent cancer somatic mutations, cancer predisposing germline variants and Mendelian disease variants. We generated a histogram of the functional effects of the four types of disease variants (Figure 1E). Roughly 5% of cancer predisposing germline variants change splice sites, suggesting abnormal splicing isoforms caused by variants might lead to cancer formation. Stop_gained variants may result in a prematurely ended protein product, which is notable among the consequences of Mendelian disease variants, cancer predisposing germline variants and recurrent cancer somatic mutations. The top eleven serious consequences, specifically transcript_ablation, splice_donor_variant, splice_acceptor_variant, stop_gained, frameshift_variant, stop_lost, initiator_codon_variant, transcript_amplication, inframe_insertion, inframe_deletion and missense_variant (Figure 1E, Table S4), account for 32.39%, 31.95%, 30.55% and 1.8% in cancer predisposing germline variants, recurrent cancer somatic mutations, Mendelian disease variants, and complex disease variants respectively. The majority of complex disease variants were annotated by the bottom fifteen consequences categories, suggesting milder functional effect of complex disease variants compared to other three types of disease variants. Accordingly, cancer predisposing germline variants, recurrent cancer somatic mutations, and Mendelian disease variants tend to cause more serious consequences compared to complex disease variants.
A limitation of this analysis is that the SNPs, which are linkage disequilibrium with complex disease variants, were not considered for the functional effect annotation analysis. Even so, we still accept that the complex disease associated variants can reflect the main properties of the disease-associated linkage disequilibrium genomic regions where the complex disease causal variants may locate. Therefore, this functional effect annotation analysis here can be helpful to understand the dissimilarity among the functional effects of the four types of disease-associated variants.
Positive correlation between functionality of disease variants and evolutionary constraints on the disease variants
A series of bioinformatics tools have been developed to predict whether variants are functional or deleterious. We applied GWAVA , Mutation Assessor , CADD, and GERP [46, 47] to score and measure the functionality of the four types of disease variants.
Functional disease-associated variant is prone to under the evolutionary constraint. GERP [46, 47] can produce position-specific estimates of evolutionary constraint. Negative GERP scores indicate that a site is most likely evolutionary neutral. Positive scores suggest that a site may be under evolutionary constraint. Positive scores scale with the level of constraint, such that the greater the score, the greater the level of evolutionary constraint on that site. We found that 82.41% of cancer predisposing germline variants, 86.06% of Mendelian disease variants, 70.22% of recurrent cancer somatic mutations have a positive GERP score, while ~60% of complex disease variants have a negative GERP score (Figure 2F), indicating that variants in the former group are under evolutionary constraint, while the majority of complex disease variants are evolutionary neutrally. Moreover, GWAVA, Mutation Assessor and CADD annotations of the four types of disease variants all suggest that the functionality of cancer predisposing germline variants, Mendelian disease variants, and recurrent cancer somatic mutations is greater than that of complex disease variants. By and large, the GERP score of the disease variants gradually decrease in the order of Mendelian disease variants, cancer predisposing germline variants, recurrent cancer somatic mutations, and complex disease variants. Thus, the aforementioned observations rationally lead to the conclusion that the greater the functionality of the disease variant, the greater the level of evolutionary constraint.
Disease-associated variants in different disease categories are located within particular regulatory regions
Overall, the enrichments of different types of disease variants within various regulatory regions are different from each other. Enrichment of Mendelian disease variants, recurrent cancer somatic mutations, and cancer predisposing germline variants within transcription promoter regions are 21 times (log value 3.04), 10.57 times (log value 2.36) and 6.1 times (log value 1.8) higher than that of the genome variant background respectively, in contrast to only 1.9 times (log value 0.64) for complex disease variants. This implies that transcription promoters might be an efficient mechanism for Mendelian disease and cancer (germline or somatic), but not for complex disease pathogenesis. Additionally, the enrichment profile of the four types of disease variants within methylation regions just like that within transcription promoters. Mendelian disease variants, recurrent cancer somatic mutations, and complex disease variants show higher enrichment within transcription insulator regions than cancer predisposing germline variants. Most disease variants are enriched within methylation and histone modification regions, suggesting a strong correlation between epigenetic marks and diseases, a pattern that some recent studies support [48–50]. In fact, cancer predisposing germline variants are over ten times more enriched within histone modification regions and chromatin physical interaction regions. There are no prominently enriched regulatory regions for complex disease variants, which present quite even enrichment distribution throughout all types of regulatory regions. Interestingly, complex disease variants show a positive enrichment within transcription enhancer, while other types of diseases variants have low negative enrichment, suggesting transcription enhancers might play an important role during complex disease development compared to other types of diseases. All four types of disease variants are enriched within DNA binding sites of protein by ChIP-seq, DNase I hypersensitive sites by DNase-seq, and open chromatin regions by FAIRE-seq. Disease-associated variants in different disease categories show dissimilar enrichment patterns within diverse regulatory elements, implying distinct priority of regulatory pathogenic mechanisms for different type of disease variants.
Considering that the majority of regulatory regions are located outside coding regions, and the distinct ratios of coding and noncoding disease variants in four types of disease categories may cause an acquisition bias on enrichment analysis, we further recalculated the enrichment analysis for only noncoding disease variants in four types of disease categories to eliminate the potential acquisition bias (Figure 3B, Table S8). By and large, the enrichment profile of noncoding disease variants is similar to that of all disease variants. Noncoding disease variants for Mendelian disease and cancer (germline or somatic) shows high enrichments within transcription promoter. Noncoding cancer germline variants are over ten times enriched within chromatin physical interaction regions. The highest enrichment within transcription enhancer is from complex disease variants. The outstanding enrichment difference between all disease variants and noncoding disease variants, occurs within histone modification regions, a dramatic decrease of enrichments, which conversely implies a tight association between histone modification epigenetic marks and disease variants that are located within coding regions. A recent study showed that histone modifications marks can be used to predict coding exon inclusion levels , which supports the idea that if the histone modification regions are altered by disease variants, then the change of target exons expression can be expected, potentially leading to disease formation. On the whole, noncoding disease variants and all disease variants show similar enrichment profiles within various regulatory regions.
The two types of enrichment analyses of disease variants, based on dbSNP control group and 1000 equal size specific control groups, both suggest that disease-associated variants in different disease categories preferentially locate within particular regulatory regions.
We curated 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations, and compared them against nine types of regulatory regions. Mendelian disease variants and recurrent cancer somatic mutations are 22- and 10-fold significantly enriched in promoter regions with q<0.001 respectively, compared to allele-frequency-matched genomic background. Different from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoter (q<0.001). However, we observed a dramatic enrichment drop for noncoding cancer predisposing germline variants, with only 3-fold and 2-fold enrichment in chromatin physical interaction regions and transcription promoter regions with q<0.001, respectively. Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distributions across types of functional impacts, suggesting the discovery of Mendelian disease variants might be broad enough to cover major pathways.
We also found that nine types of regulatory regions are located within over 50% of coding exon regions, suggesting the regulatory role of coding regions during gene expression. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively.
We recommend that different types of regulatory regions should be investigated for different categories of diseases, and the disease variants curated in this study provide a valuable resource for researchers to investigate the functional impact of disease variants.
This study applied computational analytical methods to explore the pathogenic mechanism of disease-associated variants in different disease categories primarily at the regulatory level.
2 × 2 contingency table containing the number of Mendelian disease variants and control group SNPs located within or outside promoters for odds ratio calculation.
Lastly, Pearson chi-squared test was performed on the 2 × 2 contingency table using a perl module Statistics::ChisqIndep from CPAN.
This study and publication was supported by National Natural and Scientific Funding (61300057, 81000321), Anhui Province Natural and Scientific Funding (1208085QF120), and the 48th Scientific Research Staring Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China (1685).
This article has been published as part of BMC Genomics Volume 16 Supplement 8, 2015: VarI-SIG 2014: Identification and annotation of genetic variants in the context of structure, function and disease. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/16/S8.
- Sherry ST, Ward M-H, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308.PubMed CentralView ArticlePubMedGoogle Scholar
- Elgar G, Vavouri T: Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends in Genetics. 2008, 24 (7): 344-352. 10.1016/j.tig.2008.04.005.View ArticlePubMedGoogle Scholar
- Consortium EP: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489 (7414): 57-74. 10.1038/nature11247.View ArticleGoogle Scholar
- Ward LD, Kellis M: Interpreting noncoding genetic variation in complex traits and human disease. Nature Biotechnology. 2012, 30 (11): 1095-1106. 10.1038/nbt.2422.PubMed CentralView ArticlePubMedGoogle Scholar
- Li MJ, Yan B, Sham PC, Wang J: Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. Briefings in Bioinformatics. 2014, bbu018-piiGoogle Scholar
- Stenson PD, Mort M, Ball EV, Howells K, Phillips AD, Thomas N, Cooper DN: The human gene mutation database: 2008 update. Genome Med. 2009, 1 (1): 13-10.1186/gm13.PubMed CentralView ArticlePubMedGoogle Scholar
- Ponomarenko JV, Merkulova TI, Vasiliev GV, Levashova ZB, Orlova GV, Lavryushev SV, Fokin ON, Ponomarenko MP, Frolov AS, Sarai A: rSNP_Guide, a database system for analysis of transcription factor binding to target sequences: application to SNPs and site-directed mutations. Nucleic Acids Research. 2001, 29 (1): 312-316. 10.1093/nar/29.1.312.PubMed CentralView ArticlePubMedGoogle Scholar
- Wray GA: The evolutionary significance of cis-regulatory mutations. Nature Reviews Genetics. 2007, 8 (3): 206-216. 10.1038/nrg2063.View ArticlePubMedGoogle Scholar
- Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD: Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Human Mutation. 2010, 31 (6): 631-655. 10.1002/humu.21260.View ArticlePubMedGoogle Scholar
- Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L: The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research. 2014, 42 (D1): D1001-D1006. 10.1093/nar/gkt1229.PubMed CentralView ArticlePubMedGoogle Scholar
- Gräff J, Tsai L-H: Histone acetylation: molecular mnemonics on the chromatin. Nature Reviews Neuroscience. 2013, 14 (2): 97-111. 10.1038/nrn3427.View ArticlePubMedGoogle Scholar
- Haberland M, Montgomery RL, Olson EN: The many roles of histone deacetylases in development and physiology: implications for disease and therapy. Nature Reviews Genetics. 2009, 10 (1): 32-42. 10.1038/nrg2485.PubMed CentralView ArticlePubMedGoogle Scholar
- FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, et al: A promoter-level mammalian expression atlas. Nature. 2014, 507 (7493): 462-470. 10.1038/nature13182.View ArticleGoogle Scholar
- Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T: An atlas of active enhancers across human cell types and tissues. Nature. 2014, 507 (7493): 455-461. 10.1038/nature12787.View ArticlePubMedGoogle Scholar
- Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research. 2012, 22 (9): 1813-1831. 10.1101/gr.136184.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Song L, Crawford GE: DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols. 2010, 2010 (2): pdb.prot5384-PubMed CentralView ArticlePubMedGoogle Scholar
- Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, et al: An oestrogen-receptor-&agr;-bound human chromatin interactome. Nature. 2009, 462 (7269): 58-64. 10.1038/nature08497.PubMed CentralView ArticlePubMedGoogle Scholar
- Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, et al: CAGE: cap analysis of gene expression. Nature Methods. 2006, 3 (3): 211-222. 10.1038/nmeth0306-211.View ArticlePubMedGoogle Scholar
- Valen E, Pascarella G, Chalk A, Maeda N, Kojima M, Kawazu C, et al: Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Research. 2009, 19 (2): 255-265.PubMed CentralView ArticlePubMedGoogle Scholar
- Salimullah M, Mizuho S, Plessy C, Carninci P: NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harbor Protocols. 2011, 2011 (1): pdb.prot5559-PubMed CentralView ArticlePubMedGoogle Scholar
- Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, et al: Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Research. 2011, 21 (7): 1150-1159. 10.1101/gr.115469.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, et al: Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences. 2014, 111 (17): 6131-6138. 10.1073/pnas.1318948111.View ArticleGoogle Scholar
- Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al: Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012, 337 (6099): 1190-1195. 10.1126/science.1222794.PubMed CentralView ArticlePubMedGoogle Scholar
- Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011, 473 (7345): 43-49. 10.1038/nature09906.PubMed CentralView ArticlePubMedGoogle Scholar
- Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology. 2010, 28 (8): 817-825. 10.1038/nbt.1662.PubMed CentralView ArticlePubMedGoogle Scholar
- Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M: Linking disease associations with regulatory information in the human genome. Genome Research. 2012, 22 (9): 1748-1759. 10.1101/gr.136127.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Bryzgalov LO, Antontseva EV, Matveeva MY, Shilov AG, Kashina EV, Mordvinov VA, Merkulova TI: Detection of Regulatory SNPs in Human Genome Using ChIP-seq ENCODE Data. PLoS one. 2013, 8 (10): e78833-10.1371/journal.pone.0078833.PubMed CentralView ArticlePubMedGoogle Scholar
- Karczewski KJ, Dudley JT, Kukurba KR, Chen R, Butte AJ, Montgomery SB, Snyder M: Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences. 2013, 110 (23): 9607-9612. 10.1073/pnas.1219099110.View ArticleGoogle Scholar
- Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, et al: Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2014, 518: 337-343. 10.1038/nature13835.PubMed CentralView ArticlePubMedGoogle Scholar
- Ward LD, Kellis M: Interpreting non-coding variation in complex disease genetics. Nature Biotechnology. 2012, 30 (11): 1095-1106. 10.1038/nbt.2422.PubMed CentralView ArticlePubMedGoogle Scholar
- ONline Mendelian Inheritance In Man, OMIM. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), [http://omim.org]
- Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR: ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research. 2013, 42 (Database issue): D980-D985.PubMed CentralPubMedGoogle Scholar
- Glicksberg BSLL, Castellanos RZ, Hakenberg J, Cheng W, Khader S, Ma M, et al: An integrative pipeline for multi-modal discovery of disease relationships. Pac Symp Bio. 2015, 20: 407-418.Google Scholar
- Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al: COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research. 2010, 39 (Database issue): D945-D950.PubMed CentralPubMedGoogle Scholar
- St Laurent G, Shtokalo D, Tackett MR, Yang Z, Eremina T, Wahlestedt C, et al: Intronic RNAs constitute the major fraction of the non-coding RNA in mammalian cells. BMC Genomics. 2012, 13 (1): 504-10.1186/1471-2164-13-504.PubMed CentralView ArticlePubMedGoogle Scholar
- Relle M, Becker M, Meyer RG, Stassen M, Schwarting A: Intronic promoters and their noncoding transcripts: A new source of cancer-associated genes. Molecular Carcinogenesis. 2014, 53 (2): 117-124. 10.1002/mc.21955.View ArticlePubMedGoogle Scholar
- Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, Wills MR, Weissman JS: Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Reports. 2014, 8 (5): 1365-1379. 10.1016/j.celrep.2014.07.045.PubMed CentralView ArticlePubMedGoogle Scholar
- Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, et al: Exonic transcription factor binding directs codon choice and affects protein evolution. Science. 2013, 342 (6164): 1367-1372. 10.1126/science.1243490.PubMed CentralView ArticlePubMedGoogle Scholar
- Phillips JE, Corces VG: CTCF: master weaver of the genome. Cell. 2009, 137 (7): 1194-1211. 10.1016/j.cell.2009.06.001.PubMed CentralView ArticlePubMedGoogle Scholar
- McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F: Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010, 26 (16): 2069-2070. 10.1093/bioinformatics/btq330.PubMed CentralView ArticlePubMedGoogle Scholar
- Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M: The Sequence Ontology: a tool for the unification of genome annotations. Genome Biology. 2005, 6 (5): R44-10.1186/gb-2005-6-5-r44.PubMed CentralView ArticlePubMedGoogle Scholar
- Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al: Ensembl 2014. Nucleic Acids Research. 2013, 42 (Database issue): D749-D755.PubMed CentralPubMedGoogle Scholar
- Ritchie GR, Dunham I, Zeggini E, Flicek P: Functional annotation of noncoding sequence variants. Nature Methods. 2014, 11 (3): 294-296. 10.1038/nmeth.2832.View ArticlePubMedGoogle Scholar
- Reva B, Antipin Y, Sander C: Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Research. 2011, 39 (17): e118-10.1093/nar/gkr407.PubMed CentralView ArticlePubMedGoogle Scholar
- Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J: A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics. 2014, 46 (3): 310-315. 10.1038/ng.2892.PubMed CentralView ArticlePubMedGoogle Scholar
- Goode DL, Cooper GM, Schmutz J, Dickson M, Gonzales E, Tsai M, et al: Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Research. 2010, 20 (3): 301-310. 10.1101/gr.102210.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S: Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology. 2010, 6 (12): e1001025-10.1371/journal.pcbi.1001025.PubMed CentralView ArticlePubMedGoogle Scholar
- Aran D, Sabato S, Hellman A: DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 2013, 14 (3): R21-10.1186/gb-2013-14-3-r21.PubMed CentralView ArticlePubMedGoogle Scholar
- Portela A, Esteller M: Epigenetic modifications and human disease. Nature Biotechnology. 2010, 28 (10): 1057-1068. 10.1038/nbt.1685.View ArticlePubMedGoogle Scholar
- Zaidi S, Choi M, Wakimoto H, Ma L, Jiang J, Overton JD, et al: De novo mutations in histone-modifying genes in congenital heart disease. Nature. 2013, 498 (7453): 220-223. 10.1038/nature12141.PubMed CentralView ArticlePubMedGoogle Scholar
- Enroth S, Bornelöv S, Wadelius C, Komorowski J: Combinations of histone modifications mark exon inclusion levels. PLoS One. 2012, 7 (1): e29911-10.1371/journal.pone.0029911.PubMed CentralView ArticlePubMedGoogle Scholar
- 1000 Genomes Project Consortium, Abecasis , Auton A, Brooks LD, DePristo MA, Durbin RM, et al: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491 (7422): 56-65. 10.1038/nature11632.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.