Epigenome overlap measure (EPOM) for comparing tissue/cell types based on chromatin states
© Li et al. 2015
Published: 11 January 2016
The dynamics of epigenomic marks in their relevant chromatin states regulate distinct gene expression patterns, biological functions and phenotypic variations in biological processes. The availability of high-throughput epigenomic data generated by next-generation sequencing technologies allows a data-driven approach to evaluate the similarities and differences of diverse tissue and cell types in terms of epigenomic features. While ChromImpute has allowed for the imputation of large-scale epigenomic information to yield more robust data to capture meaningful relationships between biological samples, widely used methods such as hierarchical clustering and correlation analysis cannot adequately utilize epigenomic data to accurately reveal the distinction and grouping of different tissue and cell types.
We utilize a three-step testing procedure–ANOVA, t test and overlap test to identify tissue/cell-type- associated enhancers and promoters and to calculate a newly defined Epigenomic Overlap Measure (EPOM). EPOM results in a clear correspondence map of biological samples from different tissue and cell types through comparison of epigenomic marks evaluated in their relevant chromatin states.
Correspondence maps by EPOM show strong capability in distinguishing and grouping different tissue and cell types and reveal biologically meaningful similarities between Heart and Muscle, Blood & T-cell and HSC & B-cell, Brain and Neurosphere, etc. The gene ontology enrichment analysis both supports and explains the discoveries made by EPOM and suggests that the associated enhancers and promoters demonstrate distinguishable functions across tissue and cell types. Moreover, the tissue/cell-type-associated enhancers and promoters show enrichment in the disease-related SNPs that are also associated with the corresponding tissue or cell types. This agreement suggests the potential of identifying causal genetic variants relevant to cell-type-specific diseases from our identified associated enhancers and promoters.
The proposed EPOM measure demonstrates superior capability in grouping and finding a clear correspondence map of biological samples from different tissue and cell types. The identified associated enhancers and promoters provide a comprehensive catalog to study distinct biological processes and disease variants in different tissue and cell types. Our results also find that the associated promoters exhibit more cell-type-specific functions than the associated enhancers do, suggesting that the non-associated promoters have more housekeeping functions than the non-associated enhancers.
While all human tissue and cell types largely preserve the biological information in the DNA sequence of the human genome, the epigenomic landscapes of different tissue and cell types vary considerably, resulting in distinct gene expression programs, biological functions and phenotypic variations. Epigenomic information, such as DNA methylation, covalent histone modifications and DNA accessibility in each tissue and cell type can be investigated using high-throughput sequencing technologies such as Bisulfite-seq, ChIP-seq and DNase-seq . The genome-wide dynamics of epigenomic marks in their relevant chromatin states are considered to bridge genotypes and phenotypes, and they can promote the discovery of biologically meaningful relationships between vast cell types, tissues and lineages –.
Previous research mostly relied on gene expression profiles to study the relationships of samples from different tissue and cell types , . The 111 reference epigenomes from the NIH Roadmap Epigenomics Program  together with the 16 epigenomes reported by the ENCODE project  provided a global view of the epigenomic information covering a large variety of human tissue and cell types. ChromHMM utilized them to build a genome-wide annotation of chromatin states . These large-scale datasets enabled us to study the relationships among tissue and cell types from a new perspective: the similarity of tissue and cell types in terms of histone modification marks evaluated in relevant chromatin states.
Histone modifications at enhancers and promoters in the human genome were found to both reflect and explain global cell-type-specific gene expression , . Kundaje et al. showed that pairwise similarity matrices of diverse histone marks could be used to distinguish different subsets of the samples . The similarity matrices were pairwise Pearson correlation values separately calculated for a variety of epigenomic marks. In the same work, they also performed hierarchical clustering of the 111 reference epigenomes using H3K4me1 signal in enhancers (identified by a 15-state HMM model) and showed consistent grouping of biologically similar cell and tissue types, including ES cells, iPS cells, T cells, B cells, adult brain, fetal brain, digestive systems, smooth muscle and heart. Heintzman et al. performed k-means clustering on chromatin modifications from both promoters and enhancers . Their results suggested that the chromatin states at promoters are largely invariant across different cell types. In contrast, enhancers reveal cell-type-specificity in clustering and correlate to cell-type-specific gene expression programs on a global scale.
The recent large-scale imputation of epigenomic datasets provided a more consistent and robust resource for capturing sample relationships and dynamic epigenomic information across cell types . Ernst et al. found that compared with the original data, the imputed data led to a correlation matrix of epigenomic features with a more strongly pronounced block structure, suggesting that the imputed data provided a stronger basis for clustering samples into their true tissue or cell type.
Despite the fact that hierarchical clustering and correlation analysis have been shown useful in studying the relationships of biological samples across tissue and cell types, there are many limitations in their use. In the tree representation of hierarchical clustering, it is often difficult to identify the number of groups. In correlation analysis, both Pearson and Spearman correlation coefficients usually provide a noisy correlation matrix of samples, making the detection of sample groups another challenge. Therefore, in order to find a clear correspondence map and distinct grouping of samples based on epigenomic features, we need new methods. Here we propose a new measure–Epigenome Overlap Measure (EPOM)–to distinguish different tissue and cell groups by performing a three-step testing procedure on large-scale epigenomic datasets.
Selection of chromatin states
We study and evaluate the capacity of our method in mapping and grouping different tissue and cell types using histone marks at both enhancer and promoter regions.
Chromatin state description
Transcription 5’ enhancer
Promoter upstream TSS
Transcription 3’ enhancer
Promoter downstream TSS with DNase
Transcription weak enhancer
Promoter downstream TSS
Active enhancer 1
Active enhancer 2
Active enhancer flank
Weak enhancer 1
Weak enhancer 2
Enhancer acetylation only
Selection of histone modification marks
H3 lysine 4 monomethylation (H3K4me1) was observed to distribute in a cell-type-specific manner and associate with enhancer regions: predicted enhancers showed H3K4me1 enrichment , . It was also verified that candidate enhancer states all shared higher frequencies of H3K4me1 than other methylation marks . Another histone modification mark, H3 acetyl K27 (H3K27ac) was associated with active promoters in mammalian cells  and predicted enhancers . Hence, we examined the signals of H3K4me1 and H3K27ac in the candidate enhancer (or promoter) regions and attempted to identify the regions where the signals can distinguish different tissue/cell types (we also extended our method to include a third mark H3K4me3 and the results are in Additional file 1). The signals of each mark are − log10 transformed p-values, which represent the enrichment of ChIP-seq read counts based on a Poisson distribution. A stronger signal represents a more statistically significant enrichment of histone modification . The original signals are at 25 bp resolution. We compressed the signals into 200 bp resolution by taking the average of every eight 25 bp windows, so that the signals and our candidate enhancer and promoter regions can be perfectly aligned as 200 bp windows.
Given the signals of H3K4me1 and H3K27ac on 124 reference epigenomes divided into 16 tissue and cell types (we excluded the three tissue and cell types that only contain one sample) and the locations of candidate enhancers and candidate promoters, we used a three-step testing procedure (please see Fig. 2) to calculate pairwise EPOM scores and study the relationships among different tissue and cell types. The 16 tissue and cell type groups are: embryonic stem cells (ESC), induced pluripotent stem cells (iPSC), ESC-derived cells (ES-deriv.), Blood & T-cells, HSC & B-cells, Mesenchymal stem cells (Mesench.), Epithelial, Neurosphere (Neurosph.), Thymus, Brain, Muscle, Heart, Smooth Muscle (Sm. Muscle), Digestive, Other, and ENCODE cell lines (ENCODE2012).
We apply a threshold α 1 to the resulting Bonferroni-corrected p-values and refer to region k as a candidate associated enhancer (or promoter) if the null hypothesis H 0, k is rejected.
We apply a threshold α 2 to the resulting p-values and we consider the signal of tissue/cell type i to be significantly higher than that of tissue/cell type j on region k if the null hypothesis H 0, i j k is rejected. For the i-th tissue/cell type, if H 0, i j k is rejected for more than m times among all j≠i, we define region k as an associated enhancer (or promoter) of tissue/cell type i. We separately identify the H3K4me1-based and H3K27ac-based associated enhancers and promoters of each tissue/cell type. Then we combine the information of the two histone marks by taking the union of their associated enhancers (or promoters). That is, for each tissue/cell type we take the union of the two marks’ associated enhancers (or promoters) and use the union as the associated enhancers (or promoters) of that tissue/cell type.
We perform the overlap test, described in next subsection, on the discovered associated enhancers (promoters) to calculate EPOM scores between every pair of tissue/cell types.
In this paper, we set the thresholds as α 1=10−10, α 2=0.01, and m=13 or 14. In our testing procedure, the ANOVA procedure in Step 1 aims to filter out the candidate enhancer (or promoter) regions whose HM signals do not have significant variations across all biological conditions (i.e., tissue and cell types). Step 2 consists of pairwise two-sample t-tests, which aim to find associated regions for each biological condition, such that these regions’ HM signals in this condition are significantly higher than in at least m other conditions. Steps 1 and 2 are not redundant but complementary to each other. Step 1 can largely reduce the number of candidate associated regions to be tested in Step 2, so that Step 2 will find the associated regions that not only have high signals in one biological condition but also have strong signal variations across conditions. In addition, Step 1 can largely reduce computational time in Step 2, so as to increase the computational efficiency of the EPOM method. Step 2 is necessary to identify associated regions that carry cell-type-specific characteristics, because it centers on each biological condition in its search for associated regions. The two steps together ensure that the identified associated regions have strong differentiating capability of biological conditions and thus serve as good candidates for the overlap test in Step 3.
Overlap test in the three-step testing procedure
The larger the EPOM score is, the more likely that A and B are dependent and the more epigenomic characteristics they share, and vice versa.
Numbers and proportions of enhancer/promoter regions associated with various tissue/cell types
Union of the two HMs
Numbers of associated regions
Blood & T-cell
HSC & B-cell
% of associated regions among candidate regions
Blood & T-cell
HSC & B-cell
EPOM between different tissue/cell types
If in Step 2 (t test) of the testing procedure we use a lower threshold m=13 instead of m=14, the discovered associated enhancers and associated promoters would become less cell-type-specific. The resulting EPOM scores are consequently less distinguishable and the correspondence maps (see Fig. 3c, d) reveal subtler similarities between different tissue and cell types. The discovered off-diagonal mappings reveal biologically meaningful relationships. For example, Heart, Muscle and Smooth Muscle are grouped together; Blood & T cells and HSC & B cells are grouped together; Neurosphere is mapped to both Brain and ES-derived cells ; Thymus is mapped to Blood & T cells, consistent with its role in T-cell maturation and immunity: thymus is a specialized organ of the immune system and T cells mature within thymus; Thymus is also mapped to HSC & B cells, consistent with the fact that a small population of B cells develop in thymus and some HSC colonize in thymus . As the associated regions become less specific from Fig. 3a, b to Fig. 3c, d the correspondence maps based on enhancers and promoters, although present slight differences, are still consistent with each other, suggesting that our identified associated promoters and enhancers have similar levels of cell/tissue specificity in terms of grouping capability.
We also calculated the EPOM matrices for each of the two histone modification marks separately to see how different the marks’ abilities are to capture cell type characteristics. Instead of taking the union of two marks’ associated enhancers (or promoters) in Step 2, we used H3K4me1 and H3K27ac’s associated enhancers (or promoters) separately to perform the overlap test in Step 3. When using the higher threshold (m=14), the results from the two marks are generally the same; when using the lower threshold (m=13), the results from the two marks are still consistent, but with different scores for certain off-diagonal patterns (please see Additional file 3). To further study how different histone modification marks impact the EPOM scores, we added a third mark histone H3 lysine 4 tri-methylation (H3K4me3) to our study because H3K4me3 is acknowledged to be characteristic of actively transcribed protein-coding promoters . We calculated EPOM scores based on associated enhancers or promoters identified from the three histone modification marks (see Additional file 1). The EPOM matrices still exhibit a strong diagonal pattern that is highly consistent with what we observed from H3K4me1 and H3K27ac.
Another case worth attention is how the EPOM scores change if we summarize the associated enhancers (or promoters) in Step 2 of the testing procedure by taking the intersection of associated enhancers (or promoters) identified for each mark (see Additional file 4). As expected, the diagonal pattern of EPOM matrices become stronger since less associated enhancers (or promoters) are shared among different tissue/cell types. But the significant off-diagonal mappings were still successfully identified.
Potential target genes of the associated enhancers and promoters
Gene expression programs are controlled and regulated by cell-specific changes in the activity of cis-regulatory elements, including enhancers and promoters. Although identifying and annotating these regulatory elements remains a great challenge, it is possible to infer the biological functions of these regions by analyzing the functions of their neighboring genes, which are potential target genes under their regulation –. Here we study the possible functions of the identified associated enhancers and promoters by analyzing the functions of their nearby genes, which we refer to as the potential target genes of the associated enhancers and promoters.
Noticing that real enhancers and promoters can span across regions much longer than 200 bp, we merged the adjacent associated enhancers or promoters and re-identified the potential target genes of the merged associated enhancers or promoters (see Additional file 5). With decreasing numbers of the associated enhancers and promoters, the proportions of the target genes increase (see Fig. 4 and Additional file 5); however, the distribution of the proportions across tissue/cell types remains largely the same.
Gene ontology enrichment analysis of associated enhancers and promoters
The annotations of top enriched GO terms in each tissue and cell type (please see Additional files 6 and 7) verify and explain the similarity patterns discovered through EPOM score matrices. For instance, we observe a mapping between Heart and Muscle through the EPOM scores (Fig. 3a). Heart and Muscle actually share six common GO terms between their top 20 enriched GO terms in associated enhancers. The common GO terms include muscle filament sliding, sarcomere organization, fibroblast growth factor receptor signaling pathway, adenosine to inosine editing, positive regulation of GTPase activity and HAC1-type intron splice site recognition and cleavage (see Fig. 5). For another example, in accordance with the mapping of Blood & T-cell and HSC & B-cell in Fig. 3c, d, they share six top enriched GO terms, including toll-like receptor signaling pathway and cytokine-mediated signaling pathway. In addition, consistent with the mapping of Neurosphere and Brain, they have six top enriched GO terms in common, including synaptic transmission, positive regulation of GTPase activity and axon guidance.
GWAS and disease ontology (DO) enrichment analysis of associated enhancers and promoters
Genome-wide association studies (GWAS) have identified millions of genetic variants associated with common traits and diseases. However, selecting informative single-nucleotide polymorphisms (SNPs) that have main effects on diverse diseases remains a great challenge . It was observed that many non-coding variants associated with common diseases are concentrated in regulatory sequences on human genome . As a consequence, the associated enhancers and associated promoters discovered by EPOM carry important information on cell-type-specific diseases and may serve as a potential source to promote the identification of pathogenic tissue/cell types of diverse disease disorders and the understanding of regulatory mechanisms of human disease.
GWAS enrichment scores
−l o g(Bonferroni corrected p−values)
Blood & T-cell
HSC & B-cell
A series of biologically meaningful relationships between diseases and tissue/cell types are identified and verified in the enrichment analysis (please see Fig. 8 and Additional file 8). In terms of associated enhancer regions, DO terms corresponding to different hypersensitivity reaction disease (celiac disease), hematopoietic system disease (lymphopenia) and immune system cancer (lymphoma and leukemia) are enriched in Blood & T-cell and HSC & B-cell; DO terms representing hepatocellular carcinoma, pancreatic cancer and a series of gastrointestinal system disease (such as ulcerative colitis and esophageal cancer) are enriched in Digestive; DO terms representing disease of mental health (such as attention deficit hyperactivity disorder, alcohol dependence and schizophrenia), major depressive disorder and neurodegenerative disease (such as Alzheimer’s disease and Parkinson’s disease) are enriched in Brain; Cardiovascular system disease is enriched in both Muscle and Heart; and gastric adenocarcinoma (which derives from epithelial cells of glandular origin) is enriched in Epithelial. In terms of associated promoter regions, similar diseases as in associated enhancers were found to be enriched in Blood & T-cell, HSC & B-cell, Digestive and Epithelial. In addition, type 1 diabetes mellitus is also enriched in Digestive and cardiomyopathy (characterized by deterioration of the function of the heart muscle) is enriched in Muscle. Moreover, some more complicated relationships between diseases and tissue/cell types are also recovered in the DO enrichment analysis. For example, diabetes mellitus and kidney disease are found to be enriched in Heart while research have shown that both diabetes and kidney disease are high risk factors for heart disease , .
Discussion and conclusions
In this work, we propose a new measure for comparing and grouping biological samples from different tissue and cell types: Epigenomic Overlap Measure (EPOM). EPOM compares different tissue and cell types based on the similarity of histone modification marks evaluated in their relevant chromatin states. The proposed measure is calculated via a three-step testing procedure including ANOVA, t test and overlap test. Compared to traditional correlation analysis, EPOM is able to create a much clearer mapping pattern across 16 tissue and cell types. By tuning the thresholds in the testing procedure, EPOM can perform either grouping or identity mapping of biological samples based on epigenomic features. The associated enhancers and associated promoters identified by EPOM are good indicators of tissue/cell epigenomic characteristics, and they are important genomic regions for downstream analysis such as regulatory network analysis, GO enrichment analysis and GWAS studies. Results under different settings (i.e., by taking union or intersection of the associated regions identified for different marks; by using two or three HMs together or separately using individual marks; by using 200 bp associated regions or merged longer associated regions) all demonstrate the effectiveness of our approach compared with correlation analysis in finding clear correspondence maps of biological samples. Moreover, the resulting EPOM scores reveal biologically meaningful patterns between similar tissue/cell types and confirm the belief that epigenomic landscapes are powerful resources for understanding cellular identity , . These results imply the great potential of using EPOM to study tumor heterogeneity based on single-cell epigenomic data .
The EPOM method can be easily extended to study the relationships between diverse tissue/cell types based on signals of any epigenetic marks in genomic regions of interest. Here we suggest an efficient approach to systematically select epigenetic marks for EPOM if no specific marks are of prior interest. The selection will be based on the number of regions where each mark has differential signals across biological conditions. The differential regions of each mark can be found by the Step 1 (ANOVA) in our testing procedure given a specified p-value threshold, and the marks that have large numbers of differential regions will be good candidates for EPOM. The rationale behind this selection approach is that EPOM prefers the marks carrying more cell-type-specific information on the genomic regions of interest. We implement this selection approach in Additional file 9, which shows that among the eight epigenetic marks studied by the Roadmap Consortium, the three marks H3K4me1, H3K27ac and H3K4me3 we use in this work are among the top ones in terms of the numbers of differential enhancer and promoter regions.
We identified the associated enhancers/promoters’ potential target genes in each tissue and cell type and used the top enriched GO terms in these genes to predict the biological functions of the associated enhancers and promoters. The results of GO enrichment analysis confirm the similarities of tissues and cell types found by EPOM and provide functional explanations for the underlying regulatory mechanisms leading to these patterns. The EPOM scores, together with the GO enrichment results, suggest that the associated enhancers and promoters have well captured the epigenomic characteristics of their corresponding tissue and cell types. An important future direction is to incorporate three-dimensional (3D) chromatin structures into the identification of the target genes of associated enhancers/promoters. The Hi-C technology makes it possible to decipher 3D chromatin structures and to thus reveal more accurate and complete interactions between genes and regulatory regions , . However, Hi-C data are not yet available for the human tissue and cell types in our study, and without the data it is difficult to accurately infer potential target genes of associated enhancers/promoters from 3D chromatin structures . In addition, better computational tools are needed for accurate 3D genome reconstruction from Hi-C data .
Despite the previous belief that chromatin states at promoters are largely invariant across diverse cell types , , our functional analyses on the potential target genes of the associated promoters in different tissue/cell types suggest that the non-housekeeping promoters carry cell-type-specific functions. We also found that the potential target genes of the associated enhancers are enriched with functions both specific to a single tissue/cell type or shared by a subgroup of tissue/cell types. Those associated regulatory regions identified by EPOM are key elements for understanding differential gene expression, cell differentiation and phenotypic variations.
More functional analyses based on disease ontology further confirm that the discovered associated regions carry important disease-relevant characteristics of their corresponding tissue/cell types. The identified associated enhancers and promoters can be good resources for understanding the epigenomic mechanisms of different tissue and cell types. It is a great challenge now to interpret the biological mechanisms and effects of the large amounts of identified SNPs. A common approach was to simply study the overlap between the SNPs and regulatory elements such as histone modification marks, binding sites of transcription factors and promoter regions . However given that the dynamics of trait-associated variants can vary significantly in different tissue and cell types, we should carefully evaluate the enrichment of trait-associated variations in their most relevant tissues or cell types . With the knowledge that our associated enhancers and promoters carry significant regulatory epigenomic features and thus represent the genomic context of their corresponding tissue and cell types better than other non-coding genomic regions, we highlight three important perspectives to make use of associated enhancers and promoters in GWAS studies. First, the identified associated enhancers and promoters provide a unique source for studying cell-type-specific disease variants and exploring disease-associated SNP functions. Although previous research showed SNP and GWAS enrichment in diverse chromatin states  and studied SNPs for certain selected traits , they did not provide a method to test the enrichment of genome-wide SNPs in cellular specific contexts. Second, the enriched DO terms can help researchers understand the dynamics of disease-related regulatory elements across diverse tissue/cell types. We can identify the potential target genes of the associated enhancers and promoters highly enriched with disease-related SNPs. Then by comparing the distinct and common target genes of each tissue/cell type and studying the regulatory networks between those genes and their associated enhancers or promoters, it is possible to shed light on the causes of cell type specific diseases as well as multi-factorial disorders. Last, the results of our study provide useful information to refine the disease ontology. Once we verify the potential target genes of the associated enhancers (or promoters) enriched with disease variants, we can update the DO terms to reflect these newly discerned genes .
Availability of supporting data
The epigenomic datasets supporting the results of this article are available at the web portal of the Roadmap Epigenomics Project. Both the data of the 25-state Imputation Based Chromatin State Model and the imputed signals of histone modification marks are available at http://egg2.wustl.edu/roadmap/web_portal/imputed.html#chr_imp. The data for SNP annotation is available at http://jjwanglab.org/gwasdb. The associated enhancer and promoter regions identified by EPOM are available at http://www.stat.ucla.edu/~jingyi.li/software-and-data.html or http://www.stat.ucla.edu/~jingyi.li/data/EpOM/associated_enhancers_and_promoters.tar.gz.zip.
This article has been published as part of BMC Genomics Volume 17 Supplement 1, 2016: Selected articles from the Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/17/S1.
This work was supported by the start-up fund of Department of Statistics at University of California, Los Angeles, and the Hellman Fellowship from the Hellman Foundation. The publication costs for this article were funded by the Hellman Foundation. The authors would like to thank Yu-Cheng T. Yang for processing the imputed data of histone modification marks and for his ideas, indispensable advice and wise guidance. We also thank the anonymous reviewers for their valuable comments and suggestions that helped improve the manuscript.
- Pellegrini M, Ferrari R: Epigenetic analysis: Chip-chip and chip-seq. Methods Mol Biol. 2012, 802: 377-87. 10.1007/978-1-61779-400-1_25.View ArticlePubMedGoogle Scholar
- Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al: Integrative analysis of 111 reference human epigenomes. Nature. 2015, 518 (7539): 317-30. 10.1038/nature14248.View ArticlePubMedPubMed CentralGoogle Scholar
- Bernstein BE, Meissner A, Lander ES: The mammalian epigenome. Cell. 2007, 128 (4): 669-81. 10.1016/j.cell.2007.01.033.View ArticlePubMedGoogle Scholar
- Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, et al: The nih roadmap epigenomics mapping consortium. Nat Biotechnol. 2010, 28 (10): 1045-8. 10.1038/nbt1010-1045.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee Y-s, Krishnan A, Zhu Q, Troyanskaya OG: Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies. Bioinforma. 2013, 29 (23): 3036-44. 10.1093/bioinformatics/btt529.View ArticleGoogle Scholar
- Pettit J-B, Tomer R, Achim K, Richardson S, Azizi L, Marioni J: Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput Biol. 2014, 10 (9): e1003824-10.1371/journal.pcbi.1003824.View ArticlePubMedPubMed CentralGoogle Scholar
- ENCODE Project Consortium: An integrated encyclopedia of dna elements in the human genome. Nature. 2012, 489 (7414): 57-74. 10.1038/nature11247.View ArticleGoogle Scholar
- Ernst J, Kellis M: Chromhmm: automating chromatin-state discovery and characterization. Nat Methods. 2012, 9 (3): 215-6. 10.1038/nmeth.1906.View ArticlePubMedPubMed CentralGoogle Scholar
- Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, et al: Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009, 459 (7243): 108-12. 10.1038/nature07829.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kellis M: Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nature Biotechnol. 2015, 33 (4): 364-76. 10.1038/nbt.3157.View ArticleGoogle Scholar
- Koch CM, Andrews RM, Flicek P, Dillon SC, Karaöz U, Clelland GK, et al: The landscape of histone modifications across 1 % of the human genome in five human cell lines. Genome Res. 2007, 17 (6): 691-707. 10.1101/gr.5704207.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010, 28 (8): 817-25. 10.1038/nbt.1662.View ArticlePubMedPubMed CentralGoogle Scholar
- Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, et al: Histone h3k27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci. 2010, 107 (50): 21931-6. 10.1073/pnas.1016071107.View ArticlePubMedPubMed CentralGoogle Scholar
- Li JJ, Huang H, Bickel PJ, Brenner SE: Comparison of d. melanogaster and c. elegans developmental stages, tissues, and cells by modencode rna-seq data. Genome Res. 2014, 24 (7): 1086-101. 10.1101/gr.170100.113.View ArticlePubMedPubMed CentralGoogle Scholar
- Johansson CB, Svensson M, Wallstedt L, Janson AM, Frisén J: Neural stem cells in the adult human brain. Exp Cell Res. 1999, 253 (2): 733-6. 10.1006/excr.1999.4678.View ArticlePubMedGoogle Scholar
- Kissa K, Murayama E, Zapata A, Cortés A, Perret E, Machu C, et al: Live imaging of emerging hematopoietic stem cells and early thymus colonization. Blood. 2008, 111 (3): 1147-56. 10.1182/blood-2007-07-099499.View ArticlePubMedGoogle Scholar
- Hon GC, Hawkins RD, Ren B: Predictive chromatin signatures in the mammalian genome. Hum Mol Genet. 2009, 18 (R2): 195-201. 10.1093/hmg/ddp409.View ArticleGoogle Scholar
- Nègre N, Brown CD, Ma L, Bristow CA, Miller SW, Wagner U, et al: A cis-regulatory map of the drosophila genome. Nature. 2011, 471 (7339): 527-31. 10.1038/nature09990.View ArticlePubMedPubMed CentralGoogle Scholar
- Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, et al: Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012, 148 (1): 84-98. 10.1016/j.cell.2011.12.014.View ArticlePubMedPubMed CentralGoogle Scholar
- Cotney J, Leng J, Oh S, DeMare LE, Reilly SK, Gerstein MB, et al: Chromatin state signatures associated with tissue-specific gene expression and enhancer activity in the embryonic limb. Genome Res. 2012, 22 (6): 1069-80. 10.1101/gr.129817.111.View ArticlePubMedPubMed CentralGoogle Scholar
- Gene Ontology Consortium: Gene ontology consortium: going forward. Nucleic Acids Res. 2015, 43 (D1): 1049-56. 10.1093/nar/gku1179.View ArticleGoogle Scholar
- Bilic J, Belmonte JCI: Concise review: Induced pluripotent stem cells versus embryonic stem cells: close enough or yet too far apart?. Stem Cells. 2012, 30 (1): 33-41. 10.1002/stem.700.View ArticlePubMedGoogle Scholar
- He B, Chen C, Teng L, Tan K: Global view of enhancer–promoter interactome in human cells. Proc Natl Acad Sci. 2014, 111 (21): 2191-9. 10.1073/pnas.1320308111.View ArticleGoogle Scholar
- Liang Y, Kelemen A: Statistical advances and challenges for analyzing correlated high dimensional snp data in genomic study for complex diseases. Stat Surveys. 2008, 2: 43-60. 10.1214/07-SS026.View ArticleGoogle Scholar
- Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al: Systematic localization of common disease-associated variation in regulatory dna. Science. 2012, 337 (6099): 1190-5. 10.1126/science.1222794.View ArticlePubMedPubMed CentralGoogle Scholar
- Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, et al: Gwasdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 2012, 40 (D1): 1047-54. 10.1093/nar/gkr1182.View ArticleGoogle Scholar
- Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, et al: Disease ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 2015, 43 (D1): 1071-8. 10.1093/nar/gku1011.View ArticleGoogle Scholar
- Heart Outcomes Prevention Evaluation (HOPE) Study investigators: Effects of ramipril on cardiovascular and microvascular outcomes in people with diabetes mellitus: results of the hope study and micro-hope substudy. The Lancet. 2000, 355 (9200): 253-9. 10.1016/S0140-6736(99)12323-7.View ArticleGoogle Scholar
- Sarnak MJ, Levey AS, Schoolwerth AC, Coresh J, Culleton B, Hamm LL, et al: Kidney disease as a risk factor for development of cardiovascular disease a statement from the american heart association councils on kidney in cardiovascular disease, high blood pressure research, clinical cardiology, and epidemiology and prevention. Circulation. 2003, 108 (17): 2154-69. 10.1161/01.CIR.0000095676.90936.80.View ArticlePubMedGoogle Scholar
- Lang AH, Li H, Collins JJ, Mehta P: Epigenetic landscapes explain partially reprogrammed cells and identify key reprogramming genes. PLoS Comput Biol. 2014, 10 (9): e1003734-10.1371/journal.pcbi.1003734.View ArticlePubMedPubMed CentralGoogle Scholar
- Barrero MJ, Boué S, Belmonte JCI: Epigenetic mechanisms that regulate cell identity. Cell Stem Cell. 2010, 7 (5): 565-70. 10.1016/j.stem.2010.10.009.View ArticlePubMedGoogle Scholar
- Shapiro E, Biezuner T, Linnarsson S: Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet. 2013, 14 (9): 618-30. 10.1038/nrg3542.View ArticlePubMedGoogle Scholar
- Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009, 326 (5950): 289-93. 10.1126/science.1181369.View ArticlePubMedPubMed CentralGoogle Scholar
- Van Berkum NL, Lieberman-Aiden E, Williams L, Imakaev M, Gnirke A, Mirny LA, et al: Hi-c: a method to study the three-dimensional architecture of genomics. J Vis Exp. 2010, 39: 1869-PubMedGoogle Scholar
- Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, et al: Single-cell hi-c reveals cell-to-cell variability in chromosome structure. Nature. 2013, 502 (7469): 59-64. 10.1038/nature12593.View ArticlePubMedGoogle Scholar
- Lesne A, Riposo J, Roger P, Cournac A, Mozziconacci J: 3d genome reconstruction from chromosomal contacts. Nat Methods. 2014, 11 (11): 1141-3. 10.1038/nmeth.3104.View ArticlePubMedGoogle Scholar
- Hawkins RD, Hon GC, Yang C, Antosiewicz-Bourget JE, Lee LK, Ngo Q-M, et al: Dynamic chromatin states in human es cells reveal potential regulatory sequences and genes involved in pluripotency. Cell Res. 2011, 21 (10): 1393-409. 10.1038/cr.2011.146.View ArticlePubMedPubMed CentralGoogle Scholar
- Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al: Annotation of functional variation in personal genomes using regulomedb. Genome Res. 2012, 22 (9): 1790-7. 10.1101/gr.137323.112.View ArticlePubMedPubMed CentralGoogle Scholar
- Schmidt EM, Zhang J, Zhou W, Chen J, Mohlke KL, Chen YE, et al: Gregor: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinforma. 2015, 16 (31): 2601-6. 10.1093/bioinformatics/btv201.View ArticleGoogle Scholar
- Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, et al: Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015, 518 (7539): 337-43. 10.1038/nature13835.View ArticlePubMedGoogle Scholar
- Schriml LM, Mitraka E: The disease ontology: fostering interoperability between biological and clinical human disease-related data. Mamm Genome. 2015, 26: 584-9. 10.1007/s00335-015-9576-9.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.