Genome-wide Enhancer Maps Differ Significantly in Genomic Distribution, Evolution, and Function

Non-coding gene regulatory loci are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. However, in practice, most studies consider enhancer candidates identified by a single method alone. Here we assess the robustness of conclusions based on such a paradigm by comparing enhancer sets identified by different strategies. Because the field currently lacks a comprehensive gold standard, our goal was not to identify the best identification strategy, but rather to quantify the consistency of enhancer sets identified by ten representative identification strategies and to assess the robustness of conclusions based on one approach alone. We found significant dissimilarity between enhancer sets in terms of genomic characteristics, evolutionary conservation, and association with functional loci. This substantial disagreement between enhancer sets within the same biological context is sufficient to influence downstream biological interpretations, and to lead to disparate scientific conclusions about enhancer biology and disease mechanisms. Specifically, we find that different enhancer sets in the same context vary significantly in their overlap with GWAS SNPs and eQTL, and that the majority of GWAS SNPs and eQTL overlap enhancers identified by only a single identification strategy. Furthermore, we find limited evidence that enhancer candidates identified by multiple strategies are more likely to have regulatory function than enhancer candidates identified by a single method. The difficulty of consistently identifying and categorizing enhancers presents a major challenge to mapping the genetic architecture of complex disease, and to interpreting variants found in patient genomes. To facilitate evaluation of the effects of different annotation approaches on studies’ conclusions, we developed a database of enhancer annotations in common biological contexts, creDB, which is designed to integrate into bioinformatics workflows. Our results highlight the inherent complexity of enhancer biology and argue that current approaches have yet to adequately account for enhancer diversity.


21
of conclusions based on such a paradigm by comparing enhancer sets identified by different strategies. Because the field 22 currently lacks a comprehensive gold standard, our goal was not to identify the best identification strategy, but rather to 23 quantify the consistency of enhancer sets identified by ten representative identification strategies and to assess the 24 robustness of conclusions based on one approach alone. We found significant dissimilarity between enhancer sets in terms 25 of genomic characteristics, evolutionary conservation, and association with functional loci. This substantial disagreement 26 between enhancer sets within the same biological context is sufficient to influence downstream biological interpretations, 27 and to lead to disparate scientific conclusions about enhancer biology and disease mechanisms. Specifically, we find that 28 different enhancer sets in the same context vary significantly in their overlap with GWAS SNPs and eQTL, and that the 29 majority of GWAS SNPs and eQTL overlap enhancers identified by only a single identification strategy. Furthermore, we 30 find limited evidence that enhancer candidates identified by multiple strategies are more likely to have regulatory function 31 than enhancer candidates identified by a single method. The difficulty of consistently identifying and categorizing 32 enhancers presents a major challenge to mapping the genetic architecture of complex disease, and to interpreting variants 33 found in patient genomes. To facilitate evaluation of the effects of different annotation approaches on studies' 34 conclusions, we developed a database of enhancer annotations in common biological contexts, creDB, which is designed 35 to integrate into bioinformatics workflows. Our results highlight the inherent complexity of enhancer biology and argue 36 that current approaches have yet to adequately account for enhancer diversity. 37 38 INTRODUCTION 1 have been proposed 1,11 , it is not possible to compare them all; so for each context, we considered methods that represent 2 the diversity of experimental and computational approaches in common use.

3
For all contexts, we considered three enhancer sets derived solely from chromatin immunoprecipitation followed 4 by sequencing (ChIP-seq) for histone modifications informative about enhancer activity from the ENCODE Project 30 . The 5 "H3K27ac" set includes all H3K27ac ChIP-seq peaks without additional refinement [31][32][33][34][35] . The "H3K27acPlusH3K4me1" 6 set includes all H3K27ac ChIP-seq peaks that also overlap an H3K4me1 peak, and the "H3K27acMinusH3K4me3" set 7 contains H3K27ac peaks that do not overlap an H3K4me3 peak [36][37][38] . In liver only, we have an additional set of enhancers 8 identified from H3K27ac ChIP-seq peaks without overlapping H3K4me3 peaks for comparison ("Villar15") 38 . We also 9 considered a method that incorporates DNase I hypersensitive sites (DHSs) with histone modifications to generate the 10 "DNasePlusHistone" enhancer set, which is composed of DHSs where the ratio of H3K4me1 to H3K4me3 is less than 11 one 39 . Since transcriptional signatures are increasingly used to identify enhancers, we consider "FANTOM" enhancers 12 identified from bidirectionally transcribed eRNA detected via cap analysis of gene expression (CAGE) by the FANTOM5 13 Project 25,40,41 . We also include several methods that combine machine learning with functional genomics data, such as the 14 ENCODE consortium's "EncodeEnhancerlike" made by combining DHSs and H3K27ac peaks using an unsupervised 15 ranking method and the "ChromHMM" predictions generated by a hidden Markov model trained on ChIP-seq data from 16 eight histone modifications, CTCF, and RNA Pol II 15, [42][43][44] . For the K562 and Gm12878 cell lines we also include enhancer 17 predictions made by two supervised machine learning methods trained to identify enhancers based on ChIP-seq data in 18 conjunction with other functional genomic features. We will refer to these sets as "Yip12" and "Ho14" 45,46 . An overview 19 of the data and computational approaches used by each method is given in Figure 1   assay different aspects of enhancer biology (e.g., co-factor binding, histone modification, enhancer RNAs), and therefore 27 we expected to find variation among enhancer sets. Nevertheless, the magnitude of differences we observed is striking. 28 For each attribute we considered, the enhancer sets identified differ by several orders of magnitude (Table 1). For 29 instance, FANTOM identifies 326 kilobases (kb) of sequence with liver enhancer activity, EncodeEnhancerlike identifies 30 89 megabases (Mb), and H3K27acMinusH3K4me3 identifies almost 138 megabases (Mb). In addition, methods based on 31 similar approaches often differ substantially; e.g., Villar15, which uses the same enhancer definition as 32 H3K27acMinusH3K4me3, only annotates 86.1 Mb with enhancer function in liver. The enhancer sets also vary in their 33 relative distance to other functional genomic features, such as transcription start sites (TSSs). For example, in liver, the 34 average distance to the nearest TSS ranges from 14 kb for EncodeEnhancerlike to 64 kb for DNasePlusHistone 35 (Supplementary Table 1). Overall, methods based on histone modifications tend to identify larger numbers of longer 36 enhancers compared with CAGE data, and machine learning methods are variable. We highlight these trends in liver, but 37 they are similar in other contexts (Table 1).

39
Enhancer sets have low similarity: Given the diversity of the enhancer sets identified by different methods, we evaluated 40 the extent of overlap between different sets. The magnitude of overlap between enhancer sets is typically low: less than 41 50% for nearly all pairs of methods (median 31% bp overlap for liver; Figure 3A). Furthermore, the largest overlaps are in 42 comparisons including one set with high genome coverage, or in comparisons of sets that were identified based on similar 43 data. To further quantify overlaps, we calculated the Jaccard similarity index-the number of shared bp between two 44 enhancer sets divided by the size of their union-for each pair of methods. Overall, the Jaccard similarities are also replicates to have high of overlap and serve as an "upper bound" of similarity in practical applications. The replicates 1 overlap 65-80% at the bp level. Thus, while there is some variation, the amount of overlap observed between enhancers 2 identified by different methods almost always falls far below the variation between replicates.

4
Enhancer sets have different levels of evolutionary conservation: Enhancers identified by different methods also differ in 5 their levels of evolutionary constraint. Using primate and vertebrate evolutionarily conserved elements defined by 6 PhastCons 47 , we calculated the percentage of conserved base pairs present in each enhancer set. All enhancer sets have 7 more conserved base pairs than expected from length-matched regions drawn at random from the genome. However, 8 enhancers identified by some methods are more conserved than others ( Figure 4A). Across each context, approximately 9 10% of the base pairs in histone-based, ChromHMM, and Ho14 enhancer sets are conserved. Adding DNaseI 10 hypersensitivity data, as in the DNasePlusHistone and EncodeEnhancerlike sets, increases the level of conservation a few 11 percent compared to solely histone-derived enhancers to ~14%. In contrast, the FANTOM and Yip12 enhancers have 12 nearly twice as many conserved base pairs as the histone-based sets (17.8-21.6%). This is similar to levels of conservation 13 among experimentally validated enhancers from the VISTA database (20.5%) 48  as enhancers (VISTA positives), and 832 did not exhibit enhancer activity (VISTA negatives). All genome-wide liver 20 enhancer sets are significantly enriched for overlap with the VISTA positives ( Figure 4B; p < 0.001 for all), and with the 21 exception of FANTOM, each set is ~2x more likely to overlap validated enhancers than expected if they were randomly 22 distributed across the genome. The FANTOM set is 7x enriched; however, given its smaller size, this was based on 6 23 overlaps compared to an expected ~0. We also tested for depletion of VISTA negatives among the enhancer sets; we 24 found that all liver sets, except FANTOM and DNasePlusHistone, are also enriched for overlap with VISTA negatives (p 25 < 0.001). This is not surprising as the regions tested by VISTA were largely selected based on some evidence of enhancer 26 activity, and they may have enhancer activity in other contexts not tested by VISTA. Nonetheless, to evaluate the ability 27 of different methods to distinguish VISTA positives from negatives, we computed the relative enrichment for VISTA 28 positives vs. negatives. Again, with the exception of FANTOM, the enhancer sets from all contexts do not demonstrate 29 clear ability to distinguish the positives and negatives (Supplementary Figure 4). For all contexts, except GM12878, the 30 FANTOM enhancers are more than 2x more enriched for VISTA positives than negatives, but again we note that these 31 FANTOM/VISTA results are based on relatively small numbers of enhancers (Supplementary Table 3). Furthermore, 32 among VISTA positives overlapped by one or more enhancers in liver, 37.9% are unique to a single method and 28.8% 33 are overlapped by two methods, suggesting that different strategies identify different subsets of validated enhancers.

35
Interpretation of GWAS hits and eQTL is contingent on the enhancer identification strategy used  Table 4). Nonetheless, 44 GWAS tag SNPs are significantly enriched at similar levels in most enhancer sets and contexts, with the exception of 45 FANTOM, which has roughly twice the enrichment of other methods ( Figure 4C). Since the tag SNPs may not be the 46 functional variants, we also considered SNPs in high linkage disequilibrium (LD) with the GWAS SNPs (r 2 > 0.9). While the overall enrichments were lower, the variability between enhancer sets remained small (Supplementary Figure 5).

48
Most GWAS tag SNPs are predicted to overlap an enhancer by only a single method ( Figure 4D Figure 5). This illustrates that the annotation of sequences harboring GWAS SNPs varies greatly 1 depending on the enhancer identification strategies used. Since the GWAS catalog contains regions associated with 2 diverse traits, we manually curated the set of GWAS SNPs into subsets associated with phenotypes relevant to liver or 3 heart (Supplementary Table 5). As in the full GWAS set, the majority of curated liver SNPs are overlapped by a single 4 method (58.5%) and only 3.7% are shared by all four methods. The heart and liver enhancer sets are almost universally 5 more enriched for overlap with GWAS SNPs that influence relevant phenotypes compared to GWAS SNPs overall (p = 6 0.002 for liver, p = 0.026 for heart, Mann-Whitney U test). FANTOM enhancers are the exception to this trend due to the 7 small number of overlapping context-specific SNPs (Supplementary Table 4). This suggests that the different methods, in 8 spite of their lack of agreement, all identify regulatory regions relevant to the target context.

9
To test if these patterns hold for functional genetic variants that are not tied to complex disease, we analyzed the 10 overlap of enhancer sets with expression quantitative trait loci (eQTL) identified by the GTEx Consortium. These  dominate any other variation introduced by biological, technical, or methodological differences. Second, similarity in 51 genomic distribution of enhancer sets does not necessarily translate to similarity in functional space, and vice versa. For the machine-learning-based sets in genomic comparisons than in functional analyses. Finally, comparing enhancer sets by 1 performing hierarchical clustering within and between biological contexts reveals that genomic distributions are generally 2 more similar within biological contexts, compared to other sets defined by the same method in a different context (Figure 3 5E). For example, the ChromHMM set from heart is more similar to other heart enhancer sets than to ChromHMM sets 4 from other contexts. In contrast, the enhancer set similarities in functional space are less conserved by biological context 5 ( Figure 5F). Here, the heart ChromHMM set is functionally more similar to the H3K27acMinusH3K4me3 set from liver 6 cells than other heart enhancer sets. In general, cell line enhancer sets (red and green) show more functional continuity 7 than heart and liver sets (blue and purple). However, the FANTOM enhancers are exceptions to these trends; FANTOM 8 enhancers from each context form their own cluster based on their genomic distribution, underscoring their uniqueness.

10
Combining enhancer sets does not increase evidence for regulatory function

11
Although there are large discrepancies in genomic and functional attributes between enhancer sets identified by different 12 methods in the same context, we hypothesized that the subset of regions shared by two or more sets would have stronger 13 enrichment for markers of gene regulatory function. To test this, we analyzed enhancers to determine whether those 14 identified by multiple methods have increased "functional support" compared to regions identified by only a single 15 method. We evaluated three signals of functional importance for each enhancer:

53
Because enhancer identification strategies have such substantial differences, one strategy cannot and should not 54 be used as a proxy for another. Importantly, using different strategies can yield substantially different biological interpretations and conclusions, e.g., about the gene regulatory potential of a SNP or the degree of evolutionary constraint 1 on enhancers. This is particularly important, given that studies of gene regulation commonly use only a single approach to 2 identify enhancers. For example, GWAS have identified thousands of non-coding loci associated with risk for complex 3 disease, and a common first step in the interpretation of a trait-associated locus is to view it in the context of genome-wide 4 maps of regulatory enhancer function 34-36,41,44,54,55,58 . Thus, our findings complicate the use of annotated enhancers to study 5 the mechanisms of gene regulation and to elucidate the molecular underpinnings of disease, most notably in non-coding 6 variant prioritization 29,59,60 .

7
Our goal was not to identify the best method for identifying enhancers, but rather to evaluate the congruence of 8 the diverse strategies in use today. Given their differences in assumptions, motivations and protocols, it is not surprising 9 that different assays and algorithms identify somewhat different sets of enhancers; however, the degree of difference we 10 observe is striking. The consistent lack of agreement between methods demonstrates that the respective working 11 definitions of "enhancer" have little overlap. Focusing on functional annotations, we find agreement between methods 12 about basic functions, but substantial differences in more specific annotations. This suggests that different strategies 13 contribute unique information towards the identification of functionally important enhancers.

44
We downloaded broad peak ChIP-seq data for three histone modifications, H3K27ac, H3K4me1, and H3K4me3 45 from the ENCODE project 30 for two cell lines, K562 and Gm12878, and from the Roadmap Epigenomics Consortium 65 46 for two primary tissues, liver and heart. Also from the ENCODE Encyclopedia (version 3.0), we downloaded the "enhancer-like" annotations; these combine DHSs and H3K27ac ChIP-seq peaks using an unsupervised machine learning To represent enhancer identification strategies in common use, we created three additional enhancer sets for this 7 study using histone modification ChIP-seq peaks and DNase-seq peaks downloaded from ENCODE and Roadmap 8 Epigenomics. The H3K27acPlusH3K4me1 track is a combination of H3K27ac and H3K4me1 ChIP-seq peak files 4,19,37 . If 9 both types of peaks were present (i.e., the regions overlap by at least 50% of the length of one of the regions) the 10 intersection was classified as an enhancer. Similarly, to create the H3K27acMinusH3K4me3 set for each context, we 11 intersected H3K27ac and H3K4me3 ChIP-seq peak files and kept regions where H3K27ac regions did not overlap a 12 H3K4me3 peak by at least 50% of their length. We derived the combination of H3K27ac and H3K4me3 and the 50% 13 overlap criterion from previous studies 14, 37,38 . The DNasePlusHistone track is based on the pipeline described in Hay et al.

28
We also evaluated the similarity of each pair of enhancer sets using the Jaccard similarity index. The Jaccard 29 index is equal to cardinality of the intersection of two sets divided by cardinality of the union. In our analyses, we 30 calculated the Jaccard index at the base pair level. We also computed the relative Jaccard similarity as the observed

52
We intersected the GWAS Catalog SNPs with each of the enhancer sets and calculated the total number of SNPs 53 overlapped as well as the total number of regions overlapped by at least one SNP. We compared those values between 54 enhancer sets and contexts. To determine whether there is enrichment of GWAS SNPs in a given enhancer set, we used the following protocol: count the number of SNPs that overlap enhancers; generate 1,000 random sets of regions that 1 maintain the length distribution of the enhancer set and avoid ENCODE blacklist regions; count the number of SNPs that 2 overlap each of the random regions; compare the observed SNP count with the SNP counts from the simulations and 3 calculate an empirical p-value. We expanded the set of GWAS Catalog SNPs to include SNPs in high LD (r 2 > 0.9) using 4 individuals of European ancestry from the 1000 Genomes Project (phase 3) 72 . We intersected this larger set of SNPs with 5 each of the enhancer sets and calculated the amount of overlap. We also identified the SNP with the maximum number of 6 enhancer set overlaps for each GWAS SNP's LD block. 7 We downloaded all GTEx eQTL from the GTEx Portal (v6p, downloaded 09-07-2016) 73 . We concatenated the 8 data from all 44 represented tissues and ran the enrichment analysis on unique eQTL, filtering at four increasingly strict 9 significance thresholds: 10 -6 , 10 -10 , 10 -20 , and 10 -35 . We interpreted the results from the p-value threshold of 10 -10 . To 10 determine enrichment of eQTL in a given enhancer annotation set, we used the same protocol as for the GWAS SNP 11 enrichment above. We also performed a tissue-specific enrichment analysis with only eQTL from liver and heart at p-12 value thresholds of 10 -6 and 10 -10 , presenting the results from p < 10 -10 , although our main results are consistent across 13 thresholds.

15
Relative enrichment for functionally validated enhancers 16 We downloaded all positive and negative enhancer sequences from the VISTA enhancer browser. For each method, we 17 separately determined enrichment for the positive and negative enhancer sets using the random permutation approach

23
Enhancer set GO term annotation and similarity 24 We used GREAT to find Gene Ontology annotations enriched among genes nearby the enhancer sets. GREAT assigns 25 each input region to regulatory domains of genes and uses both a binomial and a hypergeometric test to discover 26 significant associations between regions and associated genes' ontology terms 74 . Due to the large number of reported 27 regions in each enhancer set, we consider significance based only on the binomial test with the Bonferroni multiple testing 28 correction (<0.05). We downloaded up to 1,000 significant terms for each enhancer set from two of the GO ontologies: 29 Molecular Function (MF) and Biological Process (BP). We calculated the similarity between lists of GO terms using the 30 GOSemSim package in R 57 . GOSemSim uses Wang's sematic similarity metric which accounts for the hierarchical 31 organization and relatedness of ontology terms when calculating the similarity score 56 . For each pair of enhancer sets, we 32 calculated the similarity between GO terms, and converted the resulting similarity matrix into a distance matrix for 33 hierarchical clustering. We also calculated the number of shared GO terms between pairs of methods, and manually 34 compared the top ten significant terms for each enhancer set.

37
To identify groups of similar enhancers in genomic and functional space, we performed hierarchical clustering on the 38 enhancer sets. For functional similarity, we clustered the lists of GO terms returned for each enhancer set. We calculated 39 Wang's similarity metric using the GoSemSim package in R and converted it to distance by subtracting the similarity 40 score from 1. For genomic similarity, we converted the pairwise Jaccard similarity to distance in the same way and 41 clustered the enhancer methods based on this measure. For both distance matrices we used agglomerative hierarchical 42 clustering in R function with the default complete linkage method to iteratively combine clusters 75 . We visualized the 43 cluster results as dendrograms using ggplot2 and dendextend 69,76 . We performed multidimensional scaling (MDS) on the 44 Jaccard and GO term distance matrices using default options in R 75 .

46
We stratified enhancers by the number of identification strategies that annotate them in order to determine whether 48 enhancers predicted by increasing numbers of methods show a higher percentage of evolutionarily conserved base pairs or 49 a higher fraction of SNPs/eQTL per bp. For each predicted enhancer, we calculated the number of base pairs that overlap 50 a given reference set, and used that to compute the relevant statistics. We considered three different reference sets: 51 evolutionarily conserved base pairs as defined by PhastCons elements, GWAS SNPs, and GTEx eQTL to calculate the 52 percent bases conserved and GWAS SNPs and GTEx eQTL per base pair for predicted enhancers.

Confidence distribution of enhancer sets
For enhancer sets with region-level scores available, we ranked each enhancer by its score, and then analyzed whether 1 regions that have higher scores are more likely to be predicted by other identification methods. We calculated the rank 2 using the ChIP-seq or CAGE-seq signal scores for a subset of methods (H3K27ac, H3K27acPlusH3K4me1, 3 H3K27acMinusH3K4me3, DNasePlusHistone, FANTOM), and the machine learning derived score for 4 EncodeEnhancerlike regions. Within each set, we sorted the enhancer regions by score and assigned ranks starting at 1 for 5 the top-scoring region. We then partitioned the enhancer regions in each set by the number of other enhancer sets that 6 overlap at least one base pair in that region.

8
Enhancer database 9 We created creDB, a database of cis-regulatory elements. It currently contains all 1,371,867 enhancers analyzed in this 10 study and representative sets from other common contexts. It is implemented as an SQLite database with an R interface 11 and distributed as an R package. Our design facilitates the integration of creDB into a wide range of genome-wide 12 bioinformatics workflows, alleviating the vetting and processing that is necessary with flat file downloads. This sets it 13 apart from other efforts that collect enhancers but focus on providing interfaces for searching a small number of

5
The middle columns describe the computational processing (if any) performed on the raw data (ML: machine learning).

6
The rightmost columns give the contexts in which the sets were available.