Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines
© Rye et al.; licensee BioMed Central Ltd. 2014
Received: 21 May 2013
Accepted: 7 December 2013
Published: 26 March 2014
Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from these two projects.
Transcription start sites can be distinguished by chromatin states defined by specific combinations of both chromatin mark enrichment and the profile shapes of these chromatin marks. The observed patterns can be associated with cellular functions and processes, and they also show association with expression level, location relative to nearby genes, and CpG content. In particular we find a substantial number of repressed inter- and intra-genic transcription start sites enriched for active chromatin marks and Pol II, and these sites are strongly associated with immediate-early response processes and cell signaling. Associations between start sites with similar chromatin patterns are validated by significant correlations in their global expression profiles.
The results confirm the link between chromatin state and cellular function for expressed transcripts, and also indicate that active chromatin states at repressed transcripts may poise transcripts for rapid activation during immune response.
KeywordsFantom Encode Cage Transcription start sites Chromatin states Gene expression
The transcriptional landscape of human cells is tightly linked to chromatin structure. By modulating chromatin, transcription factors (TFs) and chromatin modifying enzymes decides which transcripts, and the amount of each that are produced by a cell [1, 2]. Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes, represents an ongoing endeavor towards our understanding of human cellular biology. However, the diversity of the transcriptional landscapes among different cell types in the human organism, and the complex mechanisms that account for this diversity are just beginning to be understood. Recently two large scale efforts with the goal to map and understand the regulatory and transcriptional landscape of human cells and tissues have been undertaken. Using single molecule Cap Analysis of Gene Expression (CAGE ) technology to profile 975 human tissues, cell lines and primary cells, the FANTOM5 consortium has generated a comprehensive map of transcription start sites (TSSs) and their relative expression across the human genome  The amount of TSS data produced by this consortium have been condensed into a global set of 184 827 defined Robust clusters of Transcription Start Sites (here abbreviated as RTSSs) throughout the human genome. A robust cluster is defined as groups of TSSs which are in close proximity of each other in the genome, have the same direction of transcription, share a similar global expression pattern across all cells and tissues, and have sufficient support in the number of CAGE sequence tags . The ENCODE project  has generated data on a huge amount of features that participate in the regulation of gene expression in human cell lines. Among the several approaches taken by ENCODE to investigate the different aspects of transcript regulation, the mapping of chromatin modifications and transcription factor binding sites in selected human cell lines using ChIP-Seq [6, 7] is probably the most comprehensive. To facilitate comparison and utilize the efforts made by both projects, the four cell lines K562, GM12878, HeLa-S3 and HepG2 used by ENCODE where specifically subjected to CAGE in FANTOM5.
One of the important findings in the ENCODE project was the impact on gene expression by different combinations of chromatin modifications at regulatory elements throughout the genome. Chromatin modifications are post-translational chemical modifications, most commonly methylations and acetylations, on the N-terminal tails of the eight histone proteins constituting the nucleosome core. These modifications affect the interaction between the core and the DNA wrapped around it, as well as interactions with chromatin-binding proteins, resulting in configurations of open and closed chromatin . In addition, variants of the histone proteins and modifications to the DNA itself also impact the chromatin configuration. The general distribution of chromatin and other DNA-binding proteins can be analyzed by DNase Hypersensitivity (DNase HS) , which can identify regions of open chromatin. Overall features like chromatin modifications, histone variants and open chromatin are referred to as chromatin marks. Studies made by ENCODE and others have shown that different combinations of chromatin marks can separate the chromatin landscape of the genome into states of open and closed chromatin, where closed chromatin generally corresponds to repression of transcription, and open chromatin corresponds to active transcription. Active chromatin can further be separated into additional states, depending on the enrichment of various active chromatin modifications [10–13]. Two examples of such states are found in promoters and enhancers, which affect transcription from proximal and distal genomic locations, respectively. Other studies have shown that the actual shapes of enrichment for individual chromatin marks also differ between genomic locations. However, the functional implications of these differences have been less investigated [14–17].
For studies of chromatin profile shapes, a crucial step is the definition of anchor points throughout the genome, which are used as reference positions for the study of shapes in the neighborhood of the anchor points. TSSs of annotated genes are examples of such anchor points. However, these generally represent a too limited selection of genomic sites for general analysis, considering that a large amount of regulation takes place distal from any annotated gene TSS. Other strategies for anchor point definitions have thus included binding sites for clusters of transcription factors  or for specific transcription factors, like the enhancer associated protein p300 [14, 18, 19]. One challenge with this approach is the lack of directionality of such data. Directionality is important, because individual chromatin shapes have been shown to display asymmetry around anchor points, especially if the anchor points are transcript-producing . Another challenge is the functional heterogeneity of various transcription factors, which can make the anchor points difficult to compare. In contrast to transcription factors, RTSSs as defined in FANTOM5 are both directional and represent a set of genomic sites associated with the same function, that is, activation of transcription. In addition, due to aggregation of data across multiple cell types, a lot of RTSS regions will have zero expression in any individual cell type, since the general FANTOM5 RTSS regions are defined over a comprehensive set of human cells and tissues. This information represents a novel opportunity to investigate chromatin marks genome-wide at locations where transcription is known to be repressed, which could previously be investigated only for TSS positions of annotated genes. TSSs from CAGE were previously used to analyze states for a single chromatin mark (H3K9ac) in few cell lines during FANTOM4 .
It has now become well established that the regulatory landscape of the human genome includes much more than the genomic regions surrounding the approximately 22 000 currently well annotated genes. The 184 827 globally defined transcripts from FANTOM5, as well as the mapping of chromatin states and transcription factors in ENCODE are both attempts to map the characteristics and diversity of these transcriptional events, and the mechanism that regulate them. In contrast to most previously known genes, the function of these novel transcripts is mostly unknown. However, several have been shown to correlate with transcriptional outputs of nearby genes [4, 19, 21–30]. Whether this correlation is due to direct spatial interaction between regulatory elements, co-transcription from the same promoter, assisted recruitment of factors promoting transcription, or establishment of favorable chromatin domains remains to be determined , but should nevertheless encourage the association of such non-coding transcripts to nearby genes.
The four cell lines K562, GM12878, HeLa-S3 and HepG2 common to ENCODE and FANTOM5 all include the complete set of 12 chromatin marks mapped by ChIP-Seq in ENCODE. In addition, the 184 827 RTSSs from FANTOM5 defined globally over 975 human tissues, cell lines and primary cells represent an opportunity to investigate chromatin marks at RTSSs repressed in the respective cell-lines, as well as the expressed ones. In this study we have used globally defined RTSSs from FANTOM5 as anchor points, and investigated combinations of enrichment and shape profiles for chromatin marks around these anchor points. Most RTSSs are intra- or inter-genic, rather than being located at or close to currently annotated TSSs. These RTSSs are mostly repressed in the four cell lines studied, however, we also discovered a substantial number of such repressed inter- and intra-genic RTSSs harboring activating chromatin marks and Pol II, indicative of regulatory elements poised for transcription. Using a tool for ontology analysis in genomic regions, we found that these RTSSs were strongly associated with immediate-early responses and cell signaling. Shape profiles for chromatin marks around expressed RTSSs were subjected to a two-level clustering procedure, identifying metaclusters with combinatorial characteristics of enrichment and shape. These metaclusters differed substantially in functional ontology annotations, average RTSS expression, location of RTSSs with respect to nearby genes, and CpG content, indicating that the clusters are biologically relevant. Finally we validated the associations between RTSSs within metaclusters, showing that the global expression levels of corresponding RTSSs are correlated.
This work is part of the FANTOM5 project. Data downloads, genomic tools and co-published manuscripts are summarized at http://fantom.gsc.riken.jp/5/.
Globally defined RTSSs are mostly located in intra- and intergenic regions, and repressed in individual cell lines
Cell line specific expressed and repressed RTSSs and their association with genomic regions
Markers for active chromatin show enrichment at both expressed and repressed RTSSs
Active chromatin marks at expressed and repressed RTSSs show distinct chromatin profiles, and differ in nucleosome positioning at their TSSs
Because expressed and repressed RTSSs displayed such distinct profiles characteristics of active chromatin marks, we chose to analyze these two classes of RTSSs separately throughout the rest of our study. We start with the analyses of expressed RTSSs, and continue with the repressed RTSSs afterwards.
Clustering of chromatin profiles around expressed RTSSs identifies combinatorial subsets of various asymmetric chromatin shapes
Metaclusters of RTSSs differ in functional associations, average expression level, localization with respect to nearby genes and enrichment of CpG-islands
The efforts made by FANTOM5, ENCODE and others have confirmed the huge landscape of transcriptional events existing in addition to the well-known catalogue of protein coding genes. Most of this landscape consists of non-coding transcripts, whose exact functions have yet to be determined. One commonly described property of these non-coding transcripts is their tendency to affect the regulation of nearby genes. To determine the biological relevance of the defined metaclusters, we linked the metaclusters to functional annotations using the publicly available Genomic Regions Enrichment of Annotations Tool (GREAT) . GREAT is a tool that assigns functionality to a set of genomic regions based on nearby genes, and is thus well suited for analyses of RTSS metaclusters with an abundance of intra- and inter-genic elements. In addition, we investigated whether the metaclusters differed with respect to number of associated RTSSs, average RTSS expression level, localization with respect to nearby genes and CpG content.
To determine functional associations of the different metaclusters, we analyzed functional terms extracted from GREAT for each metacluster. Due to possible confounding of RTSSs located close to each other in the genome, we used two strategies referred to as permissive and conservative for analysis in GREAT (Methods). For both strategies we used the total set of RTSSs in all metaclusters as background data. By doing this, we identify terms significantly over-represented in one metacluster compared to other metaclusters, rather than compared to a general genomic background. A total of 5229 and 3671 genes significantly related to 2114 and 1293 terms were retrieved from GREAT for all metaclusters by this approach, for the permissive and conservative strategy respectively, while no significant genes or terms were retrieved for random selections of RTSSs. Using the permissive strategy, all metaclusters were associated with many, often related, significant terms, while this was only true for a subset of metaclusters in the conservative strategy. Terms associated with individual metaclusters were considerably more different between metaclusters in the same cell line than between metaclusters in different cell lines [Additional file 1: Figure S8]. Metaclusters with similar functional terms between the cell lines also shared individual RTSSs, as well as chromatin configurations in these cell lines, showing that chromatin configurations are reproducible. Both the enrichment of individual chromatin marks and the profile shape of these marks were important for separating metaclusters. For example metacluster c4 in K562 is separated from metacluster c2 by enrichment differences for H3K9ac, H3K36me3 and H3K4me3/2, while the separation from metacluster c8 is mainly due to differences in the profile shapes of H3K9ac, H3K27ac and H3K4me2/3.
The annotation terms identified by GREAT imply a functional association between RTSSs and their nearby annotated genes. To validate that sensible RTSS-to-gene interactions are represented in these associations, we used the global expression profiles over all 975 FANTOM5 samples and calculated intra-correlations between RTSSs in windows of increasing size, anchored on annotated TSSs of genes associated with RTSSs through GREAT (Figure 6b, Methods). For each window and each cluster, we compared intra-correlation in expression profiles between RTSSs present in each cluster to the correlation observed when all globally defined RTSSs within the window were considered. We generally observed a higher correlation between RTSSs within clusters than within all globally defined RTSSs, especially in windows representing the distal RTSSs. The high correlations observed in all calculations indicate that co-expression of nearby genes within clusters is substantial. Overall the results supports that many of the RTSS-to-gene associations identified by GREAT are sensible, and validate that strategies such as those applied by GREAT to attach possible functions to non-coding transcripts that currently lack functional annotations are feasible.
In general, all observations described above show that the identified metaclusters differ in several properties and associated functions, that properties, functions and chromatin states are related, and that these relations are reproducible across cell lines. All these results when taken together show that the subclusters that were identified by the different chromatin configurations through the metaclustering approach are biologically relevant.
Repressed RTSSs enriched for active marks are linked to immune response by gene ontology terms, and contain additional enrichment of polymerase II
Genes responding rapidly to environmental stimuli, for example in immune response, have been shown in several studies to display only low levels of transcription, but with marks characteristic of an active chromatin state, also when the cell is unstimulated [38–41]. These genes were defined as being in a poised state, where the active chromatin poises the gene for rapid activation in response to external stimuli. Similar poised states were also recently shown to exist for enhancers . An additional common feature described in these studies was the additional enrichment of polymerase II (Pol II) in the poised genes and enhancers, where the initiating form of Pol II, but not the elongating form, was generally observed in the poised regions. The existence of initiating Pol II was shown to transform to the elongating form rapidly in response to stimuli . As the region changed from poised to active, more Pol II was also shown to be recruited to the region. To investigate whether our selected RTSSs displayed characteristics of such poised regions, we downloaded data on Pol II in all four cell lines from ENCODE (Methods), and investigated the Pol II enrichment in our poised RTSSs. Indeed, we found that our poised RTSSs showed enrichment of Pol II in all four cell lines (Figure 8b; [Additional file 1: Figure S12]). The enrichment was less than for expressed RTSSs, but considerably higher than the general average over all repressed RTSSs.
We also tried to investigate whether our selected regions would respond to external stimuli. For this analysis, we could only find one relevant dataset from ENCODE. The data was for Pol II enrichment in the K562 cell line after stimulation with interferon alpha (IFNα) and gamma (IFNγ). We observed similar levels of Pol II for our selected RTSSs relative to Pol II levels for expressed RTSSs before and after stimulation of both IFNα and IFNγ (details in S13, [Additional file 1: Figure S14 and S15]). This was in contrast to the previous studies which reported a general increase in Pol II levels after stimulation .
Several studies have now shown that the transcriptional landscape of human cells is far more diverse than previously anticipated [5, 45, 46]. In addition to the well-known protein coding transcripts, an abundance of intra- and intergenic non-coding transcripts are also produced, whose functions have yet to be determined. Based on results from previously published studies, we assume that one role of these non-coding transcripts is to affect the expression of genes in their neighborhood, and have used GREAT to assign functional relationships to these non-coding transcripts through their association with nearby genes. Moreover we have assumed that transcripts sharing similar patterns of chromatin enrichment and profile shape are associated with similar functions, and have thus grouped the transcripts into distinct clusters based on chromatin features. This strategy has revealed strong non-overlapping functional associations for the different clusters, many of which are reproducible across the four studied cell lines. Some of the identified clusters also display chromatin configurations which, to our knowledge, are yet uncharacterized. The two most prominent of these are clusters with a non-canonical H3K79me2 profile associated with cell division, nucleosome assembly and histone proteins found in all four cell lines, and clusters with sole H3K36me3 enrichment related to G-protein coupled receptor signaling found in three of the cell lines.
Intra- and intergenic RTSSs have correlated expression with nearby genes
In this study we have assumed that an important function of regulatory elements, including RTSSs producing non-coding transcripts, is to affect the transcription of nearby genes. The RTSSs is a subset of the general collection of regulatory elements available to a cell, which also includes enhancers and insulators, many of which do not produce their own transcripts. There are individual examples of situations where regulatory elements affect flanking genes [47–50], and where the regulatory element and the affected gene are separated by several unaffected genes [51–53]. However, it less known how common these modes of regulation are on a global scale. Some studies have found the association of regulatory elements to flanking genes to be substantial [11, 54, 55], while other studies using 5C technology  to identify spatial genomic interactions have concluded that the association between a distal regulatory element and its closest gene is less common . Studies of spatial genomic interactions have also revealed that a single regulatory element may affect several genes, and a gene may be affected by several regulatory elements, complicating the picture further . However, while spatial interaction is necessary for some regulatory elements, like enhancers, to execute their function, transcript-producing regulatory elements do not necessarily need to interact directly with the genes to affect their transcription. GREAT allows associations with both upstream and downstream genes at the same time, but not beyond the closest gene or a genomic distance limit. Considering GREAT’s dependence on closest gene associations, the strong functional relations observed in this study may seem somewhat surprising. We acknowledge that several of the individual RTSS-to-gene associations identified by GREAT may be false positives. However, we see several reasons why the functional analysis still might work. i) The analysis in GREAT is based on statistical overrepresentation, which makes it robust against low levels of misclassification. ii) Several of the clusters are enriched for RTSSs that are proximal to, and often coincide with, their nearest gene. These RTSS-to-gene associations are thus very likely to be true. iii) Genes with similar function may have a tendency to be located in the same genomic region [59, 60]. So, even if a specific RTSS-to-gene association is wrong, the RTSS may still be affiliated with a gene with similar function, resulting in a correct functional association. iv) Validation of RTSS-to-gene associations using global expression profiles shows that the correlations for RTSSs within the same clusters are higher than for closely located RTSSs in general, and that this improved correlation is most visible for distal RTSSs. This indicates that the RTSS-to-gene associations used by GREAT are more likely to be correct than a random association between two RTSSs within the same genomic region. We have not validated individual RTSS-to-gene associations in this study. However, several associations are interesting candidates for further investigation. Overall we have shown that clustering RTSSs based on chromatin configuration, and using GREAT for ontology annotation of each cluster, has produced functional annotations for these clusters that seem to be reasonable and are reproducible across cell lines.
Inter- and intragenic RTSSs enriched for active chromatin marks and Pol II are poised for activation
The investigation of average chromatin profiles around repressed and expressed RTSSs has revealed a subtle relationship between open and closed chromatin, and between transcript repression and expression. Especially our analyses of a selected subset of poised RTSSs that are substantially enriched for active chromatin marks, but with zero expression levels, shows that chromatin state is not always directly correlated with active transcription. In addition we also observe slight enrichment of repressive marks, especially H3K9me3, at expressed RTSSs. Possible reasons for H3K9me3 enrichment in gene bodies have been described previously , but their potential effect on TSSs has to our knowledge not been studied. Our poised RTSSs were selected by quite conservative criteria, but should still make up a representative subset for this category of RTSSs. As evidence for this, an alternative selection procedure resulted in sets of poised RTSSs which were highly overlapping with the sets used for these analyses. Poised regulatory elements reside generally in regions of open chromatin, and are used by the cell to respond rapidly to environmental cues. Because they reside in open chromatin, their function can be initiated with at most a limited degree of chromatin remodeling, and often also without any de novo production of transcription factors , and this ensures rapid activation. We observed characteristics for our RTSSs which indicate that they represent such poised regulatory elements. First we observed substantial enrichment of Pol II at the RTSSs, which is a typical hallmark for poised regulatory elements. This enrichment was observed independently of the selection procedure. Previous reports have discussed the role of stalled Pol II at poised regulatory elements, and it has been suggested that one role of these elements is to load Pol II onto the gene promoter through the activation of long-range spatial interactions [62, 63]. In our analysis, since the RTSSs actually represent transcription events, we find it more likely that Pol II initiates transcription at the respective RTSSs, though we cannot exclude that at least some RTSSs also function through other mechanisms. Second, the selected RTSSs are located in regions that are also occupied by genes associated with immune responses, cell signaling and general immediate cell responses, all of which are activated rapidly in response to environmental cues. Several of the poised RTSSs are located proximal to, or coincide with, the actual genes, creating quite robust RTSS-to-gene associations. For the distal RTSSs we again observed a higher global correlation of expression between the poised RTSSs in the genomic region than for general RTSSs in the same region, confirming that many of the RTSS-to-gene associations are also likely to be relevant. Third, common immediate response genes like FOS and JUN were associated with poised RTSSs in all cell lines, and we observed a higher CpG content in genes associated with all cell types than genes associated with three or less cell types, in accordance with other data on subsets of immediate-early response genes .
The predefined number of clusters reveals functional features despite lack of fine-structure in data
In our study we set the predefined number of clusters for the k-means clustering to 5 for clustering of individual chromatin marks, and 10 for the combination of marks. Other studies have identified higher number of profiles, both for each chromatin mark and for combinations of marks [10, 17], where the final number of states has been determined through various optimizations of a clustering procedure. Visual inspection of score plots from PCA revealed no obvious separation of groups of profiles for any chromatin marks, leaving no suggestion for an initial estimation of the number of clusters. However, the variation in RTSS profile shapes should still warrant that a separation into groups is meaningful. Thus the number of clusters was chosen to be suitable for keeping the number of states equal for all chromatin marks and cell lines, and at a level convenient for interpretation. The numbers chosen turned out to be sufficient for producing relevant functional associations, and thus this works as a proof of principle. We anticipate that more sophisticated ways of selecting clusters will probably produce stronger and more detailed functional associations than the ones observed in this study.
The regulatory landscape governing transcription in different cell types is highly complex. However, it is also predictable, in that the same cell type responds similarly every time it is subjected to the same environmental cue, and coordinated, in that several transcriptional elements respond in the same manner to stimulation. Genome wide mapping of various features, whether it is expression level measurements, TSS activity level, chromatin configuration, DNase HS or transcription factor binding and activity, all leave traces of this coordinated action. Thus, it is an important challenge to integrate such data and determine at what level it is meaningful to look for general patterns that are robust and predictable on a global scale, to investigate what these patterns mean in terms of function and phenotype, and what the main components that govern these patterns are. In this study we have used combinations of chromatin marks around a global set of experimentally defined TSSs, and identified subsets of TSSs with similar chromatin configuration, several of which have functional associations. Hopefully this and related strategies, together with integration of even more genome wide features, will continue to reveal patterns of ubiquitous and cell type specific gene regulation, expression and function.
We have integrated chromatin data from the ENCODE consortium with the robust set of globally defined TSSs from FANTOM5 to investigate how chromatin features can be used to distinguish TSSs with different properties in four cell lines analyzed by both consortia. We find that most TSSs are repressed in the cell lines studied here, however, a substantial number of the repressed TSSs are enriched with active chromatin marks. These TSSs are strongly associated with immediate-early response processes and cell signaling. Expressed TSSs can be clustered into subsets based on combinations of both enrichment and profile shape of individual chromatin marks. We identified three main groups of clusters which differ in average TSS expression, CpG island enrichment, TSS location with respect to nearby genes and functional GO terms. Interestingly, groups with clusters enriched for TSSs distal to nearby genes show the strongest functional associations. Finally we show that nearby TSSs with similar chromatin configuration show better correlation in global expression profiles than nearby TSSs in general, thus validating the link between chromatin states and cellular function.
ChIP-Seq mapped tag libraries, and enrichment regions for the 10 histone modifications H3K4me1, H3K4me2, H3K4me3, H3K27me3, H3K36me3, H3K9me3, H3K27ac, H3K9ac, H3K79me2, H4K20me1, histone variant H2A.Z (Broad Histone, Broad Institute), DNase hypersensitivity (DNase HS, Duke DNaseI HS, Duke University) and Pol II (SYDH TFBS, Stanford/Yale/USC/Harvard) were downloaded from ENCODE for the four cell lines K562, GM12878, HeLa-S3 and HepG2 . In addition, 4 ChIP-Seq datasets of Pol II after stimulation with IFNα and IFNγ measured after 6 and 30 hours were also downloaded from ENCODE (SYDH TFBS, Stanford/Yale/USC/Harvard). Nucleosome position sequencing data for the cell lines K562 and GM12878 were downloaded as bigWig files from ENCODE (Stanf Nucleosome, Stanford/BYU). For RTSS from FANTOM5, we started with a preliminary global CAGE RTSS dataset of 180 338 robust RTSS (this set was later expanded to 184 827 for the FANTOM5 main paper , the additional RTSS were not included in this study), in addition to cell-type specific expression of these clusters in the four selected cell lines. Three RTSS expression replicates were pooled into a single expression profile in each cell line. RTSS overlapping with unmappable regions from ENCODE  (mapability, exludable regions from Duke University and Stanford) were removed prior to analysis, reducing the number of global RTSS to 179 369. In addition, we encountered unexpected profiles for histone modification H3K27me3 in HepG2. These profiles were not confirmed by a second H3K27me3 dataset (UW Histone, University of Washington)) from ENCODE. We thus decided to discard this dataset from the analysis. An overview of all datasets used in the analysis are listed in [Additional file 1: Table S17].
Overlap of chromatin marks with expressed/repressed RTSSs
To define expressed RTSS, we used a mapped tag threshold of 5 for the cell lines GM12878, HeLaS3 and HepG2 and 3 for K562. The reason for the lower threshold in K562 was that the CAGE tag library for K562 contained a lower total number of tags (10.7 m) than the other three cell lines (30.2 m, 26.5 m and 33.1 m respectively). Only RTSSs regions with zero tag count were defined as repressed. RefSeq genes (UCSC Genome Browser 18.10.2011) [65, 66] were used for gene annotations. RTSSs in the category ±150 bp proximal to annotated RefSeq TSS were required to have the same strand directionality as the annotated TSS. Overlaps between RTSSs and chromatin marks were calculated for each mark individually using downloaded enrichment peak-profiles from ENCODE (filename extension .broadPeak for histone modifications and histone variant H2A.Z and .narrowPeak for DNase HS). An overlap between an enrichment peak and a RTSS was identified if the enrichment profile overlapped the RTSS plus a 500 bp extension from each end of the RTSS region. The extension was used because some chromatin marks associate with RTSSs up- or downstream, rather than at the exact position of the RTSS. Isolated RTSSs were defined as RTSSs with a genomic distance of at least 2kbp from any other RTSSs. The p-value for each overlap was calculated by the Genomic Hyperbrowser  using a Monte Carlo scheme with 100 permutations [Additional file 1: Table S18]. Details of the calculations can also be found at .
Processing of chromatin marks around RTSSs
Profiles around each of the 179 369 globally defined RTSSs in each cell line for all chromatin marks were calculated from ChIP-Seq mapped sequence read libraries downloaded from ENCODE. Replicates for each chromatin mark were pooled. RTSS center positions were used as genomic anchor points for profile regions spanning 3 kbp in both directions from the anchor point. Because the average sequence read length was estimated to be around 200 bp (ENCODE, Broad Histone, Broad Institute), start positions for the mapped reads were shifted by +100 bp for reads mapped to the positive strand, and -100 bp for reads mapped to the negative strand. Each profile was then calculated by summing all start positions in 100 bp intervals up and downstream of the anchor point, extending 3kbp in each direction. To limit the impact of noise, only RTSSs overlapping with ENCODE-defined significantly enriched regions for each chromatin mark were used to calculate the average profiles. Nucleosome data for K562 and GM12878 were downloaded as bigWig files from ENCODE, and profiles were calculated by summing values in 100 bp intervals ±3kbp around RTSSs as described for the chromatin marks.
Clustering of individual chromatin marks within each cell line
Clustering was performed for each chromatin mark in each cell line individually using k-means clustering with number of clusters set to 5. We chose this number both because it generally produced subprofiles that were clearly distinct in shape, and to avoid too much combinatorial variation for the subsequent meta-clustering (see below). For clustering we used profiles for all expressed RTSSs in each cell line. The profiles were calculated using the RTSS center position as anchor point, and averaging sequence read intensities in 100 bp windows extending 3kbp both up and downstream. RTSSs with profiles containing less than 100 reads for a chromatin mark were filtered out for the clustering of this mark. Applying this filter resulted in between 15 000 and 50 000 profiles clustered for each active mark, and between 1000 and 15 000 profiles for each repressive mark. All profiles selected for clustering were smoothed prior to clustering using Gaussian convolution with window size of 7 bins. Removing edge effects caused by the smoothing reduced the number of measuring points in each profile from 60 to 48. Clustering was performed using the kcluster function in the Python Bio.Cluster package. We used Pearson Correlation as distance measure rather than Euclidian Distance to emphasize profile shape rather than intensity differences, and also to reduce the effect of normalization. We did repeated analyses with number of passes, npass, set to 10, and found that this number made each clustering fairly reproducible as evaluated by visual inspection of the resulting profiles. We thus chose 200 passes in the final clustering of each chromatin mark, which should be sufficient to produce robust cluster profiles. All other parameters were set to default. To investigate the effect of confounding, we also selected profiles from isolated expressed RTSSs and clustered them separately. Clustered profiles using only the isolated clusters were generally comparable to cluster profiles from the full sets of expressed RTSSs. We thus used results from the full set clustering for further analysis.
Meta-clustering of correlation coefficients for multiple chromatin marks within each cell line
The initial clustering produced 5 average subprofiles for each of the 12 chromatin marks in each cell-line. For each expressed RTSS, a Pearson correlation coefficient was calculated between each of the RTSS chromatin profiles and the five subprofiles for this chromatin profile. This resulted in 60 correlation coefficients for each of the expressed RTSSs (55 for HepG2, because H3K27me3 was excluded from this cell line). Correlation coefficients where the total chromatin signal was below the predefined threshold of 100 where set to zero. The matrix of expressed RTSSs and chromatin profiles was then subjected to k-means clustering, with number of clusters set to 10 and number of passes to 1000. Other cluster parameters were the same as previously defined. The clustering was performed independently in each cell line.
Robust subset of repressed RTSSs with active marks
In each cell line profiles for all active marks around each repressed RTSS were pooled to create a general active profile for each RTSS. A RTSS was selected for the robust subset if it i) overlapped with a peak-region of significant enrichment for any chromatin mark, ii) the total signal for the pooled profile was above a threshold set to 1000 reads, iii) the correlation of the RTSS profile to the average pooled repressed profile was above 0.5, and iv) the correlation of the RTSS profile to the average repressed profile was significantly better (p-value ≤ 0.05) than the correlation to the average expressed profile. For the last criteria we implemented a statistical test for comparing dependent correlations . A total number of 6184 RTSSs for K562, 3813 for GM12878, 4345 for HeLa-S3 and 4303 for HepG2 passed these filtering criteria [Additional file 1: Table S9]. To confirm the robustness of the selected subsets, we also applied a second procedure to select repressed RTSSs with active marks. Instead of pooling the samples, we now used the five criteria described above on each chromatin mark individually, using a threshold of 100 (instead of 1000) on each individual mark. Then only repressed RTSSs which passed all criteria in at least three active marks were selected. This resulted in slightly fewer RTSSs for each cell line compared to the other selection procedure. Between 60% and 80% of the RTSSs selected by the second procedure were also selected by the first procedure. This overlap is high, considering that the selected RTSSs only constitute around 5% of the total number of repressed RTSSs enriched for any chromatin mark. We thus conclude that the selected subsets represent a robust selection of repressed RTSSs with active marks in each cell line.
Gene ontology enrichment analysis by GREAT
RTSS regions for each of the 40 metaclusters (10 in each cell-line) were individually submitted to the Genomic Regions Enrichment of Annotations Tool (GREAT)  using default parameters and the full set of expressed RTSSs from each cluster’s corresponding cell line as background. Terms and associated genes for each term were extracted for the most relevant categories, which we determined to be Molecular Function, Biological Process, PANTHER Pathway, Pathway Commons, BioCyc Pathway and MSigDB Pathway. Only terms displayed by GREAT were included in the analysis. In default mode, GREAT only displays the top 20 terms for each category which pass two statistical tests (p-value < = 0.05): A binomial test which accounts for over-representation in genomic regions, and a hypergeometric test which accounts for over-representation in functionally associated gene sets. In addition, the region fold enrichment must be larger than 2 for a term to be reported. For each cluster, localization enrichment with respect to associated genes was retrieved from Region-Gene Association Graphs (Binned by orientation and distance to TSS) displayed by GREAT. An issue with the initial analysis using RTSS locations in each cluster was the possibility of confounding of nearby RTSSs. RTSSs located less than a few hundred bp apart may have a confounded chromatin signature, which may bias the significance of some terms in GREAT. To deal with this issue, we applied two strategies for CAGE analysis. In the permissive strategy we used all RTSSs in each metacluster as input, thus allowing more weight to be put on regions where many RTSSs are located close together, while in the conservative strategy we merged all RTSSs within a 100 bp window surrounding anchor RTSSs. The RTSSs used as anchors for merging were the ones having the highest proximity to other RTSSs. The matrix used for PCA on GREAT terms was constructed by first listing all genes associated with significant terms for all metaclusters. Then, for each gene and each cluster, a value of 1 was assigned if significant terms for this cluster contained the gene, and 0 if the gene was not contained in the significant terms. This procedure resulted in a matrix where each of the 40 clusters is a sample, each gene is a variable, and each elements in the matrix have the value 0 or 1. For the poised RTSSs we collected terms from the same GREAT categories as for the metaclusters, but now we used the general human genomic background provided by GREAT rather than a customized background for the analysis. To evaluate the GREAT terms for the selected repressed RTSSs we compared them to terms generated using random sets of RTSSs with set sizes equal to the selected RTSS sets, and drawn randomly from i) the total set of all repressed RTSSs, and ii) the set of repressed RTSSs with not overlapping significantly enriched chromatin regions from ENCODE. To enumerate the terms related to immune response, we counted terms containing variants of the words immune, interferon, interleukin, cytokine, inflammation, TNF, NFkB and TCF. For terms related to signaling we only counted variants of the word signaling. Associated genes were also extracted from the GREAT reports.
CpG island enrichment analysis
CpG island coverage and enrichment in RTSS promoters (defined as the region 200 bp upstream of a RTSS) in 40 metaclusters, as well as promoters for genes affected by our selection of poised RTSSs, was computed using the Genomic HyperBrowser . A track of genomic locations for CpG islands was downloaded from the UCSC genome browser, and enrichment factors for each metacluster and gene set were computed as the ratio of observed bp overlap with the CpG island track versus the expected bp overlap across all promoters in each metacluster or gene set. For CpG content of genes affected by our poised RTSSs, we constructed a hypothesis test to investigate whether genes affected in all four cell lines (case) where more significantly enriched for CpG islands than genes affected in one, two or three cell lines (control). The p-value was computed using a Monte Carlo scheme where case and control marks were permuted randomly across all promoter regions of the analysis. Further details on the analysis, including the possibility to reproduce results, is given in a Galaxy page at .
Validation of RTSS-to-gene associations
The correlation between two RTSSs was calculated as Pearson correlation between expression levels over all 975 cell types and tissues analyzed in FANTOM5. For each cluster and the set of poised RTSSs in each cell-line, we used annotated TSSs for genes associated with each cluster from GREAT, and calculated intra-correlations between all RTSSs in increasing distances of 0.05, 0.2, 0.5, 1, 5, 10, 50 and 150 kbp upstream and downstream from the annotated gene TSS. Correlations were calculated for all globally defined RTSSs within the region, and RTSSs contained in each cluster only for the same region. Random correlations were calculated as all intra-correlations between 100 randomly selected RTSSs from the global set.
Robust clusters of transcription start site(s)
Transcription start site(s)
Cap analysis of gene expression
Principal component analysis.
- RIKEN Omics Science Center ceased to exist as of April 1st:
2013, due to RIKEN reorganization.
This project has been supported by the Research Council of Norway through the FUGE bioinformatics technology platform (183438/S10) and the Norwegian infrastructure for bioinformatics (208481/F50). This work was supported by a Research Grant from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) to the RIKEN Center for Life Science Technologies, a research grant from the Japanese MEXT through RIKEN Preventive Medicine and Diagnosis Innovation Program to Yoshihide Hayashizaki (YH) and a research grant for RIKEN Omics Science Center from MEXT to YH. We would like to thank all members of the FANTOM5 consortium for contributing to generation of samples and analysis of the data-set and thank GeNAS for data production. Jostein Johansen is acknowledged for suggestions on the PCA for functional genes.
- Li B, Carey M, Workman JL: The role of chromatin during transcription. Cell. 2007, 128: 707-719. 10.1016/j.cell.2007.01.015.PubMedView ArticleGoogle Scholar
- Kouzarides T: Chromatin modifications and their function. Cell. 2007, 128: 693-705. 10.1016/j.cell.2007.02.005.PubMedView ArticleGoogle Scholar
- Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO, et al: Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011, 21: 1150-1159. 10.1101/gr.115469.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Forrest ARR, Kawaji H, Rehli M, Baillie JK, de Hoon MJL, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M, Itoh M, et al: A promoter level mammalian expression atlas. Nature. 2014,http://dx.doi.org/10.1038/nature13182,Google Scholar
- Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, et al: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489: 57-74. 10.1038/nature11247.View ArticleGoogle Scholar
- Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007, 316: 1497-1502. 10.1126/science.1141319.PubMedView ArticleGoogle Scholar
- Park PJ: ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009, 10: 669-680.PubMed CentralPubMedView ArticleGoogle Scholar
- Bernstein BE, Meissner A, Lander ES: The mammalian epigenome. Cell. 2007, 128: 669-681. 10.1016/j.cell.2007.01.033.PubMedView ArticleGoogle Scholar
- Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE: High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008, 132: 311-322. 10.1016/j.cell.2007.12.014.PubMed CentralPubMedView ArticleGoogle Scholar
- Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010, 28: 817-825. 10.1038/nbt.1662.PubMed CentralPubMedView ArticleGoogle Scholar
- Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011, 473: 43-49. 10.1038/nature09906.PubMed CentralPubMedView ArticleGoogle Scholar
- Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, et al: Comprehensive analysis of the chromatin landscape in drosophila melanogaster. Nature. 2011, 471: 480-485. 10.1038/nature09725.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, Cui K, Roh TY, Peng W, Zhang MQ, Zhao K: Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 2008, 40: 897-903. 10.1038/ng.154.PubMed CentralPubMedView ArticleGoogle Scholar
- Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, et al: Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009, 459: 108-112. 10.1038/nature07829.PubMed CentralPubMedView ArticleGoogle Scholar
- Hon G, Wang W, Ren B: Discovery and annotation of functional chromatin signatures in the human genome. PLoS Comput Biol. 2009, 5: e1000566-10.1371/journal.pcbi.1000566.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang J, Lunyak VV, Jordan IK: Chromatin signature discovery via histone modification profile alignments. Nucleic Acids Res. 2012, 40: 10642-10656. 10.1093/nar/gks848.PubMed CentralPubMedView ArticleGoogle Scholar
- Kundaje A, Kyriazopoulou-Panagiotopoulou S, Libbrecht M, Smith CL, Raha D, Winters EE, Johnson SM, Snyder M, Batzoglou S, Sidow A: Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res. 2012, 22: 1735-1747. 10.1101/gr.136366.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Won KJ, Ren B, Wang W: Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biol. 2010, 11: R7-10.1186/gb-2010-11-1-r7.PubMed CentralPubMedView ArticleGoogle Scholar
- Zentner GE, Tesar PJ, Scacheri PC: Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. Genome Res. 2011, 21: 1273-1283. 10.1101/gr.122382.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Kalari KR, Rossell D, Necela BM, Asmann YW, Nair A, Baheti S, Kachergus JM, Younkin CS, Baker T, Carr JM, et al: Deep sequence analysis of non-small cell lung cancer: integrated analysis of gene expression, alternative splicing, and single nucleotide variations in lung adenocarcinomas with and without oncogenic KRAS mutations. Front Oncol. 2012, 2: 12-PubMed CentralPubMedView ArticleGoogle Scholar
- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL: Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011, 25: 1915-1927. 10.1101/gad.17446611.PubMed CentralPubMedView ArticleGoogle Scholar
- De Santa F, Barozzi I, Mietton F, Ghisletti S, Polletti S, Tusi BK, Muller H, Ragoussis J, Wei CL, Natoli G: A large fraction of extragenic RNA pol II transcription sites overlap enhancers. PLoS Biol. 2010, 8: e1000384-10.1371/journal.pbio.1000384.PubMed CentralPubMedView ArticleGoogle Scholar
- Flynn RA, Almada AE, Zamudio JR, Sharp PA: Antisense RNA polymerase II divergent transcripts are P-TEFb dependent and substrates for the RNA exosome. Proc Natl Acad Sci U S A. 2011, 108: 10460-10465. 10.1073/pnas.1106630108.PubMed CentralPubMedView ArticleGoogle Scholar
- Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, Young G, Lucas AB, Ach R, Bruhn L, et al: lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011, 477: 295-300. 10.1038/nature10398.PubMed CentralPubMedView ArticleGoogle Scholar
- Ho Y, Elefant F, Liebhaber SA, Cooke NE: Locus control region transcription plays an active role in long-range gene activation. Mol Cell. 2006, 23: 365-375. 10.1016/j.molcel.2006.05.041.PubMedView ArticleGoogle Scholar
- Kim A, Zhao H, Ifrim I, Dean A: Beta-globin intergenic transcription and histone acetylation dependent on an enhancer. Mol Cell Biol. 2007, 27: 2980-2986. 10.1128/MCB.02337-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F, Zytnicki M, Notredame C, Huang Q, et al: Long noncoding RNAs with enhancer-like function in human cells. Cell. 2010, 143: 46-58. 10.1016/j.cell.2010.09.001.PubMed CentralPubMedView ArticleGoogle Scholar
- Ponjavic J, Oliver PL, Lunter G, Ponting CP: Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain. PLoS Genet. 2009, 5: e1000617-10.1371/journal.pgen.1000617.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang KC, Yang YW, Liu B, Sanyal A, Corces-Zimmerman R, Chen Y, Lajoie BR, Protacio A, Flynn RA, Gupta RA, et al: A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature. 2011, 472: 120-124. 10.1038/nature09819.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim TK, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J, Harmin DA, Laptewicz M, Barbara-Haley K, Kuersten S, et al: Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010, 465: 182-187. 10.1038/nature09033.PubMed CentralPubMedView ArticleGoogle Scholar
- Bulger M, Groudine M: Functional and mechanistic diversity of distal transcription enhancers. Cell. 2011, 144: 327-339. 10.1016/j.cell.2011.01.024.PubMed CentralPubMedView ArticleGoogle Scholar
- Bertilsson H, Tessem MB, Flatberg A, Viset T, Gribbestad I, Angelsen A, Halgunset J: Changes in gene transcription underlying the aberrant citrate and choline metabolism in human prostate cancer samples. Clin Cancer Res. 2012, 18: 3261-3269. 10.1158/1078-0432.CCR-11-2929.PubMedView ArticleGoogle Scholar
- Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA, et al: Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A. 2010, 107: 21931-21936. 10.1073/pnas.1016071107.PubMed CentralPubMedView ArticleGoogle Scholar
- Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-resolution profiling of histone methylations in the human genome. Cell. 2007, 129: 823-837. 10.1016/j.cell.2007.05.009.PubMedView ArticleGoogle Scholar
- Ozsolak F, Song JS, Liu XS, Fisher DE: High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007, 25: 244-248. 10.1038/nbt1279.PubMedView ArticleGoogle Scholar
- Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang Z, Wei G, Zhao K: Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008, 132: 887-898. 10.1016/j.cell.2008.02.022.PubMedView ArticleGoogle Scholar
- McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G: GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010, 28: 495-501. 10.1038/nbt.1630.PubMedView ArticleGoogle Scholar
- Barski A, Jothi R, Cuddapah S, Cui K, Roh TY, Schones DE, Zhao K: Chromatin poises miRNA- and protein-coding genes for expression. Genome Res. 2009, 19: 1742-1751. 10.1101/gr.090951.109.PubMed CentralPubMedView ArticleGoogle Scholar
- John S, Sabo PJ, Thurman RE, Sung MH, Biddie SC, Johnson TA, Hager GL, Stamatoyannopoulos JA: Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat Genet. 2011, 43: 264-268. 10.1038/ng.759.PubMedView ArticleGoogle Scholar
- Lim PS, Hardy K, Bunting KL, Ma L, Peng K, Chen X, Shannon MF: Defining the chromatin signature of inducible genes in T cells. Genome Biol. 2009, 10: R107-10.1186/gb-2009-10-10-r107.PubMed CentralPubMedView ArticleGoogle Scholar
- Lim PS, Shannon MF, Hardy K: Epigenetic control of inducible gene expression in the immune system. Epigenomics. 2010, 2: 775-795. 10.2217/epi.10.55.PubMedView ArticleGoogle Scholar
- Ghisletti S, Barozzi I, Mietton F, Polletti S, De Santa F, Venturini E, Gregory L, Lonie L, Chew A, Wei CL, et al: Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages. Immunity. 2010, 32: 317-328. 10.1016/j.immuni.2010.02.008.PubMedView ArticleGoogle Scholar
- Hargreaves DC, Horng T, Medzhitov R: Control of inducible gene expression by signal-dependent transcriptional elongation. Cell. 2009, 138: 129-145. 10.1016/j.cell.2009.05.047.PubMed CentralPubMedView ArticleGoogle Scholar
- Ramirez-Carrozzi VR, Braas D, Bhatt DM, Cheng CS, Hong C, Doty KR, Black JC, Hoffmann A, Carey M, Smale ST: A unifying model for the selective regulation of inducible transcription by CpG islands and nucleosome remodeling. Cell. 2009, 138: 114-128. 10.1016/j.cell.2009.04.020.PubMed CentralPubMedView ArticleGoogle Scholar
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, et al: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006, 38: 626-635. 10.1038/ng1789.PubMedView ArticleGoogle Scholar
- Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, Vandesompele J, Mestdagh P: LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 2013, 41: D246-251. 10.1093/nar/gks915.PubMed CentralPubMedView ArticleGoogle Scholar
- Emison ES, McCallion AS, Kashuk CS, Bush RT, Grice E, Lin S, Portnoy ME, Cutler DJ, Green ED, Chakravarti A: A common sex-dependent mutation in a RET enhancer underlies hirschsprung disease risk. Nature. 2005, 434: 857-863. 10.1038/nature03467.PubMedView ArticleGoogle Scholar
- Visser M, Kayser M, Palstra RJ: HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. Genome Res. 2012, 22: 446-455. 10.1101/gr.128652.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Wasserman NF, Aneas I, Nobrega MA: An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res. 2010, 20: 1191-1197. 10.1101/gr.105361.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang X, Cowper-Sal lari R, Bailey SD, Moore JH, Lupien M: Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus. Genome Res. 2012, 22: 1437-1446. 10.1101/gr.135665.111.PubMed CentralPubMedView ArticleGoogle Scholar
- Dean A: On a chromosome far, far away: LCRs and gene expression. Trends Genet. 2006, 22: 38-45. 10.1016/j.tig.2005.11.001.PubMedView ArticleGoogle Scholar
- Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engstrom PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, et al: Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 2007, 17: 545-555. 10.1101/gr.6086307.PubMed CentralPubMedView ArticleGoogle Scholar
- Lettice LA, Heaney SJ, Purdie LA, Li L, de Beer P, Oostra BA, Goode D, Elgar G, Hill RE, de Graaff E: A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum Mol Genet. 2003, 12: 1725-1735. 10.1093/hmg/ddg180.PubMedView ArticleGoogle Scholar
- Teng L, Firpi HA, Tan K: Enhancers in embryonic stem cells are enriched for transposable elements and genetic variations associated with cancers. Nucleic Acids Res. 2011, 39: 7371-7379. 10.1093/nar/gkr476.PubMed CentralPubMedView ArticleGoogle Scholar
- Visel A, Blow MJ, Li ZR, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F, et al: ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature. 2009, 457: 854-858. 10.1038/nature07730.PubMed CentralPubMedView ArticleGoogle Scholar
- Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C, et al: Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 2006, 16: 1299-1309. 10.1101/gr.5571506.PubMed CentralPubMedView ArticleGoogle Scholar
- Sanyal A, Lajoie BR, Jain G, Dekker J: The long-range interaction landscape of gene promoters. Nature. 2012, 489: 109-113. 10.1038/nature11279.PubMed CentralPubMedView ArticleGoogle Scholar
- Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, Poh HM, Goh Y, Lim J, Zhang J, et al: Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012, 148: 84-98. 10.1016/j.cell.2011.12.014.PubMed CentralPubMedView ArticleGoogle Scholar
- Cohen BA, Mitra RD, Hughes JD, Church GM: A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat Genet. 2000, 26: 183-186. 10.1038/79896.PubMedView ArticleGoogle Scholar
- Hurst LD, Pal C, Lercher MJ: The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 2004, 5: 299-310.PubMedView ArticleGoogle Scholar
- Blahnik KR, Dou L, Echipare L, Iyengar S, O’Geen H, Sanchez E, Zhao Y, Marra MA, Hirst M, Costello JF, et al: Characterization of the contradictory chromatin signatures at the 3’ exons of zinc finger genes. PLoS One. 2011, 6: e17121-10.1371/journal.pone.0017121.PubMed CentralPubMedView ArticleGoogle Scholar
- Leach KM, Nightingale K, Igarashi K, Levings PP, Engel JD, Becker PB, Bungert J: Reconstitution of human beta-globin locus control region hypersensitive sites in the absence of chromatin assembly. Mol Cell Biol. 2001, 21: 2629-2640. 10.1128/MCB.21.8.2629-2640.2001.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhu X, Ling J, Zhang L, Pi W, Wu M, Tuan D: A facilitated tracking and transcription mechanism of long-range enhancer function. Nucleic Acids Res. 2007, 35: 5532-5544. 10.1093/nar/gkm595.PubMed CentralPubMedView ArticleGoogle Scholar
- Giskeodegard GF, Bertilsson H, Selnaes KM, Wright AJ, Bathen TF, Viset T, Halgunset J, Angelsen A, Gribbestad IS, Tessem MB: Spermine and citrate as metabolic biomarkers for assessing prostate cancer aggressiveness. PLoS One. 2013, 8: e62375-10.1371/journal.pone.0062375.PubMed CentralPubMedView ArticleGoogle Scholar
- Pal S, Gupta R, Davuluri RV: Alternative transcription and alternative splicing in cancer. Pharmacol Ther. 2012, 136: 283-294. 10.1016/j.pharmthera.2012.08.005.PubMedView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralPubMedView ArticleGoogle Scholar
- Sandve GK, Gundersen S, Rydbeck H, Glad IK, Holden L, Holden M, Liestol K, Clancy T, Ferkingstad E, Johansen M, et al: The genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol. 2010, 11: R121-10.1186/gb-2010-11-12-r121.PubMed CentralPubMedView ArticleGoogle Scholar
- Calculated p-values for repressed RTSS overlapping with active chromatin marks.http://hyperbrowser.uio.no/hb/u/sandve/h/rtss-vs-reference-biofeature,
- Steiger JH: Tests for comparing elements of a correlation matrix. Psychol Bull. 1980, 87: 245-251.View ArticleGoogle Scholar
- Calculated p-value for CpG-island enrichment in poised RTSS shared by all cell-lines.http://hyperbrowser.uio.no/test/u/sandve/p/CTSS-cpg-analysis,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.