Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes

Osato, Naoki

doi:10.1186/s12864-017-4339-5

Research
Open access
Published: 19 January 2018

Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes

Naoki Osato¹

BMC Genomics volume 19, Article number: 957 (2018) Cite this article

2565 Accesses
2 Citations
2 Altmetric
Metrics details

Abstract

Background

Transcriptional target genes show functional enrichment of genes. However, how many and how significantly transcriptional target genes include functional enrichments are still unclear. To address these issues, I predicted human transcriptional target genes using open chromatin regions, ChIP-seq data and DNA binding sequences of transcription factors in databases, and examined functional enrichment and gene expression level of putative transcriptional target genes.

Results

Gene Ontology annotations showed four times larger numbers of functional enrichments in putative transcriptional target genes than gene expression information alone, independent of transcriptional target genes. To compare the number of functional enrichments of putative transcriptional target genes between cells or search conditions, I normalized the number of functional enrichment by calculating its ratios in the total number of transcriptional target genes. With this analysis, native putative transcriptional target genes showed the largest normalized number of functional enrichments, compared with target genes including 5–60% of randomly selected genes. The normalized number of functional enrichments was changed according to the criteria of enhancer-promoter interactions such as distance from transcriptional start sites and orientation of CTCF-binding sites. Forward-reverse orientation of CTCF-binding sites showed significantly higher normalized number of functional enrichments than the other orientations. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types. The median expression level of transcriptional target genes changed according to the criteria of enhancer-promoter assignments (i.e. interactions) and was correlated with the changes of the normalized number of functional enrichments of transcriptional target genes.

Conclusions

Human putative transcriptional target genes showed significant functional enrichments. Functional enrichments were related to the cellular functions. The normalized number of functional enrichments of human putative transcriptional target genes changed according to the criteria of enhancer-promoter assignments and correlated with the median expression level of the target genes. These analyses and characters of human putative transcriptional target genes would be useful to examine the criteria of enhancer-promoter assignments and to predict the novel mechanisms and factors such as DNA binding proteins and DNA sequences of enhancer-promoter interactions.

Background

More than 400 types of cells have been found in the human body. Human development is accompanied by the differentiation of stem cells into various cell types, leading to a diversification of their phenotypes and functions. For example, the development of the immune system involves differentiation and diversification of stem cells into various types of mature immune cells. The functions of monocytes include phagocytosis and antigen presentation. CD4⁺ T cells, however, play a central role in cell-mediated immunity and are involved in the activation of phagocytes and antigen-specific cytotoxic T-lymphocytes, and the release of various cytokines in response to an antigen. The CD20⁺ B cells are involved in the production of antibodies against antigens.

Differentiation of cells is often triggered by the expression of transcription factors (TF) followed by the expression of their target genes, which results in the transformation of cells into other cell types. For example, the transcription factors PU.1 and CCAAT enhancer-binding protein α (C/EBPα) play a critical role in the expression of myeloid-specific genes and the generation of monocytes and macrophages [1, 2]. The transcription factor GATA-3 is essential for early T cell development and the differentiation of naive CD4⁺ T cells into Th2 effector cells [3]. E2A, EBF1, PAX5, and Ikaros are among the most important transcription factors that control early development in mice, thereby conditioning homeostatic B cell lymphopoiesis [4].

We previously examined the differentiation of monocytes and macrophages in mice, and discovered that the transcription factor IRF8 was essential for cellular differentiation [5]. An analysis of transcription factor-binding sites (TFBS) revealed that IRF8 regulated the expression of KLF4 through the IRF8 transcriptional cascade. Functional enrichment analyses revealed that the target genes of IRF8 showed functional enrichment for antigen presentation, whereas those of KLF4 showed functional enrichments for phagocytosis and locomotion. These results suggested that the transcriptional cascades of IRF8 and KLF4 included different functional modules of target genes.

Functional enrichments of transcriptional cascades of IRF8 and KLF4 appeared to be related to the cellular functions of monocytes and macrophages. Although several transcription factors were expressed in monocytes and macrophages, the number of these transcriptional target genes that resulted in functional enrichments remains unknown. Whether transcriptional target genes in other human cells show functional enrichments remain unclear. If the transcriptional target genes showed significant functional enrichment, analyzing transcriptional target genes would be useful in identifying genes involved in a specific cellular function. Using the budding yeast, previous studies examined the functional enrichments on a genome-scale genetic interaction map using the GeneMANIA algorithm [6,7,8]. Using bacterial systems, the analyses of functional enrichments of predicted regulatory networks were performed using Gene Ontology annotations [9]. Various databases of functional annotations of genes and pathways exist. Analysis of functional enrichments is expected to be useful for understanding the association of genes involved in similar functions and same pathways, and for predicting unknown gene functions such as non-protein-coding RNAs. In addition, the extent of enhancer contribution to functional enrichments of transcriptional target genes remains unknown.

In this study, transcriptional target genes were predicted using public databases of open chromatin regions of human monocytes, naive CD4⁺ T, CD20⁺ B cells, HUVEC, IMR90, MCF-7, HMEC, H1-hESC, iPSC, and ChIP-seq data of human H1-hESC cells and known transcription factor binding sequences. Functional enrichment analyses of putative transcriptional target genes were conducted using 10 different annotation databases of functional annotations and pathways. The gene expression level of transcriptional target genes was examined in the cells.

Results

Prediction of transcriptional target genes

To examine functional enrichments of transcriptional target genes in a genome scale, transcriptional target genes were predicted in human monocytes, CD4⁺ T cells, and CD20⁺ B cells. Searches for known transcription factor binding sequences, which were collected from various databases and papers, were conducted in open chromatin regions of the promoter sequences of RefSeq transcripts (Fig. 1, see Methods). Among 6277 transcription factor binding sequences derived from vertebrates, 4373 were linked to 1018 TF transcripts computationally (see Methods). To maintain the sensitivity of the searches for transcription factor binding sites and as some transcription factors will recognize multiple distinctly different sequence motifs, transcription factor binding sequences that targeted the same genes were recognized as redundant, and one of the sequences was used [10] (see Methods). In total, 3337 transcription factor binding sequences in human monocytes, 3652 in CD4⁺ T cells, and 3187 in CD20⁺ B cells were identified with their target genes, which were selected from highly expressed genes in a cell (top 30% expression level, see Methods).

The total numbers of unique highly expressed target genes of transcription factor binding sequences were 4481, 7558, and 4753 in monocytes, CD4⁺ T cells, and CD20⁺ B cells respectively using promoters. The mean target genes of a transcription factor were 124, 164, and 144 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively, with the corresponding medians being 24, 33, and 24, respectively. With regard to the genomic localizations of TFBS, 51%, 65%, and 61% of TFBS were located within promoters (±5 kb of TSS) of target genes in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively (according to association rule 1, see Methods).

Functional enrichments of putative transcriptional target genes

Functional enrichments of the putative target genes were examined. The distribution of functional enrichments in transcriptional target genes was predicted using genome sequences of promoters in the three cell types (Fig. 1, Table 1 and Additional file 1: Figure S3, see Methods). Furthermore, the effect of transcriptional target genes including randomly selected genes on functional enrichments was investigated using DNase-DGF data of monocytes, CD4⁺ T and CD20⁺ B cells, HUVEC, IMR90, MCF-7, HMEC, and ChIP-seq data of H1-hESC (Fig. 2a and b, and Additional file 1: Figure S1 and S2, see Methods). The native putative transcriptional target genes not including randomly selected genes showed the highest functional enrichments using Gene Ontology, GO Slim, KEGG, Pathway Commons, WikiPathways, InterPro and UniProt functional regions (Domains) in both DNase-DGF and ChIP-seq data of the five types of cells. Of the 10 databases used in this analysis, the Gene Ontology database consists of three types of functional annotations, i.e., 20,836 biological processes, 9020 molecular functions, and 2847 cellular components. The numbers of functional enrichments of Gene Ontology annotations in target genes of a transcription factor were 2902, 4077, and 2778 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively. An examination of functional enrichments of highly expressed genes (top 30% expression level), independent of transcriptional target genes, revealed 237, 301, and 239 ‘unique’ Gene Ontology annotations in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively (Table 1). Further, the examination of functional enrichments of highly expressed target genes (top 30% expression level) in target genes revealed 1271, 1654, and 1192 ‘unique’ Gene Ontology annotations in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively i.e., These numbers were four times larger than functional enrichments identified by gene expression information alone, independent of transcriptional target genes, suggesting that transcriptional target genes were frequently associated with similar functions or pathways (Additional file 1: Figure S8 and S9).

Table 1 Number of functional enrichments and unique functional enrichments of putative transcriptional target genes. (see the colored table in Additional file 1: Figure S3)

Full size table

Functional enrichments of transcriptional target genes from other databases were also examined (Table 1). KEGG, Target genes of transcription factors, Disease Ontology, GO Slim, Pathway Commons, Cellular biomarkers, Target genes of microRNAs, Protein domains, and WikiPathways had 95, 16, 127, 12, 242, 17, 97, 303, and 105 unique functional annotations, respectively. The numbers of functional enrichments of transcriptional target genes in the other annotation databases except for microRNAs and Protein domains were significantly higher than gene expression information alone, independent of transcriptional target genes, as well as Gene Ontology annotations (Table 1). The functional enrichments of transcriptional target genes from Pathway Commons for monocytes, CD4⁺ T cells, and CD20⁺ B cells are shown in Table 2 and Additional file 1: Figure S10. Functional enrichments were found to be related to cellular functions, e.g., interferon signaling, GMCSF (Granulocyte-macrophage colony-stimulating factor, a kind of cytokine)-mediated signaling events, antigen processing-cross presentation in monocytes; TCR (T-cell receptor) signaling in naive CD4⁺ T cells, IL-12 (Interleukin-12, a kind of cytokine)-mediated signaling events, and downstream signaling in naive CD8⁺ T cells in CD4⁺ T cells; interferon alpha/beta signaling, IL8- and CXCR2 (Chemokine receptor type 2, a kind of cytokine)-mediated signaling events, and BCR (B cell antigen receptor) signaling pathway in CD20⁺ B cells. WikiPathways, KEGG and GO also revealed that functional enrichments were associated with cellular functions (Additional file 1: Figure S11, S12 and S13).

Table 2 Functional enrichments of putative transcriptional target genes using Pathway Commons

Full size table

Effect of enhancer-promoter association rules on functional enrichments

To understand the effect of ‘promoter and extended regions for enhancer-promoter association (EPA)’ on the functional enrichments of target genes, the rule of extended regions was modified according to four criteria (Fig. 3a and see Methods) [11], and functional enrichments were investigated.

According to the association rule (1), the means of target genes were 177, 217, and 175 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively, whereas the corresponding medians were 55, 58, and 37, respectively (Additional file 1: Figure S14). The numbers of functional enrichments of Pathway Commons annotations using promoter regions were 1005, 1806, and 821 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively (Additional file 1: Figure S15). With the use of EPA (association rule 1), the numbers of functional enrichments of Pathway Commons annotations were 3087, 7216, and 3900, representing 3.07-, 4.00-, and 4.75-fold increases, respectively, in the three cells types. Additionally, the numbers of ‘unique’ Pathway Commons annotations with promoter regions were 321, 415, and 329 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively; the corresponding numbers with the use of EPA (association rule 1) were 364, 437, and 364, representing 1.13-, 1.05-, and 1.11-fold increases, respectively, in the three cell types. The normalized numbers of functional enrichments of Pathway Commons annotations were 44.75, 84.51, and 59.32, representing 1.84-, 2.80-, and 3.32-fold increases, respectively, in the three cell types (association rule 1, Table 3 and Additional file 1: Figure S15). Other cell types also showed the same tendencies (Table 3 and Additional file 1: Figure S15).

Table 3 Normalized number of functional enrichments of putative transcriptional target genes using promoter and extended regions for enhancer-promoter association (see the colored table in Additional file 1: Figure S4)

Full size table

The normalized numbers of the functional enrichments of transcriptional target genes showed association rule (4) as the highest number, followed by association rule (1) and (2) in the three cell types. Although association rule (3) was the longest among the four criteria, it showed the lowest number of functional enrichments in the three cell types (Fig. 3a and Table 3). ChIP-seq data of 19 TF in H1-hESC (Human embryonic stem cells) also showed almost the same tendency (difference between association rule (4) and (1) was not statistically significant, probably due to a large number of transcriptional target genes predicted using 19 TF ChIP-seq data. Several thousands of target genes of each TF were predicted. Some of them would be indirect interactions between TF and genome DNA, which were identified by ChIP-seq experiments. (Additional file 1: Figure S16, see Additional file 1).

Differences in functional enrichments using Pathway Commons were examined between promoters versus EPA (association rule 1) (Table 4 [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] and Additional file 1: Figure S17). A comparison of 321 and 364 functional enrichments using the promoters and EPA, respectively, in monocytes revealed that 152 (47% in promoters, 42% in extended regions) of them were common. For example, IFN-gamma (Interferon gamma) pathway, GMCSF (Granulocyte-macrophage colony-stimulating factor, a kind of cytokine)-mediated signaling events, and PDGF (Platelet-derived growth factor) receptor signaling network were enriched using extended regions (association rule 1) as opposed to promoters (Additional file 1: Figure S17). The comparison of 415 (promoters) and 437 (extended regions) functional enrichments in CD4⁺ T cells revealed that 163 of them (39% in promoters, 37% in extended regions) were common. IFN-gamma pathway, TCR (T-cell receptor) signaling in naive CD4⁺ T cells, and IL3 (Interleukin-3, a kind of cytokine)-mediated signaling events were enriched using extended regions. The comparison of 329 (promoters) and 364 (extended regions) functional enrichments in CD20⁺ B cells revealed that 171 of them (52% in promoters, 47% in extended regions) were common. IL5-mediated signaling events, IL4-mediated signaling events, and cytokine signaling in immune system were enriched in CD20⁺ B cells using extended regions. Only about 40% of functional enrichments of Pathway Commons annotations were unchanged between promoters and EPA. EPA significantly affected the functional enrichments of transcriptional target genes. Journal papers showed that frequent functional enrichments were related to the cellular functions in the three cell types (Table 4). These results showed that new functional enrichments related to cellular functions were identified using extended regions for enhancer-promoter association.

Table 4 Differences in functional enrichments between EPA and promoters using Pathway Commons

Full size table

Effect of CTCF-binding sites on functional enrichments

CTCF have the activity of insulators to block the interaction between enhancers and promoters [27]. Recent studies identified a correlation between the orientation of CTCF-binding sites and chromatin loops (Fig. 3b) [28]. Forward–reverse (FR) orientation of CTCF-binding sites are frequently found in chromatin loops. To examine the effect of forward–reverse orientation of CTCF-binding sites on functional enrichments of target genes, ‘promoter and extended regions for enhancer-promoter association (EPA)’ were shortened at the genomic locations of forward–reverse orientation of CTCF-binding sites, and transcriptional target genes were predicted from the shortened regions using TFBS (see Methods). The numbers of functional enrichments of target genes were investigated. According to EPA (association rule 4) that were shortened at genomic locations of forward–reverse orientation of CTCF-binding sites, the means of target genes were 67, 64, and 77 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively, whereas the corresponding medians were 23, 21, and 20, respectively (Additional file 1: Figure S18). The normalized numbers of functional enrichments of Pathway Commons annotations using EPA were 71.42, 108.08, and 90.99 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively (Table 5 and Additional file 1: Figure S19). With the use of EPA shortened at forward–reverse orientation of CTCF-binding sites, the normalized numbers of functional enrichments of Pathway Commons annotations were 196.58, 220.54, and 220.77, representing 2.75-, 2.04-, and 2.43-fold increases, respectively, in the three cells types. Additionally, the normalized numbers of functional enrichments of ‘unique’ Pathway Commons annotations with EPA were 5.09, 5.34, and 6.00 in monocytes, CD4⁺ T cells, and CD20⁺ B cells, respectively; the corresponding normalized numbers with the use of EPA shortened at forward–reverse orientation of CTCF-binding sites were 9.88, 10.72, and 9.10, representing 1.94-, 2.01-, and 1.52-fold increases, respectively, in the three cell types (Additional file 1: Figure S19). Other cell types also showed the same tendencies (Table 5 and Additional file 1: Figure S19). The normalized numbers of functional enrichments were significantly increased between EPA and EPA shortened at forward–reverse orientation of CTCF-binding sites in Gene Ontology, Disease Ontology, Pathway Commons, GO Slim, WikiPathways, KEGG, InterPro and UniProt functional regions (Domains) annotations. These increases were also significant, compared with EPA shortened at CTCF-binding sites without the consideration of their orientation. Transcriptional target genes predicted from EPA shortened at forward–reverse orientation of CTCF-binding sites tend to include similar function of genes significantly.

Table 5 Normalized number of functional enrichments of putative transcriptional target genes using CTCF binding sites (see the colored table in Additional file 1: Figure S5)

Full size table

Differences in functional enrichments obtained using EPA versus EPA shortened at forward–reverse orientation of CTCF-binding sites were examined using the functional enrichments of Pathway Commons (Table 6 [29,30,31,32,33,34,35,36,37,38,39,40,41,42,43] and Results in Additional file 1). Transcriptional target genes predicted from EPA shortened at the CTCF-binding sites tended to include the similar function of genes. About 40–80% of functional enrichments were unchanged between promoters and EPA shortened at forward–reverse orientation of CTCF-binding sites, and the functional enrichments observed in EPA shortened at forward–reverse orientation of CTCF-binding sites as opposed to promoters included various immunological terms. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types (Table 6). These results showed that new functional enrichments related to cellular functions were identified using forward–reverse orientation of CTCF-binding sites.

Table 6 Differences in functional enrichments between EPA shortened at FR CTCF and EPA without CTCF using Pathway Commons

Full size table

Comparison of expression levels of putative transcriptional target genes

To examine the relationship between functional enrichments and expression levels of target genes, the expression levels of target genes predicted from promoters and three types of ‘promoter and extended regions for enhancer-promoter assignment (EPA)’ were investigated in monocytes, CD4⁺ T, H1-hESC and iPSC (Fig. 4). Median expression levels of the target genes of the same transcription factor binding sequences were compared between promoters and three types of EPA. Red and blue dots in Fig. 4 show statistically significant difference of the distribution of expression levels of target genes between promoters and EPA. Additionally, “red dots” show the median expression level of target genes of a TFBS was ‘higher’ in EPA than promoters, and “blue dots” show the median expression level of target genes of a TFBS was ‘lower’ in EPA than promoters. The ratios of red dots were higher in EPA (association rule 4) that were shortened at forward–reverse orientation of CTCF-binding sites versus promoters (left graph in Fig. 4) than EPA (association rule 4) versus promoters (right graph) in monocytes and CD4⁺ T cells. The ratios of blue dots were higher in EPA (association rule 4) that were shortened at forward–reverse orientation of CTCF-binding sites versus promoters (left graph) than EPA (association rule 4) versus promoters (right graph) in H1-hESC and iPSC. Moreover, the ratio of the sum of median expression levels between the three types of EPA and promoters in monocytes and CD4⁺ T cells was the highest in EPA shortened at forward–reverse orientation of CTCF-binding sites (Additional file 1: Figure S21). Conversely, the ratio of the sum of median expression levels between the three types of EPA and promoters in H1-hESC and iPSC was the lowest in EPA shortened at forward–reverse orientation of CTCF-binding sites.

EPA shortened at forward–reverse orientation of CTCF-binding sites changed (i.e. increased or decreased) the expression levels of target genes more than the other types of EPA. This implied that gene expression tended to be activated in monocytes and CD4⁺ T cells, but repressed in H1-hESC and iPSC by enhancers. EPA shortened at forward–reverse orientation of CTCF-binding sites also showed the highest normalized number of functional enrichments of transcriptional target genes, as shown in the previous paragraphs.

Discussion

Genome-wide functional enrichments and gene expression levels of putative target genes of human transcription factors were investigated. Human putative transcriptional target genes showed significantly larger numbers of functional enrichments than gene expression information alone, independent of transcriptional target genes. Moreover, when the number of functional enrichments of human putative transcriptional target genes was normalized by the total number of transcriptional target genes, native putative transcriptional target genes showed the highest ratio of functional enrichments, compared with target genes partially including randomly selected genes. The ratio of functional enrichments was decreased according to the increase of the ratio of randomly selected genes in target genes. These tendencies were observed in putative transcriptional target genes predicted from both open chromatin regions and ChIP-seq data of transcription factors. Prediction of transcriptional target genes from open chromatin regions includes false positives, since DNase I cleavage bias affects the computational analysis of DNase-seq experiments [44]. However, the detection of ChIP-seq peaks is also changed depending on the methods to identify them and the depth of DNA sequencing of ChIP-seq experiments [45]. Though human putative transcriptional target genes include false positives, they showed significantly the largest number of functional enrichments, compared with target genes including 5–60% of randomly selected genes (Fig. 2).

The median expression level of human putative transcriptional target genes was changed according to the criteria of enhancer-promoter assignments, and was correlated with the normalized number of functional enrichments. The median expression level of transcriptional target genes was ‘decreased’ significantly in transcriptional target genes predicted using enhancers, compared with those predicted using promoters in H1-hESC and iPSC, and the median expression level was ‘increased’ significantly in target genes predicted using enhancers, compared with those predicted using promoters in monocytes and CD4⁺ T cells. These results implied that transcription factors bound in enhancers act as repressors in H1-hESC (ES) and iPSC, but those act as activators in monocytes and CD4⁺ T cells. The change of functional roles of transcription factors depending on the cell types would be analyzed and reported elsewhere.

The median expression level was increased significantly in target genes predicted using enhancers, compared with those predicted from promoters in immune cells using gene expression data (Blueprint RNA-seq RPKM data; GSE58310), but smaller number of target genes showed the increase of median expression level using gene expression data (ENCODE; GSM984609). The results of the analyses may be slightly different depending on gene expression data. H1-hESC (ES) and iPSC showed a strong tendency of decrease of median expression levels of transcriptional target genes between enhancers and promoters.

The gene symbols of transcription factors were sometimes different among databases, because more than one gene symbol are assigned to some transcription factors and some gene symbols are spelled in several different ways. These differences need to be identified with manual curations. This analysis will be required to predict transcriptional cascades by associating transcription factors with transcriptional target genes consisting of transcription factors. In the analyses of transcriptional cascades, to reduce false positive predictions of enhancer-promoter associations from open chromatin regions, the identification of DNase peaks will be modified using a new tool such as HINT [46].

In this study, I focused on three types of immune cells and stem cells such as H1-hESC and iPSC to examine transcriptional target genes in a genome scale, since in my previous study, I examined transcriptional cascades involved in the differentiation of immune cells as introduced in Background [5]. Furthermore, I confirmed the features of functional enrichments of putative transcriptional target genes are commonly found in other four types of normal and disease cells (HUVEC, IMR90, MCF-7 and HMEF).

It is difficult to predict enhancer-promoter associations using a single parameter, so that machine learning methods to combine several parameters have been proposed [47,48,49]. These methods showed high accuracy in predicting enhancer-promoter associations (I tried to use some of the tools, but they did not work properly. I am waiting for the authors to update the tools). However, molecular mechanisms of enhancer-promoter interactions are not clearly understood. CTCF has been found to bind at chromatin interaction anchors and form chromatin interactions [27]. About 20–40% of chromatin interaction anchors included DNA binding sequences of CTCF, when I examined public Hi-C experimental data [50, 51]. Among 33,939 RefSeq transcripts, 7202 (21%), 4404 (13%), and 6921 (20%) (p-value <10⁻⁵ in the search for CTCF-binding motifs using FIMO) to 9608 (28%), 5806 (17%), and 9137 (27%) (p-value <10⁻⁴) of transcripts had forward–reverse orientation of CTCF-binding sites within 1 Mb from transcriptional start sites in the three immune cell types, respectively. These analyses implied that other factors might be involved in chromatin interactions. ZNF143 has been reported to locate at promoter regions of chromatin interaction anchors [52]. To predict the other factors and molecular mechanisms, the analyses in this study would be useful to examine further the criteria in predicting enhancer-promoter associations. Machine learning methods need the information what parameters should be used for prediction, so it would be better to choose parameters involved in predicting enhancer-promoter associations. To improve the prediction and understand the molecular mechanisms of enhancer-promoter interactions, I am promoting the analyses of chromatin interaction anchors, and the results of the analyses will be reported elsewhere.

Conclusion

In this study, human transcriptional target genes were predicted using open chromatin regions, ChIP-seq data, and DNA binding sequences of transcription factors in databases. Human putative transcriptional target genes showed significant functional enrichments. Journal papers showed that frequent functional enrichments were related to the cellular functions. The normalized number of functional enrichments was the highest in native putative transcriptional target genes, compared with target genes partially replaced with randomly selected genes. The normalized number of functional enrichments of human putative transcriptional target genes changed according to the criteria of enhancer-promoter assignments and correlated with the median expression level of the target genes. The normalized numbers of functional enrichments of transcriptional target genes did not show the highest number in the criterion of enhancer-promoter assignments covering the longest distance from transcriptional start site among four criteria. This suggested that there is a criterion of enhancer-promoter assignments that shows the highest normalized number of functional enrichments. The median expression level of transcriptional target genes was ‘decreased’ significantly in transcriptional target genes predicted using enhancers, compared with those predicted using promoters in H1-hESC and iPSC, and the median expression level was ‘increased’ significantly in target genes predicted using enhancers, compared with those predicted using promoters in immune cells. These results implied that transcription factors bound in enhancers act as repressors in H1-hESC (ES) and iPSC, but those act as activators in immune cells. These analyses and characters of human putative transcriptional target genes would be useful to examine the criteria of enhancer-promoter assignments and to predict the novel mechanisms and factors such as DNA binding proteins and DNA sequences of enhancer-promoter interactions.

Methods

Searches for transcription factor binding sequences from open chromatin regions

To examine transcriptional regulatory target genes, bed files of hg19 narrow peaks of ENCODE DNase-DGF and DNase data for Monocytes-CD14⁺_RO01746 (GSM1024791; UCSC Accession: wgEncodeEH001196), CD4⁺_Naive_Wb11970640 (GSM1014537; UCSC Accession: wgEncodeEH003156), CD20⁺_RO01778 (GSM1014525; UCSC Accession: wgEncodeEH002442), H1-hESC (GSM816632; UCSC Accession: wgEncodeEH000556), iPSC (GSM816642; UCSC Accession: wgEncodeEH001110), HUVEC (GSM1014528; UCSC Accession: wgEncodeEH002460), IMR90 (GSM1008586; UCSC Accession: wgEncodeEH003482), MCF-7 (GSM816627; UCSC Accession: wgEncodeEH000579), and HMEC (GSM816669; UCSC Accession: wgEncodeEH001101) from the ENCODE website (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/; http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnase/; http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromDnase/) were used. For comparison with transcriptional target genes predicted using ChIP-seq data, bed files of hg19 narrow peaks of ENCODE ChIP-seq data for 19 transcription factors (TF) (BACH1, BRCA1, C/EBPbeta, CHD2, c-JUN, c-MYC, GTF2I, JUND, MAFK, MAX, MXI1, NRF1, RAD21, RFX5, SIN3A, SUZ12, TBP, USF2, ZNF143) in H1-hESC from the ENCODE website (https://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeAwgTfbsUniform) were utilized.

To identify transcription factor binding sites (TFBS) from the DNase-DGF data, TRANSFAC (2013.2), JASPAR (2010), UniPROBE, BEEML-PBM, high-throughput SELEX, Human Protein-DNA Interactome, and transcription factor binding sequences of ENCODE ChIP-seq data were used [53,54,55,56,57,58,59]. Position weight matrices of transcription factor binding sequences were transformed into TRANSFAC matrices and then into MEME matrices using in-house Perl scripts and transfac2meme in MEME suite [60]. Transcription factor binding sequences of transcription factors derived from vertebrates were used for further analyses. Searches were conducted for transcription factor binding sequences from the central 50-bp regions of each narrow peak using FIMO with p-value threshold of 10⁻⁵ [61]. Transcription factors corresponding to transcription factor binding sequences were searched computationally by comparing their names and gene symbols of HGNC (HUGO Gene Nomenclature Committee) -approved gene nomenclature and 31,848 UCSC known canonical transcripts (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/knownCanonical.txt.gz), as transcription factor binding sequences were not linked to transcript IDs such as UCSC, RefSeq, and Ensembl transcripts.

Prediction of transcriptional target genes

Target genes of a transcription factor were assigned when its TFBS was found in DNase-DGF narrow peaks in promoter or extended regions for enhancer-promoter association of genes (EPA). Promoter and extended regions were defined as follows: promoter regions were those that were within distances of ±5 kb from transcriptional start sites (TSS). Promoter and extended regions were defined as per the following four association rules, which are similar or same as those defined in a previous study [11]: (1) the basal plus extension association rule assigns a basal regulatory domain to each gene regardless of other nearby genes. The domain is then extended to the basal regulatory domain of the nearest upstream and downstream genes, and includes a 5 kb + 5 kb basal region and an extension up to 300 kb or the midpoint between the TSS of the gene and that of the nearest gene upstream and downstream; (2) 5 kb + 1 kb basal region and an extension up to 1 Mb; (3) the two nearest genes association rule, which extends the regulatory domain to the TSS of the nearest upstream and downstream genes without the limitation of extension length; and (4) the single nearest gene association rule, which extends the regulatory domain to the midpoint between the TSS of the gene and that of the nearest gene upstream and downstream without the limitation of extension length. Association rule (1) was used in our previous study [5]. Association rule (2), (3), and (4) were the same as those in Fig. 3a of the previous study [11], however, association rules (3) and (4) did not have the limitation of extension length in this study. The genomic positions of genes were identified using ‘knownGene.txt.gz’ file in UCSC bioinformatics sites [62]. The file ‘knownCanonical.txt.gz’ was also utilized for choosing representative transcripts among various alternate forms for assigning promoter and extended regions for enhancer-promoter association of the genes. From the list of transcription factor binding sequences and transcriptional target genes, redundant transcription factor binding sequences were removed by comparing the target genes of a transcription factor binding sequence and its corresponding transcription factor; if identical, one of the transcription factor binding sequences was used. When the number of transcriptional target genes predicted from a transcription factor binding sequence was less than five, the transcription factor binding sequence was omitted.

Gene expression analyses

For gene expression data, RNA-seq reads mapped onto human hg19 genome sequences were obtained, including ENCODE long RNA-seq reads with poly-A of monocytes CD14⁺ cells, CD20⁺ B cells, H1-hESC, iPSC, HUVEC, IMR90, MCF-7, and HMEC (GSM984609, GSM981256, GSE26284, GSM958733, GSM2344099, GSM2344100, GSM958734, GSM2400222, GSM765388, and GSM758571), and UCSF-UBC human reference epigenome mapping project RNA-seq reads with poly-A of naive CD4⁺ T cells (GSM669617). Two replicates were present for monocytes CD14⁺ cells, CD20⁺ B cells, H1-hESC, iPSC, HUVEC, IMR90, MCF-7, and HMEC and a single one for CD4⁺ T cells. RPKMs of the RNA-seq data were calculated using RSeQC [63]. For monocytes, Blueprint RNA-seq RPKM data (GSE58310, GSE58310_GeneExpression.csv.gz, Monocytes_Day0_RPMI) was also used [64]. Based on RPKM, UCSC transcripts with expression levels among top 30% of all the transcripts were selected in each cell type.

Functional enrichment analyses

The functional enrichments of target genes of a TFBS and its corresponding transcription factor were examined using GO-Elite v1.2.5 with p-value threshold at 1, and after GO-Elite analyses a false discovery rate (FDR) test was performed with q-value threshold at 10⁻³ to correct for multiple comparisons of thousands of groups of transcriptional target genes in each cell type and condition [65]. For examining functional enrichments of high or low expressed genes independent of transcriptional target genes, the p-value threshold was set to 0.01 or 0.05 to confirm that the results were not significantly changed. UCSC gene IDs were transformed into RefSeq IDs prior to GO-Elite analyses. GO-Elite uses 10 databases for identifying functional enrichments: (1) Gene Ontology, (2) Disease Ontology, (3) Pathway Commons, (4) GO Slim, (5) WikiPathways, (6) KEGG, (7) Transcription factor to target genes, (8) microRNA to target genes, (9) InterPro and UniProt functional regions (Domains), and (10) Cellular biomarkers (BioMarkers). To calculate the normalized numbers of functional enrichments of target genes, the numbers of functional enrichments were divided by the total number of target genes in each cell type and condition, and were multiplied by 10⁵. In tables showing the numbers of functional enrichments in 10 databases, heat maps were plotted according to Z-scores calculated from the numbers of functional enrichments of each database using in-house Excel VBA scripts. In the comparisons of the normalized numbers of functional enrichments of target genes in cell types and conditions, if the number of a functional annotation in a cell type or condition was two times larger than that in the other cell type or condition, the functional annotation was recognized as more enriched than the other cell type or condition.

To investigate whether the normalized numbers of functional enrichments of transcriptional target genes correlate with the prediction of target genes, a part of target genes were changed with randomly selected genes with high expression level (top 30% expression level), and functional enrichments of the target genes were examined. First, 5%, 10%, 20%, 40%, and 60% of target genes were changed with randomly selected genes with high expression level in monocytes, CD4⁺ T cells, and CD20⁺ B cells. Second, as another randomization of target genes, the same number of 5%, 10%, 20%, 40%, and 60% of target genes were selected randomly from highly expressed genes, then added them to the original target genes, and functional enrichments of the target genes were examined. All analyses were repeated three times to estimate standard errors (Fig. 2a and b, Additional file 1: Figure S1, S2, and S6). The same analysis was performed using DNase-DGF data and ChIP-seq data of 19 TF in H1-hESC. Transcriptional target genes were predicted from promoter (Additional file 1: Figure S7).

CTCF-binding sites

CTCF ChIP-seq data for monocytes CD14⁺ cells (GSM1003508_hg19_wgEncodeBroadHistoneMonocd14ro1746CtcfPk.broadPeak.gz), CD4⁺ T cells (SRR001460.bam), CD20⁺ B cells (GSM1003474_hg19_wgEncodeBroadHistoneCd20CtcfPk.broadPeak.gz), H1-hESC (wgEncodeAwgTfbsUtaH1hescCtcfUniPk.narrowPeak.gz), iPSC (GSE96477), HUVEC (wgEncodeAwgTfbsUwHuvecCtcfUniPk.narrowPeak.gz), IMR90 (wgEncodeAwgTfbsSydhImr90CtcfbIggrabUniPktfbsf.narrowPeak.gz), MCF-7 (wgEncodeAwgTfbsUwMcf7CtcfUniPktfbsf.narrowPeak.gz), and HMEC (wgEncodeAwgTfbsUwHmecCtcfUniPktfbsf.narrowPeak.gz) were used. SRR001460.bam was sorted and indexed by SAMtools and transformed into a bed file using bamToBed of BEDTools [66, 67]. ChIP-seq peaks were predicted by SICER-rb.sh of SICER with optional parameters ‘hg19 1 200 150 0.74 200 100’ [68]. Extended regions for enhancer-promoter association (association rule 4) were shortened at the genomic locations of CTCF-binding sites that were the closest to a transcriptional start site, and transcriptional target genes were predicted from the shortened enhancer regions using TFBS. Furthermore, promoter and extended regions for enhancer-promoter association (association rule 4) were shortened at the genomic locations of forward–reverse orientation of CTCF-binding sites. When forward or reverse orientation of CTCF-binding sites were continuously located in genome sequences several times, the most external forward–reverse orientation of CTCF-binding sites were selected.

Abbreviations

ChIP-seq:: ChIP-sequencing, chromatin immunoprecipitation followed by massively parallel DNA sequencing
ENCODE:: Encyclopedia of DNA elements
EPA:: Promoter and extended regions for enhancer-promoter association
FF:: Forward-forward
FR:: Forward-reverse
RF:: Reverse-forward
RNA-seq:: RNA-sequencing
RR:: Reverse-reverse
TF:: Transcription factors
TFBS:: Transcription factor binding sites
TSS:: Transcriptional start sites

References

Valledor AF, Borras FE, Cullell-Young M, Celada A. Transcription factors that regulate monocyte/macrophage differentiation. J Leukoc Biol. 1998;63(4):405–17.
Article CAS PubMed Google Scholar
Nagamura-Inoue T, Tamura T, Ozato K. Transcription factors that regulate growth and differentiation of myeloid cells. Int Rev Immunol. 2001;20(1):83–105.
Article CAS PubMed Google Scholar
Ho IC, Tai TS, Pai SY. GATA3 and the T-cell lineage: essential functions before and after T-helper-2-cell differentiation. Nat Rev Immunol. 2009;9(2):125–35.
Article CAS PubMed PubMed Central Google Scholar
Rothenberg EV. B cell specification from the genome up. Nat Immunol. 2010;11(7):572–4.
Article CAS PubMed Google Scholar
Kurotaki D, Osato N, Nishiyama A, Yamamoto M, Ban T, Sato H, Nakabayashi J, Umehara M, Miyake N, Matsumoto N, et al. Essential role of the IRF8-KLF4 transcription factor cascade in murine monocyte differentiation. Blood. 2013;121(10):1839–49.
Article CAS PubMed PubMed Central Google Scholar
Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JL, Toufighi K, Mostafavi S, et al. The genetic landscape of a cell. Science (New York, NY). 2010;327(5964):425–31.
Article CAS Google Scholar
Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic acids research. 2010;38(Web Server issue):W214–20.
Article CAS PubMed PubMed Central Google Scholar
Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, Wang W, Usaj M, Hanchard J, Lee SD, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science (New York N Y). 2016;353:6306.
Article Google Scholar
Marbach D, Costello JC, Kuffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804.
Article CAS PubMed PubMed Central Google Scholar
Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, et al. Diversity and complexity in DNA recognition by transcription factors. Science (New York, NY). 2009;324(5935):1720–3.
Article CAS Google Scholar
McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28(5):495–501.
Article CAS PubMed PubMed Central Google Scholar
Kraaij MD, Vereyken EJ, Leenen PJ, van den Bosch TP, Rezaee F, Betjes MG, Baan CC, Rowshani AT. Human monocytes produce interferon-gamma upon stimulation with LPS. Cytokine. 2014;67(1):7–12.
Article CAS PubMed Google Scholar
Ebner S, Hofer S, Nguyen VA, Furhapter C, Herold M, Fritsch P, Heufler C, Romani N. A novel role for IL-3: human monocytes cultured in the presence of IL-3 and IL-4 differentiate into dendritic cells that produce less IL-12 and shift Th cell responses toward a Th2 cytokine pattern. J Immunol. 2002;168(12):6199–207.
Article CAS PubMed Google Scholar
Francisco-Cruz A, Aguilar-Santelises M, Ramos-Espinosa O, Mata-Espinosa D, Marquina-Castillo B, Barrios-Payan J, Hernandez-Pando R. Granulocyte-macrophage colony-stimulating factor: not just another haematopoietic growth factor. Med Oncol. 2014;31(1):774.
Article PubMed Google Scholar
Shaw RJ, Doherty DE, Ritter AG, Benedict SH, Clark RA. Adherence-dependent increase in human monocyte PDGF(B) mRNA is associated with increases in c-fos, c-jun, and EGR2 mRNA. J Cell Biol. 1990;111(5 Pt 1):2139–48.
Article CAS PubMed Google Scholar
Heil M, Clauss M, Suzuki K, Buschmann IR, Willuweit A, Fischer S, Schaper W. Vascular endothelial growth factor (VEGF) stimulates monocyte migration through endothelial monolayers via increased integrin expression. Eur J Cell Biol. 2000;79(11):850–7.
Article CAS PubMed Google Scholar
Hogg N, Laschinger M, Giles K, McDowall A. T-cell integrins: more than just sticking points. J Cell Sci. 2003;116(Pt 23):4695–705.
Article CAS PubMed Google Scholar
Ngai P, McCormick S, Small C, Zhang X, Zganiacz A, Aoki N, Xing Z. Gamma interferon responses of CD4 and CD8 T-cell subsets are quantitatively different and independent of each other during pulmonary Mycobacterium Bovis BCG infection. Infect Immun. 2007;75(5):2244–52.
Article CAS PubMed PubMed Central Google Scholar
MacIver NJ, Blagih J, Saucillo DC, Tonelli L, Griss T, Rathmell JC, Jones RG. The liver kinase B1 is a central regulator of T cell development, activation, and metabolism. J Immunol. 2011;187(8):4187–98.
Article CAS PubMed PubMed Central Google Scholar
Gallegos AM, Xiong H, Leiner IM, Susac B, Glickman MS, Pamer EG, van Heijst JW. Control of T cell antigen reactivity via programmed TCR downregulation. Nat Immunol. 2016;17(4):379–86.
Article CAS PubMed PubMed Central Google Scholar
Kronin V, Hochrein H, Shortman K, Kelso A. Regulation of T cell cytokine production by dendritic cells. Immunol Cell Biol. 2000;78(3):214–23.
Article CAS PubMed Google Scholar
TT L, Cyster JG. Integrin-mediated long-term B cell retention in the splenic marginal zone. Science (New York, NY). 2002;297(5580):409–12.
Article Google Scholar
Baumann MA, Paul CC. Interleukin-5 and human B lymphocytes. Methods. 1997;11(1):88–97.
Article CAS PubMed Google Scholar
Winer DA, Winer S, Shen L, Wadia PP, Yantha J, Paltser G, Tsui H, Wu P, Davidson MG, Alonso MN, et al. B cells promote insulin resistance through modulation of T cells and production of pathogenic IgG antibodies. Nat Med. 2011;17(5):610–7.
Article CAS PubMed PubMed Central Google Scholar
Limon JJ, Fruman DA. Akt and mTOR in B cell activation and differentiation. Front Immunol. 2012;3:228.
Article PubMed PubMed Central Google Scholar
Sorrentino R, Bertolino A, Terlizzi M, Iacono VM, Maiolino P, Cirino G, Roviezzo F, Pinto A. B cell depletion increases sphingosine-1-phosphate-dependent airway inflammation in mice. Am J Respir Cell Mol Biol. 2015;52(5):571–83.
Article CAS PubMed Google Scholar
Wendt KS, Yoshida K, Itoh T, Bando M, Koch B, Schirghuber E, Tsutsumi S, Nagae G, Ishihara K, Mishiro T, et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature. 2008;451(7180):796–801.
Article CAS PubMed Google Scholar
Guo Y, Xu Q, Canzio D, Shou J, Li J, Gorkin DU, Jung I, Wu H, Zhai Y, Tang Y, et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell. 2015;162(4):900–10.
Article CAS PubMed PubMed Central Google Scholar
Caulfield J, Fernandez M, Snetkov V, Lee T, Hawrylowicz C. CXCR4 expression on monocytes is up-regulated by dexamethasone and is modulated by autologous CD3+ T cells. Immunology. 2002;105(2):155–62.
Article CAS PubMed PubMed Central Google Scholar
Kral JB, Schrottmaier WC, Salzmann M, Assinger A. Platelet interaction with innate immune cells. Transfus Med Hemother. 2016;43(2):78–88.
Article PubMed PubMed Central Google Scholar
Heinonen KM, Bourdeau A, Doody KM, Tremblay ML. Protein tyrosine phosphatases PTP-1B and TC-PTP play nonredundant roles in macrophage development and IFN-gamma signaling. Proc Natl Acad Sci U S A. 2009;106(23):9368–72.
Article CAS PubMed PubMed Central Google Scholar
Liu HS, Pan CE, Liu QG, Yang W, Liu XM. Effect of NF-kappaB and p38 MAPK in activated monocytes/macrophages on pro-inflammatory cytokines of rats with acute pancreatitis. World J Gastroenterol. 2003;9(11):2513–8.
Article CAS PubMed PubMed Central Google Scholar
Bonder CS, Finlay-Jones JJ, Hart PH. Interleukin-4 regulation of human monocyte and macrophage interleukin-10 and interleukin-12 production. Role of a functional interleukin-2 receptor gamma-chain. Immunology. 1999;96(4):529–36.
Article CAS PubMed PubMed Central Google Scholar
Murakami T, Nakajima T, Koyanagi Y, Tachibana K, Fujii N, Tamamura H, Yoshida N, Waki M, Matsumoto A, Yoshie O, et al. A small molecule CXCR4 inhibitor that blocks T cell line-tropic HIV-1 infection. J Exp Med. 1997;186(8):1389–93.
Article CAS PubMed PubMed Central Google Scholar
Koyasu S, D'Adamio L, Arulanandam AR, Abraham S, Clayton LK, Reinherz EL. T cell receptor complexes containing Fc epsilon RI gamma homodimers in lieu of CD3 zeta and CD3 eta components: a novel isoform expressed on large granular lymphocytes. J Exp Med. 1992;175(1):203–9.
Article CAS PubMed Google Scholar
Dong C, Yang DD, Wysk M, Whitmarsh AJ, Davis RJ, Flavell RA, Defective T. Cell differentiation in the absence of Jnk1. Science (New York, NY). 1998;282(5396):2092–5.
Article CAS Google Scholar
Huang Y, Clarke F, Karimi M, Roy NH, Williamson EK, Okumura M, Mochizuki K, Chen EJ, Park TJ, Debes GF, et al. CRK proteins selectively regulate T cell migration into inflamed tissues. J Clin Invest. 2015;125(3):1019–32.
Article PubMed PubMed Central Google Scholar
Kumanogoh A, Marukawa S, Suzuki K, Takegahara N, Watanabe C, Ch'ng E, Ishida I, Fujimura H, Sakoda S, Yoshida K, et al. Class IV semaphorin Sema4A enhances T-cell activation and interacts with Tim-2. Nature. 2002;419(6907):629–33.
Article CAS PubMed Google Scholar
Endo T, Ito K, Morimoto J, Kanayama M, Ota D, Ikesue M, Kon S, Takahashi D, Onodera T, Iwasaki N, et al. Syndecan 4 regulation of the development of autoimmune arthritis in mice by modulating B cell migration and germinal center formation. Arthritis Rheumatol. 2015;67(9):2512–22.
Article CAS PubMed Google Scholar
Beider K, Ribakovsky E, Abraham M, Wald H, Weiss L, Rosenberg E, Galun E, Avigdor A, Eizenberg O, Peled A, et al. Targeting the CD20 and CXCR4 pathways in non-hodgkin lymphoma with rituximab and high-affinity CXCR4 antagonist BKT140. Clin Cancer Res. 2013;19(13):3495–507.
Article CAS PubMed Google Scholar
Kimata H, Yoshida A, Ishioka C, Masuda S, Sasaki R, Mikawa H. Human recombinant erythropoietin directly stimulates B cell immunoglobulin production and proliferation in serum-free medium. Clin Exp Immunol. 1991;85(1):151–6.
Article CAS PubMed PubMed Central Google Scholar
Koizumi M, Hiasa Y, Kumagi T, Yamanishi H, Azemoto N, Kobata T, Matsuura B, Abe M, Onji M. Increased B cell-activating factor promotes tumor invasion and metastasis in human pancreatic cancer. PLoS One. 2013;8(8):e71367.
Article CAS PubMed PubMed Central Google Scholar
Rosser EC, Oleinika K, Tonon S, Doyle R, Bosma A, Carter NA, Harris KA, Jones SA, Klein N, Mauri C, Regulatory B. Cells are induced by gut microbiota-driven interleukin-1beta and interleukin-6 production. Nat Med. 2014;20(11):1334–9.
Article CAS PubMed Google Scholar
Gusmao EG, Allhoff M, Zenke M, Costa IG. Analysis of computational footprinting methods for DNase sequencing experiments. Nat Methods. 2016;13(4):303–9.
Article PubMed Google Scholar
Thomas R, Thomas S, Holloway AK, Pollard KS. Features that define the best ChIP-seq peak calling algorithms. Brief Bioinform. 2017;18(3):441-50. doi:10.1093/bib/bbw035.
PubMed Google Scholar
Gusmao EG, Dieterich C, Zenke M, Costa IG. Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications. Bioinformatics (Oxford, England). 2014;30(22):3143–51.
Article CAS Google Scholar
He B, Chen C, Teng L, Tan K. Global view of enhancer-promoter interactome in human cells. Proc Natl Acad Sci U S A. 2014;111(21):E2191–9.
Article CAS PubMed PubMed Central Google Scholar
Wang D, Rendon A, Wernisch L. Transcription factor and chromatin features predict genes associated with eQTLs. Nucleic Acids Res. 2013;41(3):1450–63.
Article CAS PubMed Google Scholar
Whalen S, Truty RM, Pollard KS. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet. 2016;48(5):488–96.
Article CAS PubMed PubMed Central Google Scholar
Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, Wingett SW, Andrews S, Grey W, Ewels PA, et al. Mapping long-range promoter contacts in human cells with high-resolution capture hi-C. Nat Genet. 2015;47(6):598–606.
Article CAS PubMed Google Scholar
Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Varnai C, Thiecke MJ, et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167(5):1369–84. e1319
Article CAS PubMed PubMed Central Google Scholar
Bailey SD, Zhang X, Desai K, Aid M, Corradin O, Cowper-Sal Lari R, Akhtar-Zaidi B, Scacheri PC, Haibe-Kains B, Lupien M. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat Commun. 2015;2:6186.
Article PubMed PubMed Central Google Scholar
Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996;24(1):238–41.
Article CAS PubMed PubMed Central Google Scholar
Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38(Database issue):D105–10.
Article CAS PubMed Google Scholar
Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2009;37(Database issue):D77–82.
Article CAS PubMed Google Scholar
Zhao Y, Stormo GD. Quantitative analysis demonstrates most transcription factors require only simple models of specificity. Nat Biotechnol. 2011;29(6):480–3.
Article CAS PubMed PubMed Central Google Scholar
Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, et al. DNA-binding specificities of human transcription factors. Cell. 2013;152(1–2):327–39.
Article CAS PubMed Google Scholar
Xie Z, Hu S, Blackshaw S, Zhu H, Qian J. hPDI: a database of experimental human protein-DNA interactions. Bioinformatics (Oxford, England). 2010;26(2):287–9.
Article CAS Google Scholar
Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 2014;42(5):2976–87.
Article CAS PubMed Google Scholar
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic acids research. 2009;37(Web Server issue):W202–8.
Article CAS PubMed PubMed Central Google Scholar
Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics (Oxford, England). 2011;27(7):1017–8.
Article CAS Google Scholar
Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, et al. The UCSC genome browser database: 2014 update. Nucleic Acids Res. 2014;42(Database issue):D764–70.
Article CAS PubMed Google Scholar
Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics (Oxford, England). 2012;28(16):2184–5.
Article CAS Google Scholar
Saeed S, Quintin J, Kerstens HH, Rao NA, Aghajanirefah A, Matarese F, Cheng SC, Ratter J, Berentsen K, van der Ent MA, et al. Epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity. Science (New York, NY). 2014;345(6204):1251086.
Article Google Scholar
Zambon AC, Gaj S, Ho I, Hanspers K, Vranizan K, Evelo CT, Conklin BR, Pico AR, Salomonis N. GO-elite: a flexible solution for pathway and ontology over-representation. Bioinformatics (Oxford, England). 2012;28(16):2209–10.
Article CAS Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The sequence alignment/map format and SAMtools. Bioinformatics (Oxford, England). 2009;25(16):2078–9.
Article Google Scholar
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England). 2010;26(6):841–2.
Article CAS Google Scholar
Zang C, Schones DE, Zeng C, Cui K, Zhao K, Peng W. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics (Oxford, England). 2009;25(15):1952–8.
Article CAS Google Scholar
de Wit E, Vos ES, Holwerda SJ, Valdes-Quezada C, Verstegen MJ, Teunissen H, Splinter E, Wijchers PJ, Krijger PH, de Laat W, Binding Polarity CTCF. Determines chromatin looping. Mol Cell. 2015;60(4):676–84.
Article CAS PubMed Google Scholar
Cooper GM, Hausman RE. Figure 7.21. THE CELL: A Molecular Approach Sixth Edition. Sunderland: Sinauer Associates, Inc.; 2013.
Image 68 insulators. http://slideplayer.com/slide/3836520.

Download references

Acknowledgements

This research was partially supported by Development of Fundamental Technologies for Diagnosis and Therapy Based upon Epigenome Analysis from Japan Agency for Medical Research and Development (AMED). This work was partially supported by JST CREST Grant Number JPMJCR15G1, Japan. The supercomputing resource was provided by Human Genome Center of the Institute of Medical Science at the University of Tokyo. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics.

Funding

Publication charges for this article were funded by JSPS KAKENHI Grant Number 16 K00387. This research was partially supported by the Platform Project for Supporting in Drug Discovery and Life Science Research(Platform for Dynamic Approaches to Living System)from Japan Agency for Medical Research and Development (AMED).

Availability of data and materials

This study has used data that is freely available from public databases as well as data from the TRANSFAC database, licensed to my laboratory.

About this supplement

This article has been published as part of BMC Genomics Volume 19 Supplement 1, 2017: 16th International Conference on Bioinformatics (InCoB 2017): Genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-19-supplement-1.

Author information

Authors and Affiliations

Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, 565-0871, Japan
Naoki Osato

Authors

Naoki Osato
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

NO designed and performed the research.

Corresponding author

Correspondence to Naoki Osato.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Images in Figure 3 adapted from [28, 69,70,71] with permission.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Supplementary Data. Supplementary results, figures and tables. (PDF 3383 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Osato, N. Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes. BMC Genomics 19 (Suppl 1), 957 (2018). https://doi.org/10.1186/s12864-017-4339-5

Download citation

Published: 19 January 2018
DOI: https://doi.org/10.1186/s12864-017-4339-5

Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes

Abstract

Background

Results

Conclusions

Background

Results

Prediction of transcriptional target genes

Functional enrichments of putative transcriptional target genes

Effect of enhancer-promoter association rules on functional enrichments

Effect of CTCF-binding sites on functional enrichments

Comparison of expression levels of putative transcriptional target genes

Discussion

Conclusion

Methods

Searches for transcription factor binding sequences from open chromatin regions

Prediction of transcriptional target genes

Gene expression analyses

Functional enrichment analyses

CTCF-binding sites

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

About this supplement

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional file

Additional file 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us