Methodology article | Open | Published:
Inferring condition-specific targets of human TF-TF complexes using ChIP-seq data
BMC Genomicsvolume 18, Article number: 61 (2017)
Transcription factors (TFs) often interact with one another to form TF complexes that bind DNA and regulate gene expression. Many databases are created to describe known TF complexes identified by either mammalian two-hybrid experiments or data mining. Lately, a wealth of ChIP-seq data on human TFs under different experiment conditions are available, making it possible to investigate condition-specific (cell type and/or physiologic state) TF complexes and their target genes.
Here, we developed a systematic pipeline to infer Condition-Specific Targets of human TF-TF complexes (called the CST pipeline) by integrating ChIP-seq data and TF motifs. In total, we predicted 2,392 TF complexes and 13,504 high-confidence or 127,994 low-confidence regulatory interactions amongst TF complexes and their target genes. We validated our predictions by (i) comparing predicted TF complexes to external TF complex databases, (ii) validating selected target genes of TF complexes using ChIP-qPCR and RT-PCR experiments, and (iii) analysing target genes of select TF complexes using gene ontology enrichment to demonstrate the accuracy of our work. Finally, the predicted results above were integrated and employed to construct a CST database.
We built up a methodology to construct the CST database, which contributes to the analysis of transcriptional regulation and the identification of novel TF-TF complex formation in a certain condition. This database also allows users to visualize condition-specific TF regulatory networks through a user-friendly web interface.
Transcription factors (TFs) interact with one another and with their co-factors to form TF complexes, with constituents that vary in different cell types or under different cellular conditions. These TF complexes regulate different sets of target genes to determine cellular state . Given the high variability of TF complex composition, it is critical to examine TF complexes and their target genes in a condition-specific manner to accurately reveal their regulatory activities.
TF-TF interactions can be experimentally identified using electrophoretic mobility shift assays (EMSAs), X-ray crystallography, immunoprecipitation, yeast two-hybrid systems, mammalian two-hybrid systems and luciferase assays. Because of technical limitations, most human TF-TF interactions represent potentials of physical binding rather than physiological interactions under specific conditions. For example, Ravasi et al. developed a database of physical TF-TF interactions using a mammalian two-hybrid system in hamster cells , in which approximately 1,600 TF-TF interactions were identified among human and mouse TFs. However, these data merely indicated the potential interactions amid the pertinent TF pairs in the experimental model. These data did not reflect the condition-specific target genes of TF-TF complexes, which are essential for understanding their regulatory mechanisms.
Chromatin immunoprecipitation followed by DNA microarray or high-throughput sequencing (ChIP-chip/ChIP-seq) techniques are powerful for identifying TF binding sites. These approaches discover binding “peaks”, i.e. regions of chromatin and the corresponding sequences enriched for TFs. Consequently, condition-specific TF peaks can be identified by altering cellular conditions, which further reveal motifs recognized by DNA-binding TFs or their co-regulatory counterparts. For example, the CENTDIST web server identifies co-regulatory TFs in complexes by investigating TF motifs enriched in the ChIP-seq peaks for a TF . In addition, the spacing of TF-pair binding motifs is often inflexible , allowing the SpaMo algorithm to identify TF-TF pairs by interrogating motif spacings . However, in light of many binding peaks having been shown to be non-functional [6, 7], such methods may not be informative for identifying functional binding sites.
Thanks to a comprehensive TF motif database, CST is the first pipeline that uses data from a single ChIP-seq experiment to predict both TF partners and their target genes. Chen et al. predicted TF complexes and their target genes using yeast TF ChIP-chip data , but their method required paired ChIP-chip data: one assay to determine the binding sites of a primary TF and the other the binding sites of a partner TF. Therefore, we believe CST will lower the cost of using ChIP-seq for these purposes and be valuable to the community.
CST uses ChIP-seq data after immunoprecipitation of the primary TF along with a database containing known TF binding sequence motifs to identify partner TFs. Finally, we integrated the predicted results and constructed a database called DBCST. DBCST allows users to upload their own ChIP-seq data and analyse them for TF complexes and their regulatory targets. DBCST is freely available at http://syslab3.nchu.edu.tw/DBCST.
Prediction of TF complexes and their target genes
Using high-confidence criteria (see Methods and Fig. 1); our pipeline identified 13,504 relationships between 2,392 predicted TF complexes and 3,272 predicted target genes. By contrast, when using low-confidence criteria (see Methods and Fig. 1), we identified 127,994 relationships. In addition, the correlation between gene expression and TF binding was highly significant (P = 2.2× 10−16, see Additional file 1 Supplementary Methods) and the likelihood of a TF complex near transcriptionally active genes showed that the TF complexes are most likely located -1kbp to 0.5kbp around TSS (Fig. S1). The numbers of ChIP-seq datasets for each cell line used in our database are provided in Additional file 1: Table S1. The high-confidence and low-confidence target genes of the predicted USF2-NFYA complex using the ChIP-seq data for USF2 in K562 cells are partially listed in Fig. S2. Brief instructions for users and a detailed tutorial of DBCST can be found in the Additional file 1 Supplementary Information and on the web page, respectively.
Validation of CST-predicted TF complexes by comparison to other databases
To evaluate CST, we examined the presence of predicted condition-specific TF complexes in two external databases. The first database was to demonstrate the performance of CST (Fig. 2a and b), whereas the second was to investigate the accuracy in a condition-specific circumstance (Fig. 2c and d).
For the first validation, we compared the degree of enrichment for the CST-predicted TF complexes present in an empirically determined TF complex database (see Methods) against that of TF complexes created randomly among potential TF pairs in the CST pipeline (i.e. a background). We also included TF complexes predicted by SpaMo  for a fair comparison (Fig. 2a and b). After ordering the TF complexes by p-values in an ascending manner and calculating enrichment ratios, we discovered that TF complexes identified by CST were highly enriched compared with those by SpaMo. The peak enrichment for CST was approximately 32 (at the 40% confidence decile), whereas that for SpaMo was approximately 18 (at the 60% decile). These results indicated that Target Identification from Profiles (TIP) method  together with SpaMo, equivalent to CST, significantly improved the prediction of TF complexes over the use of SpaMo alone. Similar results are suggested in Fig. 2b, in which the top N of TF complexes are selected.
For the second validation, we compared the CST-predicted TF complexes to the TF-specific transcriptional regulatory modules (TRMs) in haematopoietic stem cells (HSCs) proposed by Diez et al. (see Methods) . Briefly, Diez et al. used ChIP-seq to identify condition-specific binding sites. After scanning enriched motifs in these binding sites and integrating protein-protein interaction data, the authors discovered condition-specific TRMs for each immunoprecipitated TF. Although there were 9 TRMs in HSCs (ERG, FLI1, GATA2, GFI1B, LMO2, MEIS1, SFPI1, RUNX1 and TAL1), two TRMs were observed in CST (including GATA2 and TAL1 in K562 cells). Four significant TF complexes were observed in both TRM and CST (3 in the TAL1 and 1 in the GATA2 datasets; P = 3*10−4; Fisher’s exact test) after further comparisons for the TRM-predicted TF complexes (Fig. 2c) and for the CST-predicted TF complexes (Fig. 2d). In order to compare CST and SpaMo predictions, Additional file 1: Figure S3 shows SpaMo-predicted TF complexes from GATA2 and TAL1 ChIP-seq data in K562 cells. The result of CST is more significant than the SpaMo prediction (P = 0.02; Fisher’s exact test). Notably, the predicted motif spacings of TAL1-STAT1 and TAL1-GATA1 interactions in CST are 85 and 23 bps, respectively (Additional file 1: Table S2). According to a previous study claiming that a TF-TF interaction is likely indirect if the spacing of the interaction exceeds 30 bps , we speculated that interactions between TAL1 and STAT1 are indirect, whereas between TAL1 and GATA1 are direct. This result is consistent with the TRM database, in which TAL1 indirectly interacts with STAT1 by the WDR5 bridge protein, whereas TAL1 directly interacts with GATA1.
Validation of CST-predicted target genes using ChIP-qPCR and RT-PCR
Using USF2 ChIP-seq data in K562 cells, our pipeline predicted that USF2 and NFYA form a TF complex that possesses a significant motif spacing of 9 bps and binds to five genomic locations, regulating eight target genes with high confidence (Table 1). USF2 is a basic helix-loop-helix leucine zipper protein recognizing the E-box (CACGTG) DNA-binding motif, whereas NF-Y is a trimeric TF consisting of two histone-like subunits (NFYB and NFYC) and a CCAAT binding subunit (NFYA). Using EMSA, Zhu et al. reported that USF2 and NFYA form a TF complex at the HoxB4 promoter in K562 cells , which supports our prediction that USF2 and NFYA together form a TF complex in K562 cells.
To experimentally validate the interactions between the predicted USF2-NFYA TF complex and its targets, we used ChIP-qPCR in K562 and HeLa cells for in vitro validation. For qPCR amplification targets, we selected the promoters of high-confidence target genes (EIF4E and GLYR1), a low-confidence target gene (HoxB7) and HoxB4 (a positive control according to Zhu et al.). To ensure PCR accuracy, we designed two primer sets for HoxB7 (see Additional file 1: Table S3). Relative to IgG-IP normalization, the qPCR fold enrichment of all targets was large and highly significant for both NFYA (Fig. 3a) and USF2 (Fig. 3b) in both K562 and HeLa cell lines. In addition, the regular PCR amplification from USF2-IP and NFYA-IP DNA also demonstrated the interaction between the USF2-NFYA complex and the promoter of the target genes (Additional file 1: Fig. S4). Although these ChIPs against USF2 and NFYA were independent of one another, the results supported the conclusion that both TFs bind to the same target sequences on target genes predicted in CST.
To experimentally validate that these targets were activated by USF2-NFYA complex in HeLa cells, we interfered with the complex formation by silencing USF2 (Fig. 3c) and then investigated the expression levels of three downstream genes with real-time RT-PCR (Fig. 3d). The data clearly demonstrated that the expression levels of predicted target genes (EIF4E, GLYR1 and HoxB7) of the USF2-NFYA complex were reduced while USF2 was silenced.
Clustering target genes for each TF complex separately results indifferent GO enrichment results
Changes in the interacting partner (s) of a given TF often result in alterations in target genes to elicit different biological functions. To examine these phenomena, we performed gene ontology (GO) enrichment analysis of target genes for given TF complexes to determine how they varied with TF complex composition. Using USF2 ChIP-seq data in K562 cells, CST predicted that USF2 and IRF1 form a TF complex. Among the top 10 GO enrichment results for the USF2-IRF1 complex targets (Table 2, upper panel), the second, third and tenth GO terms are related to iron transport. Previous studies reported that USF2 and IRF1 co-regulate β2-microglobulin, which can regulate iron metabolism and transport [13, 14]. This result is completely different from the GO enrichment results of the predicted USF2-NFYA TF complex targets, which are related to DNA catabolic processes and activity (Table 2, lower panel). CST fully distinguishes the functionality of the USF2-IRF1 complex from that of USF2-NFYA, implying that the USF2-IRF1 and USF2-NFYA TF complexes recruit different downstream target genes to determine phenotypes in K562 cells.
ChIP-seq/ChIP-chip techniques are powerful methods for identifying TF binding sites. However, these approaches currently are prone to a high false positive rate in predicting target genes [6, 7]. Therefore, we employed TIP  to remove binding peaks not located in predicted target genes and obtain better results than SpaMo  (Fig. 2a and b). Due to CST predicting TF complexes based on SpaMo, CST and SpaMo have similar curve treads in Fig. 2a. Other than TIP, many other methods exist for scoring target genes, such as TFAS  and ClosestGene , which can also be used to predict rankings. These methods all require binding peaks from a peak-calling algorithm [17–19]. Notably, the number of binding peaks is sensitive to the parameters of the peak-calling algorithm and thus can affect the accuracy and consistency of target gene prediction.
A previous study showed that USF2 and NFYA form TF complexes on the HoxB4 promoter in the K562 cell line , but this observation was not detected by CST. Our further scrutiny found that, among 9,428 narrow peaks from ENCODE K562 USF2 ChIP-seq data , there are no narrow peaks on the HoxB4 putative promoter (+/-3kbp around the TSS). We postulated that this is why CST was not able to detect it. To prevent such incidents from occurring, we suggest that the criteria for calling narrow peaks should be loosened. In CST, we used SpaMo to facilitate the prediction of TF complexes. SpaMo can predict whether two TFs belong to the same complex , but cannot confirm whether the interactions of the TF pair are direct or indirect. For example, the different motif spacings of the USF2-NFYA complex on different promoters (HoxB7, 21 bps; and HoxB4, 10 bps ) in K562 cells may arise from different interactions or conditions. USF2 and NFYA may interact indirectly when binding to HoxB7 but directly when binding to HoxB4, indicating that binding to HoxB7 may require more protein components than binding to HoxB4. This rationale may explain the observation in our qPCR experiments that the USF2-NFYA complex exhibited a higher binding affinity and enrichment on the HoxB7 promoter than on the HoxB4 promoter (Fig. 3a and b).
The accuracy of CST-predicted TF complexes from HeLa S3 NFYA ChIP-seq data can be confirmed by many studies (Table 3). Of the 7 CST-predicted TF complexes for NFYA in HeLa S3 cells, 5 have been previously reported: FOS , RFX5 , SREBP2 , TBP , and SP1 . The experimental techniques involved in the above studies included immunoprecipitation, mammalian two-hybrid assays or luciferase assays for TF complex identification. Furthermore, CST-predicted TF complexes are supported by the published molecular structure data. Our results indicated that both IRF3-JUN and NFKB-IRF3 are TF complexes, consistent with a crystal structure in which ATF2/JUN, IRF3/IRF7 and NFKB form an enhanceosome on the interferon beta enhancer .
In addition, we found that complex formation is dynamically changed in various conditions, even from the same cells. For example, the TF partners of MYC in K562 cells treated with interferon gamma are different from that with interferon alpha (Additional file 1: Table S4). Furthermore, even the same drug treatment, the different time points showed different complex formation. Notably, MYC appears to interact with the AP1 family in K562 cells exposed to interferon, independent of gamma or alpha subtype (Additional file 1: Table S4). The similar results were also observed in the TF partners of JUN in K562 cells (Additional file 1: Table S5).
The attraction of CST for predicting condition-specific TF complexes arises from its rich database containing vast vertebrate TF motifs. Thus, the results of CST are not hindered by cell lines that have little ChIP-seq data. There are thousands published ChIP-seq datasets, among which the MCF-7 cell line has a maximum of 40 distinct TF ChIP-seq datasets . When Chen et al.  proposed an algorithm to identify TF complexes using paired ChIP-seq data, there were only 780 (C 2 40 ) potential TF complexes for scrutiny in MCF-7 cell lines. By contrast, there are 966 distinct vertebrate TF motifs in the TRANSFAC database (version 2013.2), resulting in 966 × 40 potential combinations for CST. Hence, CST is powerful and insightful, particularly for cell lines having few ChIP-seq datasets.
We are confident that CST will be helpful for detecting condition-specific TF complexes and their target genes because of its top-performing methods for target gene prediction (TIP ) and for partner TF prediction (MEME, FIMO and SpaMo [5, 26, 27]). Current bioinformatics approaches of TF target genes do not consider the fluidity of TF complexes . Therefore, many important nuances in TF function and transcriptional regulation are missing. For example, USF2 could regulate iron transport and DNA catabolic processes when forming TF complexes with IRF1 and NFYA, respectively (Table 2). However, the top 10 GO terms of USF2 target genes identified by processing USF2 ChIP-seq data in the K562 cell line using the TIP algorithm (see Methods) are related to chromatin structure (Additional file 1: Table S6). If we only examine the GO results for USF2 targets, we may ignore important regulatory functions of USF2, including iron transport and DNA catabolic processes from the USF2-IRF1 and USF2-NFYA complexes, respectively.
To the best of our knowledge, CST is the first pipeline that infers both condition-specific TF complexes and their target genes using human ChIP-seq datasets. Integrating the results of CST pipeline from 359 ChIP-seq ENCODE datasets, we constructed DBCST database. DBCST provides a searchable platform for TF complex and regulatory function discovery. DBCST is not only a database but also a web server and can perform CST pipeline from user’s own ChIP-seq experiment. User also can download CST package, which reports the list of primary TF targets and its binding sites when inputting wig file and narrow peak data, from DBCST download function. Using the result of CST package and then running MEME, FIMO and SpaMo, user can perform CST pipeline in their computer. We hope that DBCST will be a useful resource and provide insightful assistance for biologists studying transcriptional regulation going forward.
Wiggle files and narrow peak data of 359 SYDH (Stanford/Yale/USC/Harvard) ENCODE ChIP-seq experiments were downloaded from the UCSC Genome Browser . One hundred forty-six TF binding motifs were collected from the JASPAR CORE database . To obtain a complete set of TF binding motifs across various conditions/cell types, we added 278 de novo motifs extracted from ENCODE ChIP-seq data using MEME  to our motif database. Genomic sequences and annotation files for RefSeq genes (both in hg19 version) were downloaded from the UCSC Genome Browser . GO annotations were retrieved from the gene2go file (Dec 2012 version) on the NCBI Entrez Gene FTP site (ftp://ftp.ncbi.nlm.nih.gov/gene) .
The main steps of the CST pipeline are described below (Fig. 1).
Step 1. Identify target genes using TIP
Conventionally, TF target genes are identified by first selecting the binding peaks of the TF using a peak-calling algorithm (e.g., MACS ) and then by finding the genes with peaks in their putative promoters. However, this approach is known to produce many false positive target genes [6, 7]. In CST, the TF target genes are predicted using the Target Identification from Profiles (TIP) method  (Fig. 1a), which evaluates the confidence score of each putative target gene using a probabilistic model based on ChIP-chip or ChIP-seq data. TIP is one of the most accurate TF target gene prediction methods . For all 359 ENCODE ChIP-seq samples, the selected TIP-derived target genes had to pass a confidence threshold of Q-value < 0.1.
Step 2. Identify binding peaks within target gene promoters
To examine TF binding motifs and their relative spacing, for each primary TF, the locations of binding peaks at the promoters of TIP-predicted target genes must first be identified. To accomplish this goal, we used the ENCODE narrow peak data to search for these peaks (Fig. 1b). Putative promoters were defined as the genomic regions +/- 3 kbp starting from the TSSs of the target genes. These regions are where the highest densities of accumulative TF binding peaks and histone modification signals both occur .
Step 3. Discover primary TF binding motifs using MEME
To discover primary TF binding motifs, we retrieved 120bps DNA sequences centred at the summits of the top 500 binding peaks (ranked by p-value) and used MEME (version 4.9.0_4 in the MEME suite)  with the “–mod zoops –maxw 10” options (Fig. 1c).
Step 4. Locate the primary TF binding motif using FIMO
A given binding peak may contain different motifs other than the primary binding motif for a primary TF. We employed FIMO (version 4.9.0_4 in the MEME suite with a p-value < 1e-4)  to select peaks containing the primary binding motif for each primary TF (Fig. 1d).
Step 5. Identify TF partners using SpaMo
We selected 300 bps DNA sequences centred on the primary binding motif after referencing a thermodynamic model of TF-TF interactions  in which 150 bps were the maximum distance for TF interactivity. These sequences were then used in SpaMo (version 4.9.0_4 in MEME suite)  to search for the presence of binding motifs of potential partner TFs (Fig. 1e).
SpaMo identified TF-TF pairs with enriched spacing between the primary and secondary motif. Given intervals centred on the primary TF binding sites from the ChIP-seq data to identify the significant spacing between motif pairs, SpaMo assumes that the number of observed spacings between the primary TF and the secondary TF motifs follows a binomial distribution. Our pipeline for the secondary binding motifs included the de novo motifs identified in step 3 (278 in total; derived from the ENCODE data) and 146 additional motifs from the JASPAR CORE motif database . The primary-secondary TF pair is reported when the spacing between the primary and secondary motif was significant (SpaMo p-value < 0.05 and E-value < 10).
Step 6. Report the target genes with GO enrichment analysis for each TF complex
GO enrichment analysis was performed for predicted target genes (Fig. 1f) using Fisher’s exact test scores based on the hypergeometric distribution for each GO term. CST provides two sets of target genes: high-confidence and low-confidence. A high-confidence target gene is called if the following standards are fully met: (i) it is the target gene for a primary TF; (ii) it has a primary and a secondary motif on its promoter, and its motif spacing is ≤ 150 bases ; and (iii) the spacing of the motif pair is significant (SpaMo p-value < 0.05 and E-value < 10). By contrast, a low-confidence target gene is called if standard (iii) is not met. The latter is used to describe TFs with variable spacing because of their binding on wrapped DNA strings or nucleosomes.
To validate our results, we employed a three-step approach: (1) comparison of predicted TF complexes against an external and empirically derived TF complex database; (2) spot validation of the target genes using ChIP-qPCR and RT-PCR; and (3) GO enrichment analysis of the target genes.
Step 1. Comparison of predicted TF complexes against external TF complex databases
Two procedures were used in this step. First, we compared our list of TF complexes to a TF-TF interaction database experimentally derived and collected by Ravasi . To calculate their enrichment ratios against this database, we rank ordered our predicted TF complexes based on their p-values and then examined the TF complexes in groups (TF complexes with the lowest 10% of p-values followed by TF complexes with the lowest 20% of p-values up to 100% of TF complexes). Similar steps were conducted on TF complexes predicted using SpaMo as a reference. Next, enrichment ratios for CST and SpaMo were determined relative to a randomly generated list of TF complexes in CST or SpaMo based on the following formula: the observed number (CST or SpaMo TF complexes in the Ravasi database) divided by the expected number (randomly generated TF complexes in the Ravasi database).
Second, we used another external TF repository of transcriptional regulatory modules (TRMs)  for further comparisons. Briefly, we conducted hypergeometric distribution Fisher’s exact tests on the degrees of overlap between CST and TRMs to check for consistency.
Step 2. Validation of CST-predicted target genes using ChIP-qPCR and RT-PCR
Chromatin immunoprecipitation quantitative PCR (ChIP-qPCR). Four ChIP-qPCR experiments for the NFYA and USF2 TFs in the K562 and HeLa cell lines were performed using a selection of the NFYA-USF2-predicted TF complex target genes. HoxB4, a literature-derived external positive control from Zhu et al., was one of the qPCR amplification targets . ChIP was performed using a ChIP kit (Millipore, Billerica, MA, USA) according to the manufacturer’s instructions and described in the Additional file 1 Supplementary Methods. The selected qPCR amplification targets were HoxB7, GLYR1 and EIF4E (which were derived from the CST-predicted target gene list of the NFYA-USF2-predicted TF complex) and HoxB4 (which was derived from external work confirming NFYA-USF2 regulation and used as a positive control ). The forward and reverse primers used for the ChIP-qPCR are listed in Additional file 1: Table S3. The locations of these primers are illustrated in Additional file 1: Fig. S5.
Real-time reverse transcription-polymerase chain reaction (RT-PCR) and Western blotting. USF2-specific siRNA (SI02780785) and scrambled control were purchased from Qiagen (Massachusetts, USA) and employed to silence the USF2 expression in HeLa cells. Subsequently, total RNAs were purified by Trizol reagent (Invitrogen) and then subjected to SYBR Green RT-PCR using an ABI Prism 7300 sequence detection system (Applied Biosystems, Philadelphia, PA, USA), as described previously . The primers used for amplification are listed in Additional file 1: Table S3. The expression of the mRNA normalized to that of the internal control (TATA box-binding protein, TBP) was defined as -ΔCT = -(CTTarget - CTTBP), whereas the relative expression of the target gene was calculated using the 2–ΔCT method. The detailed procedures of immunoblotting were performed as described previously . The antibodies included anti-USF2 (Abcam, Burlingame, CA, USA) and anti-β-tubulin (Millipore, Bedford, MA, USA), primary antibody as well as horseradish peroxidase-conjugated secondary antibody (Santa Cruz Biotechnology Inc.). The β-tubulin acted as an internal control.
Step 3. Gene ontology enrichment analysis of the target genes of selected TF complexes
To validate that CST captured the phenomenon mentioned in Mullen, A. C. et al , we performed GO enrichment analysis for target genes of a primary TF with a different partner TF.
chromatin immunoprecipitation sequencing
Database of Condition-Specific Targets
Quantitative polymerase chain reaction
Transcriptional regulatory module
Transcription start site
Mullen AC, Orlando DA, Newman JJ, Loven J, Kumar RM, Bilodeau S, Reddy J, Guenther MG, DeKoter RP, Young RA. Master transcription factors determine cell-type-specific responses to TGF-beta signaling. Cell. 2011;147(3):565–76.
Ravasi T, Suzuki H, Cannistraci CV, Katayama S, Bajic VB, Tan K, Akalin A, Schmeier S, Kanamori-Katayama M, Bertin N, et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell. 2010;140(5):744–52.
Zhang Z, Chang CW, Goh WL, Sung WK, Cheung E. CENTDIST: discovery of co-associated factors by motif distribution. Nucleic Acids Res. 2011;39(Web Server issue):W391–399.
Wolberger C. Multiprotein-DNA complexes in transcriptional regulation. Annu Rev Biophys Biomol Struct. 1999;28:29–56.
Whitington T, Frith MC, Johnson J, Bailey TL. Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res. 2011;39(15):e98.
MacArthur S, Li XY, Li J, Brown JB, Chu HC, Zeng L, Grondona BP, Hechmer A, Simirenko L, Keranen SV, et al. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 2009;10(7):R80.
Li XY, MacArthur S, Bourgon R, Nix D, Pollard DA, Iyer VN, Hechmer A, Simirenko L, Stapleton M, Hendriks CLL, et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm (vol 6, pg e27, 2008). Plos Biol. 2008;6(7):1600–0.
Chen MJ, Chou LC, Hsieh TT, Lee DD, Liu KW, Yu CY, Oyang YJ, Tsai HK, Chen CY. De novo motif discovery facilitates identification of interactions between transcription factors in Saccharomyces cerevisiae. Bioinformatics. 2012;28(5):701–8.
Cheng C, Min R, Gerstein M. TIP: a probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles. Bioinformatics. 2011;27(23):3221–7.
Diez D, Hutchins AP, Miranda-Saavedra D. Systematic identification of transcriptional regulatory modules from protein-protein interaction networks. Nucleic Acids Res. 2014;42(1):e6.
Fleming JD, Pavesi G, Benatti P, Imbriano C, Mantovani R, Struhl K. NF-Y coassociates with FOS at promoters, enhancers, repetitive elements, and inactive chromatin regions, and is stereo-positioned with growth-controlling transcription factors. Genome Res. 2013;23(8):1195–209.
Zhu J, Giannola DM, Zhang Y, Rivera AJ, Emerson SG. NF-Y cooperates with USF1/2 to induce the hematopoietic expression of HOXB4. Blood. 2003;102(7):2420–7.
Gobin SJ, Biesta P, Van den Elsen PJ. Regulation of human beta 2-microglobulin transactivation in hematopoietic cells. Blood. 2003;101(8):3058–64.
Muckenthaler MU, Rodrigues P, Macedo MG, Minana B, Brennan K, Cardoso EM, Hentze MW, de Sousa M. Molecular analysis of iron overload in beta2-microglobulin-deficient mice. Blood Cells Mol Dis. 2004;33(2):125–31.
Ouyang Z, Zhou Q, Wong WH. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci U S A. 2009;106(51):21521–6.
Sikora-Wohlfeld W, Ackermann M, Christodoulou EG, Singaravelu K, Beyer A. Assessing computational methods for transcription factor target gene identification based on ChIP-seq data. PLoS Comput Biol. 2013;9(11):e1003342.
Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol. 2009;27(1):66–75.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.
Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol. 2008;26(12):1351–9.
Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 2014;42(Database issue):D764–770.
Jabrane-Ferrat N, Nekrep N, Tosi G, Esserman LJ, Peterlin BM. Major histocompatibility complex class II transcriptional platform: assembly of nuclear factor Y and regulatory factor X (RFX) on DNA requires RFX5 dimers. Mol Cell Biol. 2002;22(15):5616–25.
Dooley KA, Millinder S, Osborne TF. Sterol regulation of 3-hydroxy-3-methylglutaryl-coenzyme A synthase gene through a direct interaction between sterol regulatory element binding protein and the trimeric CCAAT-binding factor/nuclear factor Y. J Biol Chem. 1998;273(3):1349–56.
Lee LC, Chen CM, Wang HC, Hsieh HH, Chiu IS, Su MT, Hsieh-Li HM, Wu CH, Lee GC, Lee-Chen GJ, et al. Role of the CCAAT-binding protein NFY in SCA17 pathogenesis. PLoS One. 2012;7(4):e35302.
Panne D, Maniatis T, Harrison SC. An atomic model of the interferon-beta enhanceosome. Cell. 2007;129(6):1111–23.
Griffon A, Barbier Q, Dalino J, van Helden J, Spicuglia S, Ballester B. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res. 2015;43(4):e27.
Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8.
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(Web Server issue):W202–208.
Liu CC, Lin CC, Chen WS, Chen HY, Chang PC, Chen JJ, Yang PC. CRSD: a comprehensive web server for composite regulatory signature discovery. Nucleic Acids Res. 2006;34(Web Server issue):W571–577.
An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489(7414):57-74.
Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38(Database issue):D105–110.
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2014, 42(Database issue):D7-17.
Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007;130(1):77–88.
Cheng Q, Kazemian M, Pham H, Blatti C, Celniker SE, Wolfe SA, Brodsky MH, Sinha S. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet. 2013;9(8):e1003571.
Yuan A, Hsiao YJ, Chen HY, Chen HW, Ho CC, Chen YY, Liu YC, Hong TH, Yu SL, Chen JJ, et al. Opposite Effects of M1 and M2 Macrophage Subtypes on Lung Cancer Progression. Sci Rep. 2015;5:14273.
Su TJ, Ku WH, Chen HY, Hsu YC, Hong QS, Chang GC, Yu SL, Chen JJ. Oncogenic miR-137 contributes to cisplatin resistance via repressing CASP3 in lung adenocarcinoma. Am J Cancer Res. 2016;6(6):1317–30.
This work was funded by Taiwan Ministry of Science and Technology grants [MOST 104-2621-M-005-005-MY3, MOST 104-2320-B-005-005] (to C.C.L.) and the Ministry of Education [Taiwan, R.O.C. under the ATU plan].
Availability of data and material
DBCST database is freely available online at http://syslab3.nchu.edu.tw/DBCST. The ChIP-seq data download from ENCODE project: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhTfbs. GO annotations were retrieved from the gene2go file (Dec 2012 version) on the NCBI Entrez Gene FTP site (ftp://ftp.ncbi.nlm.nih.gov/gene). TF binding motifs were collected from JASPAR database: http://jaspar.genereg.net/.
CCY, CC and CCL designed the algorithm and database. CCY, MHC, SYL and CC performed the analysis. CCY, MHC, EHA, CC, CCL and JJWC prepared the manuscript. CCY, SYL and JJWC contributed to the literature study. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Supplementary Instruction for Browsing Web Interface and Supplementary Methods. Figure S1. The likelihood of TF complexes near a transcriptionally active gene. Figure S2. High-confidence and low-confidence target genes. Figure S3. SpaMo-predicted TF complexes from GATA2 and TAL1 ChIP-seq experiments in K562 cells. Figure S4. ChIP-PCR and slab gel electrophoresis against CST NFYA-USF2-predicted target genes. Figure S5. The schema of ChIP-PCR primers. Figure S6. DBCST’s partners view. Figure S7. DBCST’s network view. Figure S8. DBCST’s target genes view. Figure S9. DBCST’s upload functionality. Table S1. Statistics for the TF and ChIP-seq datasets used in the construction of DBCST for each cell line. Table S2. The intersection of TF-TF interactions predicted by TRMs and CST. Table S3. A list of PCR primers. Table S4. Dynamic TF complexes and binding motifs from K562 MYC ChIP-seq data in various conditions. Table S5. Dynamic TF complexes and binding motifs from K562 JUN ChIP-seq data in various conditions. Table S6. A partial list of the GO analysis results of target genes predicted by USF2 ChIP-seq data in the K562 cell line. (PDF 1190 kb). (DOC 2001 kb)