Transcription factor-associated combinatorial epigenetic pattern reveals higher transcriptional activity of TCF7L2-regulated intragenic enhancers
© The Author(s). 2017
Received: 17 January 2017
Accepted: 3 May 2017
Published: 12 May 2017
Recent studies have suggested that combinations of multiple epigenetic modifications are essential for controlling gene expression. Despite numerous computational approaches have been developed to decipher the combinatorial epigenetic patterns or “epigenetic code”, none of them has explicitly addressed the relationship between a specific transcription factor (TF) and the patterns.
Here, we developed a novel computational method, T-cep, for annotating chromatin states associated with a specific TF. T-cep is composed of three key consecutive modules: (i) Data preprocessing, (ii) HMM training, and (iii) Potential TF-states calling.
We evaluated T-cep on a TCF7L2-omics data. Unexpectedly, our method has uncovered a novel set of TCF7L2-regulated intragenic enhancers missed by other software tools, where the associated genes exert the highest gene expression. We further used siRNA knockdown, Co-transfection, RT-qPCR and Luciferase Reporter Assay not only to validate the accuracy and efficiency of prediction by T-cep, but also to confirm the functionality of TCF7L2-regulated enhancers in both MCF7 and PANC1 cells respectively.
Our study for the first time at a genome-wide scale reveals the enhanced transcriptional activity of cell-type-specific TCF7L2 intragenic enhancers in regulating gene expression.
Numerous public data resources, including the ENCODE and Epigenomics Roadmap, have generated thousands of genome-wide data sets, and provided us with substantial quantities of data to study transcriptional and epigenetic patterns in different cell types at a genome-wide scale [1, 2]. Many studies have shown that epigenetic modifications play a central role in regulating gene expression, and are involved in a diversity of biological processes in the human organism [3, 4]. However, the “epigenetic code”, referring to the transcription of genetic information encoded in DNA is in part regulated by DNA methylation and histone modifications, has not yet been fully elucidated. Substantial efforts in cracking this code and uncovering its biological functions have been made, utilizing both in vitro and in silico methods [5–7]. One innovative and practical way is to segment the epigenome into various chromatin states [8–10], where each state is decoded as a specific combinatorial pattern of multiple epigenetic modifications. Such states or combinatorial patterns can reflect a variety of sets of genes’ expression levels, which are essential to establish and maintain distinct functions in chromatin [11, 12]. For instance, novel classes of enhancers have been identified through genome-wide pattern analyses [13–15]. Therefore, identification of epigenetic regulatory combinations and networks is increasingly important for understanding genome functions, chromatin components and molecular mechanisms.
Computational approaches and models have been a major force to uncover these complex epigenetic patterns. These contain probabilistic methods, such as ChromaSig, an unbiased clustering and alignment approach that finds over-represented epigenetic signatures . More advanced machine learning approaches have been applied in recent work, such as hidden Markov models (HMM)  and Bayesian network methods . The HMM has proved to be a good model in training models with numerous inputs, and that generate thousands of outputs [18–20]. For example, ChromHMM utilized a multivariate HMM to learn chromatin states and to output emission probabilities for each mark in each state and then to infer the number of combinatorial marks in each of all states . Segway  applied dynamic Bayesian networks (DBN) to segregate the genome with a higher segment resolution, and derive different chromatin states from chromatin marks . Although these computational methods have successfully been used to annotate chromatin elements, many of them suffer from significant limitations. For instance, some supervised learning methods cannot find de novo information. Some unsupervised learning methods train on small genomic regions such as Segway  and HMMSeg , or output a single mark’s probability such as ChomHMM . None of these algorithms explicitly compare targets of a specific TF with different histone modification marks to qualitatively assess the association of that TF with active, repressive and elongation regions. Most TFs have narrow binding patterns compared to most histone modification marks, but are major regulators of gene expression [24, 25]. Therefore, by ignoring TF binding in the beginning of algorithm design or training, a method may erroneously consider it as less meaningful due to the lower probability of TF than chromatin states, or not incorporate TF binding information at all. As such, it is critical to develop an algorithm that considers a specific TF together with many different epigenetic marks.
In order to address TF-dependent epigenetic regulatory events, we have developed a novel algorithm and software tool, T-cep (Transcription factor-associated combinatorial epigenetic patterns), which applies a univariate HMM to identify combinatorial epigenetic patterns associated with cell type-specific TF targets in different cell types. To evaluate T-cep, we applied it on TCF7L2-omics data, including ChIP-seq data of TCF7L2, Pol-II, active chromatin marks (H3K4me1,3 and H3K27ac), repressive chromatin marks (H3K27me3 and H3K9me3), a mark of transcriptional elongation (H3K36me3) as well as DNase-seq for open chromatin regions in five cancer cell types.
TCF7L2 (transcription factor 7-like 2), an important component of WNT pathway, has been implicated in several human diseases including carcinogenesis, type 2 diabetes and bipolar disorder [26–29]. The WNT pathway is often constitutively activated in human cancers, such as colon, liver, breast, and pancreatic cancer , with high upregulation of TCF7L2. Several studies have shown tissue-specific alternative splicing of TCF7L2, suggesting that TCF7L2 may have different functional properties in different cells [31, 32]. In our previous studies, we have mapped genome-wide binding of TCF7L2 in six cell lines [33, 34]. However, the combinatorial epigenetic profile of TCF7L2 in these cancer types has not been well studied. One hypothesis is that TCF7L2 regulates its downstream target genes in a cell type-specific manner, i.e., utilizing different combinatorial patterns with various epigenetic environment cues in each cell type. Therefore, one way to test this hypothesis is to identify TCF7L2-assoicated combinatorial epigenetic patterns in a diverse set of cell types.
In this paper, we first describe the workflow of T-cep, then present training results on TCF7L2-omics data in five cancer cell types, and compare it with ChromHMM . Finally, we perform functional validations using siRNA, RT-qPCR, co-transfections and luciferase reporter assays on selected gene loci. To the best of our knowledge, this is the first genome-wide combinational epigenetic pattern discovery study for cell type-specific TCF7L2 regulation.
The workflow of T-cep
Step 1: Data pre-processing
T-cep uses a univariate HMM, where each bin can only emit one probability value, which corresponds to a combination of different marks. An alphabet of 2n (n = the number of epigenetic marks) observation symbols is constructed by enumerating each possible combination of marks (including no marks). We combine all ChIP-seq data in multiple cells as observations and translate them to such an alphabet of 2n observations as an input for the next step. If necessary for computational efficiency, this alphabet can be simplified by removing symbols that correspond to mark combinations not present in the input data.
Step 2: HMM training
We initially train multiple HMMs for 300 iterations with T-cep, and select the best model based on the lowest BIC score. If desired, this model may be simplified by removing infrequent states. This is performed by eliminating the states which are called in very few bins (lower than 5% of the average number) and uniformly redistributing their transition probabilities to other states. We then re-train this derived HMM for another 100 iterations to produce the final model.
Step 3: Potential TF-states calling
The Viterbi decoding algorithm is used to output called states for all bins in the genome. The probabilities of each mark are futher calculated by marginalization for each of different cell types respectively. We chose states with a cutoff of probability greater than 0.1 for any individual mark for further investigation, as these are most likely to yield meaningful biological insights. Finally, the HMM states can be classified as potential TF-associated chromatin states (TF-states) and potential non-TF-associated chromatin states (non-TF-States).
Identification of TCF7L2-associated chromatin states by T-cep
To demonstrate the accuracy of T-cep and evaluate its performance, we chose TCF7L2-omics data as a study case. TCF7L2 is an important component of the WNT pathway, and has previously been studied by our laboratory. Additionally, 45 datasets of omics-seq data are available for the training (Additional file 1: Table S1-S2, all are available in the ENCODE Consortium), including ChIP-seq of TCF7L2, six histone marks (H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3, H3K36me3), Pol-II and DNase-seq in five cancer cell types, HCT116, HeLa, HepG2, MCF7 and PANC1, with a total of ~2.02 billion reads. For each of these cell types, we also obtained RNA-seq data from our previous study or publicly available sources [35, 36].
We initially trained five 25-state HMMs over the data with bin sizes of 750 bp for 300 iterations, and selected the best HMM with the lowest BIC score of 4.3247 • 107 (Additional file 1: Table S3). The states of every bin in each genome were called after the first HMM training. Seven infrequent states (lower than 5% of the average) were then eliminated (Additional file 1: Table S4) to produce an 18-state HMM model with BIC 4.40718 * 107. We then performed a secondary HMM training for 100 iterations, and the log-likelihood was calculated at each iteration to verify model convergence (Additional file 1: Figure S1). This produced the final model, with a BIC score of 4.31487 *107. The transition probabilities and the emission probabilities of each final state are shown in Additional file 1: Figures S2-S3. The Pearson correlation between the HMM emission probabilities and the actual observation frequencies under each state was R 2 ≥ 0.97 for all cell types, except HCT116 with R 2 = 0.9545 (Additional file 1: Table S5 and Figure S4). We then divided emission probabilities for each mark independently by marginalization among combinations of marks probabilities and potential TF-states (Fig. 1b and Additional file 1: Figure S5). A comparison of a genomic region for the states annotated by T-cep and the actual ChIP-seq signals (Fig. 1c) clearly demonstrated the ability of T-cep to accurately capture epigenetic elements.
We also identified some non-TCF7L2-associated states (non-TF-states). States 7 and 9 were classified as non-TCF7L2-associated enhancers. State 9 with a high Pol-II was assigned as an active non-TCF7L2-associated enhancer, while state 7 was likely to be an inactive non-TCF7L2-associated enhancer. Interestingly, the average gene expression levels of both states are less than those of the TCF7L2-associated states. State 6 was classified as a non-TCF7L2-associated bivalent state or poised enhancer due to its high emission probability of both repressive mark H3K27me3 and enhancer mark H3K4me1, and its lower average gene expression. All the other non-TCF7L2-associated states were assigned for their possible functions as well (Fig. 2b).
The annotation of chromatin states is further supported by the location analysis of bins within each state relative to genes (Fig. 2c). All genes’ length is normalized by Virtual bin Creation and the 5’-3’ region represents the whole gene body region and the distal region is extended to each to 90 kb upstream of 5’TSS or downstream of 3’TSS. Clearly, it is showed more than 60% of state 3 bins were located at or near the 5’TSS of each gene, (the green line), in which we interpreted it as promoter. State 5 is represented with the red line, in which its bins were distributed mainly in gene body region, thus called as intragenic enhancer. State 4 as distal enhancer shown in the blue line was mainly located in distal regions, and state 1 was infrequently in any of gene body or intragenic regions (Fig. 2c and Additional file 1: Figure S7).
In the other aspect, we wanted to re-examine whether the distribution in proportion of numbers of three annotated functional states is actually in their expected genomic regions. Indeed, State 3 bins are only in 5’TSS regions (131,086 bins), state 5 bins in intragenic regions (206,644 bins) and state 4 bins in 5’Distal regions (70,812 bins) for further analysis. Moreover, we observed in each cell type the distribution of these three TF-States in order 3/5/4 is the following: 239/32/26 for HCT116, 1061/559/30 for HepG2, 737/1109/29 for HeLa, 802/1265/327 for MCF7 and 3093/1972/1581 for PANC1 (Fig. 2d). Our results indicate the distribution in proportion of numbers of each of three functional states may be cell type specific.
Gene enrichment and pathway analyses of TCF7L2-associated chromatin states
We further examined the biological function of the TCF7L2-regulated genes by KEGG pathway enrichment analysis [38, 39]. Our results demonstrated that the enriched pathways are different between these three categories with cell type specificity (Fig. 3d). In overall, pathway enrichments in MCF7 cells were more involved in cancer initiation and cell cycle pathways, while those in PANC1 cells were more related to the metabolic pathway and cell adhesion. Interestingly, TCF7L2-regulated promoters are more enriched with well-known specific cancer initial development such as Epstein-Barr virus infection in breast cancer and metabolic in pancreatic cancer [20, 40, 41]. However, TCF7L2-regulated intragenic enhancers included advanced stage markers in cancers such as Thyroid hormone signaling and Proteoglycan in cell adhesion [42, 43], whereas TCF7L2-regulated distal enhancers are enriched with the general signaling in cancer development such as PI3K-AKT pathway and Rap signaling . These results suggested that genes associated with TCF7L2 regulated intragenic enhancers may be more relevant to metastasis and cancer progression dependent on a specific cancer type.
Comparison with ChromHMM
To compare T-cep with other published software tools, we chose ChromHMM  since it is a widely used software tool, is powerful in genomic segmentation with a user-friendly application, and can take different types of epigenetic marks (histone modification, transcription factor, DNA methylation and et al) as combinatorial inputs. However, when coming to the question of segmentation involved in a specific TF, the method has some limitations. 1) Given that the number of binding sites for most TFs are much less than those of histone modification sites at a whole genome, thus treating TF same as histone marks may lose the specific TF characteristics. 2) In ChromHMM, the independence between marks is transformed into a combination of probabilities in a given state. This would indicate that, for example, those regions showing low probability of TF binding are combined with the regions without TF binding because both have similar probabilities of other marks. Such output is contradicted to the actual combination. 3) Since there is no specific rule for determining the final threshold of segmentation, this may result in various outputs for the same data samples. In contrast, T-cep is designed to consider all possible events without ignoring any low number of marks. In addition, TF-state is not combined with other similar combinations since T-cep focuses on TF associated combinatorial states.
We trained our TCF7L2-omics data on ChromHMM with both 25-state model and 18-state HMM. We found that ChromHMM output similar patterns for two models, indicating the 18-state model is optimized for segmentation (Additional file 1: Figure S9). We matched the output of 18 states for both tools in order to show a direct side-to-side comparison. We also adopt the following metrics for the evaluation in order to show the TF-specific segmentation: 1) whether the tool is able to identify potential TF-states; 2) how many TF-state bins are able to be recovered in the annotated functional state; and 3) what is the quantitative correlation between annotated functional TF-state and gene expression level.
Functional validations of TCF7L2-associated intragenic enhancers
In this study, we have developed a novel computational method, T-cep, to identify transcriptional factor associated combinatorial epigenetic patterns from ChIP-seq data. T-cep applied an univariate HMM training, selected best models by the BIC score and imposed second HMM re-training for determine TF-state and non-TF-state. The unique features of T-cep are the following: (1) encoding each mark as an alphabet of 2n observations as an input with ability of scaling up, (2) training HMM twice to reduce background noise and infrequent states, (3) outputting all combinatorial patterns for emission probability, and (4) marginalizing on decoding each mark probability in each state. This is a computationally intensive approach, allowing states in T-cep to directly examine combinatorial marks together, since they have emission probabilities for marks’ combinations rather than individual marks. This is in contrast to other software tools which are limited to treat all kind of data set as equal biological weight and lack a focus on the key factor during HMM training. Importantly, using the T-cep method, we were able to unveil a novel set of TF-associated intragenic enhancers that are predicted to be higher transcriptional activity than typical distal enhancers.
As a study case, TCF7L2 is chosen in that it is a key WNT downstream regulator which is implicated in cancers and diabetes, as well as its alternative splicing generates protein variants with differential promoter-binding and transcriptional activation [31, 32, 46]. After trained by T-cep with the TCF7L2-omics data, we uncovered four types of TCF7L2-associated chromatin states with distinct combinatorial characteristics and genomic distribution. Of them, TCF7L2 intragenic enhancer is particularly interesting. The Gene Enrichment and Pathway analyses further revealed that genes with TCF7L2-associated states are cancer type specific (Fig. 3a) and their intragenic enhancers are more associated with advanced stage markers in cancer such as Thyroid hormone signaling and Proteoglycan in cell adhesion (Fig. 3c), illustrating their functional relevance to specific cancer progression and metastasis. Our functional validations found that such intragenic enhancers exert higher transcriptional activities than those distal enhancers in TCF7L2 regulated gene expression. Our findings provide a new set of intragenic enhancers consisting of epigenetic signatures H3K4me1, H3K27ac and H3K36me3  that may play more important functional role in diseases progression. This is in contrast to the role of conventional distal enhancers with typical epigenetic marks of H3K4me1, H3K27ac but the absence of significant H3K4me3 [48, 49] in regulating the cell type specificity.
One notable advantage of T-cep is that it can identify more TF-states at a cell-type dependent manner in comparison to ChromHMM due to the unique underlying algorithmic design. At a broader application, our T-cep was able to identify five MYC-associated states, states 1, 2, 4, 5 and 6, the similar pattern found in TCF7L2 data sets: TF-promoter (state 2), TF-intragenic enhancer (state 6), TF-distal enhancer (state 5) and TF-non-combinatorial state (state 1), as well as TF-inactive enhancer state (state 4) (Additional file 1: Figure S10). Three of these states (state 1, 4, 6) were noticeably missed by ChromHMM (Additional file 1: Figure S11). In a broader aspect, T-cep may be capable of deeply exploiting transcription factor associated epigenetic patterns where many other tools are unable to, and may further uncover their novel functionalities associated with various diseases or cancer types, providing a rationale to explore the underlying mechanisms.
In summary, our study for the first time at a genome-wide scale reveals the enhanced transcriptional activity of cell type-specific TF-associated intragenic enhancers, allowing us more insights into underlying epigenetic regulatory codes in different cell or tissue types as well as in normal or disease conditions.
Training a hidden Markov model (HMM)
In HMMs, TF-state or non-TF-state are considered as hidden states and all of the possible combinations of chromatin marks are as observations. T-cep introduces a binary value for each segment that ‘1’ for mark presence or ‘0’ for mark absence based on a Poisson background distribution(P < 10−4). Observation symbols are made up by enumerating each possible combination of mark. For example, a segment with TCF7L2, H3K4me1 and H3K36me3 can be translated into 0b101000100, while 0 inside means no correspondent mark in this segment. The HMMs sequence is assumed as a segment-homogeneous first-order chain.
Implementation and Application
T-cep is implemented in C++, runs on Linux, and depends on the OpenMP and Boost APIs. T-cep contains a suite of scripts and program implementing the preprocessing, HMM training and TF-State calling steps. A univariate hidden Markov model (HMM) is applied to uncode observed genome-wide epigenetic data into hidden functional chromatin states. Several other scripts are provided for various utility functions.
For the concern of scaling limitation, although it is not necessary in this study (only 512 output combinations), T-cep also provided several scripts for allowing large numbers of marks in application. getOutputMap.pl can output the actually combination in all database. manyOutputMapper.pl used for mapping output combinations in precompiled datasets to an arbitrary ID system, which only for at least one bin actually found in output combinations. Due to far fewer mark combinations are expected to occur in practice, this script effectively frees T-cep from the scalability for reasonable outputs instead of considering every combination of marks as a possible output, which grows according to 2n. ManyOutputModelBackConvert.pl applied in mapping the outputs of HMMs trained with this data back to the original output combinations. This only considers mark combinations actually found in the data sets an upper bound to model complexity corresponding to the number of bins in all datasets.
The tool and scripts are available at our website, http://compbio.uthscsa.edu/T-cep/.
Raw data processing
Public ChIP-seq datasets were downloaded from ENCODE project and only the 5’-end of uniquely mapped reads are used for the further analysis. Alignment to human reference genome hg19 was performed with Bowtie. Default settings were used with an exception that the number of allowed alignments was restricted to 1 in order to obtain only unique mapped reads, and seven threads were used. RNA-seq FASTQ files were aligned to hg19 using Tophat2 and average gene expression of each state was calculated by first finding the highest expression for any splice variant of each gene in the RNA-seq results. The midpoint of each bin is calculated, and compared to the 5’ and 3’ ends of annotated RefSeq genes. The average gene expression reported as gene expression levels are weighted by the number of bins close to them. The percentage of bins within RepeatMasker regions was calculated for all states in all cell lines. Hg19 RepeatMasker data was downloaded from http://www.repeatmasker.org/genomes/hg19/RepeatMasker-rm405-db20140131/. The raw RepeatMasker regions were extracted and converted to BED format. The mergeBed program of the BEDTools package was used to merge adjacent RepeatMasker regions. The merged regions were filtered to exclude regions less than 500 bp length. Correlation between state bins and these filtered regions was performed by BEDTools’ intersectBed.
Co-transfection and RT-qPCR
TCF7L2 siRNAs were purchased from Thermo Fisher Scientific Silencer® Select siRNAs. For transfection of siRNA oligo, MCF7 and PANC1 cells were seeded by 6 cell plate with Lipofectamine® RNAiMAX Transfection Reagent for 48 h. Total RNAs from cells were extracted using Quick-RNA™ MiniPrep kit (Zymo Research). Then cDNA was prepared using RevertAid H Minus First Strand cDNA Synthesis Kit (Thermo Fisher Scientific). Expression of mRNA analysis was performed with LightCycler® 480 SYBR Green I Masteron and LightCycler® 480 System Sequence Detection System (Roche Applied Science) using GAPDH for normalization.
Luciferase reporter assay
Intragenic or distal TFEs were transient transfection into MCF7/PANC1 cells with Lipofectamine 2000 reagent with or without TCF7L2 co-transfection and β-galactosidase expression vector. Cells were harvested and luciferase activity was determined using the Luciferase Assay System (Promega) per the manufacturer’s protocol. The β-galactosidase activity was performed on lysates as a control. The experiments were performed in triplicate.
The authors gratefully acknowledge a financial support from China Scholarship Council. We are grateful to all members of the Jin laboratories for valuable discussion.
US National Institutes of Health (NIH) [R01 GM114142 to V.X.J]. Funding for open access charge: NIH.
Availability of data and materials
The data that support the findings of this study are available from the ENCODE consortium.
QL and VXJ conceived the project. QL, RB and TL developed the algorithm. QL, TL and VXJ designed the experiments and QL and TL performed experiments. QL, TL and VXJ wrote the manuscript with input from RB. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74.View ArticleGoogle Scholar
- Romanoski CE, Glass CK, Stunnenberg HG, Wilson L, Almouzni G. Epigenomics: roadmap for regulation. Nature. 2015;518:314–6.View ArticlePubMedGoogle Scholar
- Reik W, Dean W, Walter J. Epigenetic reprogramming in mammalian development. Science. 2001;293:1089–93.View ArticlePubMedGoogle Scholar
- Kouzarides T. Chromatin modifications and their function. Cell. 2007;128:693–705.View ArticlePubMedGoogle Scholar
- Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 2008;40:897–903.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang M, Mok MW, Harper H, Lee WH, Min J, Knapp S, et al. Structural genomics of histone tail recognition. Bioinformatics. 2010;26:2629–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Su J, Qi Y, Liu S, Wu X, Lv J, Liu H, et al. Revealing epigenetic patterns in gene regulation through integrative analysis of epigenetic interaction network. Mol Biol Rep. 2012;39:1701–12.View ArticlePubMedGoogle Scholar
- Hamada M, Ono Y, Fujimaki R, Asai K. Learning chromatin states with factorized information criteria. Bioinformatics. 2015;31:2426–33.View ArticlePubMedGoogle Scholar
- Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet. 2011;12:7–18.View ArticlePubMedGoogle Scholar
- Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Mammana A, Chung HR. Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome. Genome Biol. 2015;16:151.View ArticlePubMedPubMed CentralGoogle Scholar
- Heinz S, Romanoski CE, Benner C, Glass CK. The selection and function of cell type-specific enhancers. Nat Rev Mol Cell Biol. 2015;16:144–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Pradeepa MM, Grimes GR, Kumar Y, Olley G, Taylor GC, Schneider R, et al. Histone H3 globular domain acetylation identifies a new class of enhancers. Nat Genet. 2016;48:681–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Whitaker JW, Nguyen TT, Zhu Y, Wildberg A, Wang W. Computational schemes for the prediction and annotation of enhancers from epigenomic assays. Methods. 2015;72:86–94.View ArticlePubMedGoogle Scholar
- Hon G, Ren B, Wang W. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput Biol. 2008;4, e1000201.View ArticlePubMedPubMed CentralGoogle Scholar
- Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012;9:473–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kellis M. Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 2013;23:1142–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Bonneville R, Jin VX. A hidden Markov model to identify combinatorial epigenetic regulation patterns for estrogen receptor alpha target genes. Bioinformatics. 2013;29:22–8.View ArticlePubMedGoogle Scholar
- Majidnezhad V. A HTK-based method for detecting vocal fold pathology. Acta Inform Med. 2014;22:246–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010;28:817–25.View ArticlePubMedPubMed CentralGoogle Scholar
- Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 2013;41:827–41.View ArticlePubMedGoogle Scholar
- Day N, Hemmaplardh A, Thurman RE, Stamatoyannopoulos JA, Noble WS. Unsupervised segmentation of continuous genomic data. Bioinformatics. 2007;23:1424–6.View ArticlePubMedGoogle Scholar
- Farnham PJ. Insights from genomic profiling of transcription factors. Nat Rev Genet. 2009;10:605–16.View ArticlePubMedPubMed CentralGoogle Scholar
- Spitz F, Furlong EE. Transcription factors: from enhancer binding to developmental control. Nat Rev Genet. 2012;13:613–26.View ArticlePubMedGoogle Scholar
- Wang F, Jiang L, Li J, Yu X, Li M, Wu G, et al. Association between TCF7L2 polymorphisms and breast cancer susceptibility: a meta-analysis. Int J Clin Exp Med. 2015;8:9355–61.PubMedPubMed CentralGoogle Scholar
- Lu XP, Hu GN, Du JQ, Li HQ. TCF7L2 gene polymorphisms and susceptibility to breast cancer: a meta-analysis. Genet Mol Res. 2015;14:2860–7.View ArticlePubMedGoogle Scholar
- Grant SF, Thorleifsson G, Reynisdottir I, Benediktsson R, Manolescu A, Sainz J, et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet. 2006;38:320–3.View ArticlePubMedGoogle Scholar
- Winham SJ, Cuellar-Barboza AB, Oliveros A, McElroy SL, Crow S, Colby C, et al. Genome-wide association study of bipolar disorder accounting for effect of body mass index identifies a new risk allele in TCF7L2. Mol Psychiatry. 2014;19:1010–6.View ArticlePubMedGoogle Scholar
- Ravindranath A, O’Connell A, Johnston PG, El-Tanani MK. The role of LEF/TCF factors in neoplastic transformation. Curr Mol Med. 2008;8:38–50.View ArticlePubMedGoogle Scholar
- Mondal AK, Das SK, Baldini G, Chu WS, Sharma NK, Hackney OG, et al. Genotype and tissue-specific effects on alternative splicing of the transcription factor 7-like 2 gene in humans. J Clin Endocrinol Metab. 2010;95:1450–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Weise A, Bruser K, Elfert S, Wallmen B, Wittel Y, Wohrle S, et al. Alternative splicing of Tcf7l2 transcripts generates protein variants with differential promoter-binding and transcriptional activation properties at Wnt/beta-catenin targets. Nucleic Acids Res. 2010;38:1964–81.View ArticlePubMedGoogle Scholar
- Frietze S, Wang R, Yao L, Tak YG, Ye Z, Gaddis M, et al. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome Biol. 2012;13:R52.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu B, Yi J, Sv A, Lan X, Ma Y, Huang TH, et al. QChIPat: a quantitative method to identify distinct binding patterns for two biological ChIP-seq samples in different experimental conditions. BMC Genomics. 2013;14 Suppl 8:S3.View ArticlePubMedPubMed CentralGoogle Scholar
- Yamaga R, Ikeda K, Horie-Inoue K, Ouchi Y, Suzuki Y, Inoue S. RNA sequencing of MCF-7 breast cancer cells identifies novel estrogen-responsive genes with functional estrogen receptor-binding sites in the vicinity of their transcription start sites. Horm Cancer. 2013;4:222–232.3.View ArticlePubMedGoogle Scholar
- Kirby MK, Ramaker RC, Gertz J, Davis NS, Johnston BE, Oliver PG, et al. RNA sequencing of pancreatic adenocarcinoma tumors yields novel expression patterns associated with long-term survival and reveals a role for ANGPTL4. Mol Oncol. 2016;10:1169–82.View ArticlePubMedGoogle Scholar
- Chabner BA. NCI-60 cell line screening: a radical departure in its time. J Natl Cancer Inst. 2016;108.Google Scholar
- Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128.View ArticlePubMedPubMed CentralGoogle Scholar
- Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Marrao G, Habib M, Paiva A, Bicout D, Fallecker C, Franco S, et al. Epstein-Barr virus infection and clinical outcome in breast cancer patients correlate with immune cell TNF-alpha/IFN-gamma response. BMC Cancer. 2014;14:665.View ArticlePubMedPubMed CentralGoogle Scholar
- Timofte D, Livadariu R, Bintintan V, Diaconu C, Ionescu L, Sandberg AA, et al. Metabolic disorders in patients operated for pancreatic cancer. Rev Med Chir Soc Med Nat Iasi. 2014;118:392–8.PubMedGoogle Scholar
- Iozzo RV, Sanderson RD. Proteoglycans in cancer biology, tumour microenvironment and angiogenesis. J Cell Mol Med. 2011;15:1013–31.View ArticlePubMedPubMed CentralGoogle Scholar
- Brown AR, Simmen RC, Simmen FA. The role of thyroid hormone signaling in the prevention of digestive system cancers. Int J Mol Sci. 2013;14:16240–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Paplomata E, O’Regan R. The PI3K/AKT/mTOR pathway in breast cancer: targets, trials and biomarkers. Ther Adv Med Oncol. 2014;6:154–66.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Osmark P, Hansson O, Jonsson A, Ronn T, Groop L, Renstrom E. Unique splicing pattern of the TCF7L2 gene in human pancreatic islets. Diabetologia. 2009;52:850–4.View ArticlePubMedGoogle Scholar
- Zentner GE, Tesar PJ, Scacheri PC. Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. Genome Res. 2011;21:1273–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet. 2007;39:311–8.View ArticlePubMedGoogle Scholar
- Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. A unique chromatin signature uncovers early developmental enhancers in humans. Nature. 2011;470:279–83.View ArticlePubMedGoogle Scholar