Human transcriptional interactome of chromatin contribute to gene co-expression
© Dong et al. 2010
Received: 25 July 2010
Accepted: 14 December 2010
Published: 14 December 2010
Skip to main content
© Dong et al. 2010
Received: 25 July 2010
Accepted: 14 December 2010
Published: 14 December 2010
Transcriptional interactome of chromatin is one of the important mechanisms in gene transcription regulation. By chromatin conformation capture and 3D FISH experiments, several chromatin interactions cases among sequence-distant genes or even inter-chromatin genes were reported. However, on genomics level, there is still little evidence to support these mechanisms. Recently based on Hi-C experiment, a genome-wide picture of chromatin interactions in human cells was presented. It provides a useful material for analysing whether the mechanism of transcriptional interactome is common.
The main work here is to demonstrate whether the effects of transcriptional interactome on gene co-expression exist on genomic level. While controlling the effects of transcription factors control similarities (TCS), we tested the correlation between Hi-C interaction and the mutual ranks of gene co-expression rates (provided by COXPRESdb) of intra-chromatin gene pairs. We used 6,084 genes with both TF annotation and co-expression information, and matched them into 273,458 pairs with similar Hi-C interaction ranks in different cell types. The results illustrate that co-expression is strongly associated with chromatin interaction. Further analysis using GO annotation reveals potential correlation between gene function similarity, Hi-C interaction and their co-expression.
According to the results in this research, the intra-chromatin interactome may have relation to gene function and associate with co-expression. This study provides evidence for illustrating the effect of transcriptional interactome on transcription regulation.
Gene transcription regulation is one of the important processes in biology. In eukaryotic cells, effect of highly compartmentalized nucleus on gene transcription regulation has come into notice. By experimental techniques, such as chromosome conformation capture  (3C) and interphase fluorescent in situ hybridization (FISH), the spatial associations between specific genes could be detected . They provided accumulating data to study the of 'gene expression in 3D' .
Mechanisms such as transcription factory[4, 5] and nucleus speckles[6–8], know as the "transcriptional interactome" were proposed, and spatial linkage of sequence-distant but function-related genes were revealed in case studies, although there are still debates on mechanisms [5, 8]. Recently, the only "genome-wide study" was reported to demonstrate transcription interactions . However it still focused on a small set of genes (mouse globin associated genes) rather than the whole genome. And still as J. Lawrence et al. once pointed out, 'a more challenging question for future studies is to determine whether the level of expression is indeed influenced by nuclear and chromosomal organization'.
Recently based on Hi-C experiment, a genome-wide picture of chromatin interaction in human gm06990 and K562 cells was reported . As being demonstrated, Hi-C interactions can be applied as a measurement of spatial distance and chromatin organizations . We used Hi-C interactions to evaluate whether on genomics level, spatial distances or chromatin structures have potential effect on transcription co-regulation. As the inter chromatin interactions are too small comparing to intra in Hi-C interactions , we focused only on intra-chromatin gene pairs.
To estimate the effect of chromatin organization, we test the correlation between Hi-C interaction (observed Hi-C interaction numbers, OH, and Pearson correlation coeffecient of them, PC, of 1 M and 100 k resolution of both human gm06990 and K562 cells, from Ref. , see Methods for details) and mutual ranks of gene co-expression rates (provided by COXPRESdb). We controlled the effect of transcription factors on co-expression using transcription control similarity (TCS, see Methods for details) , and used 6,084 genes with both TF annotation (from ITFP) and co-expression information for analyses. Then, 960,507 intra chromatin gene pairs were extracted by these genes. The Hi-C interactions between genes were represented by the interactions between the chromatin units where genes locate on. As gene co-expression rates are calculated among multiple tissues and cell types, 273,458 gene pairs with similar ranks of Hi-C interaction (ranks difference <5%) in the two cell types are finally used for analysis. We coined a term 'normalized distance' to measure the sequence distance between genes (see Methods for details).
In most gene pairs, TCS equal to zero (107881 pairs out of 116834). Thus, we further divided them into two groups, equal or not to zero, which may provide us a better control for TCS. Correlation analyses were carried out as same as Figure. 1 and Figure. 2. In both groups, co-expressions are correlated with Hi-C interactions. (Additional File 5 & Additional File 6).
According to the above results, we show significant correlation between Hi-C interactoions and the mutual ranks of gene coexpression. However, the rank values we presented in those figures (Figure. 1, Figure. 2 and Additional File 1 to 6) are much higher than those that are used to construct gene co-expression networks . In COXPRESdb, three levels of mutual ranks are used to construct networks (less than 5, 5 to 30, 30 to 50) . We focused on 5735 co-expressed pairs which get mutual ranks ≤50, and found that they have much more numbers of interaction than other pairs (Additional File 7). Moreover, for co-expressed genes, there is a similar trend between the mutual ranks of gene co-expression and Hi-C interactions (Additional File 8).
Why co-localized genes turn to have a similar kind of expression profile? It was suggested that transcription active sites exist in nucleus, such as Pol II-enriched transcription factories  and splicing-factor-enriched speckles . Genes locate on such sites are more actively transcribed, and could move in and out to be transcriptionally active or quiescent [4, 16]. Such movement may be an important factor in gene transcription regulation [2, 4, 17]. In Hi-C experiment, formaldehyde is used to cross-link cells, resulting in covalent links between spatially adjacent chromatin segments . This procedure is just the same as 3C and 4C, which are used to support the transcription factory mechanism, and implies the dynamic instinct of gene localization. So our result based on Hi-C could be good evidence to illustrate that transcription factories are genome-wide common in human nucleus.
Besides the transcription active sites in nucleus, chromatin modification or structure also plays a significant role in gene transcription regulation [12, 19]. One of the well-known models for such structure is open and closed chromatin [10, 12]. Under an assumption that sequence-neighboring genes have a similar chromatin structure, N. Batada et al. found that these genes have higher co-expression rates than separate ones . They also suggested that the transformation of chromatin structure between open and closed state - chromatin remodelling, is a major source of co-expression of linked genes. In the Hi-C interactions, PC is an indicator of these structure . So our results could provide common evidence that open and closed chromatin structures the regulate gene expression without N. Batada's assumption.
We have provided genome wide evidence of the correlation between Hi-C interaction and co-expression of sequence-distant TF-annotated intra-chromatin gene pairs. Our results highlight a possible general and independent effect of transcription interactome, on gene transcription regulation, and such effect may be gene-function related. However, it should be noticed that there are still some difficulties to get a definite conclusion. First, Lieberman-Aiden's study is still the only human Hi-C data available on just two cell lines . It is hard to distinguish the dynamic and stable chromatin interactions among different cell types. Second, the Hi-C data has a low resolution, that chromatins are divided into 1 M and 100 k units, and the average interactions between units are counted. One unit may include many genes. Therefore, statistics based upon it, is not prescise. We hope in the future, when more is available, the chromatin-level of gene transcription regulation would be more precisely demonstrated.
The Hi-C interaction data from Lieberman-Aiden's article can be accessed through the GEO data base with an accession no. GSE18199 . Both observed interactions (OH) and Pearson correlation of them (PC) were used in our analyses. It was pointed out that OH can be applied as a measurement of spatial distance and PC is an indicator of chromtin structure . In GEO, PC of X chromosome of K562 cell is missing, so we calculated it from the OH using the original method . Mutual ranks of gene coexpression scores are from COXPRESdb , which includes co-expression data for 19,777 genes in human. The information about human transcription factor are from the Integrated Transcription Factor Platform (ITFP, version 1.0 Aug 2008), which under current release includes 4,105 putative TFs and 69,496 potential TF-target pairs for human. And for the GO anotation of human genes, we use the GO.db package (version 2.4.1) for R (http://www.r-project.org/).
Normalized distance is defined as the sequence distance (bp) between two intra-chromasome genes, over the total length (bp) of their chromosome. To measure transcription control similarity (TCS), we use Batada's defination . They define that transcription control similarity for a given gene pair "as one minus the number of transcription factors that bind one but not both the genes, divided by the sum of the number of regulator-target interactions." .
We took 6,084 genes within both COXPRESdb  and ITFP  for analyses. There are 960,507 intra chromatin gene pairs from them. The Hi-C interaction of gene pairs is calculated by the weighted average of interactions of 1 M chromatin units where genes locate on. We calculated Hi-C interaction rank difference of gene pairs of the two cell types, and choose 273,458 pairs with rank difference <5% for further analyses.
For the 273,458 pairs, we tested pair-wise correlation between Hi-C interaction, mutual ranks of gene co-expression rate, TCS and normalized distance. Then we choose 116,834 pairs with normalized distance ≥0.2, and again tested the above correlations. And we further divide the pairs into two groups according to their TCS values, equal or not to zero, to test the correlations. For all correlation analysis (all figures and correlation test in the article), we divided all samples into 20 groups according to their horizontal axis values. The number of samples in each group is expected to be similar. However, we find that in several figures some groups are missing because there are too many samples with a same x-axis value. (In Figure. 2B, group 3, 5, 7, 9, 11 & 14 are missing. In Additional File 5B, group 3, 5, 7, 9, 11 & 14 are missing. In Additional File 6B, group 3, 5, 7, 9 & 12 are missing.) Group's average values for both horizontal and vertical values are calculated for analysis. Because gene pairs are significantly not equally distributed according to the x-axis values, so we do not divide them by using the same size of interval of x-axe as many studies did.
We thank Prof. Xiangyin Kong, Dr. Guangyong Zheng, Mr. Zhen Wang, Mr. Jingxuan Zhang, and Ms. Tingyan Zhong for their helpful comments and suggestions. This research was supported by grants from National High-Tech R&D Program (863) (2006AA02Z334, 2007DFA31040), State key basic research program (973) (2006CB910705, 2010CB529206, 2011CBA00801), Research Program of CAS (KSCX2-YW-R-112, KSCX2-YW-R-190), National Natural Science Foundation of China (30900272) and Special Start-up Fund for CAS President Award Winner (to G. Ding).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.