Redistribution of H3K4me2 on neural tissue specific genes during mouse brain development
© Zhang et al.; licensee BioMed Central Ltd. 2012
Published: 17 December 2012
Skip to main content
© Zhang et al.; licensee BioMed Central Ltd. 2012
Published: 17 December 2012
Histone modification plays an important role in cell differentiation and tissue development. A recent study has shown that the dimethylation of lysine 4 residue on histone 3 (H3K4me2) marks the gene body area of tissue specific genes in the human CD4+ T cells and neural cells. However, little is known of the H3k4me2 distribution dynamics through the cell differentiation and tissue development.
We applied several clustering methods including K-means, hierarchical and principle component analysis on H3K4me2 ChIP-seq data from embryonic stem cell, neural progenitor cell and whole brain of mouse, trying to identify genes with the H3K4me2 binding on the gene body region in different cell development stage and study their redistribution in different tissue development stages.
A cluster of 356 genes with heavy H3K4me2 labeling in the gene body region was identified in the mouse whole brain tissue using K-means clustering. They are highly enriched with neural system related functions and pathways, and are involved in several central neural system diseases. The distribution of H3K4me2 on neural function related genes follows three distinctive patterns: a group of genes contain constant heavy H3K4me2 marks in the gene body from embryonic stem cell stage through neural progenitor stage to matured brain tissue stage; another group of gene have little H3K4me2 marks until cells mature into brain cells; the majority of the genes acquired H3K4me2 marks in the neural progenitor cell stage, and gain heavy labeling in the matured brain cell stage. Gene ontology enrichment analysis also revealed corresponding gene ontology terms that fit in the scenario of each cell developmental stages.
We investigated the process of the H3K4me2 mark redistribution during tissue specificity development for mouse brain tissue. Our analysis confirmed the previous report that heavy labeling of H3K4me2 in the downstream of TSS marks tissue specific genes. These genes show remarkable enrichment in central neural system related diseases. Furthermore, we have shown that H3K4me2 labeling can be acquired as early as the embryonic stem cell stage, and its distribution is dynamic and progressive throughout cell differentiation and tissue development.
Post-translational modifications of histones play important roles in regulating DNA activities and gene expressions in eukaryotic cells [1–4]. In the nucleus, chromatin forms basic units called nucleosomes which constitute an octamer with eight different histone protein molecules and a 146bp DNA wrapped around it . Each histone molecule has two tails which contain amino acid residues subject to a variety of modifications such as methylation, acetylation, phorsphorylation, and ubiquitination. Such modifications can affect (promote or repress) the accessibility of the DNA to RNA polymerase during gene transcription and thus can activate or silence certain genes [1, 3, 5]. More importantly, some of such modifications (e.g., the methylation on certain lysine and arginine residues on the N-terminal side) are stable during cell division and considered important epigenetic events. In the past few years, with the rapid progress in high throughput technologies such as microarray and massive parallel sequencing, many exciting new discoveries have been made on studying the genome-wide binding landscapes of different histone modification marks using ChIP-chip and ChIP-seq methods [1, 6–11]. In these studies, researchers discovered the combination of different histone modifications mark different chromatin states which often correspond to different genomic annotations such as promoters and enhancers [1, 11–13]. Such observations have led to discoveries and predictions of many new regulatory sites on the genome [13, 14]. In addition, the ChIP-chip and ChIP-seq technologies enable researchers to investigate the distribution of the histone marks (and other proteins) on the genome at a high resolution, which led to the discovery that the binding patterns of the proteins and histone marks over the genome are also highly informative in predicting genomic functions and annotations [13, 15–17]. An interesting discovery along this line is the relationship between H3K4me2 binding patterns and tissue specific genes .
H3K4me2 (dimethylation of the lysine residue at 4th position on the N-terminal tail of histone 3) is an important histone mark and has been shown to bind to both gene promoter regions and enhancer regions and its binding is often associated with gene activation. A recent study demonstrated that in human CD4+ T cell and neural tissue, H3K4me2 shows specific binding patterns on the tissue-specific genes in their transcribed regions . In this study, the focus is on a 10kb range of DNA covering 2kb upstream of the transcription starting site (TSS) and 8kb downstream of the TSS for the genes. The H3K4me2 binding patterns over this range for all the genes are then extracted and K-means algorithm is applied to cluster these patterns into five groups. Interestingly, a group distinct itself from others as this group not only shows a bimodal distribution around the TSS, it also uniquely shows high binding quantities (or long tail) over the 8kb downstream of the TSS. Genes in this group are then shown to be highly enriched with tissue specific genes.
However, the recent characterization of histone mark distribution on tissue specific genes was carried out in matured tissue cells, little is known about the dynamic change of histone modification during the cell differentiation and development, neither is it known that at what developmental stage the H3K4me2 marking is acquired. It is generally accepted that the histone modifications play an important role in the cell differentiation and tissue development . Here we are using the public available mouse H3K4me2 ChIP-seq data from different developmental stages of neural cells, namely embryonic stem cells, neural progenitor cells, and whole brain cells , and try to address the question that if the same tissue-specific patterns of H3K4me2 present in the neural cells, and if so, how early it emerges and how it redistributes during the developmental process.
In this paper, we applied detailed clustering analysis on the H3K4me2 distribution patterns around the TSS regions of genes in the whole genome. Our analysis confirmed the previous report that heavy presence of H3K4me2 in the downstream of TSS marks tissue specific genes. However, more importantly, we have shown that such patterns are not constant. Instead, redistribution of H3K4me2 occurs dynamically in all stages of tissue development.
In addition, our results shed light on understanding the acquisition of tissue specificity during the developmental process. For the 356 brain tissue specific genes, their H3K4me2 distribution patterns around the TSS regions over the three developmental stages can clearly be divided into three major groups. The GO enrichment analysis of the three groups of genes indicates that a small portion of neural system-related development and functions was activated or prone to be activated as early as the embryonic stem cell stage, as shown by the H3K4me2 marks on a small group of genes, whereas most of the neural system-related functions were developed or scheduled to develop through the neural progenitor cell stage, as indicated by the H3K4me2 marks on a larger set of genes in the NP stage. There is a group of genes acquired heavy H3K4me2 labeling downstream of TSS only when the matured brain cell is formed, and consistently cell adhesion activity such as synapse contact formation become the dominate process. This H3K4me2 re-distribution pattern fits very well with the developmental process of neural system.
An interesting observation about these three groups is the gene involvement in neural degenerative and neurological diseases as well as mental disorders including Alzheimer early onset, brain dementia, and schizophrenia. Most of genes associated with these diseases are in the group 1 (with heavy H3K4me2 in ES stage) or group 3 (with heavy H3K4me2 acquired in NP stage). In addition, GO function analysis also suggests that groups 1 and 3 genes are more involved in essential neural functions such as transmission of nerve impulse and synaptic transmission while the group 2 (with heavy H3K4me2 only acquired in the latest stage) are more involved in developing and maintain synapse structures as a mature brain function. These observations imply that these disease-related genes are more involved in the essential neural tissue functions than in the advanced brain activities.
Besides H3K4me2, many other histone marks as well as epigenetic events such as DNA-methylation are also involved and more insight about the differentiation process can be gained with more data available. In addition, the brain tissue data in this study is obtained from whole brain sample even though it is well known that different anatomical and functional regions of the brain have distinctive gene expression patterns and more refined analysis of region and function specific genes could be identified using similar approach as in this paper.
In summary, we investigated the process of the H3K4me2 mark redistribution during tissue specificity development for mouse brain tissue. Our analysis confirmed the previous report that heavy labeling of H3K4me2 in the downstream of TSS marks tissue specific genes. These genes are not only highly enriched with neural system functions and pathways, but also highly involved in CNS diseases. Furthermore, we have shown that such labeling can be acquired as early as the embryonic stem cell stage, and its distribution is dynamic throughout cell differentiation and tissue development. The distribution of H3K4me2 on neural function related genes follows three distinctive patterns: a group of genes contain constant heavy H3K4me2 marks in the gene body from ES through NP to matured brain tissue stage; another group of gene have little H3K4me2 marks until cells mature into brain cells; the majority of the genes acquired H3K4me2 marks in the NP stage, and gain heavy labeling in the matured brain cell stage. GO enrichment analysis also revealed corresponding GO terms that fit in the scenario of each cell developmental stages.
Three H3K4me2 ChIP-seq datasets are downloaded from NCBI Gene Expression Omnibus with accession number GSE11172 including murine embryonic stem (ES) cells, ES-derived neural progenitor (NP) cells, and whole brain (WB) tissues . The 36-bp short reads were generated using Illumina GA sequencing platform. In our analysis, we extended the short reads by 200bp, and histograms of the extended reads counts over a 20-kb TSS region (+/- 10kb of TSS) for each entry from the RefSeq mm8 database (obtained from UCSC website) were computed using bin size 20bp. Unsupervised clustering method K-means was used to cluster each individual TSS region profiles from the three origins (ES, NP and WB) using k = 7, repeats = 50, distance = square Euclidean as implemented in Matlab. We empirically selected k = 7 based on visual inspection of clustering results from different values of k.
In WB tissue, the cluster with an elevated H3K4me2 marking downstream of TSS was further divided into four clusters using K-means (k = 4, repeats = 50, distance = square Euclidean) to obtain a cluster of 356 genes with highly elevated H3K4me2 labeling downstream of TSS. This set of genes was compared with clusters obtained from ES and NP samples, and largely overlapped genes and clusters were further subject to GO enrichment analysis using the MetaCore software as well as the NIH DAVID tool.
At the same time, the combined TSS profiles of the 356 genes from all three origins were clustered using hierarchical clustering in Matlab and three clusters were selected based on visual inspection: the cluster with constant H3K4me2 labeling from ES through NP to matured brain; the cluster with minimal H3K4me2 labeling in ES and NP stage, but gets heavy H3k4me2 labeling in brain tissue; and the cluster with minimal H3K4me2 labeling in ES stage, but gradually gains H3K4me2 labeling as the cells differentiate into NP cells and mature as brain tissue. Each cluster of genes was subject to gene set enrichment and network analysis using bioinformatics software MetaCore™ by GeneGo (http://www.genego.com/metacore.php), as well as Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/). Principal Component Analysis (PCA) was performed on the same combined data using Matlab code.
dimethylation of lysine 4 residue on histone 3
transcription start site
chromatin immunoprecipitation sequencing
central nerve system
principal component analysis
This article has been published as part of BMC Genomics Volume 13 Supplement 8, 2012: Proceedings of The International Conference on Intelligent Biology and Medicine (ICIBM): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S8.
This work was supported by NIH 1U01 GM092655-01.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.