Spatial compartmentalization at the nuclear periphery characterized by genome-wide mapping
© Wu and Yao; licensee BioMed Central Ltd. 2013
Received: 13 November 2012
Accepted: 27 August 2013
Published: 30 August 2013
Skip to main content
© Wu and Yao; licensee BioMed Central Ltd. 2013
Received: 13 November 2012
Accepted: 27 August 2013
Published: 30 August 2013
How gene positioning to the nuclear periphery regulates transcription remains largely unclear. By cell imaging, we have previously observed the differential compartmentalization of transcription factors and histone modifications at the nuclear periphery in mouse C2C12 myoblasts. Here, we aim to identify DNA sequences associated with the nuclear lamina (NL) and examine this compartmentalization at the genome-wide level.
We have integrated high throughput DNA sequencing into the DNA adenine methyltransferase identification (DamID) assay, and have identified ~15, 000 sequencing-based Lamina-Associated Domains (sLADs) in mouse 3T3 fibroblasts and C2C12 myoblasts. These genomic regions range from a few kb to over 1 Mb and cover ~30% of the genome, and are spatially proximal to the NL. Active histone modifications such as H3K4me2/3, H3K9Ac and H3K36me3 are distributed predominantly out of sLADs, consistent with observations from cell imaging that they are localized away from the nuclear periphery. Genomic regions around transcription start sites of expressed sLAD genes display reduced association with the NL; additionally, expressed sLAD genes possess lower levels of active histone modifications than expressed non-sLAD genes.
Our work has shown that genomic regions associated with the NL are characterized by the paucity of active histone modifications in mammalian cells, and has revealed novel connections between subnuclear gene positioning, histone modifications and gene expression.
The nuclear periphery is a unique subcompartment of the nucleus that consists of the inner nuclear membrane (INM) and its associated proteins, nuclear lamina (NL) and chromatin [1, 2]. A number of genes were found to preferentially localize to the nuclear periphery, and re-position upon transcription activation or cellular differentiation [3–5]. Nonetheless, how gene positioning to the nuclear periphery confers regulatory functions remains largely unclear. Tethering reporter genes to the nuclear periphery led to repression of some, but not all, transgenes as well as several proximal native genes, indicating that roles for the nuclear periphery in transcription are not obligatory but rather occur on a gene-by-gene basis [6–10]. Characterizing the features of chromatin at the nuclear periphery will help to better understand the mechanistic links between subnuclear gene positioning and transcription regulation.
Interactions of chromatin with the NL have been studied by microscopic approaches , and alterations to chromatin at the nuclear periphery are often associated with aging  or with the presence of lamin mutants . Recently, genome-wide mapping has become a powerful approach in identifying chromatin domains associated with the NL. In particular, DNA adenine methyltransferase identification (DamID) assay has successfully identified large lamina-associated domains (LADs) covering about 40% of the genome in human and mouse cells [14, 15]. DamID assay using INM proteins such as Emerin yielded binding profiles that were similar to Lamin B1 , indicating that the identified genomic regions largely represent those regions that are preferentially positioned to the nuclear periphery. In mouse cells, LADs are 40 Kb to 15 Mb in size and have lower gene densities, lower transcription activities, and lower occupancy of active histone modifications than non-LAD regions of the genome . The boundaries of LADs are enriched with CTCF binding sites, CpG islands, and promoters transcribing away from the LADs . A recent study has dissected a LAD at the immunoglobulin heavy chain locus and identified DNA sequences that can direct transgenes to the NL in mouse fibroblasts. These sequences are enriched with a GAGA motif and associated with transcription corepressor cKrox and histone deacetylase HDAC3 . While LADs are generally considered as repressive chromatin domains, it is not clear how these domains play regulatory roles on gene transcription.
Cellular imaging and genomic analyses have revealed substantial compartmentalization of the nucleus, and have suggested that this compartmentalization functions in gene regulation [17–19]. Gene transcription at the nuclear periphery is also under the influence of the compartmentalization of chromatin domains and regulatory factors [20–22]. For example, our previous work on the mouse myogenic differentiation system has revealed that the non-canonical TFIID subunit, TAF3, but not other transcription components such as RNA polymerase II (Pol II), TAF4 and TBP, is localized away from the nuclear periphery where the key myogenic gene MyoD is located . This differential localization is correlated with the differential occupancy of TAF3 versus TFIID at the MyoD promoter. Additionally, we noticed that H3K4me3 — the histone modification labeling active promoters — was localized away from the nuclear periphery, and that recognition of H3K4me3 appeared to be required for the subnuclear localization of TAF3 . Another recent study in the myogenic system revealed that transcription repression by Msx1 required recruiting Polycomb to the nuclear periphery and the enrichment of the repressive histone modification H3K27me3 on target genes . Thus, it would be very interesting to further characterize the features of chromatin at the nuclear periphery by examining localizations of additional histone modifications and their genome-wide distributions relative to DNA sequences associated with the NL. The recent development of high throughput DNA sequencing  may reveal additional features of lamina-associated regions with improved sensitivity and resolution. We anticipate that bridging cellular imaging and molecular mapping approaches will present a unique opportunity to cross-examine the compartmentalization of histone modifications at both subcellular and genome-wide levels.
In this paper, we have generated the initial maps of genome-NL interactions in mouse 3T3 fibroblasts and C2C12 myoblasts by DamID assay coupled to high throughput DNA sequencing. We have identified sequencing-based LADs (sLADs) that cover ~30% of the genome and represent the genomic regions with significant NL association. In 3T3 fibroblasts, the sLADs largely overlap with the LADs reported previously , and range from a few kb to over 1 Mb in size. These newly constructed high-resolution DamID maps allow the examination of NL association within the structure of genes. For example, we have detected substantially lower NL association around the transcription start sites (TSSs) of expressed sLAD genes. Furthermore, by cell imaging and genome analyses, we have confirmed that several active histone modifications, such as H3K4me2, H3K9Ac and H3K36me3, are localized away from the nuclear periphery and are predominantly distributed outside of sLADs. These findings further support the notion that chromatin at the nuclear periphery is characterized by the paucity of active histone modifications, which may have implications on how peripheral gene positioning affects transcription regulation in mammalian cells.
The published genome-NL interaction maps were constructed using DamID assay followed by analysis on genome tiling arrays [14, 15] (referred to as Dam-chip hereafter). In this study, we developed the Dam-seq method that couples next-generation sequencing (NGS) to DamID assay. First, we tethered mouse Lamin B1 to E. coli DNA adenine methyltransferase (Dam) and expressed the fusion protein in mouse cells. In eukaryotic cells that lack endogenous Dam proteins, the tethered Dam is capable of methylating adenines in GATC sequences that are in close spatial proximity . This fusion protein was successfully used to identify genomic regions associated with the NL in mouse cells , and we have confirmed its localization at the nuclear periphery by immunofluorescence staining (not shown). Next, we infected 3T3 mouse embryonic fibroblast (MEF) cells and C2C12 myoblast cells with the lentivirus expressing Dam-Lamin B1 at a very low level , purified the genomic DNA from infected cells, specifically amplified the methylated DNA fragments and sequenced them directly. The same protocol was applied to cells expressing free Dam proteins and the sequencing data served as a control to correct for local DNA accessibilities  and sequencing biases. We prepared two replicates for each fusion protein (or free Dam) in each cell type and sequenced the DNA libraries separately. Only sequencing reads that were uniquely mapped to chromosome 1–19 and X were retained, and potential PCR duplicates were removed. All pairs of replicates have highly correlated read densities along the genome (Pearson correlation coefficients >0.90, p-value <2e-16) and thus were combined for the follow-up analyses. In total, we obtained 43.3 million and 49.0 million reads from 3T3 MEF cells expressing Dam-Lamin B1 (MEF LmnB1) and free Dam (MEF Dam), respectively, and obtained 66.3 million and 41.9 million reads from C2C12 myoblast cells expressing Dam-Lamin B1 (myoblast LmnB1) and free Dam (myoblast Dam), respectively.
Summary and statistics of LADs/sLADs in MEFs and myoblasts
Non-LADs /non-sLADs genome coverage (%)
Undetermined regions genome coverage (%)
LADs/ sLADs genome coverage (%)
Min. size (Kbp)
Max. size (Kbp)
Median size (Kbp)
Mean size (Kbp)
We have validated the Dam-seq method by the following analyses using the data from 3T3 MEFs. First, to estimate the depth of sequencing, we identified sLADs from randomly sampled sequencing reads and plotted the sLAD genome coverage versus the number of sampled reads (Figure 1B). The increment of sLAD genome coverage was less than 1% of the genome per five million reads when the input LmnB1 reads were over 25 million, and the genome coverage was close to saturation with ~40 million LmnB1 reads (Figure 1B). This simulation result suggests that the vast majority of the sLADs have been identified with the current sequencing depth (43.3 million 3T3 LmnB1 reads and 66.3 million C2C12 LmnB1 reads). To examine whether the sLADs identified at lower sequencing depths were concordant with those identified at the highest sequencing depth (the full dataset), we plotted percentages of concordant and missed 2 kb sLAD windows at each sampled sequencing depth (Figure 1C). Over 98% of the 2 kb sLAD windows identified at each lower sequencing depth stay as sLADs based on the full dataset, while the percentage of missed 2 kb sLAD windows decreases quickly as the sequencing depth gets higher (Figure 1C). Therefore, increasing sequencing depth retains the vast majority of the sLADs and helps to identify sLADs that are missed at a lower sequencing depth.
Second, similar to the Dam-chip method that measures NL association by log2 fluorescent intensity ratios of LmnB1 over Dam (log2 chip ratios) , we computed log2 read density (RPKM, reads per kilo base per million mapped reads) ratios of LmnB1 over Dam (log2 seq ratios) for all the non-overlapping, contiguous 2 kb windows along the genome. The chromosome-wide profiles of log2 seq ratios and log2 chip ratios are fairly similar (Figure 1A, Spearman correlation coefficient 0.74 and p-value <2e-16). The log2 seq ratios range from −7.3 to 7.7 (−4.0 to 3.8 for 5 to 95 percentiles) while the log2 chip ratios range from −5.4 to 4.9 (−1.6 to 1.9 for 5 to 95 percentiles), suggesting that the sequencing method reports a higher dynamic range of NL association.
Next, we compared the subnuclear localizations of three sLAD-only regions and two LAD-only regions in 3T3 fibroblasts (Figure 3C-D and Additional file 2: Figure S2). All three sLAD-only regions were confirmed to be spatially proximal to the NL (50%–80% of loci located within 0.5 μm to the NL, Figure 3C-D), while the two LAD-only regions were found to be less proximal to the NL (20–30% of loci located within 0.5 μm to the NL, Figure 3C-D). Therefore, at least some of the sLAD-only regions (not identified as LADs by the Dam-chip method) are indeed spatially proximal to the NL and some LAD-only regions (not detected as sLADs by our Dam-seq method) are less proximal to the NL. In summary, all the above FISH analyses have supported the validity of our newly introduced Dam-seq method in identifying genomic regions that are positioned close to the NL.
In this study, we have combined the DamID assay and high throughput DNA sequencing to measure genome-NL interactions, and have developed a robust bioinformatic approach that is suitable for identifying genomic regions associated with Lamin B1 (named sLADs). In MEF cells, sLADs largely overlap with the LADs previously identified using the DNA microarray approach . Comparisons of sLADs and gene expression data in MEFs and myoblasts support a negative correlation (Additional file 3), consistent with previous reports . The method demonstrated here is applicable to identifying NL-associated regions in other mammalian cell types or differentiation systems, as well as to identifying genomic regions specifically associated with individual INM proteins and their mutants.
We have noticed that among sLAD genes, NL association is decreased in TSS proximal regions and that the decrease in NL association appears to correlate with gene expression. Highly sensitive, direct measurements of transcription are needed in order to define more precisely the relationship between gene expression and NL association. The reduced NL association in TSS proximal regions can be due to their increased spatial separations from the NL, or their increased chromatin accessibilities, and it is likely that both scenarios may contribute to this observed phenomenon. It is also intriguing to speculate that the reduced NL association in TSS proximal regions may provide a local environment amenable for recruiting transcription machineries and transcription initiation.
We have performed parallel examinations on the nuclear compartmentalization of several key histone modifications in C2C12 myoblasts by cellular imaging and a combined analysis of ChIP-Seq and Dam-Seq data. Histone modifications (such as H3K27me3) that are localized throughout the nucleoplasm including the nuclear periphery are distributed both within and outside of sLADs with similar densities. In contrast, active histone modifications that are largely located away from the nuclear periphery are also predominantly distributed outside of sLADs (Figure 6). These highly consistent results indicate that in C2C12 myoblasts, chromatin at the nuclear periphery is largely characterized by the paucity of active histone modifications. It remains an open question about chromatin features at the nuclear periphery in other cell types such as stem cells and terminally differentiated cells. For instance, it was observed that the repressive H3K27me3 mark is enriched at the nuclear periphery in mouse embryonic stem cells . It would also be interesting to determine whether similar spatial compartmentalization also exists for protein factors associated with these histone modifications (such as TAF3 reported in the previous study ). The molecular and imaging methods demonstrated here and in the previous report  may be used for future investigations.
By coupling next generation sequencing to the DamID assay, we have identified sLADs that are genomic regions with significant association to the NL, ranging from a few kb to over 1 Mb and covering about 30% of the mouse genome. Single cell imaging and genomic analyses have provided consistent evidence on the paucity of several active histone modifications at the nuclear periphery both at the microscopic and the molecular levels in C2C12 myoblasts. Comparing sLAD genes and non-sLAD genes has revealed distinct histone modification levels and NL association that correlate with gene expression states. Our work may give clues on how gene positioning to the nuclear periphery (via NL association) affects transcription regulation in mammalian cells.
NIH-3T3 mouse fibroblasts were obtained from ATCC (CRL-1658) and were grown in DMEM + 10% newborn calf serum. C2C12 mouse myoblasts were grown in DMEM + 10% fetal bovine serum. Both cells were grown at a 37°C incubator with 5% CO2.
DamID was performed using the lentivirus transduction protocol as previously described . Briefly, mouse Lamin B1 cDNA was cloned into the pLGW-dam-V5 vectors kindly provided by the Van Steensel lab. Lentivirus with Dam-V5 and Dam-V5-LmnB1 vectors was generated by the Virapower Lentivirus Expression system (Invitrogen) and was stored at −80°C before use. The lentivirus was diluted with the growth medium and incubated with the cells for two days before changing to the growth media. On day 3 after infection, genomic DNA was isolated using DNeasy blood and tissue kit (Qiagen). The isolated genomic DNA was subject to DpnI digestion, ligation of DamID adaptors and DpnII digestion. Samples without addition of DpnI or adaptors were prepared in parallel as negative controls. Then a 5 μl aliquot of the DpnII digestion was used as template in a 50 μl reaction to amplify methylated genomic fragments.
To prepare for NGS, the above PCR products were purified and digested by DpnII again to remove DamID adaptors, and fragmented by NEBNext® dsDNA Fragmentase (New England Biolabs). Because the methyl PCR products are a smear of 200~2000 bp and the appropriate fragment size for sequencing is ~200 bp, we experimentally determined the appropriate time of fragmentation. For each DNA sample, we performed six reactions, each fragmenting 1μg DNA with 1 μl Fragmentase, incubated them at 37°C for 25 to 50 minutes at an interval of 5 minutes and pooled all for purification. The fragmented DNA was further subject to end repair, addition of “A” bases, ligation of NGS adaptors, gel selection of 300 bp fragments and PCR enrichment according to the protocol provided by Yale Center for Genome Analysis (YCGA, http://ycga.yale.edu/sequencing/Illumina/protocols.aspx) with reagents ordered from NEB. The sequencing was performed by YCGA using Illumina’s HiSeq 2000 sequencing system. The sequencing reads and the alignments (see the next section) have been deposited in the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) with accession number GSE41583.
All 75 bp single-end reads passing the quality controls from YCGA were mapped onto the mouse genome assembly mm9 using Bowtie2-2.0.0-beta5  with the following command: bowtie2 –phred3 –× mm9 –U <in.fastq> −S <out.sam>. Only sequencing reads uniquely mapped to chromosome 1–19 and X with mapping quality ≥ 40 were used for further analyses. Potential PCR duplicates were removed using SICER . Read counts in all 10 Kb non-overlapping contiguous windows along the genome were used to estimate the Pearson correlation coefficient of the two biological replicates.
We used SICER  to identify genomic regions enriched with methylated DNA sequences. SICER is a bioinformatic software designed to analyze diffuse ChIP-seq signals. It partitions the genome into non-overlapping, contiguous windows of user-specified size and records the NGS read counts in each window for the sample library. A Poisson distribution (the parameter λ as the average read count per window over the genome) is then used as the background model to score each window and identify “eligible” windows that have read counts significantly higher than the random background. Then SICER combines windows separated by gaps less than a predetermined size in order to compensate for unsaturated sequencing. Next, after normalizing both sample and control libraries to the same size (one million reads), the significance of enrichment is determined for each of the “eligible” windows or domains (a cluster of contiguous windows) relative to the control library. The false discovery rate (FDR) is also calculated to correct for multiple testing.
SICER_v1.1 was downloaded from http://home.gwu.edu/~wpeng/Software.htm, and the script SICER.sh was run with the following parameters: redundancy threshold = 1, effective genome fraction = 0.81, window size = 2000, gap size = 6000, FDR = 0.001, E-value=0.1. We chose the window size of 2 kb based on the fact that the methylation of a Dam protein may spread up to 5 kb from its binding site , and chose the gap size of 6000 bp to account for the potential lack of sequencing depth. The effective genome fraction is the uniquely mappable proportion of the genomic sequences of a certain size, and is used to calculate the background parameter λ. The value we chose here corresponds to the 75 bp reads . Based on the SICER-reported normalized read count profiles (the library size was normalized to one million reads) for all non-overlapping, contiguous 2 kb windows along the genome, we termed windows with neither LmnB1 nor Dam reads “undetermined regions” that usually represent genome assembly gaps or highly repetitive DNA segments. It should be noted that these regions contain very few genes or ChIP-seq peaks of histone modifications, thus their exclusion from analysis does not affect the conclusions in the paper. The last window in each chromosome which is smaller than 2 kb was not processed by SICER and therefore was considered as “undetermined regions” too. To calculate the log2 seq ratio for each corresponding 2 kb window (Figure 1A and Additional file 2: Figure S1), we normalized the read density in each window to reads per kilobase per million uniquely mapped reads (RPKM), set a small pseudo-count (p) as the fifth percentile of the combined RPKM values of LmnB1 and Dam, then computed the scaled log2 seq ratio as log2 ((LmnB1 RPKM + p)/(Dam RPKM + p)). The ChIP-seq peaks were assigned to the same set of 2 kb windows based on the peak center and then used to depict their distribution relative to sLADs (Figure 6). The 2 kb window containing the TSS and TES of each mouse gene was determined in order to fetch the data upstream and downstream for gene-related analyses (Figure 4A, Figure 7, Additional file 2: Figure S3 and S4). All the above analyses were implemented by R and Perl scripts.
For the Spearman correlation test between Dam-chip and Dam-seq, the log2 chip ratio of each 60 bp probe was assigned to the above set of 2 kb windows by the probe location. One probe may overlap with and thus be assigned to two adjacent windows, and values of multiple probes in the same window were averaged. Then the Spearman correlation coefficient was calculated between the log2 chip ratios and the log2 seq ratios.
For the analyses of sequencing depths and sLADs concordance (Figure 1B-C), samples of various sizes (each with three replicates) were generated from the entire dataset of MEF LmnB1 by the “sample” function in R. The same set of SICER parameters as described earlier was used for sLAD identification.
First, we downloaded tables named knownGene, kgXref, knownToLocusLink from UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start; ) on May 21, 2012 and merged the three tables by the known gene IDs, which resulted in a list of 55105 transcripts. Second, after grouping transcripts with the same gene symbol, we removed incompatible cases in which transcripts are located in both plus and minus strands or in which some shorter transcripts have no overlaps with the longest transcript. Finally, we retained the longest transcripts for each gene symbol for further analyses. The resulted gene list contains 29235 genes on chromosome 1–19 and X.
For a subset of 29235 genes (19269 genes), expression data of myoblasts are available from GEO database under the accession number GSE19968 . We downloaded supplementary files for the three replicates of C2C12 myoblast expression profiling arrays (GSM499013-15) which contain the processed array data by Agilent Feature Extraction Version 10.1.1 . Based on the online instruction (http://www.genomics.agilent.com/files/Manual/G3300-90001_FE_Plugin_A1.pdf), the data from an array probe contained a number of GeneSpring flags, each of which had three levels: Absent (A), Marginal (M) or Present (P). We identified a transcript (detected by a probe) as present only when all flags were shown as P. A gene was considered as expressed only when the transcripts were present in all three replicates of arrays. Custom Perl scripts were written to implement the task. The parsed expression states were used in Figure 4A, Figure 7, and Additional file 2: Figure S3 and Figure S4.
Immunostaining of histone modifications, DNA immuno-FISH and imaging analyses were carried out in C2C12 myoblasts as described previously . Antibodies used are the following: anti-LmnB (Santa Cruz, sc-6217), anti-H3K4me1 (Abcam, ab8895), anti-H3K4me2 (Abcam, ab32356), anti-H3K4me3 (Abcam, ab8580), anti-H3K9Ac (Abcam, ab4441), anti-H3K9me2 (Millipore, 07–441), anti-H3K27me3 (Millipore, 07–449), anti-H3K36me3 (Abcam, ab9050) and anti-H3K79me2 (Abcam, ab3594).
DNA adenine methyltransferase identification
Sequencing-based lamina-associated domain
Inner nuclear membrane
RNA polymerase II
Transcription start site
Next generation sequencing
Mouse embryonic fibroblast
DNA adenine methyltransferase
Fluorescence in situ hybridization
Transcription end site
False discovery rate
Reads per kilobase per million mapped reads.
We thank Dr. Bas van Steensel for providing the DamID vectors (The Netherland Cancer Institute, The Netherlands), Laura Viggiano for cloning the dam-V5-LmnB1 construct, Dr. Weiqun Peng (George Washing University, USA) for suggestions on SICER, and Drs. Topher Carroll, Tae Hoon Kim and Andrew Xiao (Yale University, USA) for critical reading on this manuscript. This work was supported by the startup funding from Yale School of Medicine and a Scientist Development Grant from American Heart Association (12SDG11630031).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.