Harnessing changes in open chromatin determined by ATAC-seq to generate insulin-responsive reporter constructs
BMC Genomics volume 23, Article number: 399 (2022)
Gene regulation is critical for proper cellular function. Next-generation sequencing technology has revealed the presence of regulatory networks that regulate gene expression and essential cellular functions. Studies investigating the epigenome have begun to uncover the complex mechanisms regulating transcription. Assay for transposase-accessible chromatin by sequencing (ATAC-seq) is quickly becoming the assay of choice for many epigenomic investigations. However, whether intervention-mediated changes in accessible chromatin determined by ATAC-seq can be harnessed to generate intervention-inducible reporter constructs has not been systematically assayed.
We used the insulin signaling pathway as a model to investigate chromatin regions and gene expression changes using ATAC- and RNA-seq in insulin-treated Drosophila S2 cells. We found correlations between ATAC- and RNA-seq data, especially when stratifying differentially-accessible chromatin regions by annotated feature type. In particular, our data demonstrated a weak but significant correlation between chromatin regions annotated to enhancers (1-2 kb from the transcription start site) and downstream gene expression. We cloned candidate enhancer regions upstream of luciferase and demonstrate insulin-inducibility of several of these reporters.
Insulin-induced chromatin accessibility determined by ATAC-seq reveals enhancer regions that drive insulin-inducible reporter gene expression.
Gene regulation is essential to the development and maintenance of life. Gene regulatory networks describe the interplay between regulatory regions, such as promoters and enhancers, and expression of their target genes . Deciphering how specific regulatory regions control gene transcription can provide insights into biological processes such as cell type differentiation [2, 3], responses to addictive substances , and other cell functions.
The advent of new sequencing techniques has led to a greater understanding of how genes are differentially expressed. RNA-seq has provided a broader and more detailed picture of complex transcriptional states and responses [5, 6]. While genome-wide RNA-seq experiments can yield information on the many genes that are differentially transcribed in different conditions, these rich datasets reveal little about the regulatory mechanisms involved in directing these expression changes.
Epigenomic assays such as chromatin immunoprecipitation (ChIP-seq), DNAse-seq, and assay for transposase-accessible chromatin by sequencing (ATAC-seq) can interrogate chromatin accessibility and identify transcription factor binding sites [7,8,9]. The relationship between chromatin accessibility and transcription is complicated. Previous studies show little overlap between corresponding differences in chromatin and transcription [10,11,12], which highlights the complex interactions between the chromatin state and downstream gene expression. Furthermore, few studies have analyzed if changes in open chromatin induced by an intervention occur in transcriptional enhancers that can be coupled to heterologous minimal promoters to engineer intervention-inducible reporter constructs.
Here, we sought to characterize the relationship between ATAC-seq and RNA-seq data in more detail, with particular focus on whether intervention-induced changes in chromatin accessibility can accurately predict gene expression. We used the insulin signaling pathway as a model because the insulin receptor activates multiple downstream signaling pathways [13,14,15], resulting in widespread changes to the chromatin state  and gene expression . Our data from Drosophila S2 cells show that ATAC-seq and RNA-seq datasets are correlated, mainly driven by the ATAC-seq peaks/reads located in gene promoter regions. We also show that DNA regions with increased accessibility after insulin treatment can be harnessed to generate insulin-inducible reporter constructs.
ATAC-seq and RNA-seq changes in insulin-exposed S2 cells
To investigate the concordance in changes in gene expression and chromatin accessibility, we exposed serum-starved Drosophila S2 cells to insulin or vehicle and harvested the cells 4 hours later for ATAC-seq and RNA-seq analysis. We determined genome-wide changes in open chromatin by ATAC-seq and identified 9726 high-confidence peaks (i.e., regions of accessible DNA mapped to the nuclear genome) in the insulin-exposed S2 cells and 9560 in the vehicle-exposed S2 cells. Merging the control and experimental peak sets resulted in 10,269 peaks. The largest variance in this dataset (6 samples; 2 treatments × 3 replicates) arose from insulin treatment, as shown by principal component analysis (PCA; Fig. 1A). In parallel, we identified 10,287 transcripts in vehicle- and insulin-exposed S2 cells using RNA-seq. PCA indicated that the largest variance between the 6 samples resulted from insulin treatment (Fig. 1B). Because ATAC-seq provides a view of chromatin accessibility along all features of genes, we evaluated the feature distribution of both the treatment and the control ATAC-seq data (Fig. 1C, D). We observed the same genome features in the control and treated data, but the relative proportion of features was significantly different (χ2 = 19.6, df = 10, p = 0.03). This difference largely resulted from a change in the proportion of peaks annotated to distal (1-2 kb from the TSS) and proximal (≤1 kb) promoters, which increased from 8 to 9% and 58 to 60%, respectively (Fig. 1C), though the proportion of each feature was not significantly different (Fig. 1D). These results suggest that insulin signaling recruits additional regulatory features by changing chromatin accessibility.
ATAC-seq and RNA-seq reads show weak correlation driven by ATAC-seq peaks in proximal promoters
We next asked whether RNA transcript levels were correlated with ATAC-seq reads and whether the feature annotation of those ATAC-seq peaks, i.e. where in a gene they were located, mattered for any levels of correlation. 8621 out of 10,269 ATAC-seq peaks were mapped to a gene, and we plotted these ATAC-seq peak reads against the RNA-seq reads for each peak (thus duplicating many RNA-seq data points, since each gene has a median number of 2 (Quartile 1-3: 2-4) ATAC-seq peaks mapped to it). Overall, RNA-seq and ATAC-seq peak reads showed only a weak correlation (Pearson correlation coefficient R = 0.1; Fig. 2A), which was still highly significant (p = 2.2e-16) due to the number of data points. To determine if the counts for specific feature types were correlated, we stratified this analysis by the 11 ATAC-seq peak gene features. Only the peak reads in the ≤1 kb promoter class correlated with RNA-seq reads (R = 0.2, p = 2.2e-16; Fig. 2B), though this correlation was also weak. This would suggest that more highly transcribed genes require a greater extent of DNA accessibility in their promoters, which might be expected for efficient transcriptional initiation. Further, these results indicate that while information can be obtained from raw count data, these data cannot be used to predict the functional relevance of a specific genome feature type.
Differential gene expression and DNA accessibility correlate for multiple ATAC-seq peak feature annotations
Next, we determined the insulin-induced changes in DNA accessibility and RNA expression. In the ATAC-seq peak set, 773 peaks were significantly differentially accessible (false discovery rate, FDR < 0.1) between the insulin-exposed and control samples. 364 peaks were more accessible upon insulin exposure, while 409 peaks were less accessible after exposure to insulin (Fig. 3A). The feature distribution of those differential peaks was very similar to the feature distribution in the whole ATAC-seq peak dataset (χ2 = 6.13, df = 10, p = 0.80; Fig. 3B), though we did not detect distal downstream elements (1-2 and 2-3 kb downstream) in the differentially accessible peaks. We also examined the significant gene expression changes from the RNA-seq dataset. In this dataset, 3616 genes were differentially expressed (FDR < 0.05) between the insulin-exposed and control samples. 2056 genes were upregulated after insulin exposure, while 1560 were downregulated (Fig. 3C).
Then, we investigated the correlation between all ATAC-seq and RNA-seq log2 fold changes after insulin treatment. The overall correlation between the two datasets was significant, but very weak (R = 0.05, p = 1.8e-06; Fig. 4A). Performing the same analysis after stratifying by feature type showed weak but significant correlations between the differential RNA-seq transcripts and the differential ATAC-seq features in addition to ≤1 kb promoters (R = 0.096, p = 5.7e-13), including weak correlations with ATAC-seq peaks in enhancers (2-3 kb from the TSS: R = 0.15, p = 5.4e-4 and 1-2 kb from the TSS: R = 0.13, p = 9.0e-05). Furthermore, there were significant anticorrelations in ATAC-seq peaks for downstream elements (1-2 kb: R = − 0.67, p = 0.035) and distal intergenic regions (R = − 0.14, p = 0.0011; Fig. 4B). When we restricted the analysis to only the significant changes in ATAC-seq (FDR < 0.1) and RNA-seq (FDR < 0.05) peaks, the overall correlation for all features increased 8-fold (R = 0.42; Fig. 5A). Log2 fold changes with the same sign in ATAC- and RNA-seq peaks (i.e. both increased or both decreased) were significantly more numerous than half of all the restricted data points (p = 5.38e-10), indicating that increased chromatin accessibility is associated with increased gene expression and vice versa. The overall feature type correlations also changed when restricted to only significant changes (Fig. 5B), particularly the correlation with enhancers 1-2 kb from the TSS (R = 0.65), which increased by approximately 5-fold. Conversely, the anticorrelations with peaks in downstream and distal intergenic regions disappeared. Together, these results suggest that predicting functionally-relevant genome regions may be more accurate when the regions have significantly different accessibility. These results also suggest that DNA accessibility in enhancers is involved in mediating changes in transcription.
Functional testing of significant differentially accessible ATAC-seq peaks
We next wanted to test whether any of the DNA regions from significantly more accessible ATAC-seq peaks could drive insulin-induced expression. We cloned a number of ATAC-seq peaks in front of a luciferase gene with a minimal promoter and transfected S2 cells with these vectors for 48 h. The cells were serum-starved for 18 h and then treated with 10 μM insulin or vehicle for 4 h. We were particularly interested in identifying putative enhancer elements, which are often distant from proximal promoter sequences  and are cell-type specific , so we focused on regions farther than 1 kb from the TSS. Accordingly, we selected three groups of four ATAC-seq peaks each: first, we chose the four peaks with the largest log2 fold change, indicating increased accessibility after insulin treatment (Additional file 1). Of the four tested plasmids, one showed significantly increased luciferase activity after insulin treatment: 3 L114 (annotated to CG6149; 1.2-fold increase; p = 0.016; Fig. 6A). Because ATAC-seq peaks in enhancer regions were the most strongly correlated with differential gene expression in our above analysis (Fig. 5B), we next chose four peaks with the highest log2 fold change from enhancer regions that were significantly more accessible after insulin. Of the tested peaks, 2 produced significantly increased luciferase activity after insulin treatment: 2 L225 (annotated to snoRNA:psi285-2996; 1.1-fold increase; p = 0.0033) and 2R111 (annotated to schnurri; 1.4-fold increase; p = 0.025; Fig. 6B). Lastly, because introns often contain regulatory regions that contain instructive DNA for expression , we chose the four intron regions with the largest log2 fold changes for luciferase assays. One ATAC-seq peak resulted in significantly increased luciferase activity: X216 (annotated to lncRNA flamenco; 1.3-fold increase; p = 0.05) (Fig. 6C). These data show that DNA regions with increased accessibility upon insulin treatment can indeed drive insulin-induced increases in expression when placed in front of a heterologous promoter.
Limited predictability of the levels of expression and inducibility
Out of the twelve ATAC-seq peaks we cloned and tested, all led to variable levels of luciferase expression, while only four caused significant insulin-inducibility. To determine whether the luciferase expression levels and inducibility by insulin was predictable from our ATAC-seq and RNA-seq datasets, we analyzed the correlation between the –omics data and luciferase activity. First, we asked if expression levels of luciferase were correlated with ATAC-seq peak reads, but found no correlation (Fig. 7A), even when we stratified the data according to enhancer- (Fig. 7B) or intron-derived ATAC-seq peaks (Fig. 7C). Similarly, RNA-seq counts did not correlate with S2 luciferase luminescence (Fig. 7D-F). Next, we asked whether the log2 fold changes in the –omics data sets correlated with the relative inducibility of luciferase by insulin (measured as insulin/vehicle ratios). Again, we failed to observe significant correlations of S2 inducibility with log2 fold changes in ATAC-seq (Fig. 8A) and RNA-seq (Fig. 8B) reads, even when we analyzed only the cloned peaks that led to significant insulin-induced changes (Fig. 8C, D).
Next-generation sequencing has enabled an unprecedented amount of genome-wide information on RNA transcript levels and DNA accessibility. ATAC-seq data provides accessibility information from distinct features/regions of a gene, thereby suggesting gene regions that act as functional enhancers (or suppressors) of gene expression. Here, we investigated the correlation between genome-wide changes in DNA accessibility and transcript levels and found weak, but significant correlations that were mostly driven by proximal promoter and distal enhancer regions. Cloning some of these DNA regions with increased accessibility upon insulin stimulation showed that some of them indeed act as transcriptional enhancers, demonstrating that genome-wide ATAC-seq can be harnessed to clone functionally-active insulin-response elements.
To investigate the functional relevance of differential DNA accessibility, we first determined genome-wide ATAC-seq reads in Drosophila S2 cells from serum-starved and insulin-exposed conditions (Fig. 1). The insulin receptor activates several downstream pathways, including the PI3K  and Ras/ERK  pathways, which have various effects on the chromatin state [23, 24] and gene expression  during several cellular processes including cell growth, protein synthesis, and gluconeogenesis . Thus, activating insulin signaling provided a way to identify genome-wide chromatin and gene expression changes, which allowed us to integrate these physiological changes and determine whether chromatin regions that become more open after insulin signaling could predict gene regulation. We found significant (though weak) overall correlations between ATAC-seq reads and transcript levels, which were driven by ATAC-seq peaks in proximal promoters (Fig. 2). In ATAC-seq, genome regions with increased accessibility result in a higher mapped read count . Because promoter regions are critical for the initiation of transcription, these genomic regions are generally accessible for actively-transcribed genes . Thus, proximal promoter regions largely drive the overall weak correlation between ATAC-seq and RNA-seq counts that we observed. These data indicate that normalized counts can identify correlations between chromatin and gene expression, but these correlations are likely limited to promoter regions for actively transcribed genes.
When we analyzed correlations between all insulin-induced log2 fold changes in ATAC-seq peak and transcript reads (Fig. 3), changes in open chromatin in distal (1-2 and 2-3 kb away from the TSS) promoter regions also correlated significantly (though weakly) with changes in transcript levels (Fig. 4). This suggests that the application of insulin recruits additional enhancers that participate in promoting transcription. Conversely, other enhancer regions become less accessible, and the linked genes are less transcribed with insulin. The weak correlations between distal enhancer and proximal promoter with transcript levels increased when we analyzed only ATAC-seq peaks that changed significantly with insulin (Fig. 5). These results suggest that perturbations that cause differential gene expression occur via recruitment of additional regulatory promoter features. The correlations between differential transcript levels and differentially accessible promoter regions were all positive, suggesting that these regions play a role in the downstream differential gene expression. However, these data do not exclude the possibility that in some genes, insulin might lead to increased accessibility at promoters which are then bound by transcriptional repressors, leading to decreased transcription. Indeed, numerous ATAC-seq peak/transcript data points are in quadrants of anticorrelation (Fig. 5), and the insulin-induced transcription factor FOXO is known to have transcriptional repressor activity [28, 29]. Future experiments focusing on such anticorrelated data pairs/genes might reveal DNA regions that lead to insulin-induced transcriptional repression.
Our main goal was to determine whether we could harness our ATAC-seq data to generate insulin-inducible reporter plasmids. We selected ATAC-seq peaks based on our correlation analysis of differentially-accessible chromatin regions and differential transcript expression. We particularly focused on enhancers (1-2 kb from the TSS) because the correlation increase was the largest for this feature. Distal regions may include regulatory regions such as enhancers or repressors that are critically involved in regulating gene expression . Our results suggested that these regions can drive differential gene expression (Fig. 6). We also selected peaks with relatively large log2 fold changes in intron peaks. In Drosophila, intronic regions often contain regulatory sequences , so altering chromatin accessibility in genome regions associated with introns is one mechanism to control gene expression . Finally, we selected peaks with the largest log2 fold changes, irrespective of feature type. In each of these three categories we found peaks that led to significant insulin-induced increases in reporter gene expression. However, none of the three categories seemed obviously more promising for predicting insulin-inducibility. Furthermore, neither read counts nor log2 fold changes in ATAC-seq or RNA-seq were predictive of insulin-inducibility (Figs. 7, 8). This suggests that while ATAC-seq data can be successfully harnessed to generate insulin-inducible reporter constructs, their efficacy is not obviously predictable and will require larger datasets to understand which ATAC-seq peaks can be utilized to generate functionally relevant transgenes. Indeed, previous studies investigating putative enhancer elements identified candidates based on overlap with known histone marks (H3K4me1, H3K27ac, etc. identified by ChIP-seq), known enhancers associated with annotated genes of interest [32,33,34], or used massively parallel reporter assays . In contrast, our goal was to determine whether ATAC-seq alone could predict downstream transcription using only feature-based or fold change-based selection. Importantly, these previous studies showed similar success rates to ours. Peaks with increased chromatin accessibility after insulin treatment that did not result in insulin-induced luciferase activity may represent regulatory elements that are involved in setting up poised transcription or may contain repressor regions that pause transcription. In contrast, peaks causing increased luciferase activity may represent sequences that are sufficient to initiate transcription or activate promoter clearance [36,37,38]. Additional studies using ChIP-seq to identify the histone marks at our tested peak sequences will be required to determine whether they are enhancers involved in poised versus active transcription.
Our investigation shows that ATAC-seq data can be harnessed to isolate regulatory DNA regions that are both expressed and inducible. However, because chromatin peaks may be one of several regulatory sequences [20, 30], these chromatin regions cannot be easily predicted by analysis of these genome-wide –omics data alone and must be functionally validated. Still, our data show the feasibility of using ATAC-seq to generate active transgenes that are inducible by an intervention or by a diseased state to drive a reporter, or even a disease-antidote gene.
Drosophila S2 cells (Drosophila Genomics Resource Center, Bloomington, IN, USA) were cultured in Schneider’s Drosophila Medium (ThermoFisher, Waltham, MA, USA) supplemented with 10% fetal bovine serum (ThermoFisher) at 25 °C. Cells were cultured in Schneider’s medium without FBS for 24 h before experiments. Then, cells were incubated with 10 μM insulin (Sigma Aldrich, St. Louis, MO, USA) or vehicle (25 mM HEPES, pH 8.2) for 4 h at 25 °C.
S2 cells were incubated with 3 μM DAPI for 10 min. 60,000 cells per sample were sorted using a BD FACS Aria flow cytometer (BD Biosciences, San Jose, CA, USA). DAPI-negative cells were collected into ice-cold PBS (pH 7.4). After sorting, the samples were washed once with ice-cold PBS and centrifuged at 500 g for 5 min at 4 °C. ATAC-seq libraries were prepared as previously described . Briefly, 50 μL lysis buffer (10 mM Tris-HCl 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP40) was added to each sample, and the sample was centrifuged at 500 g for 10 min at 4 °C. The supernatant was removed, and the nuclei pellet was tagmented using a Nextera DNA Library Prep kit (Illumina, Inc., San Diego, CA, USA) as previously described. Then, the tagmented DNA was purified using a Qiagen MinElute PCR Purification Kit (Qiagen, Germantown, MD, USA). The purified DNA was PCR amplified for 5 cycles using a Nextera DNA Library Index kit (Illumina) and Phusion HF Master Mix (New England BioLabs, Inc., Ipswich, MA, USA) with the following protocol: 72 °C for 5 min, 98 °C for 30 sec, and 5 cycles of 98 °C for 10 sec, 63 °C for 30 sec, and 72 °C for 1 min. A 5-μL aliquot of the pre-amplified reaction was analyzed by qPCR using SsoFast EvaGreen Supermix (Bio-Rad Life Science, Inc., Hercules, CA, USA) and an Applied Biosystems 7900HT qPCR instrument (ThermoFisher) using the following protocol: 1 cycle of 98 °C for 30 sec and 40 cycles of 98 °C for 10 sec, 63 °C for 30 sec, and 72 °C for 1 min. Then, the pre-amplified PCR mixture was amplified for another 10 cycles (corresponding to 1/3 maximum fluorescence from the qPCR assay) using the same thermocycling parameters. After amplification, the libraries were purified using AMPure XP beads (Beckman Coulter Life Sciences, Indianapolis, IN, USA). Libraries were sequenced on an Illumina HiSeq 2500 instrument using 50-bp single-end reads.
Total RNA was isolated from S2 cells using a PureLink RNA purification kit (ThermoFisher). Then, rRNA was removed from each sample with a Ribo-Zero rRNA Removal kit (Illumina). RNA libraries were constructed using a NEBNext Ultra II RNA Library Kit for Illumina and NEBNext Multiplex Oligos for Illumina, Primer Set 1 (New England Biolabs). Libraries were sequenced on an Illumina HiSeq 2500 instrument using 50-bp single-end reads.
ATAC-seq data analysis
ATAC-seq Fastq files were aligned to the dm6 genome assembly (http://ftp.flybase.net/releases/FB2018_06/dmel_r6.25/fasta/) using Novocraft Novoalign with the following settings: --NonC -o SAM -r Random. SAM files were processed to BAM format, sorted, and indexed using Samtools . BAM files were reads per million-normalized and converted to bigWig files using the Bio-ToolBox ‘bam2wig.pl’ program (https://github.com/tjparnell/biotoolbox/blob/master/scripts/bam2wig.pl). Peak calling was performed on the bigWig files by utilizing the Multi-Replicate Macs ChIPseq pipeline (https://github.com/HuntsmanCancerInstitute/MultiRepMacsChIPseq) with the following settings: --dupfrac 0.2 --size 200 --cutoff 2 --peaksize 300 --peakgap 200. Fractions of reads in peaks values were calculated in R using the Rsubread package (version 2.8.2). Transcription start site enrichment was calculated using Deeptools (version 3.3.2). Called peaks were annotated in R with the ChIPseeker package . Count data for called peaks was collected from processed BAM files using the Bio-ToolBox ‘get_datasets.pl’ program (https://metacpan.org/pod/distribution/Bio-ToolBox/scripts/get_datasets.pl). The count data was then used to identify differentially accessible regions with the R package DEseq2 . ATAC-seq quality metrics are shown in Additional file 1.
RNA-seq data analysis
RNA-seq fastq files were aligned to the BDGP6 genome assembly using the STAR aligner  with the following settings: --twopassMode Basic --outSAMtype BAM SortedByCoordinate --outWigType bedGraph --outWigStrand Unstranded --clip3pAdapterseq AGATCGGAAGAGCACACGTCTGAACTCCAGTCA. The resulting sorted BAM files were indexed using Samtools (Li et al., 2009). FeatureCounts was used to collect count data for BDGP6 genes using the following command: -T 16 -s 2 . Count data for all replicates and experimental conditions were combined into a single count matrix in R. The count matrix was subsequently used to identify differentially expressed genes with the R package DEseq2 .
Integration analysis of ATAC-Seq and RNA-Seq datasets
The ATAC-seq peak data were compared to the RNA-seq data to determine how chromatin accessibility influenced gene expression. The raw RNA-seq and ATAC-seq counts for each sample were compared using the gene annotation of the ATAC-seq peak and the assigned RNA-seq gene. The raw count value was averaged by experimental condition and genomic assay type. Then, the RNA-seq and ATAC-seq datasets were compared using the annotated genes and the log2 fold change values for each peak/gene in the respective genomic assay. ATAC-seq peaks with an FDR < 0.1 and genes detected by RNA-seq with an FDR < 0.05 were used to compare the differentially accessible peaks and differentially expressed genes. Pearson correlation analysis was performed between the log2 fold change values of the genomic assays and between the average raw count values of the genomic assays (controlling for the experimental condition).
Plasmid construction and transformation
A multiple cloning site (MCS) was cloned into the backbone pDEST VanGlow-GL vector . Then, we removed the mini-white+ cassette using the restriction enzymes AflII (3 L137, 3R131, X216, 3 L114, 2 L796, 3 L981, 2 L220, 2R177, 2 L225, and FOXO TFBS) or SmaI and PmlI (X950, 2 L846, and 2R111). The digested plasmids were incubated with T4 ligase for 15 min at room temperature and purified by 1% gel electrophoresis. The plasmids were linearized using AvrII and PacI (sites contained in MCS; all enzymes from New England BioLabs).
Genomic DNA was purified from S2 cells using a Monarch Genomic DNA Purification kit. Candidate chromatin peak sequences and 100-200 bp flanking sequences  (Additional file 2) were amplified using Phusion High-Fidelity PCR MasterMix (primer sequences are listed in Additional file 3) and a C1000 thermocycler (Bio-Rad Life Science). The peak sequences were amplified for 98 °C for 5 min, followed by 35 cycles of 98 °C for 30 sec, 52-68 °C gradient for 30 sec, 72 °C for 4 min, and a final incubation for 5 min at 72 °C. The amplified fragments were purified on 1% agarose gels and extracted using a Monarch Gel Purification kit and cloned into linearized VanGlo-GL-MCS plasmid using NEBuilder HiFi Assembly master mix. The assembled plasmids were transformed into DH5α cells and grown overnight. Clones were screened by restriction digestion using EcoRI-HF. Sequences were confirmed by Sanger sequencing at GeneWiz (South Plainfield, NJ, USA). Confirmed plasmids were transformed into S2 cells using TransIT Insect Transfection Reagent (Mirus Bio, Madison, WI, USA). Transformed cells were grown for 48 h before use in experiments.
Transformed S2 cells were serum starved for 24 h and treated with insulin or vehicle as described above (Cell culture). Then, luciferase activity was assayed using a Luciferase Reporter Substrate Assay Kit-Firefly (Abcam, Cambridge, MA, USA). Luminescence was detected with a BioTek Synergy HTX microplate reader (BioTek Instruments, Winooski, VT, USA) and Gen5 2.01.17 software (BioTek Instruments).
Statistical differences in relative luminescence data were analyzed by Student’s t-tests with at least three biological replicates using GraphPad Prism 9.0 software. Differences between genome feature proportions were analyzed using χ2 tests included in R  Correlations were analyzed using Pearson correlation tests included in R. One-proportion z-tests were performed using the binom.test function in R. Heatmaps were generated using the R package ComplexHeatmap . PCA plots were created using the R package pcaExplorer . Correlation plots were produced with the R package ggpubr (https://github.com/kassambara/ggpubr).
Availability of data and materials
All sequencing data are deposited in the Sequence Read Archive (BioProject ID: PRJNA730574; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA730574). The plasmids developed in this study are available from C.M. upon reasonable request.
MacNeil LT, Walhout AJM. Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res. 2011;21:645–57.
Goode DK, Obier N, Vijayabaskar MS, Lie-A-Ling M, Lilly AJ, Hannah R, et al. Dynamic Gene Regulatory Networks Drive Hematopoietic Specification and Differentiation. Dev Cell. 2016;36:572–87.
Tokusumi Y, Tokusumi T, Shoue DA, Schulz RA. Gene regulatory networks controlling hematopoietic progenitor niche cell production and differentiation in the Drosophila lymph gland. PLoS One. 2012;7:41604.
Morozova TV, Mackay TFC, Anholt RRH. Transcriptional networks for alcohol sensitivity in Drosophila melanogaster. Genetics. 2011;187:1193–205.
Duarte FM, Fuda NJ, Mahat DB, Core LJ, Guertin MJ, Lis JT. Transcription factors GAF and HSF act at distinct regulatory steps to modulate stress-induced gene activation. Genes Dev. 2016;30(15):1731–46. https://doi.org/10.1101/gad.284430.116.
Petruccelli E, Brown T, Waterman A, Ledru N, Kaun KR. Alcohol causes lasting differential transcription in Drosophila mushroom body neurons. Genetics. 2020;215(1):103–16. https://doi.org/10.1534/genetics.120.303101.
Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 2008;36:5221–31.
Song L, Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc. 2010;2010:pdb.prot5384.
Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10:1213–8.
Kagohara LT, Zamuner F, Davis-Marcisak EF, Sharma G, Considine M, Allen J, et al. Integrated single-cell and bulk gene expression and ATAC-seq reveals heterogeneity and early changes in pathways associated with resistance to cetuximab in HNSCC-sensitive cell lines. Br J Cancer. 2020;123(1):101–13.
Li X, Chen Y, Fu C, Li H, Yang K, Bi J, et al. Characterization of epigenetic and transcriptional landscape in infantile hemangiomas with ATAC-seq and RNA-seq. Epigenomics. 2020;12(11):893–905. https://doi.org/10.2217/epi-2020-0060.
Ackermann AM, Wang Z, Schug J, Naji A, Kaestner KH. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes. Mol Metab. 2016;5:233–44.
McNeill H, Craig GM, Bateman JM. Regulation of neurogenesis and epidermal growth factor receptor signaling by the insulin receptor/target of rapamycin pathway in drosophila. Genetics. 2008;179:843–53.
Kido Y, Nakae J, Accili D. The Insulin Receptor and Its Cellular Targets 1. J Clin Endocrinol Metab. 2001;86:972–9.
Puig O, Marr MT, Ruhf ML, Tjian R. Control of cell number by Drosophila FOXO: downstream and feedback regulation of the insulin receptor pathway. Genes Dev. 2003;17(16):2006–20. https://doi.org/10.1101/gad.1098703.
Kulkarni MM, Kulkarni MM, Sopko R, Sun X, Hu Y, Nand A, et al. An Integrative Analysis of the InR/PI3K/Akt Network Identifies the Dynamic Response to Insulin Signaling. Cell Rep. 2016;16:3062–74.
Post S, Karashchuk G, Wade JD, Sajid W, De Meyts P, Tatar M. Drosophila Insulin-Like Peptides DILP2 and DILP5 Differentially Stimulate Cell Signaling and Glycogen Phosphorylase to Regulate Longevity. Front Endocrinol (Lausanne). 2018;9:245.
Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, et al. Boundaries mediate long-distance interactions between enhancers and promoters in the Drosophila Bithorax complex. PLoS Genet. 2018;14:e1007702.
Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–47.
Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 1979;2010(330):1787–97.
Dekanty A, Lavista-Llanos S, Irisarri M, Oldham S, Wappner P. The insulin-PI3K/TOR pathway induces a HIF-dependent transcriptional response in Drosophila by promoting nuclear localization of HIF-α /Sima. J Cell Sci. 2005;118:5431–41.
Zhang W, Thompson BJ, Hietakangas V, Cohen SM. MAPK/ERK signaling regulates insulin sensitivity to control glucose metabolism in Drosophila. PLoS Genet. 2011;7:1002429.
Mouchel-Vielh E, Rougeot J, Decoville M, Peronnet F. The MAP kinase ERK and its scaffold protein MP1 interact with the chromatin regulator Corto during Drosophila wing tissue development. BMC Dev Biol. 2011;11:1–14.
Sánchez-Alegría K, Flores-León M, Avila-Muñoz E, Rodríguez-Corona N, Arias C. PI3K signaling in neurons: A central node for the control of multiple functions. Int J Mol Sci. 2018;19:3725.
Garofalo RS. Genetic analysis of insulin signaling in Drosophila. Trends Endocrinol Metab. 2002;13:156–62.
Goberdhan DCI, Wilson C. The functions of insulin signaling: size isn’t everything, even in Drosophila. Differentiation. 2003;71:375–97.
Blythe SA, Wieschaus EF. Establishment and maintenance of heritable chromatin structure during early Drosophila embryogenesis. Elife. 2016;5:e20148.
Jünger MA, Rintelen F, Stocker H, Wasserman JD, Végh M, Radimerski T, et al. The Drosophila Forkhead transcription factor FOXO mediates the reduction in cell number associated with reduced insulin signaling. J Biol. 2003;2:20.
Ramaswamy S, Nakamura N, Sansal I, Bergeron L, Sellers WR. A novel mechanism of gene regulation and tumor suppression by the transcription factor FKHR. Cancer Cell. 2002;2:81–91.
Gisselbrecht SS, Palagi A, Kurland JV, Rogers JM, Ozadam H, Zhan Y, et al. Transcriptional Silencers in Drosophila Serve a Dual Role as Transcriptional Enhancers in Alternate Cellular Contexts. Mol Cell. 2020;77:324–337.e8.
Duret L. Why do genes have introns? Recombination might add a new piece to the puzzle. Trends Genet. 2001;17:172–5.
Daugherty AC, Yeo RW, Buenrostro JD, Greenleaf WJ, Kundaje A, Brunet A. Chromatin accessibility dynamics reveal novel functional enhancers in C. elegans. Genome Res. 2017;27:2096–107.
Quillien A, Abdalla M, Yu J, Ou J, Zhu LJ, Lawson ND. Robust Identification of Developmentally Active Endothelial Enhancers in Zebrafish Using FANS-Assisted ATAC-Seq. Cell Rep. 2017;20:709–20.
Cusanovich DA, Reddington JP, Garfield DA, Daza RM, Aghamirzaie D, Marco-Ferreres R, et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature. 2018;555:538–42.
Hrvatin S, Tzeng CP, Nagy MA, Stroud H, Koutsioumpa C, Wilcox OF, et al. A scalable platform for the development of cell-type-specific viral drivers. Elife. 2019;8:e48089.
Klemm SL, Shipony Z, Greenleaf WJ. Chromatin accessibility and the regulatory epigenome. Nat Rev Genet. 2019;20:207–20.
Bae S, Lesch BJ. H3K4me1 Distribution Predicts Transcription State and Poising at Promoters. Front Cell Dev Biol. 2020;8:289.
Koenecke N, Johnston J, Gaertner B, Natarajan M, Zeitlinger J. Genome-wide identification of Drosophila dorso-ventral enhancers by differential histone acetylation analysis. Genome Biol. 2016;17:1–19.
Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. In: Current protocols in molecular biology. Hoboken, NJ: John Wiley & Sons, Inc.; 2015. p. 21.29.1–9.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Yu G, Wang LG, He QY. ChIP seeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics. 2015;31:2382–3.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010. https://doi.org/10.1038/npre.2010.4282.1.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
Liao Y, Smyth GK, Shi W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–30.
Janssens DH, Hamm DC, Anhezini L, Xiao Q, Siller KH, Siegrist SE, et al. An Hdac1/Rpd3-Poised Circuit Balances Continual Self-Renewal and Rapid Restriction of Developmental Potential during Asymmetric Stem Cell Division. Dev Cell. 2017;40:367–380.e7.
R Core. R.:A language and environment for statistical computing. 2020.
Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–9.
Marini F, Binder H. PcaExplorer: An R/Bioconductor package for interacting with RNA-seq principal components. BMC Bioinformatics. 2019;20:1–8.
We thank members of the lab for continued discussion. This work was supported by the University of Utah Flow Cytometry Facility, the University of Utah Genomics Core Facility, the High Throughput Sequencing Core at the Huntsman Cancer Institute, and the National Cancer Institute through Award Number 5P30CA042014. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
This study was supported by grants from the National Institute of Health: National Institute on Drug Abuse (Grant R21DA049635, to AR), the National Institute on Alcohol Abuse and Alcoholism (R01AA026818 to AR), and the National Institute of Diabetes and Digestive and Kidney Disease (R01DK110358 to ARR).
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Merrill, C.B., Montgomery, A.B., Pabon, M.A. et al. Harnessing changes in open chromatin determined by ATAC-seq to generate insulin-responsive reporter constructs. BMC Genomics 23, 399 (2022). https://doi.org/10.1186/s12864-022-08637-y