- Research article
- Open Access
5-hydroxymethylcytosine is highly dynamic across human fetal brain development
BMC Genomics volume 18, Article number: 738 (2017)
Epigenetic processes play a key role in orchestrating transcriptional regulation during the development of the human central nervous system. We previously described dynamic changes in DNA methylation (5mC) occurring during human fetal brain development, but other epigenetic processes operating during this period have not been extensively explored. Of particular interest is DNA hydroxymethylation (5hmC), a modification that is enriched in the human brain and hypothesized to play an important role in neuronal function, learning and memory. In this study, we quantify 5hmC across the genome of 71 human fetal brain samples spanning 23 to 184 days post-conception.
We identify widespread changes in 5hmC occurring during human brain development, notable sex-differences in 5hmC in the fetal brain, and interactions between 5mC and 5hmC at specific sites. Finally, we identify loci where 5hmC in the fetal brain is associated with genetic variation.
This study represents the first systematic analysis of dynamic changes in 5hmC across human neurodevelopment and highlights the potential importance of this modification in the human brain. A searchable database of our fetal brain 5hmC data is available as a resource to the research community at http://www.epigenomicslab.com/online-data-resources.
Human brain development is characterized by coordinated changes in gene expression mediated by a complex interaction between transcription factors  and epigenetic processes . We recently characterized the dramatic alterations in DNA methylation (5-methylcytosine, 5mC) occurring during human neurodevelopment , but little is known about the role of other epigenetic modifications during this period. Of particular interest is DNA hydroxymethylation (5-hydroxymethylcytosine, 5hmC), a covalent modification of cytosine that represents an oxidised derivative of 5mC produced by the process of active DNA demethylation [4, 5]. Of note, 5hmC is present at relatively high levels in the mature central nervous system , and particularly enriched in the vicinity of genes with synapse-related functions . Although initially hypothesized to represent an intermediate step of the DNA demethylation pathway, 5hmC is now assumed to have specific functional roles in the brain. Several specific molecular readers of 5hmC have been identified, including transcriptional regulators, chromatin modifiers and DNA damage and repair proteins [8,9,10,11,12]. Additionally, the ten-eleven translocation (TET) family of proteins that catalyze the conversion of 5mC into 5hmC have been implicated in neuronal differentiation and function ; studies manipulating TET activity suggest that 5hmC plays a role in learning and memory, hippocampal neurogenesis, and neuronal activity-regulated gene expression [14,15,16]. Recently, 5hmC has been found to mark regulatory regions of the genome in the murine and human fetal brain [2, 17] and increase rapidly in abundance postnatally, concurrent with neuronal maturation and synaptogenesis [18,19,20]. Notably, there is growing evidence to suggest that abnormal regulation of 5hmC contributes to the etiology of several neurodevelopmental disorders including autism spectrum disorders  and schizophrenia .
Importantly, many of the standard methods used to quantify 5mC, including those based on sodium bisulfite (BS) conversion of DNA, are unable to discriminate between 5mC and 5hmC [10, 23], with important implications for the interpretation of published DNA methylation datasets from brain. Recently, a number of methods have been developed to enable the quantification of 5hmC across the genome . Oxidative bisulfite (oxBS) conversion of genomic DNA, which involves the oxidation of 5hmC to 5-formylcytosine (5fC) prior to standard BS conversion, enables the accurate quantification of 5mC, allowing 5hmC to be measured by proxy at base-pair resolution [25,26,27]. In this study, we use this approach in combination with the Illumina 450 K HumanMethylation array (“450 K array”) to undertake the first systematic study of 5hmC in the developing human brain, profiling 71 fetal samples ranging from 23 to 184 days post-conception (DPC). We identify widespread changes in 5hmC during human brain development, with differentially hydroxymethylated positions (DHPs) and regions (DHRs) becoming both hyper- and hypo-hydroxymethylated with fetal age. As a resource to the scientific community a searchable database of our fetal brain 5hmC data is available at http://www.epigenomicslab.com/online-data-resources.
Autosomal distribution of 5hmC in the developing human brain
We quantified 5hmC across the genome by subtracting oxBS-generated 450 K array profiles from those generated following BS-conversion performed in parallel (ΔβBS-oxBS). We set a stringent threshold for calling 5hmC (ΔβBS-oxBS > 0.036) based on the 95th percentile of negative ΔβBS-oxBS values across all profiled samples, with sites below this threshold characterized as having “undetectable” 5hmC (see Methods section). Consistent with previous observations in young neurons [18, 19, 28], we observed overall low levels of global 5hmC in the fetal brain (mean level of 5hmC across all 411,325 probes passing our stringent quality-control metrics = 2.97% (SD = 5.16%)). Approximately a quarter (n = 103,063 (25.64%)) of all autosomal probes included in the analysis were characterized by non-detectable 5hmC (i.e. ΔβBS-oxBS < 0.036) in all 71 fetal brain samples examined (Fig. 1a and Additional file 1: Table S1); these sites were significantly enriched in CpG islands and other promoter regulatory regions including the transcription start-site (TSS), 5’UTR and 1st Exon (Additional file 1: Table S2). In contrast, a small number of autosomal sites (n = 5061 (1.26%)) were characterized by detectable 5hmC in all 71 samples; these sites were significantly enriched in CpG island shores and shelves, regions flanking the TSS and gene bodies (Additional file 1: Table S2). Of these consistently hydroxymethylated probes a large proportion were also characterized by detectable 5hmC in our previous analysis of a small number of adult cortex (n = 2162, 43%) and cerebellum (n = 2381, 47%) samples . Notably, the majority of probes were characterized by detectable 5hmC in only a subset of samples (Fig. 1a), indicating that the presence and distribution of 5hmC is highly variable during fetal brain development.
Across the 298,972 autosomal probes (74.36%) characterized by detectable 5hmC (i.e. ΔβBS-oxBS > 0.036) in at least one fetal brain sample, the mean level of 5hmC was 4.16% (SD 5.43%) across all fetal brain samples (Additional file 1: Table S3). In agreement with previous studies [27, 29,30,31,32], we observed an inverse relationship between CpG density and mean levels of 5hmC at these sites, with significantly lower levels in CpG islands (1.56%, P-value <1.00E-200) and higher levels in sites not located within CpG islands, shores or shelves (5.20%, P-value <1.00E-200) compared to the average across all sites characterized by detectable 5hmC in at least one fetal brain sample (Additional file 2: Figure S1; Additional file 1: Table S3). 5hmC was significantly enriched at 450 K array probes in gene bodies (4.37%, P-value <1.00E-200) and 3’UTRs (4.74%, P-value <1.00E-200), and significantly depleted at 450 K array probes located within 1500 bp (3.93%, P-value <1.00E-200) and 200 bp (2.14%, P-value <1.00E-200) upstream of the transcription start site (TSS), 5’UTRs (3.85%, P-value <1.00E-200), and 1st exons (2.24%, P-value <1.00E-200) (Additional file 2: Figure S1; Additional file 1: Table S3), concurring with previous observations [19, 27, 29,30,31].
Human fetal brain development is associated with dynamic changes in 5hmC at specific loci
Our previous study quantified BS-converted DNA to identify changes in total DNA modifications (i.e. 5mC and 5hmC) across fetal brain development in an overlapping set of fetal brain samples , and was therefore not able to distinguish between 5mC and 5hmC. For the developmentally differentially modified positions (dDMPs) identified in our original study that were included in our final dataset (n = 28,330 probes), we found a highly-significant correlation of effect sizes across both BS datasets (r = 0.99; P-value <1.00E-200) (Additional file 2: Figures S2 and S3), confirming the robust nature of our previous DNA modification estimates. We explored the extent to which changes at these sites, previously attributed to changes in 5mC, were confounded by levels of 5hmC. There was a strong correlation (r = 0.97, P-value <1.00E-200) between the BS (5mC + 5hmC) and oxBS (5mC) effect sizes (Additional file 2: Figure S4) at these sites, indicating that the DNA modification dynamics previously described  are likely to primarily reflect changes in 5mC. The majority of these sites (n = 28,255, 99.74%), however, were characterized by detectable 5hmC in at least one sample and there was a significant positive correlation (r = 0.36; P-value <1.00E-200) (Additional file 2: Figure S5) between the BS (5mC + 5hmC) and BS-oxBS (5hmC) effect sizes, indicating a contribution of dynamic 5hmC across brain development at these sites.
We next used a linear model to detect sites where levels of 5hmC changed across brain development (see Methods), identifying 62 developmentally differentially hydroxymethylated positions (dDHPs) at a stringent Bonferroni-corrected significance threshold (P-value <1.67E-07) (Additional file 1: Table S4), representing 0.02% of the 298,972 autosomal probes tested. At a more relaxed “discovery” significance threshold (P-value <5E-05), we identified 2181 sites (0.73%) characterized by changes in 5hmC during brain development (Additional file 1: Table S5). The top-ranked sites characterized by increasing 5hmC across brain development (henceforth referred to as hyperhydroxymethylated dDHPs) and decreasing 5hmC across brain development (henceforth referred to as hypohydroxymethylated dDHPs) are shown in Fig. 1b and c, respectively. Among the “discovery” dDHPs we observed a significant enrichment of hypohydroxymethylated autosomal dDHPs (n = 1278 (58.60%)) compared to hyperhydroxymethylated dDHPs (n = 903 (41.40%)) (enrichment hyperhydroxymethylated OR = 0.71, P-value = 1.29E-08) (Additional file 1: Table S6). We next used comb-P  to identify spatially correlated regions of developmentally dynamic 5hmC (or developmentally differentially hydroxymethylated regions (dDHRs)). A total of 254 dDHRs (Šidák-corrected P-value <0.05) were identified, spanning an average of 5.43 probes and 551.30 base pairs (SD = 362.97) (Additional file 1: Table S7). The top-ranked dDHR, encompassing 11 CpG sites spanning the first exon of C22orf26 and LOC150381 on chromosome 22, is characterized by a significant reduction in 5hmC across brain development (Šidák-corrected P-value = 1.10E-15) (Fig. 1d). A full list of dDHRs is presented in Additional file 1: Table S8.
Sites characterized by changes in 5hmC across brain development are not distributed equally across genomic features
Although the frequency of dDHPs is relatively consistent across autosomes, chromosome 19 is notably depleted of dDHPs (relative enrichment = 0.62, P-value = 3.08E-05) (Additional file 1: Table S9). The distribution of dDHPs, however, is not equal across annotated genic features (Additional file 1: Tables S6 and S10), being significantly depleted in promoter regulatory regions, including CpG islands (percentage of significant probes = 0.32%, relative enrichment = 0.43, P-value = 1.43E-33), and enriched at sites not annotated to CG-rich features (CpG islands, shores and shelves) (percentage of significant probes = 0.92%, relative enrichment = 1.26, P-value = 2.27E-10) (Additional file 2: Figures S6 and S7). There is also a significant enrichment of dDHPs in annotated DNase I hypersensitivity sites (percentage of significant probes = 0.96%, relative enrichment 1.32, P–value = 2.62E-13) (Additional file 1: Table S6) and probes associated with transcription factor binding sites (TFBSs)  (percentage of significant probes = 0.79%, relative enrichment = 1.09, P-value = 4.41E-02), with specific TFBS motifs being specifically enriched for dDHPs (Additional file 1: Table S11). Sites associated with specific alternative transcription event domains were significantly depleted for dDHPs (Additional file 2: Figure S8; Additional file 1: Table S12) including alternative 3′ splice sites (percentage of significant probes = 0.34%, relative enrichment = 0.46, P-value = 1.51E-02) and constitutive exons (percentage of significant probes = 0.44%, relative enrichment = 0.59, P-value = 4.70E-06).
There is a complex relationship between 5mC and 5hmC across human brain development
It is not yet definitively known whether 5hmC represents a stable or transient DNA modification  and its relationship to 5mC has not been explored in a large number of samples. Our data show that levels of 5hmC in the fetal brain are highest at loci characterized by intermediate levels of 5mC (Fig. 2a), concurring with the results of previous analyses [27, 30]. To further delineate the relationship between 5mC and 5hmC during brain development we compared effect sizes (regression coefficients reflecting change in DNA modification with development) for all dDHPs passing our ‘discovery’ significance threshold (n = 2181) (Fig. 2b). dDHPs could be split into three broad groups; one characterized by changes in 5mC paralleling those seen for 5hmC (e.g. Fig. 2c-d), a second where changes occurred in the opposite directions across the two DNA modifications (e.g. Fig. 2e-f), and a third characterized by changes in 5hmC, but no changes in 5mC (e.g. Fig. 2g). Of note, nearly half (n = 1059; 48.56%) of the dDHPs tested where characterized by a significant interaction (P-value <2.29E-05) between trajectories of 5mC and 5hmC across brain development (Fig. 2b).
Sex differences in the human fetal brain hydroxymethylome
Although the majority of autosomal chromosomes are characterized by a similar mean level of 5hmC (Additional file 1: Table S13) (mean 5hmC across 298,972 autosomal probes = 4.16%), levels are significantly lower on the X-chromosome (mean 5hmC = 2.00%, P-value <1.00E-200), especially in females (mean 5hmC across X-chromosome in females = 1.40%, P-value <1.00E-200). A total of 427 probes (0.14% of the 307,810 autosomal and X-linked probes assessed) displayed significantly different levels of 5hmC between male and females (Bonferroni-significance level of P-value <1.62E-07), all located on the X-chromosome (Fig. 3a). In contrast to 5mC, however, which is elevated on the inactive female X-chromosome, the majority of these sites (n = 404; 94.61%) were characterized by a higher level of 5hmC in males (Fig. 3b and c and Additional file 2: Figure S9). Higher 5hmC in males versus females is observed across canonical gene features on the X-chromosome, however this is not seen on the autosomes (Fig. 3d). A total of 144 fetal brain DHRs were identified between males and females using comb-P (Šidák-corrected P-value <0.05, number of probes ≥ 3) (Additional file 1: Table S14). All were located on the X-chromosome and the top-ranked sex-associated DHR is shown in Additional file 2: Figure S10.
Modules of co-hydroxymethylated loci are identified in the developing human brain
We next employed weighted gene co-hydroxymethylation network analysis (WGCNA)  to undertake a systems level characterization of the 5hmC changes associated with human brain development. WGCNA identified a total of 32 discrete modules of co-hydroxymethylated sites (Additional file 1: Table S15). The first principal component of each module, the “module eigengene”, was used to assess the module relationship with fetal brain development (Fig. 4a). Seven modules were significantly correlated with brain development (P-value <1.56E-03) (Additional file 1: Table S15; Fig. 4b; Additional file 2: Figure S11). Module membership within these modules was highly correlated with probe significance (Additional file 2: Figure S12), indicating a clear relationship between 5hmC at the core members of each module and fetal brain development (Fig. 4c). To assess whether these modules are biologically meaningful, functional enrichment analyses were performed on the top 1000 probes from the seven DPC-associated modules (Additional file 1: Table S16). Highly significant gene ontology (GO) terms were identified for the DPC associated modules including “dendrite morphogenesis” (“blue” module; P-value = 3.22E-11), “negative regulation of synaptic transmission” (“black” module; P-value = 6.29E-09), “negative regulation of axon extension involved in axon guidance” (“red” module; P-value = 3.64E-09) and “embryonic organ morphogenesis” (“green” module, P-value = 1.79E-11), suggesting that the large changes in 5hmC across fetal brain development contribute to the regulation of relevant biological processes. Additionally, the sex-associated “lightcyan” module was significantly enriched for terms relevant to brain development including “dendritic spine development” (P-value = 8.86E-05), suggesting that sex differences in 5hmC may contribute to sex-specific neurodevelopmental processes.
DNA hydroxymethylation quantitative trait loci
Building on our recent work identifying genetic effects on 5mC in the fetal brain , we explored genetic influences on levels of 5hmC by performing a genome-wide scan for 5hmC quantitative trait loci (hmQTL). Although we were not highly-powered to detect hmQTLs given the relatively small sample size and relative paucity of 5hmC, we identified 23 hmQTLs at a Bonferroni significance threshold (P-value <2.2E-13) associated with 5hmC with a median effect size of 6.99% per allele (Additional file 1: Table S17, Fig. 5), and 77 hmQTLs at a more relaxed discovery threshold of P-value <1E-11 (Additional file 1: Table S18). In contrast, we identified 13,480 mQTLs at a Bonferroni significance threshold (P-value <1.6E-13) for 493 distinct sites, and 24,286 mQTLs at a discovery threshold of P-value <1E-11 at 860 distinct sites, in the same samples, reflecting the higher abundance of methylated sites in the genome. Three sites were part of 11 Bonferroni significant QTLs for both modifications, all showing the same direction of effect. Of the 13,480 mQTL identified 1383 (10.3%) were associated with probes that did not show any detectable levels of 5hmC, and 7652 (56.8%) showed no evidence of a genetic effect on 5hmC (P-value >0.05). 11,910 (98.5%) mQTLs were associated with significant (P-value <0.05/12,097) heterogeneous genetic effects across 5mC and 5hmC.
In this study, we examined neurodevelopmental changes in 5hmC in 71 human fetal brain samples spanning 23 to 184 DPC. To our knowledge, this represents the most extensive study of 5hmC in the human brain and is the first to examine changes in 5hmC across human neurodevelopment. We identify highly significant changes in 5hmC during human brain development at multiple sites, and modules of co-hydroxymethylated loci associated with fetal age that are significantly enriched in the vicinity of genes involved in neurodevelopmental processes. We highlight a complex relationship between patterns of 5hmC and 5mC across brain development; while some loci are characterized by parallel changes in both modifications, other sites are characterized by opposite patterns. Strikingly, in contrast to 5mC, levels of 5hmC on the X-chromosome are lower in females than males. Finally, although hmQTLs are considerably less widespread than mQTLs in the fetal brain, we highlight examples where 5hmC is associated with genetic variation.
Consistent with previous observations in young neurons [18, 19, 28], we observe overall low levels of global 5hmC in the human fetal brain. Examining the genomic distribution of 5hmC highlighted an inverse correlation between mean 5hmC and CpG density, with a depletion of 5hmC in promoter regions and enrichment in gene bodies, relative to global 5hmC levels, concordant with previous observations in adult human brain [2, 19, 29,30,31, 38]. The chromosomal distribution of 5hmC was relatively consistent, although a depletion of 5hmC was found on the X-chromosome relative to autosomes in both males and females, and this deficit was particularly striking in females. These data concur with previous studies reporting a deficit in X-chromosome 5hmC [2, 19, 29], suggesting a possible role in dosage compensation for X-linked genes. Interestingly, there was reduced X-chromosome 5hmC in females compared to males, in stark contrast to 5mC which is associated with silencing of the inactive allele. 5hmC is hypothesized to be associated with transcriptionally permissive chromatin states; whilst 5mC is often associated with gene repression, 5hmC facilitates transcription through contributing to open chromatin [39, 40]. During synaptogenesis, for example, while 5mC accumulates in heterochromatin, 5hmC becomes enriched in euchromatin, consistent with its co-localization with PolII which mediates RNA transcription . Interestingly, the male X-chromosome was also characterized by lower 5hmC than the autosomes, suggesting that even the active X-chromosome is characterised by relatively low levels of this modification.
It has not yet been fully established whether 5hmC is a stable epigenetic mark that contributes to the regulation of genomic function, or whether it is a transient intermediate marking sites of active demethylation. Our analysis of the relationship between 5mC and 5hmC across brain development in the same samples revealed a complex relationship at individual probe sites. At a number of loci, levels of 5hmC increased or decreased in parallel with 5mC, although other locations were characterized by changes in the opposite direction during brain development. This stable accumulation or depletion of 5hmC at specific sites suggests it may not merely represent a transient intermediate of active DNA demethylation, providing further evidence to support a functional role for this modification in brain development. Alternatively, it is possible that the patterns we observe reflect differences in the regulation of demethylation at different sites.
There are several limitations to this study that should be considered. First, although this represents the first systematic analysis of 5hmC across multiple stages of human fetal brain development, legal restrictions on later-term abortions precluded the assaying of brain samples from later stages. Second, because the brain tissue was acquired frozen it was processed as a composite of brain regions and cell types and our data should therefore be seen as reflecting predominant changes in the developing human brain as a whole. Of note, however, all samples were classified as being human prefrontal cortex using an online tissue prediction algorithm developed by Steven Horvath that uses DNA methylation data to infer tissue type (https://labs.genetics.ucla.edu/horvath/dnamage/) . Third, although the Illumina 450 K platform provides accurate quantification at single-base resolution with probes associated with 99% of genes and 96% of CGIs, the array targets <2% of the CpG sites in the human genome, and probes are not equally distributed across all genomic features. To calculate 5hmC it is necessary to run two arrays per sample (following BS- and oxBS-conversion), potentially increasing technical variability. However, we found a very strong concordance between the BS-converted DNA data and our previous study of total DNA modifications in the fetal brain. This study also highlights the potential for 5hmC to confound DNA modification data produced using BS-converted DNA, the standard method for profiling 5mC. As costs reduce, future studies can apply sequencing-based genomic profiling technologies to more thoroughly interrogate DNA modifications across the genome. To provide insight regarding the cause and functional consequences of epigenetic developmental dynamics, future studies should take a multi-omic approach and integrate genomic, transcriptomic and proteomic data sets; in particular, it will be important to integrate data about other epigenetic marks, including alternative DNA modifications (i.e. 5-fluorocytosine and 5-carboxylcytosine) and histone modifications with our 5mC and 5hmC data.
We identify dynamic changes in 5hmC occurring throughout the genome during human fetal brain development. There are discrete modules of co-hydroxymethylated loci associated with brain development that are significantly enriched for genes involved in neurodevelopmental processes. There is a complex relationship between levels of 5mC and 5hmC across brain development, with notable sex differences on the X-chromosome. Finally, we show that 5hmC at specific loci is associated with genetic variation. Together, these data further highlight the role of dynamic epigenetic processes during brain development, with potential implications for understanding the origins of neurodevelopmental health and disease.
DNA modifications (DNA methylation (5mC) and DNA hydroxymethylation (5hmC)) were assessed in 72 human fetal brain samples (36 male; 36 female), spanning 23 to 184 days post-conception (DPC). Genomic DNA was isolated from fetal brain tissue acquired frozen from the Human Developmental Biology Resource (HDBR) (http://www.hdbr.org) and MRC Brain Banks Network (https://www.mrc.ac.uk/research/facilities-and-resources-for-researchers/brain-banks/) under strict ethical regulations, as previously described . DNA modifications and DNA methylation were quantified with the Illumina Infinium HumanMethylation450 BeadChip, as previously described [27, 42]. Two pre-treatments, BS and oxBS conversion, and two Illumina 450 K arrays were run for each sample. Following quality control procedures, preprocessing and normalisation, as previously described , DNA modification and DNA methylation beta values were generated, and a DNA hydroxymethylation value calculated from these measures. An overview of the method for the quantification of DNA hydroxymethylation is provided in Additional file 2: Figure S13.
Human fetal brain samples
Human fetal brain tissue was acquired from the Human Developmental Biology Resource (HDBR) (http://www.hdbr.org) and the MRC Brain Banks network (https://www.mrc.ac.uk/research/facilities-and-resources-for-researchers/brain-banks/access-for-research/). Ethical approval for the HDBR was granted by the Royal Free Hospital research ethics committee under reference 08/H0712/34 and Human Tissue Authority (HTA) material storage license 12,220; ethical approval for the MRC Brain Bank was granted under reference 08/MRE09/38. The developmental age of each sample was determined using Carnegie Staging for embryonic (defined as ≤56 DPC) samples and foot and knee to heel length measurements for fetal (defined as ≥57 DPC) samples. No additional phenotypic or demographic information was available for any of the samples used in this study. An overview of the samples is given in Additional file 2: Figure S14 and Additional file 1: Table S19. Genomic DNA was isolated using standard phenol-chloroform procedures and quantified using the Qubit® BR dsDNA assay with the Qubit® Fluorometer (Life Technologies, Thermo Fisher Scientific). Sample sex was determined via PCR amplification of a region near the amelogenin gene (AMELY) that produces different sized PCR products for the X and Y chromosomes, (977 and 788 bp, respectively) . Following DNA modification profiling, sex was confirmed using multidimensional scaling plots of microarray probes on the X-chromosome.
Quantification of DNA methylation and DNA hydroxymethylation using the Illumina 450 K array
Samples were randomized with respect to DPC and sex to avoid batch effects throughout all experimental procedures. For each sample two 500 ng aliquots of genomic DNA were subject to parallel oxBS and BS conversion (Additional file 2: Figure S13) using the TrueMethyl® 24 kit (CEGX, Cambridge, UK). Sample recovery was assessed using the Qubit® ssDNA assay, with a final sample concentration of ≥20 ng/μl considered optimal for further analysis. A digestion control assay was used to qualitatively confirm successful oxBS and BS conversion (Additional file 2: Figure S15). Total levels of DNA modification (5mC + 5hmC, BS-converted DNA) and DNA methylation (5mC, oxBS-converted DNA) were quantified using the Illumina HumanMethylation450 BeadChip (Illumina) scanned on an Illumina HiScan (Illumina). BS- and oxBS-converted DNA samples from each individual were processed together at all stages.
Data pre-processing and quality control
GenomeStudio (Illumina) was used to extract signal intensities for each probe, generating a final report that was imported into the R statistical environment 3.0.2 (http://www.r-project.org)  using the methylumi package (http://www.bioconductor.org/packages/release/bioc/html/methylumi.html). Sample identity was confirmed using multidimensional scaling to confirm concordance between the expected and reported sample sex, and via correlation of the beta values for the 65 single nucleotide polymorphism (SNP) probes for each BS- and oxBS-converted sample pair. Data quality control and preprocessing were performed using the wateRmelon package as described previously . Stringent filtering of the prenormalized Illumina 450 K data was performed using the pfilter function; CpG sites with a beadcount of <3 in 5% of samples were removed (n = 838) and CpG sites with a detection P-value >0.05 in 1% of samples (n = 19,163), additionally one sample having 5% of sites with a detection P-value >0.05 was removed. Finally, cross-reactive probes and polymorphic CpGs, as detailed in the Illumina annotation file and identified in recent publications [45, 46] were removed (Additional file 1: Table S20), leaving 411,325 probes for initial analysis (including n = 9275 X-chromosome and n = 15 Y-chromosome probes). Information for samples included in the final dataset (n = 35 males [age range = 23 to 184 DPC]; n = 36 females [age range = 37 to 161 DPC]) is shown in Additional file 1: Table S19. DNA modification (BS-converted) and DNA methylation (oxBS-converted) data were normalized separately using the dasen method within the wateRmelon package . Plotting the normalised beta density for both the BS- and oxBS-converted samples revealed a bimodal distribution for both pre-treatment approaches (Additional file 2: Figure S16). The BS-converted samples showed a clear shift to the right of the oxBS-converted samples, corresponding to combined quantification of both 5mC and 5hmC. Normalized beta values for the oxBS-converted samples (5mC) were subtracted from the normalized beta values for the corresponding BS-converted samples (5hm + 5hmC) to calculate a ‘Δ beta’ value corresponding to the level of DNA hydroxymethylation at each probe. As expected, the beta value density distribution was positively skewed (Additional file 2: Figure S17). As previously described , a threshold for defining “detectable” 5hmC was calculated from the 95th percentile of the negative beta values (which result from technical noise), and beta values >0.0357 were considered to represent reliably detected levels of 5hmC. Probes not passing this threshold in at least one sample (n = 103,501 of the 411,325 sites passing quality control) were removed (Additional file 1: Table S1); leaving 307,824 probes (298,972 = autosomal) for further analysis.
Identification of differentially modified position as (DMPs) and regions (DMRs) associated with human brain development
A multiple linear regression model was used to identify changes in 5mC and 5hmC at specific sites associated with brain development and gender. We fitted models of the form;
Autosomal probes (n = 298,972) were considered significantly associated with brain development if they passed a Bonferroni-corrected significance threshold of P-value <1.67E-07. Autosomal and X-chromosome probes (n = 307,810) were considered to be significantly associated with sex if they passed a Bonferroni-corrected significance threshold of P-value <1.62E-07. To identify DMRs, we used the Python module comb-p  to group spatially correlated DMPs (seed P-value <1.00E-04, minimum of 3 probes, Šidák-corrected P-value <0.05) at a maximum distance of 500 bp. DMR P-values were corrected for multiple testing using Šidák correction which corrects the combined P-value for na/nr tests, where na is the total number of probes tested in the initial dataset and nr the number of probes in the given region.
Identifying different slopes for changes in 5mC and 5hmC across brain development
To test for significantly different relationships across development between 5mC and 5hmC we ran a mixed effects regression model. This model was fit for the combined dataset of 5mC and 5hmC. An indicator variable to denote which modification the data point was measuring was included to prevent differences in absolute levels being misidentified as significantly different slopes with an interaction between this indicator and age to identify significantly different slope coefficients. Because the DNA modifications were quantified in the same individuals, we expect the DNA methylation and DNA hydroxymethylation values to be correlated, therefore individual was included as a random effect and other covariates as fixed effects. The fixed effects part of this model is specified here:
Weighted gene co-modification network analysis
Weighted gene co-modification network analysis (WGCNA) was used to identify modules of highly co-hydroxymethylated probes . Modules were identified from their pairwise correlations, using a signed block-wise network construction at power 5 with a merging threshold of 0.30 and n = 1000 minimum probes. Modules were assigned a color name, and a weighted average-hydroxymethylation profile, known as a module eigengene (ME), was calculated for each. Correlations between the MEs and the phenotypic traits (DPC, sex) were used to identify modules associated with these traits. The relationship between each probe and each module was assessed through calculation of module membership (MM), the absolute correlation between a probe’s DNA hydroxymethylation status and the ME, allowing the identification of the subset of probes with the strongest membership to each module. Probe significance (PS), the absolute value of correlation between each probe and phenotypic trait, was calculated to quantify the association of probes with DPC. To assess the biological meaning of modules associated with DPC and sex, genes associated with the top 1000 probes ranked by MM were extracted and assessed using pathway and gene ontology analysis.
Gene ontology analysis
A gene list was derived from the DMPs using Illumina’s gene annotation. This annotation, which comes via UCSC, is based on overlap with RefSeq genes plus 1500 bp of upstream sequence. Where probes were not annotated to any gene (i.e. in the case of intergenic locations) they were omitted from this analysis, and where probes were annotated to multiple genes all were included. A logistic regression approach was used to test if genes in this list predicted pathway membership, while controlling for the number of probes that passed quality control (i.e. were tested) annotated to each gene. Pathways were downloaded from the GO website (http://geneontology.org/) and mapped to genes including all parent ontology terms. All genes with at least one 450 K probe annotated and mapped to at least one GO pathway were considered. Pathways were filtered to those containing between 10 and 2000 genes. After applying this method to all pathways, the list of significant pathways (P-value <0.05) was refined by grouping to control for the effect of overlapping genes. This was achieved by taking the most significant pathway, and retesting all remaining significant pathways while controlling additionally for the best term. If the test genes no longer predicted the pathway, the term was said to be explained by the more significant pathway, and hence these pathways were grouped together. This algorithm was repeated, taking the next most significant term, until all pathways were considered as the most significant or found to be explained by a more significant term.
DNA methylation and DNA hydroxymethylation quantitative trait loci
Genotype data was available for 64 of the 71 samples; these were assessed for DNA methylation quantitative trait loci (mQTL) and DNA hydroxymethylation quantitative trait loci (hmQTL). The genetic data was profiled on the Illumina HumanOmniExpress BeadChip (Illumina) chip and imputed as described previously . SNPs were then filtered with PLINK  excluding variants with >1% missing values, Hardy-Weinburg equilibrium P-value <0.0001 or a minor allele frequency of <5%. Subsequently, SNPs were also filtered so that each of the three genotype groups with 0, 1, or 2 minor alleles (or two genotype groups in the case of rarer SNPs with 0 or 1 minor allele) had a minimum of 5 observations. An additive linear model was fitted using MatrixEQTL  to test if the number of alleles (coded 0, 1, or 2) predicted 5mC or 5hmC at each site, including covariates for age, sex, and the first two principal components from the genotype data. To test for a significantly different genetic effect on 5hmC compared to 5mC, all significant mQTL and hmQTL were subsequently tested for heterogeneous effects. To this end, a multi-level model was fitted across the data for both modifications using the R packages lme4  and lmerTest . In these models genotype, age, sex, the first two principal components and an indicator variable to distinguish 5mC from 5hmC measurements were included as fixed effects, and individual was included as a random effect. Two models with and without an interaction term between genotype and the modification indicator variable were fitted for each QTL and the heterogeneity P-value was calculated from an ANOVA comparing these two models.
Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009;10:252–63.
Lister R, Mukamel EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, Lucero J, Huang Y, Dwork AJ, Schultz MD, et al. Global epigenomic reconfiguration during mammalian brain development. Science. 2013;341:1237905.
Spiers H, Hannon E, Schalkwyk LC, Smith R, Wong CC, O'Donovan MC, Bray NJ, Mill J. Methylomic trajectories across human fetal brain development. Genome Res. 2015;25:338–52.
Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, Agarwal S, Iyer LM, Liu DR, Aravind L, Rao A. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. 2009;324:930–5.
Chen T, Ueda Y, Dodge JE, Wang Z, Li E. Establishment and maintenance of genomic methylation patterns in mouse embryonic stem cells by Dnmt3a and Dnmt3b. Mol Cell Biol. 2003;23:5594–605.
Kriaucionis S, Heintz N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science. 2009;324:929–30.
Khare T, Pai S, Koncevicius K, Pal M, Kriukiene E, Liutkeviciute Z, Irimia M, Jia P, Ptak C, Xia M, et al. 5-hmC in the brain is abundant in synaptic genes and shows differences at the exon-intron boundary. Nat Struct Mol Biol. 2012;19:1037–43.
Spruijt CG, Gnerlich F, Smits AH, Pfaffeneder T, Jansen PW, Bauer C, Munzel M, Wagner M, Muller M, Khan F, et al. Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives. Cell. 2013;152:1146–59.
Iurlaro M, Ficz G, Oxley D, Raiber EA, Bachman M, Booth MJ, Andrews S, Balasubramanian S, Reik W. A screen for hydroxymethylcytosine and formylcytosine binding proteins suggests functions in transcription and chromatin regulation. Genome Biol. 2013;14:R119.
Jin SG, Kadam S, Pfeifer GP. Examination of the specificity of DNA methylation profiling techniques towards 5-methylcytosine and 5-hydroxymethylcytosine. Nucleic Acids Res. 2010;38:e125.
Yildirim O, Li R, Hung JH, Chen PB, Dong X, Ee LS, Weng Z, Rando OJ, Fazzio TG. Mbd3/NURD complex regulates expression of 5-hydroxymethylcytosine marked genes in embryonic stem cells. Cell. 2011;147:1498–510.
Mellen M, Ayata P, Dewell S, Kriaucionis S, Heintz N. MeCP2 binds to 5hmC enriched within active genes and accessible chromatin in the nervous system. Cell. 2012;151:1417–30.
Santiago M, Antunes C, Guedes M, Sousa N, Marques CJ. TET enzymes and DNA hydroxymethylation in neural development and function - how critical are they? Genomics. 2014;104:334–40.
Kaas GA, Zhong C, Eason DE, Ross DL, Vachhani RV, Ming GL, King JR, Song H, Sweatt JD. TET1 controls CNS 5-methylcytosine hydroxylation, active DNA demethylation, gene transcription, and memory formation. Neuron. 2013;79:1086–93.
Rudenko A, Dawlaty MM, Seo J, Cheng AW, Meng J, Le T, Faull KF, Jaenisch R, Tsai LH. Tet1 is critical for neuronal activity-regulated gene expression and memory extinction. Neuron. 2013;79:1109–22.
Zhang RR, Cui QY, Murai K, Lim YC, Smith ZD, Jin S, Ye P, Rosa L, Lee YK, Wu HP, et al. Tet1 regulates adult hippocampal neurogenesis and cognition. Cell Stem Cell. 2013;13:237–45.
Wen L, Li X, Yan L, Tan Y, Li R, Zhao Y, Wang Y, Xie J, Zhang Y, Song C, et al. Whole-genome analysis of 5-hydroxymethylcytosine and 5-methylcytosine at base resolution in the human brain. Genome Biol. 2014;15:R49.
Song CX, Szulwach KE, Fu Y, Dai Q, Yi C, Li X, Li Y, Chen CH, Zhang W, Jian X, et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol. 2011;29:68–72.
Szulwach KE, Li X, Li Y, Song CX, Wu H, Dai Q, Irier H, Upadhyay AK, Gearing M, Levey AI, et al. 5-hmC-mediated epigenetic dynamics during postnatal neurodevelopment and aging. Nat Neurosci. 2011;14:1607–16.
Chen Y, Damayanti NP, Irudayaraj J, Dunn K, Zhou FC. Diversity of two forms of DNA methylation in the brain. Front Genet. 2014;5:46.
Zhubi A, Chen Y, Dong E, Cook EH, Guidotti A, Grayson DR. Increased binding of MeCP2 to the GAD1 and RELN promoters may be mediated by an enrichment of 5-hmC in autism spectrum disorder (ASD) cerebellum. Transl Psychiatry. 2014;4:e349.
Dong E, Gavin DP, Chen Y, Davis J. Upregulation of TET1 and downregulation of APOBEC3A and APOBEC3C in the parietal cortex of psychotic patients. Transl Psychiatry. 2012;2:e159.
Huang Y, Pastor WA, Shen Y, Tahiliani M, Liu DR, Rao A. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS One. 2010;5:e8888.
Skvortsova K, Zotenko E, Luu PL, Gould CM, Nair SS, Clark SJ, Stirzaker C. Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA. Epigenetics Chromatin. 2017;10:16.
Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W, Balasubramanian S. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science. 2012;336:934–7.
Booth MJ, Ost TW, Beraldi D, Bell NM, Branco MR, Reik W, Balasubramanian S. Oxidative bisulfite sequencing of 5-methylcytosine and 5-hydroxymethylcytosine. Nat Protoc. 2013;8:1841–51.
Lunnon K, Hannon E, Smith RG, Dempster E, Wong C, Burrage J, Troakes C, Al-Sarraj S, Kepa A, Schalkwyk L, Mill J. Variation in 5-hydroxymethylcytosine across human cortex and cerebellum. Genome Biol. 2016;17:27.
Hahn MA, Qiu R, Wu X, Li AX, Zhang H, Wang J, Jui J, Jin SG, Jiang Y, Pfeifer GP, Lu Q. Dynamics of 5-hydroxymethylcytosine and chromatin marks in mammalian neurogenesis. Cell Rep. 2013;3:291–300.
Chopra P, Papale LA, White AT, Hatch A, Brown RM, Garthwaite MA, Roseboom PH, Golos TG, Warren ST, Alisch RS. Array-based assay detects genome-wide 5-mC and 5-hmC in the brains of humans, non-human primates, and mice. BMC Genomics. 2014;15:131.
Field SF, Beraldi D, Bachman M, Stewart SK, Beck S, Balasubramanian S. Accurate measurement of 5-methylcytosine and 5-hydroxymethylcytosine in human cerebellum DNA by oxidative bisulfite on an array (OxBS-array). PLoS One. 2015;10:e0118202.
Stewart SK, Morris TJ, Guilhamon P, Bulstrode H, Bachman M, Balasubramanian S, Beck S. oxBS-450K: a method for analysing hydroxymethylation using 450K BeadChips. Methods. 2015;72:9–15.
Yu M, Hon GC, Szulwach KE, Song CX, Jin P, Ren B, He C. Tet-assisted bisulfite sequencing of 5-hydroxymethylcytosine. Nat Protoc. 2012;7:2159–70.
Pedersen BS, Schwartz DA, Yang IV, Kechris KJ. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics. 2012;28:2986–8.
Slieker RC, Bos SD, Goeman JJ, Bovee JV, Talens RP, van der Breggen R, Suchiman HE, Lameijer EW, Putter H, van den Akker EB, et al. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenetics Chromatin. 2013;6:26.
Hahn MA, Szabo PE, Pfeifer GP. 5-Hydroxymethylcytosine: a stable or transient DNA modification? Genomics. 2014;104:314–23.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
Hannon E, Spiers H, Viana J, Pidsley R, Burrage J, Murphy TM, Troakes C, Turecki G, O'Donovan MC, Schalkwyk LC, et al. Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci. 2016;19(1):48–54.
Yu M, Hon GC, Szulwach KE, Song CX, Zhang L, Kim A, Li X, Dai Q, Shen Y, Park B, et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell. 2012;149:1368–80.
Lopez CM, Lloyd AJ, Leonard K, Wilkinson MJ. Differential effect of three base modifications on DNA thermostability revealed by high resolution melting. Anal Chem. 2012;84:7336–42.
Mendonca A, Chang EH, Liu W, Yuan C. Hydroxymethylation of DNA influences nucleosomal conformation and stability in vitro. Biochim Biophys Acta. 1839;2014:1323–9.
Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:R115.
Pidsley R, Y Wong CC, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14:293.
Eng B, Ainsworth P, Waye JS. Anomalous migration of PCR products using nondenaturing polyacrylamide gel electrophoresis: the amelogenin sex-typing system. J Forensic Sci. 1994;39:1356–9.
R Development Core Team. R: A Language and Environment for Statistical Computing. Austria: the R Foundation for Statistical Computing; 2011. Available online at http://www.R-project.org/.
Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–9.
Price ME, Cotton AM, Lam LL, Farre P, Emberly E, Brown CJ, Robinson WP, Kobor MS. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin. 2013;6:4.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–8.
Bates D, Maechler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.
lmerTest: Tests in Linear Mixed Effects Models v. R package version 2.0–30. 2016. https://cran.r-project.org/web/packages/lmerTest/index.html.
This project was supported by a Brain & Behavior Research Foundation Distinguished Investigator Award to J.M. H.S. was supported by an MRC PhD studentship. The human embryonic and fetal material was provided by the Joint MRC (grant #G0700089)/Wellcome Trust (grant #GR082557) Human Developmental Biology Resource (http://www.hdbr.org).
Availability of data and materials
Raw and normalized Illumina 450 K methylation data has been submitted to the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE94014. A searchable database of our fetal brain epigenomics data is available at http://www.epigenomicslab.com/online-data-resources.
Ethics approval and consent to participate
Ethical approval for the HDBR was granted by the Royal Free Hospital research ethics committee under reference 08/H0712/34 and Human Tissue Authority (HTA) material storage license 12,220; ethical approval for the MRC Brain Bank was granted under reference 08/MRE09/38.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.