Sex differences in DNA methylation assessed by 450 K BeadChip in newborns

Background DNA methylation is an important epigenetic mark that can potentially link early life exposures to adverse health outcomes later in life. Host factors like sex and age strongly influence biological variation of DNA methylation, but characterization of these relationships is still limited, particularly in young children. Methods In a sample of 111 Mexican-American subjects (58 girls , 53 boys), we interrogated DNA methylation differences by sex at birth using the 450 K BeadChip in umbilical cord blood specimens, adjusting for cell composition. Results We observed that ~3 % of CpG sites were differentially methylated between girls and boys at birth (FDR P < 0.05). Of those CpGs, 3031 were located on autosomes, and 82.8 % of those were hypermethylated in girls compared to boys. Beyond individual CpGs, we found 3604 sex-associated differentially methylated regions (DMRs) where the majority (75.8 %) had higher methylation in girls. Using pathway analysis, we found that sex-associated autosomal CpGs were significantly enriched for gene ontology terms related to nervous system development and behavior. Among hits in our study, 35.9 % had been previously reported as sex-associated CpG sites in other published human studies. Further, for replicated hits, the direction of the association with methylation was highly concordant (98.5–100 %) with previous studies. Conclusions To our knowledge, this is the first reported epigenome-wide analysis by sex at birth that examined DMRs and adjusted for confounding by cell composition. We confirmed previously reported trends that methylation profiles are sex-specific even in autosomal genes, and also identified novel sex-associated CpGs in our methylome-wide analysis immediately after birth, a critical yet relatively unstudied developmental window. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2034-y) contains supplementary material, which is available to authorized users.


Background
There is a growing interest in examining the role epigenetic marks like histone modifications, non-coding RNAs, and DNA methylation may play as biological mechanisms through which environmental exposures and other physiological and lifestyle factors can lead to disease. Unlike genetics, epigenetic modifications are dynamic and can change over time or in response to exposures. Furthermore, host factors such as sex and age also contribute to interindividual differences in epigenetic markers.
Previous studies of DNA methylation using the Illumina 27 K BeadChip methylation array have reported autosomal differentially methylated positions (DMPs) or CpG sites with varying methylation between males and females, providing evidence that it will be important to adjust for sex in analysis of methylation data [1][2][3][4][5][6]. However, these studies did not account for the existence of non-specific probes for autosomal CpGs that cross react with CpGs on sex chromosomes, thereby yielding false positives [7]. Recently, McCarthy et al. published a meta-analysis of 76 studies all using the 27 K BeadChip array to identify sex-associated autosomal DMPs across specimens from multiple tissue types from adults and children [8]. After excluding the sex-biased cross-reactive probes, they identified 184 DMPs that were associated with sex.
While McCarthy et al. identified several interesting autosomal DMPs, their study focused on methylation assessed by the 27 K BeadChip. In 2011, Illumina released a new version of their methylation array, the 450 K BeadChip, which greatly expanded the number of CpGs interrogated to over 480,000 sites. Further, their approach was restricted to identification of individual DMPs rather than differentially methylated regions (DMRs). DMR-finding approaches have several advantages over considering CpG sites individually, including decreased likelihood of hits from technical artifacts and possibly improved functional impact of results.
As methylation is cell-type specific and immune cell profiles have been shown to vary between sexes, consideration of cell composition is of utmost importance in methylation studies [9,10]. Yet previous studies of sexassociated differences in methylation [1][2][3][4][5][6] haven't taken this into account in their analyses. White blood cell composition can be estimated from 450 K BeadChip data computationally in adults [11,12], but these estimates are not appropriate for use for young children in their current implementation [13]. As an alternative, differential cell count (DCC) can be employed to effectively determine such cell type proportions (% lymphocytes, monocytes, neutrophils, eosinophils, and basophils) in cord blood samples.
Here, we use the 450 K BeadChip to assess sex differences in DNA methylation from umbilical cord blood from boys and girls participating in a large epidemiologic cohort followed by the Center for the Health Assessment of Mothers and Children of Salinas (CHAMA-COS) study. We use DCCs to account for white blood cell composition. In addition to interrogating DMPs, we apply the newly released 'DMRcate' methodology [14] to identify sex-associated DMRs in newborns.

Study population
The CHAMACOS study is a longitudinal birth cohort study of the effects of exposure to pesticides and environmental chemicals on the health and development of Mexican-American children living in the agricultural region of Salinas Valley, CA. Detailed description of the CHAMACOS cohort has previously been published [15,16]. Briefly, 601 pregnant women were enrolled in 1999-2000 at community clinics and 527 liveborn singletons were born. Follow up visits occurred at regular intervals throughout childhood. For this analysis, we include the subset of subjects that had both 450 K BeadChip data and differential cell count analysis available at birth (n = 111). Mothers retained in the study subset had a mean age of 25.8 years (±5.1 SD) at time of delivery. Study protocols were approved by the University of California, Berkeley Committee for Protection of Human Subjects. Written informed consent was obtained from all mothers.

Blood collection and processing
Cord blood was collected and stored in both heparin coated BD vacutainers (Becton, Dickinson and Company, Franklin Lakes, NJ) and vacutainers without anticoagulant at the same time. Blood clots from anticoagulant-free vacutainers were stored at −80°C and used for isolation of DNA for DNA methylation analysis. Heparinized cord blood was used to prepare whole blood slides using the push-wedge blood smearing technique [17] and stored at −20°C until staining for differential white blood cell count.

DNA preparation
DNA isolation was performed using QIAamp DNA Blood Maxi Kits (Qiagen, Valencia, CA) according to manufacturer's protocol with small, previously described modifications [18]. Following isolation, all samples were checked for DNA quality and quantity by Nanodrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA). Those with good quality (260/280 ratio exceeding 1.8) were normalized to a concentration of 50 ng/ul.

K BeadChip DNA methylation analysis
DNA samples were bisulfite converted using Zymo Bisulfite Conversion Kits (Zymo Research, Irvine, CA), whole genome amplified, enzymatically fragmented, purified, and applied to Illumina Infinium HumanMethyla-tion450 BeadChips (Illumina, San Diego, CA) according to manufacturer protocol. Locations of samples from boys and girls were randomly assigned across assay wells, chips and plates to prevent any batch bias. 450 K BeadChips were handled by robotics and analyzed using the Illumina Hi-Scan system. DNA methylation was measured at 485,512 CpG sites.
Probe signal intensities were extracted by Illumina GenomeStudio software (version XXV2011.1, Methylation Module 1.9) methylation module and back subtracted. Systematic QA/QC was performed, including assessment of assay repeatability, batch effects using 38 technical replicates, and data quality established as previously described [19]. Samples were retained only if 95 % of sites assayed had detection P > 0.01. Color channel bias, batch effects and difference in Infinium chemistry were minimized by application of All Sample Mean Normalization (ASMN) algorithm [19], followed by Beta Mixture Quantile (BMIQ) normalization [20]. Sites with annotated probe SNPs and with common SNPs (minor allele frequency >5 %) within 50 bp of the target identified in the MXL (Mexican ancestry in Los Angeles, California) HapMap population were excluded from analysis (n = 49,748). Probes where 95 % of samples had detection P > 0.01 were also dropped (n = 460). Since our analysis was focused on CpG sites associated with sex, we excluded sites on the Y chromosome (n = 95) and X-chromosome cross-reactive probes (n = 29,233) identified by Chen and colleagues [7]. Remaining CpGs included 410,072 sites for analysis of sex. Methylation values at all sites were logit transformed to the M-value scale to better comply with modeling assumption [21].

Differential cell counts
Whole blood smear slides were stained utilizing a Diff-Quik ® staining kit, a modern commercial variant of the Romanovsky stain, a histological stain used to differentiate cells on a variety of smears and aspirates. This staining highlights cytoplasmic details and neurosecretory granules, which are utilized to characterize the differential white blood count. The staining kit is composed of a fixative (3:1 methanol: acetic acid solution), eosinophilic dye (xanthene dye), basophilic dye (dimethylene blue dye) and wash (deionized water). For consistency and to ensure the best results the slides were all fixed for 15 min at 23°C (room temperature), stained in both the basophilic dye and eosinophilic dye for 5 s each and washed after each staining period to prevent the corruption of the dye.
Slides were scored for white blood cell type composition by Zeiss Axioplan light microscope with 100× oil immersion lens. Scoring was conducted at the perceived highest density of white blood cells using the standard battlement track scan method, which covers the entire width of a slide examination area. Counts for each of the five identifiable cell types (lymphocytes, monocytes, neutrophils, eosinophils, and basophils) were recorded by a dedicated mechanical counter. At least 100 cells were scored for each slide following validation of reproducibility by the repeated scoring of 5 sets of 100 cells from the same slide (CV ≤ 5 %).

DMP analysis
Association between sex at birth and differential 450 K DNA methylation at individual CpGs was performed by linear regression, adjusting for DCC variables and analysis batch. This analysis was performed using R statistical computing software (v3.1.0) [22]. Although DCC estimates were not significantly associated with sex, we chose to include them in the model because likelihood ratio tests showed that including them improved model fit for more than 2000 of the CpG sites assessed by 450 K BeadChip. We also examined gestational age and subject birthweight as possible covariates since both have been shown to be associated with DNA methylation [23], and performed sensitivity analysis to assess their potential impact. However, neither was associated with child sex or contributed to improved model fit.
P-values were corrected for multiple testing using a Benjamini-Hochberg (BH) FDR threshold of 0.05 [24].

Enrichment of annotated genomic features
Comparison of sex-DMP results to annotated function categories, including relation to genes(TSS1500, TSS200, 5′UTR, 1stExon, Body, 3′UTR, Intergenic) and CpG islands (Island, Shore, Shelf, Open Sea), was performed using UCSC Genome Browser annotations supplied by Illumina. A χ 2 test of independence with 1°of freedom was used to determine whether there was evidence of enrichment among DMP results (P value < 0.05).

DMR analysis
Identification of sex-associated DMRs was performed using the method described by Peters et al. [14] and implemented in the DMRcate Bioconductor R-package [25]. The approach begins by fitting a standard limma linear model to all CpG sites in parallel [26]. This model was parameterized identically to the DMP analysis with sex as the binary predictor of interest, adjusting for DCC variables and analysis batch. The CpG site test statistics were then smoothed by chromosome according to the DMRcate defaults, which employs a Gaussian kernel smoother with bandwidth λ = 1000 base pairs (bp) and scaling factor C = 2. The resulting kernel-weighted local model fit statistics were compared to modeled values using the method of Satterthwaite [27] to produce pvalues that are adjusted for multiple testing using a BH FDR threshold of 0.05 [24]. Regions or DMRs were assigned by grouping FDR significant sites that are a maximum of λ bp from one another and contain at least two or more CpGs. Under this method, CpGs are collapsed into DMRs without considering the direction of the association with the predictor (i.e. sex). The minimum BH-adjusted p-value within a given DMR is taken as representative of the statistical inference for that region and the maximum fold change in methylation values (here on the M-value scale) summarizes the effect size.

Gene ontology analysis
Gene ontology term enrichment analysis was performed by DAVID [28,29], WebGestalt (WEB-based Gene SeT AnaLysis) [30], and ConsensusPathDB [31], using hypergeometric distribution to assess enrichment significance. Visualization of results and GO term categorization by semantic similarity dimension reduction was performed by REVIGO [32].

Sex-associated differentially methylated positions in newborns
Analysis of DNA methylation differences between newborn boys and girls was performed by linear regression for 450 K BeadChip CpGs among subjects with DCC measurements (n = 111; 58 girls and 53 boys), adjusting for cell composition and batch (Table 1). After data cleaning, n = 410,072 CpGs were analyzed, which excluded sites previously reported to exhibit sex-chromosome specific cross-reactivity [7]. Resulting p-values were plotted by chromosome, with sites having higher methylation levels in girls compared to boys plotted above the x-axis and those with lower levels plotted below (Fig. 1). After adjustment for multiple testing (FDR p < 0.05), we identified 11,776 CpGs that differed significantly by sex in newborns ( Table 2). Of those hits, the majority of sites had higher methylation in girls compared to boys (69.0 %). This trend was consistent on both the X chromosome (64.3 % of sites higher in girls) and in autosomes (82.8 %). While the majority of hits were found on the X chromosome (74.3 %), a substantial number were also identified on autosomes (3031 or 25.7 %; Table 2).
As differential hypermethylation is to be expected for girls due to X-inactivation [33][34][35], we focused characterization of results on autosomal sites showing sex differences (Table 3 and Additional file 1). Most of these were located in CpG shores, islands and open sea (40.4, 40.1, and 15.4 %, respectively) ( Fig. 2 and Table 4). In comparison, shelf regions had the lowest percentage of hits (4.1 %). To assess whether the overrepresentation of hits in CpG islands and shores was due to the design of the 450 K BeadChip, we compared the number of hits in each functional category with the number of CpG sites included in the assay. Both shores and CpG islands were significantly overrepresented among all autosomal hits compared to the 450 K background (χ 2 = 486.1, P < 0.01 and χ 2 = 95.5, P < 0.01), while shelves and the open sea hits were underrepresented (each with P < 0.01). For CpG sites that were hypermethylated in girls compared to boys, we also observed overrepresentation in CpG islands and shores, and underrepresentation in shelf and open-sea locations (all P < 0.01). Sites that were hypomethylated in girls compared to boys were underrepresented in the open sea (30.3 %, P < 0.01) and shelves (5.6 %, P < 0.01). Hypomethylated sites were enriched at islands (χ 2 = 6.53, P = 0.01), but did not deviate significantly from the 450 K representation of shores (χ 2 = 3.42, P = 0.06).
The 11,776 CpG hits differentially methylated between newborn boys and girls were found in 2250 unique genes, and 1430 (63.6 %) of these genes were located on autosomes. Many genes contained multiple significant sites, with an average of 4.7 CpGs per gene and a maximum of 114 CpGs. However, the largest portion of sex-associated autosomal hits (30.4 %) was located in intergenic regions and seen at lower than expected frequency in gene bodies (P < 0.01) (Fig. 2). Near gene transcription starting points (TSS200, 5′UTR, and first exons), all categories were either lower than 450 K CpG design frequencies or did not deviate from them significantly. Further upstream (TS1500), hits that were hypermethylated in girls were significantly enriched (χ 2 = 108.5, P < 0.01) while those showing decreased methylation were underepresented (χ 2 = 13.3, P < 0.01). At the end of genes (3′UTR), hits that had higher methylation for girls were underrepresented (2.4 %, P < 0.01), while hits having higher methylation for boys did not deviate from expected 450 K frequencies (3.6 %, p = 0.97).
Examining the autosomal genes containing sexassociated DMPs for enrichment of particular gene ontology (GO) terms identified 278 pathways that were significantly enriched (FDR P < 0.05 and at least 5 genes per GO term) ( Table 5). These enriched GO terms fell into several broad categories including: 1) nervous system development, 2) behavior, 3) cellular

Sex-associated differentially methylated regions in newborns
Additionally, identification of groups of CpGs with 450 K BeadChip methylation differences between newborn boys and girls was performed using the DMR-finding algorithm DMRcate [14,25]. This approach identifies and ranks DMRs by Gaussian kernel smoothing of results from linear models for individual CpGs that were adjusted for cell composition and array batch (see Methods for details). A total of 3604 DMRs were significantly associated with sex in newborns after correcting for multiple testing (FDR p < 0.05; Table 6 and Additional files 3 and 4). These spanned 2608 genes and contained a total of 22,402 unique CpGs. The number of sites within the DMRs ranged from 2 to 99 CpGs, with 50 % of DMRs containing 5 or more CpGs and 25 % having 8 or more. Further, DMR length averaged 863.8 bp, and ranged from 3 to Fig. 1 Manhattan plot for association between child sex and DNA methylation at all 450 K CpGs, adjusting for batch and cell composition by differential cell count (DCC). Associations where methylation was higher for girls relative to boys are plotted above the x-axis, while those with decreased methylation are plotted below. CpGs meeting FDR multiple testing threshold of (P < 0.05) shown in red Number of CpGs significantly hyper-and hypo-methylated in newborn girls compared to boys at FDR multiple testing threshold (q < 0.05), for all CpGs, and then stratified by autosomes and the X chromosome 16.5 kb. Figure 3 shows the DNA methylation levels for boys and girls at two example top DMRs. Figure 3a shows 7 CpG sites in a DMR that had higher methylation for girls in a region spanning the PPP1R3G transcription factor on chromosome 6. While Fig. 3b shows a 8 CpGs from a DMR with lower methylation among girls in the promoter of PIWIL1, which is an important gene for stem cell proliferation and inhibition of transposon migration [36,37]. As with DMPs, the majority of sex-associated DMRs had higher methylation in girls compared to boys (75.8 %; Additional file 3: Table S1). This was true for both autosomes and sex chromosomes when considered individually, with 83.8 and 58.5 % of DMRs having higher methylation in girls, respectively. However, a greater total number of DMRs identified were located on autosomes (2471 or 68.6 %) compared to the X chromosome. Similarly, the 70.3 % of the genes covered by sex-associated DMRs were located on autosomes. Further, while the DMRcate method does not constrain all CpGs within a DMR to have the same direction of association with the predictor of interest, we found that the majority of DMRs had 100 % concordance across CpGs in the direction of effect with sex (Additional file 5). Regression coefficients, β girl , are reported in M-value scale for the change in methylation of girls relative to boys. Girl and boy mean methylation levels are shown on the β value or % methylation a Positions shown for hg19 (Genome Reference Consortium GRCh37) genome assembly Comparison of the individual site results (DMPs) with the DMR findings revealed that of the 11,776 CpG sites associated with sex in the DMP analysis, 9, 941 (84.4 %) were also included in a DMR. On autosomes, DMRs included 83.2 % of sites found as sex-associated DMPs. Conversely, the DMRs added 12,461 total sites (11,719 on autosomes) that had not been found by DMP analysis alone.

Discussion
Here, we assessed methylation sex differences in newborns as determined by 450 K BeadChip. Using reliable DCC estimates, our results are the first reported EWAS analysis by sex at birth that adjusted for confounding by cell composition. To our knowledge, we are also the first study to assess regions of differential methylation associated with sex in addition to considering all CpG sites  individually. We identified a large numbers of Xchromosome CpG sites with higher methylation in girls, which is most likely attributable to X-inactivation [33,38]. Interestingly, we further demonstrated that a substantial number of autosomal sites and regions also appear hypermethylated in females ( Fig. 1 and Table 2).
To assess the consistency of our findings with those of prior analyses, autosomal CpG sites identified as differentially methylated by sex in the current analysis were compared to hits from the three most similar published studies to date (Table 7) [8,39,40]. These studies differed from ours either in DNA methylation analysis platform (27 K in McCarthy et al. [18]) or in tissue type used (Xu et al. [39] in human prefrontal cortex and Hall et al. [40] in pancreatic isolates). Although the metaanalysis performed by McCarthy et al. included some studies in umbilical cord blood, most of the studies were performed in adult tissues. Each study found between 184 and 614 autosomal CpG sites that were differentially methylated in association with sex (total of n = 1192 unique sites across all three studies). Our results replicated 428 (35.9 %) of all hits, and 29.4-42.4 % by different studies. Further, among replicated sites we observed 98.5-100 % concordance in the direction of methylation differences. While there was substantial overlap between our autosomal sex-associated hits and these previously published results, 2603 or 85.9 % of our results are novel findings, some of which may be specific to the time point and tissue assessed (umbilical cord blood). Our larger number of hits is likely due to the increased coverage of the 450 K BeadChip. In fact, when considered as a percentage of the number of sites analyzed, we observed a comparable portion of autosomal hits to that found by McCarthy and colleagues using the 27 K platform (0.74 and 0.68 % respectively; P = 0.25).
Importantly, the autosomal methylation increases we observed were most concentrated in CpG islands and shores (Fig. 2a). As this trend was not evaluated in past studies, it should be explored and confirmed in additional datasets. Further, our findings that neurodevelopmental ontology terms were strongly enriched among our autosomal findings suggests that DNA methylation may contribute to differences in cognitive processes early in life. This is consistent with sex differences in brain development and rates of maturation that have previously been observed by magnetic resonance imaging in slightly older children (6-17 years of age) [41] and represent a possible regulatory mechanism requiring additional investigation.
Our autosomal hits included several genes already known to exhibit sex-specific functions. These included the male fertility and spermatogenesis related genes identified by McCarthy and colleagues (DDX43, NUPL1, CRISP2, FIGNL1, SPESP1 and SLC9A2). One of our top hits showing increased methylation for girls (Table 3) included SLC6A4, Solute Carrier Family 6, that is involved in presynaptic reuptake of norepinephrine and has been implicated in several neurological disorders with sexdifferences in prevalence [42][43][44]. Similarly, we observed novel sex differences in the SHANK2 and SHANK3 scaffolding protein genes that have been associated with autism spectrum disorders (Tables 3 and 6, Additional file 1) [45,46]. Further, our hits included the homeobox containing transcription factor EMX2, Empty Spiracles Homeobox2, that is required for sexual differentiation and gonadal development [47] and we found to be hypermethylated among girls (Additional file 1). The DMR analysis confirmed several trends observed by analyzing CpGs individually. In particular, DMR results again showed that girls tend to exhibit hypermethylation compared to boys. Also, many CpGs found to be autosomal DMPs were separately identified as being located within sex-associated DMRs. Besides confirming many of the findings in the DMP analysis, the application of DMR-finding substantially expanded the number of CpG sites considered significant. These results demonstrate that considering methylation over regions rather than single CpG sites may be a more effective way to identify differentially methylated sites and genes of interest.

Conclusions
We confirmed and expanded previously identified trends in autosomal and X-chromosome methylation sex differences during a previously unstudied window in child development, immediately after birth, likely critical in establishing long term health. This strategy to assess epigenetic perturbation as near as possible to the prenatal period remains a high priority in light of the fetal origins of human disease hypothesis [48][49][50][51].