Epigenetic analysis of high and low motile sperm populations reveals methylation variation in satellite regions within the pericentromeric position and in genes functionally related to sperm DNA organization and maintenance in Bos taurus

Background Sperm epigenetics is an emerging area of study supported by observations reporting that abnormal sperm DNA methylation patterns are associated with infertility. Here, we explore cytosine-guanine dinucleotides (CpGs) methylation in high (HM) and low motile (LM) Bos taurus sperm populations separated by Percoll gradient. HM and LM methylation patterns were investigated by bisulfite sequencing. Results Comparison between HM and LM sperm populations revealed that methylation variation affects genes involved in chromatin organization. CpG Islands (CGIs), were highly remodelled. A high proportion of CGIs was found to be methylated at low/intermediate level (20–60%) and associated to the repetitive element BTSAT4 satellite. The low/intermediate level of methylation in BTSAT4 was stably maintained in pericentric regions of chromosomes. BTSAT4 was hypomethylated in HM sperm populations. Conclusions The characterization of the epigenome in HM and LM Bos taurus sperm populations provides a first step towards the understanding of the effect of methylation on sperm fertility. Methylation variation observed in HM and LM populations in genes associated to DNA structure remodelling as well as in a repetitive element in pericentric regions suggests that maintenance of chromosome structure through epigenetic regulation is probably crucial for correct sperm functionality.


Background
Male infertility is a complex disorder affecting humans as well as other animals. Infertility is partially explained by physiological and biochemical factors, such as low sperm counts and poor sperm quality. The genetic basis of male infertility accounts for about 15% of infertile cases [1,2]. The etiology of this disorder remains unclear both in human and other species. For example, bulls considered of high-merit based on different sperm traits such as spermatozoa motility and morphology, are sometimes unable to produce successful full-term pregnancies [3,4]. Different molecular parameters related to sperm nuclear and mitochondrial DNA, plasma membrane and lipid composition affect the ability of spermatozoa to fertilize oocytes and contribute to normal embryo development [5][6][7]. Therefore, much remains to be understood and novel molecular approaches may help to unravel the molecular basis of infertility.
Among the known epigenetic processes in mammalian cells, DNA methylation has been identified as an important regulatory mechanism of genome function in normal embryonic development, X-chromosome inactivation and genomic imprinting [8,9]. DNA methylation of the 5-carbon position in cytosine residues was reported to be predominantly present in cytosine-guanine dinucleotides (CpG) and especially in GC rich regions called CpG islands (CGIs) [10]. CGIs methylation in different genomic features impacts gene expression, i.e., promoter demethylation is associated with gene expression, while methylation in gene bodies influences splicing [11]. Methylation is also observed in Repetitive Elements (RE) of adult cells playing a role in the maintenance of chromosome structure and genome integrity [12].
Sperm epigenetic marks are unique, thus the factors that determine the patterns of DNA methylation differ between male germ cells and somatic cells. Although RE are highly methylated in both germ and somatic cells, elements from several subfamilies show different levels of methylation in the two cell types [13]. Centromeric regions in spermatogonia are known to be less methylated compared to somatic tissues [14]. This methylation pattern is supposed to play a role in germ-cell chromatin organization, rather than in the control of gene expression [15]. Most of the epigenetic signatures in germ cells are erased after conception from the morula stage to the blastocyst stage in the inner cell mass (ICM); successively, a sharp increase in the level of methylation in the embryo is observed following implantation [16,17]. However, a proper regulation of epigenetic processes during spermatogenesis is necessary to ensure embryonic development in addition to sperm function. The level of DNA methylation of round spermatid was reported to be different from that of mature spermatozoa. Round spermatid rather than mature spermatozoa microinsemination was also observed to profoundly influence epigenetic marks in the embryo, thus affecting embryonic development and male fertility [18].
Aberrant locus specific or global methylation has been associated to abnormal semen parameters, as well as to male infertility. Oligospermic patients were reported to present a hypomethylation or unmethylation pattern at the H19 gene encoding insulin-like growth factor 2 (IGF2) imprinting control region 1 (ICR1); furthermore, hypermethylation at the Mesoderm-specific transcript (MEST) imprinted locus as well as a reduced sperm quality, as compared with normozoospermic men [19]. Broad DNA hypermethylation across many loci, also including the Satellite 2 repetitive element, was associated to poor sperm concentration and motility and to morphology alterations in abnormal human sperm [20]. The level of DNA methylation in human sperm, determined through an ELISA-like method, was correlated to conventional sperm parameters, e.g. concentration and motility, as well as sperm chromatin and DNA integrity, but not to sperm viability and morphology [21]. DNA methylation in human spermatozoa was higher in low quality spermatozoa [22]. Pyrosequencing analysis of long interspersed elements (LINE) in human sperm, after bisulfite conversion, estimated an overall global methylation of about 75%, increasing with age. At the same time, targeted bisulfite sequencing of different selected genes showed a lower methylation level with a strong trend toward age-associated hypomethylation in some genomic regions [23]. Targeted bisulfite sequencing also revealed different levels of methylation in the promoter regions between high and low motile human sperm [24].
In farm animals, several studies showed altered sperm methylation to be associated with male infertility. DNA methylation pattern were found to be different between spermatozoa from high-fertile and sub-fertile buffalo bulls [25]. Recently, assessment of the epigenetic signature of bull spermatozoa using a human DNA methylation microarray [26] and Methyl-Binding Domain (MBD) Sequencing [27] revealed differentially methylated CpG sites and regions associated to bull fertility rate.
In the present study, the 5-methyl cytosine variations in CpGs were evaluated in high and low motility bull sperm populations following methyl enrichment and bisulfite sequencing approach. The objective is to produce a genome-wide methylation profile in the two populations, and to identify differential epigenetic signatures between high (HM) and low motile (LM) sperm.

Sequencing statistic and CpG methylation distribution
Since methylation levels in sperm are expected to be generally high [13], the Methyl-binding domain (MDB) approach was used to select hypermethylated regions [28]. Bisulfite sequencing was then applied to the methylationenriched genomic fraction to investigate CpG methylation level at single base resolution in the highly methylated regions. To exclude the presence of technical biases caused by unbalanced sequencing between groups of samples due to MBD enrichment, we evaluated cytosine coverage consistence between the HM and LM groups; no technical bias related to the combined approach (MDB and bisulfite sequencing) was observed (Additional file 2).
The average number of reads per sample was 28.1 M (range: 13.2 M-37.5 M). Mapping efficiency was high for all samples (83.1-90.6%). After calculating cytosine methylation conversion, a high percentage (93.7%) of the cytosines in the CpG enriched regions was methylated in both sperm populations (see Additional file 3 for statistics). After applying a threshold of at least 5X coverage per cytosine, a total of 26.6 M methylated regions (MR) (100 bp tiles with sliding window size of 100 bp) were identified spanning across the whole bovine genome. Among these, 1,086,748 methylated regions (MRs), shared between at least 3 out of 4 for both HM and LM sperm populations, were selected for DNA cytosine methylation profile comparison.

Differentially methylated regions between HM and LM sperm populations
A genome-wide analysis of genes and regulatory elements revealed that a small percentage of CpGs shows a significant variation in the methylation level (differentially methylated regions (DMRs)/MRs percentage) between HM and LM sperm populations in gene bodies (1.45%), 5′ untranslated regions (5'UTRs) (3.12%) and 3'UTRs (2.72%). Considering CGIs, a higher proportion of the methylome (9.77%) was remodelled in HM vs LM sperm populations. Hierarchical analysis of the 20 most hyper and hypo methylated DMRs found in CGIs, in gene bodies, 5'UTR and 3'UTR well discriminated HM from LM samples (Fig. 2). Annotation of 6131 DMRs that overlapped gene bodies resulted in 3278 differentially methylated genes (DMGs) (Additional file 8). In addition, 398, 538 and 918 DMRs located near 5'UTR, 3'UTRs and CGIs, were close (± 2Kb) to 355, 484 and 297 DMGs, respectively (Additional files 9, 10 and 11). Gene ontology (GO) analysis was performed on genes found to be differentially methylated in 5'UTR, 3'UTRs  and CGIs, and on a selection of 423 genes differentially methylated in gene bodies (468 DMRs with false discovery rate (FDR) < 10exp-10). Variation in CpG methylation in different gene features and CGIs affect GO terms related to DNA replication, repair, and DNA and telomere organization and maintenance. In addition, GO terms related to hindbrain function, epithelia and endothelia migration metabolic processes were also observed to differ between HM and LM sperm populations. Unexpectedly, a large part of significant gene ontology terms are related to 3'UTRs, whereas only few terms are affected by CpG variation in 5'UTR (Table 1).

Methylation distribution in CpG islands
To further explore bovine sperm CpG methylation in CGIs, the global level of cytosine methylation was calculated in each CGI. Out of 23,431 CGIs annotated in the bovine genome, 3869 were detected (in at least 3 out of 4 samples for HM and LM) in our dataset. Based on CpG methylation level in CGIs (Fig. 1), profiles were grouped in two classes (20-60% and 80-100%), and distribution of CGIs length was calculated in each class in HM and LM sperm populations (Fig. 3). Differences in length distribution can be observed between the two classes, both in short (4-10 Kb, Fig. 3a) and long (10-240 Kb,  Table 2). Out of 2434 BTSAT4 elements annotated in the bovine genome 720 were detected (in at least 3 out of 4 samples for HM and LM) in our dataset.
Analysis of CpG methylation outlined an overall low level of BTSAT4 methylation in the HM sperm population. Considering 159 DMRs in the BTSAT4 regions, 122 were more methylated in LM sperm populations (Additional file 12) (Fig. 4).
To further support the hypothesis of a targeted methylation pattern of repetitive DNA elements, the genomic distribution of CGIs and BTSAT4 elements was observed. The overall CGIs covered by MBD sequencing represent about 1% of the total cow genome, while sequenced BTSAT4 regions represent 0.2%. The intersection of genomic absolute positions of CGIs and BTSAT4 repeats shows that 70,6% of the BTSAT4 base pairs fall within CGIs, 80% of the BTSAT4 repeats being completely or partially located within CGIs (Additional file 13).

Discussion
In this work, the pattern of methylation in high and low motile bull sperm populations was determined using an enrichment step of methyl-CpG sequences combined with bisulfite sequencing.
The comparison of different genomic features in HM and LM sperm populations revealed several differentially methylated regions flanking genes with a role in chromatin organization and maintenance. Undoubtedly, a correlation between sperm telomere length and abnormal sperm parameters exists [31]. Interestingly, HM and LM sperm populations showed variation in methylation of telomerase reverse transcriptase (TERT) and telomeraseassociated protein 1 (TEP1). Genetic variants in both genes were previously reported to be associated with susceptibility to male infertility [32]. Differential methylation in 3'UTRs was also found in genes (histone lysine methyltransferases 2A (KMT2A), histone lysine demethylases 2A (KDM2A) and nuclear receptor-binding SET Domain 2 (NSD2)/ multiple myeloma SET domain (MMSET)/ Wolf-Hirschhorn syndrome candidate-1(WHSC1)) influencing chromatin structure by epigenetic mechanisms, such as the regulation of histone H3-K4 methylation. Previous studies reported a strict association between sperm DNA methylation levels and both sperm chromatin condensation and DNA integrity, suggesting that the formation of a compact chromatin and proper DNA methylation are closely related events during spermatogenesis [21].
Histone Lysine methylation is tightly regulated by distinct families of conserved enzymes, KMTs and KDMs, which add and remove methyl groups at histone lysines  [37]. They play a role in orchestrating methylation of H3K9 and H3K27 in sperm. The methylation increases during meiosis, but the removal of H3K9me at the end of meiosis is essential for the onset of spermiogenesis [38]. In mice, the reduction of MLL2 activity results in a dramatic decrease of the number of spermatocytes by an apoptotic process and prevents spermatogenic differentiation [39]. Lysine-specific histone demethylase 1 (LSD1)/ KDM1 is required for spermatogonial differentiation, as well as germ cell survival, in the developing testis [40]. An evolutionarily conserved pathway between histone H3-K9 methylation and DNA methylation exists in mammals, that is likely to be important to reinforce heterochromatic subdomains stability and to protect genome integrity. Suppressor of variegation 3-9 homolog (Suv39h) HMTases (also called KMT1A/B) are required to direct H3-K9 trimethylation and DNA (cytosine-5-)-methyltransferase 3 beta (Dnmt3b)-dependent DNA methylation to major satellite repeats at pericentric region [41]. A concurring variation in methylation of satellite repeats at pericentric region was observed in our dataset. A group of CGIs methylated at intermediate level (20-60%), located within genomic satellite repeats and in particular BTSAT4 Bovine satellite I [29] was observed to be less methylated in HM sperm populations. In the bovine genome BTSAT4 is likely to be the counterpart of human alpha satellites, because both present in highcopy tandem repeat at centromeric position. Comparative analysis in hundreds of species shows a high variability in size for alpha satellites centromere repeats, e.g. approximately 171-bp in human and 1400-bp in Bovidae [30], which is in agreement with the size of repetitive element that we found to be methylated at intermediate level in bovine sperm. The location of bovine satellite I in all pericentric regions of Bos taurus autosomes was also confirmed by fluorescence in situ hybridization [42].
In accordance with our study, a low level of methylation of satellite DNA within pericentric regions was previously observed in primate sperm profiling [13]. Bovine alpha satellite I was observed to have low/intermediate methylation levels in sperm. Embryos obtained by somatic cell nuclear transfer (SCNT) presented a hypermethylation in the bovine alpha satellite I, expected to cause higher chromatin condensation compared to embryos generated by in vitro sperm fertilization (IVF). This may in turn contribute, either immediately or later in development, to the inefficiency of producing live offspring by SCNT [43,44]. Low methylation levels have been also correlated with the ability to bind cohesin complexes that regulate the separation of chromatids at mitosis [45], suggesting a model in which selective hypomethylation of centromeric satellites might be critical Recently, methylation at satellite repeats throughout the genome has been observed to be increased in obese rat offspring [46]. Although obesity in human is associated with infertility by numerous studies [47], a direct link between satellite repeats methylation and sperm infertility is not yet described.

Conclusion
Methylation profiling in bovine semen revealed differential methylation of the BTSAT4 repetitive element in pericentric regions between HM and LM sperm populations. In addition, many DMRs were enriched in genes often functionally related to sperm DNA organization and maintenance. Together, alteration of methylation in satellite regions within the pericentric regions and in genes associated to lysine histone methylation, highlight that the complex mechanism that regulates DNA condensation during chromosomal packaging in sperm may affect sperm motility.

Isolation of spermatozoa and evaluation of sperm motility
Frozen semen straws from four mature progeny tested Holstein bulls with satisfactory semen quality were purchased from an Artificial Insemination AI center (INSEME S.P.A., Modena, Italy).

DNA extraction, library preparation and sequencing
Four HM and four LM sperm samples obtained in previous step were used for DNA extraction. DNA was isolated by NucleoSpin® Tissue (Macherey-Nagel) following manufacturer instruction. One μg of genomic DNA was sonicated to produce DNA fragments of about 350 bp lengths. Methyl-binding domain (MBD) enrichment was performed using the MethylMiner™ Methylated DNA Enrichment Kit (Thermo Fisher Scientific), following manufacture instruction. Libraries were generated using the TruSeq® DNA PCR-Free Library Preparation Kit (Illumina) including a step of bisulfite treatment. After adapters ligation, samples were converted with EpiTect Bisulfite Kits (Qiagen) and finally PCR amplified with KAPA HiFi Uracil+ (Kapa Biosystems) to obtain methyl enriched bisulfite libraries. The eight libraries were used for cluster generation and subsequently sequenced on a single lane of Illumina Hiseq2000.

Statistical analysis and bionformatics
Data obtained from CASA were analyzed using the SAS™ package v 9.4 (SAS Institute Inc.). The General Linear Model procedure (PROC GLM) was used to evaluate the efficiency of the sperm separation comparing semen quality parameters at thawing and in the HM population. The model included the fixed effect of the sperm population, and bull as random. Results are given as adjusted least squares means ± standard error means (LSM ± SEM).
Preliminary quality control of raw reads was carried out with FastQC (http://www.bioinformatics.babraham. ac.uk/projects/fastqc/). Illumina raw sequences were then filtered with Trimmomatic [50] to remove adapters and low quality bases at the ends of sequence, using a sliding window approach. Data are available in the Sequence Reads Archive (SRA), (Accession Number SRP119411). Bismark software v.0.17.0 (https://www.bioinformatics. babraham.ac.uk/projects/bismark/) was used to align each read to a bisulfite-converted Bos taurus genome UMD311 with option -N 1, and methylation calls were extracted using the Bismark methylation_extractor function. Seqmonk software (version 0.34.1) was used for visualization and analysis of the Bismark output (http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/). Only position with at least 5 cytosine were recorded in all samples, others were discarded from the data set. Cytosines count for each position (n = 1,742,816, count>5X, present at least in 3 out of 4 in HM and LM samples) was determined and the Edge-R package was used to evaluate if over or under-representation occurs in our dataset between HM and LM groups.
Methylated regions (MRs) were detected genome wide by dividing the genome in 100 bp tiles and analyzing average methylation in a sliding window of 100 bp. MRs were considered if present at least in 3 out of 4 samples in both HM and LM sperm populations. Methylation was calculated independently for different features: 5′ UTR (−2Kb), 3′ UTR (+2Kb), gene bodies and CpG islands (CGIs). MRs were also determined per CGI length classes and overlapping BTSAT4 REs. Reads count in BTSAT4 REs was determined and the Edge-R package was used to evaluate if over or under-representation occurs in our dataset between HM and LM groups. Differentially methylated regions (DMRs) between HM and LM populations were calculated using the logistic regression filter in R to assess differential methylation (FDR < 0.05, absolute cutoff of 5%). Hierarchical clustering was produced for DMRs present in CGIs, gene bodies, 5′ UTRs and 3′ UTRs. The level of methylation was normalized across samples and methylation percentage from a selection of DMRs showing the highest differences in methylation was used for clustering using the Genesis software [51].
Genes included in DMRs at CGIs and different genomic features were submitted to GO analysis. GO classification of the DMRs was performed according to canonical GO categories, using the Cytoscape plug-in ClueGO which integrates GO [52] and enhances biological interpretation of large lists of genes. Evaluation of REs in CGIs was performed by intersecting genomic positions of both features by Bedtools intersect (http://bedtools.readthedocs.io), thus frequencies for each RE category were calculated for low/intermediate methylation CGIs (20-60% methylation) and high methylation CGIs (80-100% methylation), in both HM and LM sperm populations and in Bos taurus genome.