Optimized bisulfite sequencing analysis reveals the lack of 5-methylcytosine in mammalian mitochondrial DNA

Background DNA methylation is one of the best characterized epigenetic modifications in the mammalian nuclear genome and is known to play a significant role in various biological processes. Nonetheless, the presence of 5-methylcytosine (5mC) in mitochondrial DNA remains controversial, as data ranging from the lack of 5mC to very extensive 5mC have been reported. Results By conducting comprehensive bioinformatic analyses of both published and our own data, we reveal that previous observations of extensive and strand-biased mtDNA-5mC are likely artifacts due to a combination of factors including inefficient bisulfite conversion, extremely low sequencing reads in the L strand, and interference from nuclear mitochondrial DNA sequences (NUMTs). To reduce false positive mtDNA-5mC signals, we establish an optimized procedure for library preparation and data analysis of bisulfite sequencing. Leveraging our modified workflow, we demonstrate an even distribution of 5mC signals across the mtDNA and an average methylation level ranging from 0.19% to 0.67% in both cell lines and primary cells, which is indistinguishable from the background noise. Conclusions We have developed a framework for analyzing mtDNA-5mC through bisulfite sequencing, which enables us to present multiple lines of evidence for the lack of extensive 5mC in mammalian mtDNA. We assert that the data available to date do not support the reported presence of mtDNA-5mC. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09541-9.

The human mtDNA is a ~16.5 kb double-stranded, circular molecule, presenting in multiple copies per mitochondrion [6].The mitochondrial genome encodes 13 proteins involved in respiratory chain complexes as well as 2 ribosomal RNAs and 22 transfer RNAs specific to this organelle [7].All other 1000-3000 mitochondrial proteins, including those required for mtDNA replication and transcription, are encoded by the nuclear genome and imported into the mitochondria using specific mitochondrial targeting sequence (MTS) [8].In contrast to the nuclear DNA that associates with histones, mtDNA is packaged into individual nucleoids consisting of DNAbinding proteins and peripheral factors, which facilitate mtDNA replication and gene expression [9][10][11].However, while the existence of 5mC in mtDNA remains uncertain, the dysregulation of 5mC has been widely associated with the development of cancer, neurodegenerative diseases, and various physiological or pathological conditions [12][13][14][15][16].
A range of experimental approaches have been applied to detect and quantify 5mC in mammalian mtDNA.Using radiolabeling, it was first reported in the 1970s that about 0.2 ~ 0.6% of cytosines are methylated in mtDNA from various mammalian cells [17].Since then, methods based on radiolabeling, 5mC-sensitive restriction enzymes, 5mC-specific antibodies, and mass spectrometry have been employed to evaluate the level of 5mC in mammalian mtDNA.Two studies using 5mC-sensitive restriction enzymes, MspI and HpaII, revealed a highly restricted distribution of 5mC exclusively within the CpG dinucleotide context, occurring at a frequency of 2 to 5% (equivalent to less than 1% of 5mC/C) in mtDNA from mouse and human cells [18,19].Leveraging liquid chromatography-tandem mass spectrometry (LC-MS/MS), it was shown that the level of 5mC in mtDNA increased markedly, from 13 to 25%, in lymphoblastoid cells obtained from individuals with Down syndrome compared to control cells [20].More recently, sequencing-based methods enabled the detection of 5mC at genome-scale.With methylated DNA immunoprecipitation-sequencing (MeDIP-Seq), an antibody-based method, a conserved pattern of 5mC was observed near gene start sites across various cell and tissue types in mtDNA [21].
Another commonly used approach to quantify 5mC at single-base resolution is bisulfite sequencing [16,[22][23][24][25][26][27][28][29].By DNA pyrosequencing, a study reported 2 to 18% 5mC in the D-loop and 16S rRNA gene in mtDNA isolated from mouse brain, liver, and testis [23].Lately, 5mC in mtDNA was mapped at single-base resolution and in a strand-specific manner using whole genome bisulfite sequencing (WGBS) [16,24].Comprehensive bioinformatic analyses revealed extensive methylation of human mitochondrial genomes and a distinctive pattern of 5mC in normal and cancer cells [16].In another study, 5mC was shown to be strongly biased toward the light (L)-strand and the non-CpG motif, with conserved peak loci [24].The quantification of mtDNA-5mC was generally found to be more than 20% and was shown to be modulated by DNA methyltransferases (DNMTs), the canonical nuclear DNA methyltransferase, and was inversely associated with mitochondrial transcription [16,24].Collectively, these studies have reported the presence of a wide range of 5mC levels (ranging from 0.2 to 25% 5mC/C) in mammalian mtDNA [30].
Although a number of studies have reported the presence of 5mC in mammals mtDNA, several other research groups were unable to detect 5mC or confirm its existence in mammalian mtDNA with various methods, thus casting doubt on the existence and levels of mtDNA-5mC.Using radiolabeling and two-dimensional thin-layer chromatograph, 5mC in HeLa mtDNA was estimated to be less than 0.05 mol percent of total mitochondrial nucleotides, while that in nuclear DNA was 0.7 [31].In another study, the levels of mtDNA-5mC from mouse liver were estimated to be between 0.3% and 0.5% by mass spectrometry, but this estimation was considered to be confounded by contaminations with nuclear DNA, whose 5mC content is about 6% of cytosines [32].Using both region-specific and genomewide approaches in HEK293 and HCT116 cell lines, a group concluded that CpG methylation is absent in mtDNA [33].
The conflicting views on mtDNA methylation can be partly attributed to technical artifacts and limitations associated with studying this modification [34].It has been shown that mtDNA fragments that are less engaged in the mtDNA supercoiled structure are preferably freed during sonication and ligated to sequencing adapters, resulting in an overrepresentation of these fragments.Additionally, the secondary structure of mtDNA may hinder the accessibility of sodium bisulfites, leading to poor conversion and an overestimation of methylation levels [35].Cytosine content is another leading cause of sequencing bias.Bisulfite reactions have been known to induce template degradation by triggering depyrimidination of bisulfite adducts formed through sulfonation and deamination of cytosine [36].This leads to a depletion of cytosine-rich DNA and unmethylated fragments from the total library, resulting in a skewed representation of genomic sequences and an overestimation of 5mC levels [37].Indeed, the majority of bisulfite library preparation methods have been found to yield significantly biased sequence outputs, leading to an overestimation of DNA methylation [37].In addition to bisulfite-related false positives, mapping to highly similar nuclear mitochondrial DNA sequences (NUMTs) also introduces misalignment artifacts [38,39], which can lead to artifactual signals of 5mC.
Recently, mtDNA-5mC was reinvestigated with bisulfite-independent single-molecule Oxford Nanopore Sequencing (ONS) [40][41][42][43].However, these studies have not yet reached a consensus.While the presence of mtDNA methylation was reported in cultured cancer cell lines [40], human blood cells [41], and human primary normal and tumor tissues [42], another study suggested that the mtDNA-CpG methylation signals detected by ONS could be attributed either to sequence-specific false positives introduced by the technique or were below the error threshold modelled using negative controls [43].
In this study, we aimed to address this controversial question by performing comprehensive bioinformatic analyses on both published and in-house WGBS datasets.Our analysis showed that strand-biased sequencing reads gave rise to false positive 5mC signals in mtDNA, while reads of NUMTs interfered with the quantification of mtDNA-5mC.We therefore proposed an optimized procedure for library preparation and data analysis of mtDNA methylation with bisulfite, in which mtDNA should be linearized to ensure efficient bisulfite conversion for Sanger bisulfite sequencing and sequencing reads should be examined for depth and NUMTs for WGBS.With this modified procedure, we identified the sources of false positive mtDNA-5mC signals.Combining our findings with TET-assisted pyridine borane sequencing (TAPS) and Enzymatic Methyl-seq (EM-seq) data, predictions of MTS, and analysis of the mitochondrial proteome, we provide new evidence for the lack of 5mC in mammalian mtDNA.

Severe strand-specific sequencing biases introduce artifactual signals of mtDNA 5mC
A previous study showed that mtDNA-5mC signals were strongly biased to the L strand and non-CpG context across different cell types and developmental stages using published WGBS datasets [24].To validate their findings, we re-analyzed mtDNA methylation levels using Bismark mapping tools for the same datasets, which included mouse primordial germ cells (PGCs), oocyte, sperm, early embryos at different developmental stages (GSE56697) [44], and prefrontal cortex (PFC) (GSM830249) [45].5mC signals were observed in the mtDNA of oocyte, E6.5 embryo, and sperm (Fig. S1A-B).We examined the sequencing reads aligned to the mitochondrial genome in a strand-specific manner and observed a predilection of 5mC signals at CpH sites on the L strand, localized specifically at gene boundaries (Fig. 1A for oocyte, Fig. S1C for E6.5).These findings are in agreement with a previous report [24].However, our further analysis revealed that the sequencing depth of cytosines in Region 1 and 2 (< 10) was far lower compared to both the average depth (10,221) and the depth observed in adjacent regions (Fig. 1A, B).In addition, a severe inverse correlation was observed between the sequencing depth and the mtDNA methylation level, especially when the depth fell below 100 (Fig. 1C).A lower depth is expected for the cytosinerich L strand (3976 cytosines) than for the H strand (2013 cytosines) since bisulfite-induced fragmentation usually occur at unmethylated cytidines [36].A previous study that compared multiple WGBS libraries revealed an L/H ratio ranging from 10 to 40% [37].Consistent with this, we observed that the mean depth of the L strand was considerably lower than that of the H strand in oocyte in this dataset (2752 versus 24141, Table S1).
To exclude the possibility that 5mC signals and biased sequencing depth were due to bisulfite alignment tools, we analyzed these data with BSseeker2 independently.Comparing the mapping and calling results of BSseeker2 to those of Bismark revealed a highly similar methylation profile, with strong 5mC signals presenting in oocytes and E6.5 embryos (Fig. S2A-D) particularly in regions with low sequencing depth (Fig. S2C).Again, inverse correlations were observed between depth and methylation level using BSseeker2 (Fig. S2E).
These biased sequence representations, both between strands and within strands, are likely due to two factors 1) bisulfite-induced DNA degradation of cytosine-rich genomic regions [37], and/or 2) overrepresentation of mtDNA fragments that do not undergo supercoiling [35,46].There is a substantial difference in cytosine content between the L and H strands of mtDNA.Bisulfite treatment induces degradation of DNA that is enriched for unmethylated cytosines, resulting in regions with fewer cytosines or those exposed less to bisulfite being over-represented in the sequencing library [36,37].In Region 1, while cytosine is not highly enriched in the L strand, it is nearly depleted from the H strand.This has led to an underrepresentation of reads from the L strand (123) compared to the H strand (439497).In addition, regions involved in secondary structures tend to be underrepresented in the library because they are less freed during sonication and are also less accessible to bisulfite.In these low coverage regions, even a small number of poorly converted reads can have a significant impact on the measurement of 5mC.Consequently, the reported 5mC signals are not reliable evidence to support the existence of 5mC in mouse mitochondrial DNA.
Because methylation measurements at specific genomic regions can vary substantially for the same cell type due to experimental procedures, we wondered whether the same problem occurs with mtDNA methylation.We noticed a published WGBS data series in which the methylation levels of mtDNA from four HEK293T biological replicates [47] were very dissimilar (ranging from 1.14 to 10.16%) even they underwent the same processing procedures (Fig. S3A-C).While three replicates showed very low levels of mtDNA-5mC, only the replicate GSM2425586 displayed high mtDNA-5mC signals throughout the mtDNA.This replicate also exhibited a severe strand depth bias (H/L = 26.59)(Table S1).
Together, our analyses indicate that the reported mtDNA-5mC is a classic case of artefact resulting from a severe strand-specific sequencing bias.

Linearization of mtDNA eliminates false positive 5mC signals
The mtDNA genome is organized in coiled and supercoiled secondary structures [46].Previous studies have demonstrated that removing supercoiled structures from mtDNA using the restriction enzyme BamHI can significantly increase the bisulfite conversion rate and, subsequently reduce the presence of artifactual 5mC signals [35,48].To independently determine the impact of circular structure on methylation analysis of mtDNA, intracellular DNA was extracted using phenol-chloroform from a mouse neuroblastoma cell line Neuro-2a (N2a) and then digested with two restriction endonucleases, SacI and NcoI.These endonucleases target adjacent sites in the mouse mtDNA (Fig. 2A).Primers flanking the targeting sites were used to assess the linearization efficiency.Agarose gel analysis (Fig. 2B) and qPCR (Fig. 2C) both confirmed that the linearization of mtDNA was highly efficient (no less than 97.3%).Next, we examined methylation levels at Region 2 in a strand-specific manner using linearized mtDNA from N2a and mouse embryonic stem cells (mESC).For both the L and H strands, Complete bisulfite conversion of spike-in λDNA with or without linearization suggests that the conversion efficiency of linear spike-ins does not reflect that of the circular mtDNA the 5mC signals were no longer detectable after linearization (Fig. 2D).We further tested the methylation levels of mtDNA derived from HepG2 hepatocarcinoma cells, which were reported to have more than 40% methylation in the mitochondria genome [16].However, we observed much fewer 5mC signals in the circular mtDNA and no methylation signals after linearization (Fig. 2E left).A similar trend was observed for mtDNA of human embryonic kidney cells HEK293T (Fig. 2E right).The presence of false positive 5mC signals in the circular but not linearized mtDNA of Dnmt triple knockout (TKO) mESC (Fig. 2F), in which both mitochondrial and nuclear genome should be devoid of 5mC, further supports that linearization prior to bisulfite conversion is essential to detect true mtDNA-5mC signals.Importantly, the bisulfite conversion rate of the spike-in λDNA was 100% for both circular and linearized forms, suggesting that the conversion efficiency of linear spike-ins does not reflect that of circular mtDNA (Fig. 2F, bottom).In sum, our results suggest that linearization of mtDNA improves the efficiency of bisulfite conversion and completely eliminates false positive 5mC signals.

MtDNA-5mC is below detection limit or even absent in high throughput sequencing data
Because cytosines in the CpH context may also be methylated in mtDNA, quantification of 5mC with classic Sanger bisulfite sequencing could be misleading at times due to the amplification bias introduced by non-degenerate primers.To avoid such confusion, we performed WGBS to map mtDNA-5mC at singlebase resolution in several human and mouse cell lines.Methylation of mtDNA has been detected in HepG2 and MCF7 cells [16], but not in HEK293T cells [33].We therefore independently prepared WGBS libraries and examined mtDNA-5mC levels in these cell lines, as well as in several mouse cell lines including N2a, B16, a lung adenocarcinoma cell line (LUAD), mESC, Tet TKO mESC, and mouse brain tissue.The mtDNA was isolated using the phenol-chloroform method, and for N2a and HepG2 cells, mtDNA was also enriched as plasmids [49] (Fig. 3A).Following extraction, the mtDNA was fully fragmented by sonication.
The sequencing reads of these in-house WGBS data were evenly distributed throughout the mtDNA (Fig. 3B-D), and the depth ratios (H/L) were less than 2 (Table S1).We did not observe any strong 5mC signals across the entire mitochondrial genome in these cells (Fig. 3C, D), which disagrees with the previous report of mtDNA methylation in HepG2 and MCF7 cell lines [16].We showed that both the CpG and CpH contexts contained no more than 1.1% 5mC, with the average methylation levels ranging from 0.37 and 0.67% in our own WGBS data (Fig. 3E).Since the bisulfite conversion rates usually do not exceed 99%, the 5mC signals below 0.67% are likely the result of incomplete bisulfite conversion.Moreover, we analyzed human platelets isolated from cord blood using WGBS.Platelets are cytoplasmic pieces formed from megakaryocytes and therefore, do not contain nuclear materials [50].Our analysis showed that the average methylation level was no more than 0.59% (Fig. S4A), and the reads were evenly distributed (Fig. S4B-C, Table S1).
Having failed to detect reliable mtDNA-5mC in cell lines and primary cells, we sought to verify our WGBS results by TAPS [51], a bisulfite-free methylation mapping approach, using Tet TKO LUAD cells, mESC, and mouse heart tissues.Consistent with our previous results, the average methylation ratio ranged from 0.37% to 0.75% in the mtDNA from these cells (Fig. 3E, F), and there was no observed bias in strand depth (data not shown).Because the false positive rate of TAPS (0.23 ~ 1.63%) is comparable to that of WGBS (~1%) [51], we cannot distinguish between real 5mC signals and background noise.Furthermore, since 5mC does not accumulate in mtDNA from Tet TKO mESC and Tet TKO LUAD cells (Fig. 3E), it seems that TETs are not involved in the epigenetic regulation of mtDNA.
We also analyzed mtDNA methylation using published EM-seq data with NA12878 DNA [52].EM-seq purely uses enzymes to detect 5mC and its oxidized forms.Like TAPS, EM-seq bypasses bisulfite treatmentinduced DNA loss and sequencing bias introduced by bisulfite treatment [52].Our EM-seq analysis showed that there were no signals for 5mC in mtDNA, which is consistent with the bisulfite sequencing results of NA12878 DNA (Fig. S5A).Compared to bisulfite libraries, EM-seq libraries showed a more even distribution within the strands and between the H and L strands of mtDNA (Fig. S5A-B).Collectively, our data indicate that the methylation levels of mtDNA in mammalian cells, if ever present, are far lower than previously reported [16,24], and are below the detection limit (< 0.75%).

Interfering signals from NUMTs lead to erroneous detection of mtDNA-5mC
The phenol-chloroform method usually yields a low amount of mtDNA and is often contaminated with a large amount of nuclear DNA.To analyze mitochondrial DNA methylation, it is reasonable to employ an enrichment step to obtain mitochondrial-focused sequencing data while also avoiding interferences from nuclear sequences.Genomic DNA-qPCR analysis showed that the enrichment of mtDNA over nuclear DNA by the plasmid extraction method was 15 to 70 times higher than by the phenol-chloroform method in N2a and HepG2 cells (Fig. 4A).Notably, when analyzing the WGBS data for N2a and HepG2, we found that enriching mtDNA in HepG2 and N2a cells using the plasmid extraction method (hereafter HepG2_P and N2a_P) produced a less biased distribution of reads (Fig. 4B, Table S1), higher mapping rates, and lower 5mC levels (Fig. 4C, from 0.56% to 0.19% in N2a and 1.27% to 0.20% in HepG2).
Next, we examined the 5mC signals in the IGV genome browser.We found that these signals were When mtDNA was enriched, these signals disappeared (Fig. 4D-E).NUMTs are known to be positive for 5mC signals [53], and the misalignment of nuclear sequences to mtDNA has been suggested to influence the quantification mtDNA methylation [35,43,54].In agreement with this notion, our results suggest that contamination from NUMTs may contribute to the false 5mC signals observed in mtDNA.Enrichment of mtDNA prior to high-throughput sequencing effectively reduces these false positive signals and provides a high number of usable reads, thereby increasing the coverage of the mitochondrial genome.
Mitochondrial and NUMTs sequences are homologues due to their common origin.For high-throughput sequencing analysis of mtDNA, including only reads that uniquely mapped to mtDNA [24] or setting the Phred-scaled minimal mapping quality at 30 (equivalent to a 99.9% alignment accuracy) [55] were used to prevent the interfering signals from NUMTs.However, these methods are more effective for human than for mouse, where certain mitochondrial sequences are identical to NUMTs.For example, the sequence of mouse chrM:6394-9458 (Cox1 to Cox3) is exactly the same as that of chr1:24613120-24616184, making the reads indistinguishable in next generation sequencing.
Normally, there are between 1000 and 10 000 copies of mtDNA in each mammalian cell [56], hence the probability of amplifying similar NUMTs is extremely low due to the vast excess of mtDNA.However, once methylated, these nuclear sequences can be over-represented after bisulfite-induced degradation of unmethylated mtDNA sequences and during PCR of converted DNA fragments [37].This becomes even more problematic in cells containing fewer copies of mtDNA, such as sperm (10-100) [57,58].In the published WGBS dataset (GSE56697), 5mC signals were observed in the CpG context of mouse sperm mtDNA at Cox1 and Cox2 gene loci (Figs.S1A-B  and S2A-B).Since the sequence of this region is identical to that of NUMTs, we directed our attention to the 5' upstream sequence that contains single-base-differences (SBDs) between mtDNA and NUMTs.Upon examining each single read mapped to this region, we found that unconverted cytosines, which gave rise to 5mC signals, always appeared in the reads with SBDs.This observation suggests that the methylation calls were erroneous due to misalignment (Fig. 5A).Through sequence alignment (BLAT function in IGV), we found that the sequences of these misaligned reads were identical to those of NUMTs from chr2: 22588652-22588672 and chr11: 90539030-90539050 (Fig. 5A, left panel), or chr11: 90538942-90538977 and chr2: 22588564-22588599, respectively (Fig. 5A, right panel).These observations suggest that the 5mC signals most likely reflect the methylation status of NUMTs rather than that of mtDNA.In high-throughput sequencing analysis, reads mapping usually allow a certain level of mismatch, which would inevitably introduce erroneous 5mC signals due to NUMTs.This could potentially explain the methylation signals observed across the Cox1 and Cox2 gene loci, as well as other mtDNA regions.
To verify the methylation levels at these regions in mouse sperm mtDNA, we applied Sanger bisulfite sequencing towards Cox1.The bisulfite PCR products share an identical sequence to the NUMTs found on mouse chromosome 2 (Fig. 5B), suggesting that sperm mtDNA, in comparison to nuclear DNA, was at a disadvantage during PCR amplification.Similarly, the amplicons of regular PCR were mainly derived from NUMTs (Fig. S6).Thus, the presence of NUMTs can lead to false quantification of methylation levels, especially in cells with a limited number of mtDNA copies.
We noticed subtle 5mC signals (chrM: 6100-6800, ~5 to 10%) across the Cox1 gene locus in Tet TKO mESC (Fig. 5C).To distinguish between Cox1 and NUMTs, we designed primers for Sanger bisulfite sequencing.We found that the 5mC signals, which cannot be eliminated by linearization of mtDNA, actually originated from NUMTs on mouse chromosome 1 (Fig. 5D).This indicates that NUMTs can contribute to false mtDNA-5mC

A C C C T A C A C G G A G G T A A T A T A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
A A    signals, independent of bisulfite conversion efficiency and sequencing bias.Removing the nucleus is expected to eliminate contamination from NUMTs during methylation analysis of mitochondrial DNA.We performed a micromanipulation to remove nuclear and polar body DNA from mouse oocytes (Fig. S7A) and then examined Nd6 gene locus using Sanger bisulfite sequencing.As expected, we found that the 5mC signals were no longer detectable (Fig. S7B), confirming that the 5mC signals observed in this region were false positives (Fig. S7C).Collectively, our results suggest that the observed 5mC signals in mouse sperm and Tet TKO mESC are erroneous detection caused by interfering signals from NUMTs.While enrichment of mtDNA before sequencing is expected to reduce such false signals, total DNA isolated with phenol-chloroform can also provide reliable detection of mitochondrial DNA methylation by WGBS, as long as the sequencing depth is not severely biased and the interference from NUMTs signals is carefully examined.

5mC-associated proteins and co-factors are absent in mitochondria
Recent studies have reported the presence of DNA modification proteins in mammalian mitochondria [16,23,24,29,[59][60][61]. For example, the mitochondrial targeting sequence (MTS) of mouse and human DNMT1 was predicted by MitoProt II, and was detected by immunoblots and immunofluorescent assays in 2011 [59].However, several other groups showed that DNMT3A, rather than DNMT1, is the protein responsible for regulating mtDNA methylation within the mitochondria [23,24,60].Because we did not observe any reliable 5mC signals in the mtDNA, we sought to re-inspect human 5mCassociated proteins using a variety of MTS prediction tools, namely iPSORT [62], target [63], MitoFates [64], and MitoProt II [65].Known nuclear proteins, mtDNAencoded proteins, and nuclear DNA-encoded mitochondrial proteins were set respectively as negative and positive controls.However, none of the prediction tools used were able to identify functional MTS in the 5mCassociated proteins or cofactors, including DNMTs, TETs, UHRF1 and TDG (Fig. 6A).We then searched for 5mC-associated proteins in three mass-spectrometrybased mitochondrial proteomic datasets [66][67][68][69].TFAM, CYC1, and FH, which are three canonical mitochondrial proteins, were used as positive controls.Consistent with the results of MTS prediction, mass spectrometry analysis did not identify DNMTs, TETs, or other co-factors in mitochondria (Fig. 6B).This suggests that either 5mCassociated proteins are not localized to the mitochondria or their abundance is too low to be detected, which is in tune with the absence of mtDNA 5mC.

Discussion
The report of 5mC in mammalian mitochondrial DNA in the 1970s has led to extensive studies aimed at quantifying and understanding the function of mtDNA-5mC.However, the existence and levels of mtDNA-5mC have been strongly questioned, and contradictory results have reported since then [31, 33-35, 43, 70].Recently, it was reported that WGBS analysis revealed up to 40% methylation of cytosines in the mammalian mtDNA [16,24].Here, we show that a combination of factors, including insufficient bisulfite conversion rate, sequencing bias, and interference of NUMTs signals, contributed to the misidentification of 5mC in mammalian mitochondrial DNA.By combining our WGBS, TAPS, and EMseq data, we reveal that the level of mtDNA-5mC is far lower (< 0.19 ~ 0.75%) than previously reported [16,24], and falls below the detection limit of WGBS.In line with our findings, a recent critical reanalysis of the study by Patil et al. [16] put forward that the reported mtDNA-5mC signals were stemmed from a combination of methodological and technical pitfalls [70].To identify reliable mtDNA-5mC signals using bisulfite sequencing, we propose a modified workflow (Fig. 7).This workflow involves enriching mtDNA to minimize interference from nuclear DNA, fragmenting mtDNA to avoid bias introduced by secondary structures during library preparation, and examining sequencing depths and SBDs of aligned reads at sites with high 5mC signals.
The interference from mtDNA secondary structures was not fully examined previously [22,23,[25][26][27][28][29].Several studies performed linearization of mtDNA prior to methylation analyses, however, the linearization efficiency was not evaluated [16,48].We show that linearization of mtDNA not only decreases but completely eliminates false positive 5mC signals in bisulfite PCR.This highlights the importance of evaluating the linearization efficiency before quantifying the 5mC content in mtDNA with bisulfite.
Fig. 6 5mC-associated protein and co-factors are absent in mitochondria.A A heatmap showing the MTS scores using different prediction tools.Nuclear proteins are used as negative controls while nuclear DNA-encoded mitochondrial proteins as positive controls.On a scale of 0 to 1, a higher score indicates that the protein is more likely to locate in the mitochondrial (IMPI) or contain at least one functional mitochondrial targeting sequence (MTS) (iPSORT, TargetP, MitoFates and MitoProt).B A Venn diagram showing that 5mC-associated proteins are not among the mitochondrial proteins identified by three independent mitochondrial proteome databases.Known mitochondrial proteins, such as TFAM, CYC1 and FH, are consistently identified in all three databases Since cytosines in the CpH context were reported to be methylated in mtDNA [16,24], measurements of 5mC with classic Sanger bisulfite sequencing could be misleading due to amplification bias associated with non-degenerate primers.WGBS is an alternative approach to circumvent this problem.However, sequencing depth biases may still arise due to bisulfiteinduced degradation [37] and the supercoiled structure of mtDNA [35].Unmethylated C-rich fragments, which are not constrained by secondary structures, are susceptible to degradation after bisulfite reactions and therefore are absent from the library.Consequently, such libraries could be characterized with a severe inverse correlation between sequencing depth and mtDNA methylation level, as well as severe strand bias (high H/L ratio).Hence, we propose that for mtDNA methylation analysis using WGBS, it is crucial to carefully examine library quality in terms of sequencing depth and strand bias.
Another contributing factor to false positive signals of mtDNA-5mC in high-throughput sequencing is the tolerance of mismatches and the interference of NUMTs.In our study, we scrutinized the alignment file of the mouse sperm WGBS data and designed primers to distinguish between Cox1 and NUMTs in Tet TKO mESC.Taking advantage of SBDs between mtDNA and NUMTs, we demonstrated that the detected 5mC signals in mouse sperm and Tet TKO mESC were erroneous detections caused by misaligned NUMTs reads.Our work indicates that enrichment of mtDNA is critical in reducing the false positive 5mC signals from NUMTs and therefore is highly recommended for mtDNA methylation analysis.For the convenience of future work, we provided representative mtDNA regions in which NUMTs contamination may affect methylation calling (Table S2 for mouse and Table S3 for human).It is essential to examine reads mapped to these regions to prevent overestimation of the methylation level of the mitochondrial genome.7 A framework to analyze mammalian mtDNA-5mC using WGBS.MtDNA should be enriched and fragmented by sonication, both of which help to bring down the false 5mC signals from NUMTs and provide more usable reads for mtDNA, before subjected to high-throughput sequencing.Sanger bisulfite sequencing is performed as quality control (forward primers without cytosines and reverse primers without guanines, or degenerate primers are recommended).For WGBS analysis, when strong 5mC signal (> 5%) appears, the sequencing depth should be compared with adjacent regions as well as the average depth of the entire strand in a strand-specific manner.Since the vast majority of cytosines in mtDNA was sequenced much deeper than those in nuclear DNA (depth cutoff was usually set as 5x or 10x), it is reasonable to examine the relative depth rather than the absolute depth when analyzing mtDNA-5mC.Meanwhile, the original sequences of the reads before C to T conversion should be subject to BLAT or BLAST to examine whether the 5mC signals are resulted from misaligned NUMTs reads.Because sequencing depth bias (both between strands and within strands) and reads from NUMTs are two main sources of mammalian mtDNA-5mC signals, quality control of reads distribution and examination of alignment are expected to filter false positive 5mC signals.Enzymatic conversion-based techniques, such as TAPS and EM-seq, largely eliminate the biases and artefacts observed in previous WGBS studies, making them promising options for future investigations DNA methylation usually presents together with histone modifications in eukaryotic cells, and the crosstalk between DNMTs and histone modifications is vital for mammalian cell activities.Common model organisms such as Drosophila melanogaster, Caenorhabditis elegans, fission yeast, and baker's yeast possess histones but exhibit almost no 5mC [71].The opposite case, with abundant DNA methylation but lacking histone modifications, is rarely reported in eukaryotic cells.Recent studies have reported the presence of some, but not the complete set of, DNA methylation machinery in mammalian mitochondria [16,23,24,29,[59][60][61]. Apart from the discordances among these reports, the possibility of contamination from nuclear proteins cannot be entirely excluded.By analysing published mitochondrial databases and mass-spectrometry-based mitochondrial proteomic data, we found that 5mC-associated proteins and co-factors were absent in mitochondria across a variety of cells and tissues.Although DNMT1 has been reported to possess de novo methylation activity [72][73][74] and has been detected in mitochondria [59], it is worth noting that mtDNA is free of histones, which are essential for DNMT1 to maintain DNA methylation in vivo [71].Therefore, it is unlikely for DNMT1 alone to produce extensive and functionally stable 5mC in mitochondria.

Conclusions
In the present study, we provide evidence to suggest that the previous reported mtDNA-5mC are most likely resulted from inefficient bisulfite conversion, strand sequencing bias, and misaligned nuclear mitochondrial sequences.For the efficient development of this field, we propose an optimized bisulfite sequencing procedure, which mainly entails linearization of mtDNA for Sanger bisulfite sequencing, enrichment of mtDNA before WGBS, as well as carefully inspection of strand depth bias and NUMTs during the analysis workflow.With this framework in place, we show that 5mC in mtDNA is more likely a technical artifact rather than a relevant epigenetic mark.The maximum levels of 5mC observed in the analyzed cells and tissues are far lower than previously reported.Although it is not possible to completely exclude the existence of a few 5mC bases at levels below our detection limit (< 0.75%), and the potential for these bases to have biological functions, it is impractical to examine all types of cells under all conditions.Nevertheless, our results strongly imply that the published data are not sufficient to support the presence of 5mC in mammalian mtDNA, and the levels of 5mC are extremely low, provided the modification ever occurs.

Human subjects and platelets isolation
Human platelets were isolated from cord blood of healthy donors.Briefly, whole cord blood was added to the top of Ficoll reagent (GE Healthcare) and centrifuged at 500 × g for 30 min.The plasma layer containing platelets were transferred to a new tube and centrifuged at 3731 × g for 4 min to precipitate platelets [75].

Oocyte micromanipulation
Metaphase II-arrested oocytes were obtained from adult female B6D2F1 mice by superovulation.After Hoechst 33342 staining, oocyte nuclei and polar bodies were removed by micromanipulations in a droplet of M2 medium (Millipore, MR-015-D) containing 5 μg/ml cytochalasin B (Sigma, C6762) using a blunt Piezo-driven pipette.

DNA extraction
Total DNA was extracted from cells or tissues using Phenol-chloroform DNA extraction protocol.

Linearization of mtDNA
Linearization of mouse and human mtDNA was achieved by FastDigest restriction endonuclease digestion.500 ng mouse total DNA was digested with 2.5 μl SacI (Thermo Scientific, FD1133) and 2.5 μl NcoI (Thermo Scientific, FD0573).500 ng human total DNA was digested with 2.5 μl BbsI (Thermo Scientific, FD1014) and 2.5 μl PvuII (Thermo Scientific, FD0634).Linearization efficiency was analyzed on agarose gel and quantified by real-time qPCR with NovoStart SYBR qPCR SuperMix Plus (Novoprotein).Linearization efficiency was calculated based on the 2 −ΔΔCt method normalized by selected internal controls.Primers are listed in Table S4.

Sanger bisulfite sequencing
Bisulfite conversion was performed using EZ DNA Methylation-Direct ™ Kit (ZYMO Research) according to the manufacturer's instructions.PCR products were cloned into T-vector using pClone007 Versatile Simple Vector Kit (TSINGKE) and individual clones were sequenced.Bisulfite sequencing results were analyzed using the online tool QUMA (www.quma.cdb.riken.jp/) [76].Primers are listed in Table S5.Primers for regular PCR amplifications are listed in Table S6.

MtDNA enrichment and efficiency analysis
Enrichment of mtDNA was performed using TIANprep Mini Plasmid Kit (TIANGEN) according to the manufacturer's instructions.Relative enrichment of mtDNA over nuclear DNA was assessed based on the 2 −ΔΔCt method and normalized by selected internal controls.Primers are listed in Table S7.

WGBS library preparation and data analysis of WGBS and EM-seq
Briefly, 200-1000 ng genomic DNA was used for WGBS library construction.DNA was sonicated to ~250 bp fragments with the Diagenode Bioruptor Pico.Then DNA was subjected to end-repair, A-tailing, and ligation using VAHTS Universal DNA Library Prep Kit for Illumina V3 (Vazyme, ND607) and methylated adaptor (Roche) according to the manufacturer's instructions.DNA was purified with VAHTS DNA Clean Beads.Subsequently, library was bisulfite modified using QIAGEN EpiTect Fast DNA Bisulfite Kit (59824) and amplified with KAPA2G Robust HotStart PCR Kits with dNTPs (Sigma-Aldrich).Library was finally subjected to 150 bp paired-end sequencing using the Illumina Novaseq6000 instrument.

MTS prediction
MTS was predicted in MitoMiner (v4.0) [83] database which integrates iPSORT, targetP, MitoFates and Mito-Prot II.The probability of human proteins was examined using the SEARCH function according to the website guidance.

Mitochondrial proteomic analysis
High-confidence mitochondrial proteins were identified from three mass-spectrometry-based mitochondrial proteomic data, MitoCop, MitoCarta3.0 and APEX.

MitoCop
Nine hundred sixty-five human genes encoding proteins that satisfy the following filtering conditions according to spatial proteomics data from MitoCop were identified: 1) belong to cluster 1; 2) the sum of Mean Norm.MS Intensity cM and Mean Norm.MS Intensity pM is greater than 1.

MitoCarta3.0
All 1136 human genes encoding proteins with strong support of mitochondrial localization were identified by MitoCarta3.0.
APEX The union of 495 human mitochondrial matrix proteins and 127 human mitochondrial intermembrane space proteins were identified by the ascorbate peroxidase (APEX) tagging method, with 33 overlapping proteins.

Public datasets
Published datasets GSE92310, GSE127301, GSE87757, GSE56697 and GSE33722 were downloaded from Gene Expression Omnibus (GEO) and were summarized in Table S8.Data were processed as described above.

Fig. 1 Fig. 2
Fig. 1 Strand-specific sequencing bias causes artifactual mtDNA-5mC signals.A Re-analysis of a published WGBS dataset (GSE56697) demonstrates L strand-specific 5mC signals in the mtDNA of mouse oocyte.Green tracks show methylation level (top, scale from 0 to 100) and sequencing depth for each cytosine (bottom, scale from 0 to 500) in the double-stranded mtDNA.Methylation levels and sequencing depths of CpG and CpH were analyzed separately for L (red tracks) and H (blue tracks) strand.Cytosine sites in each strand were marked on the bottom.Region 1 and Region 2 harbor high 5mC signals.A close-up view of Region 1 is shown to the bottom, in which the 5mC signals reflect methylation of CpH sites with extremely low sequencing depth in the L strand.Coverage ≥ 3 for 5mC, and ≥ 1 for mCpG and mCpH on each strand.B An IGV snapshot of the sequencing reads showing that Region 1 is devoid of reads aligned to the L strand.Reads aligned to the L strand are colored in red and to the H strand in blue.C to T and G to A signals are marked by red and green vertical lines respectively.C Sequencing depth and methylation ratio (5mC/C %) in the oocyte are inversely correlated (P < 0.001).Each dot refers to one cytosine site in the mtDNA

Fig. 4
Fig.4 Enrichment of mitochondrial DNA improves the mapping rate and reliability of WGBS data.A qPCR analysis of genomic DNA was used to examine the enrichment efficiency of mtDNA over the nuclear DNA from N2a and HepG2 cell lines using mitochondrial and nuclear specific primers.Nuclear genes Tdg and Gfap are independently used as internal control.B Metaplots for the sequencing depths across the mtDNA of N2a and HepG2 cell lines.C Bar graph showing CpG and CpH methylation levels (left y-axis) and mapping rates (green line, right y-axis) for N2a and HepG2 cell lines.Average methylation levels are shown in the parentheses.D, E Methylation levels and sequencing depths of mtDNA in HepG2 (D) and N2a (E) cell lines measured by WGBS

(
See figure on next page.)Fig. 5 5mC signals at Cox1 gene locus in the mtDNA of mouse sperm and Tet TKO mESC come from NUMTs.A Two IGV snapshots showing the WGBS reads (GSE56697, mouse sperm) aligned to Cox1.Reads aligned to the L strand are colored in red, to the H strand in blue.In the aligned reads, bases unmatched with mitochondrial genome are shown in colored letters and are divided into two categories as follows, one comes from the bisulfite conversion of unmethylated Cs, the other is single base differences (SBDs) naturally exist between mtDNA and NUMTs (arrow).The unconverted Cs (arrowhead) and SBDs usually lie within the same read (marked with asterisks), suggesting the 5mC signals in mtDNA (red line box) are false positive calls due to misalignment.Potential alternative genomic sequences for NUMTs are shown underneath.For the left panel, misaligned reads are identical to NUMTs on chr2 and chr11.For the right panel, reads misaligned to the L strand are identical to NUMTs on chr11, to the H strand identical to NUMTs on chr2.B Sanger bisulfite sequencing towards Cox1 in mouse sperm.L and H strands are shown separately.Non-SBD-Cs in the non-CpG context of the reference sequences are masked as Ts.SBDs between mtDNA and NUMTs are shown in red font.In the case of C/T SBDs which were both read as Ts in bisulfite sequencing, the sequence of adjacent regions or the opposite strand helps to distinguish mtDNA vs. NUMTs origin of reads.The BS-PCR products align strictly to one of the NUMTs in chr2 instead of to Cox1 in mtDNA.C An IGV snapshot showing 5mC signals at Cox1 gene locus in the Tet TKO mESC.D Methylation analysis of Cox1 from Tet TKO mESC by Sanger bisulfite sequencing with or without linearization of the mtDNA.Primers were designed to distinguish Cox1 and NUMTs

Fig.
Fig.7A framework to analyze mammalian mtDNA-5mC using WGBS.MtDNA should be enriched and fragmented by sonication, both of which help to bring down the false 5mC signals from NUMTs and provide more usable reads for mtDNA, before subjected to high-throughput sequencing.Sanger bisulfite sequencing is performed as quality control (forward primers without cytosines and reverse primers without guanines, or degenerate primers are recommended).For WGBS analysis, when strong 5mC signal (> 5%) appears, the sequencing depth should be compared with adjacent regions as well as the average depth of the entire strand in a strand-specific manner.Since the vast majority of cytosines in mtDNA was sequenced much deeper than those in nuclear DNA (depth cutoff was usually set as 5x or 10x), it is reasonable to examine the relative depth rather than the absolute depth when analyzing mtDNA-5mC.Meanwhile, the original sequences of the reads before C to T conversion should be subject to BLAT or BLAST to examine whether the 5mC signals are resulted from misaligned NUMTs reads.Because sequencing depth bias (both between strands and within strands) and reads from NUMTs are two main sources of mammalian mtDNA-5mC signals, quality control of reads distribution and examination of alignment are expected to filter false positive 5mC signals.Enzymatic conversion-based techniques, such as TAPS and EM-seq, largely eliminate the biases and artefacts observed in previous WGBS studies, making them promising options for future investigations