The pseudo-mitochondrial genome influences mistakes in heteroplasmy interpretation

Background Nuclear mitochondrial pseudogenes (numts) are a potential source of contamination during mitochondrial DNA PCR amplification. This possibility warrants careful experimental design and cautious interpretation of heteroplasmic results. Results Here we report the cloning and sequencing of numts loci, amplified from human tissue and rho-zero (ρ0) cells (control) with primers known to amplify the mitochondrial genome. This paper is the first to fully sequence 46 paralogous nuclear DNA fragments that represent the entire mitochondrial genome. This is a surprisingly small number due primarily to the primer sets used in this study, because prior to this, BLAST searches have suggested that nuclear DNA harbors between 400 to 1,500 paralogous mitochondrial DNA fragments. Our results indicate that multiple numts were amplified simultaneously with the mitochondrial genome and increased the load of pseudogene signal in PCR reactions. Further, the entire mitochondrial genome was represented by multiple copies of paralogous nuclear sequences. Conclusion These findings suggest that mitochondrial genome disease-associated biomarkers must be rigorously authenticated to preclude any affiliation with paralogous nuclear pseudogenes. Importantly, the common perception that mitochondrial template "swamps" numts loci precluding detectable amplification, depends on the region of the mitochondrial genome targeted by the PCR reaction and the number of pseudogene loci that may co-amplify. Cloning and relevant sequencing data will facilitate the correct interpretation. This is the first complete, wet-lab characterization of numts that represent the entire mitochondrial genome.


Background
The unique maternal inheritance pattern of mitochondrial DNA (mtDNA), its small genome size, accelerated mutation rate, lack of recombination, and multiple copy number per cell, in comparison to nuclear DNA, are ideal biological traits for investigating evolution, population genetics and for forensic and medical applications. Thus, the mitochondrial genome has been used as a biosensor for the timing and movement of human populations in antiquity [1,2]. MtDNA analysis is routinely used in forensic biology to type biological material when degradation prevents nuclear STR amplification [3]. In addition, the entire mitochondrial molecule has potential medical utility because it can serve as a repository of cancer mutations and as a biosensor indicative of genetic alterations [4][5][6][7][8][9][10][11][12][13].
Frequently, identifying legitimate mtDNA mutations is confounded by heteroplasmy, a condition in which wildtype and mutant mitochondrial genomes co-exist in a cell. The interpretation of heteroplasmy can further be confounded by the widespread integration of portions of the mitochondrial genome into the nuclear genome [14,15]. These homologous, yet divergent nuclear and mtDNA sequences can be co-amplified in PCR reactions intended to replicate targeted mtDNA sequences only. Although this problem has previously been considered to be muted because of the high copy number of mtDNA over corresponding nuclear loci, caution is warranted [16]. For example, there are specific regions of the mitochondrial genome that have corresponding nuclear mitochondrial pseudogenes (numts) distributed across multiple chromosomes. Hence, there are regions of the mitochondrial genome that have a high nuclear copy number, which are not completely "swamped" during amplification. We report that some heteroplasmies detected in prostate cancer samples are a result of co-amplification of these multiple loci.
A large number of manuscripts addressing errors related to the interpretation of mtDNA and mtDNA heteroplasmy has been published [17][18][19][20][21][22][23][24][25]. Notably not all these errors are due to pseudogene co-amplification; however, mistakes from pseudogenes may increase with improved sequencing methods and highly sensitive re-sequencing microarray technologies that have a lower detection limit than traditional sequencing and which readily detect lowlevel heteroplasmy [11,26]. In some cases, if the heteroplasmy is inherited, it substantially increases the power of mutation detection, which becomes an important aspect since heteroplasmy has been reported as an early indication of disease [27][28][29][30][31]. In addition, if the disease process invokes mitochondrial depletion, this could increase nuclear pseudogene signal in reactions as a result of reduced mitochondrial genome copy number [32]. Loss of mitochondria has been described in several human cancers [33][34][35][36]. As well, the number of mitochondria and mtDNA copy number vary for different cell types [37][38][39]. These important matters relating to sequence interpretation have been generally neglected, in part, due to the lack of numt reference material, which would help investigators determine the relevance of detected mtDNA sequence variations. Hence, the need to validate somatic mitochondrial mutations is a pressing one.
Heteroplasmic issues have already complicated data obtained from other species. For example, in elephant hair, low mtDNA content is the reason why numts were co-amplified and misinterpreted as authentic mtDNA. In contrast, numts were not detected in DNA derived from elephant blood due to the presence of mitochondria-rich platelets [40]. Moreover, the hominid, Gorilla, is well known for significant numt interference with mitochondrial sequences, highlighting the need for diligence when interpreting human mtDNA heteroplasmy [41]. Not surprisingly, the effort for using mitochondrial cytochrome c oxidase as a primate "barcode" is plagued by numt amplification as well [42].
Further, laser capture microscopy has improved the ability to separate and analyze cancer cells, but because of the decreased amounts of sample DNA, many primer pairs are required to obtain a robust amplification of the entire mitochondrial genome [43]. Moreover, a sufficient number of cells must be captured to avoid incorporation errors associated with low template quantity [44]. This will also be relevant to studies that use formalin-fixed paraffin embedded samples [45]. The use of many primers means that smaller amplicons will be synthesized translating into a higher risk of co-amplification of numts and the potential misinterpretation of heteroplasmic calls.
There is limited in silico and wet-lab evidence indicating that fragments of the human mitochondrial genome are embedded in the nuclear DNA archive [46-50]. These findings emphasize the critical need to minimize erroneous interpretation of heteroplasmy, a vital necessity for precise forensic discrimination, evolutionary studies, and potential diagnostics. We provide evidence of numts for the entire mitochondrial genome by the amplification, cloning, and identification of numts from rho-zero (ρ 0 ) cells and clinical cancer specimens. Here we present results from overlapping primers, which co-amplified nuclear embedded, paralogous mitochondrial sequence. Surprisingly, our data shows a relatively small number (when compared to hypothetical sequence information obtained from BLAST searches) of multiple nuclear loci that co-amplify with the mitochondrial genome. These findings demonstrate that accurate interpretation of heteroplasmy not only requires careful primer design and testing, but also indicates that a compendium of the sequence information from multiple-copy number numts is an important reference tool that will facilitate correct mtDNA interpretation and support reliable mitochondrial genome sequencing data.

Results ρ 0 cells lack mtDNA
Rho-zero cells were evaluated for the presence of mtDNA.
To ensure that total DNA extracted from ρ 0 cells were indeed devoid of mtDNA, we first performed Southern blot analyses on DNA extracts from ρ 0 cells. Blood was used as a positive control. No full length mitochondrial genome signal was observed in ρ 0 lanes when the blots were probed with mtDNA-specific probes (Figure 1a). We next performed PCR on the DNA extracts with primers specific to the mitochondrial coding regions. Again, there were no amplifications observed in the ρ 0 templates, while DNA isolated from blood was amplified, as expected (data not shown). We used RT-PCR to provide further evidence that ρ 0 cells are indeed devoid of mtDNA [51]. RT-PCR analysis was performed on RNA samples from ρ 0 cells and normal human skin tissue (epithelial cells) samples with primers to OXPHOS genes and a nuclear gene (positive control), 5-aminolevulinate synthase (hALAS) ( Table 1). Whereas the hALAS primers amplified nuclear targets in template from ρ 0 and epithelial cells, there was no observable product with mtDNA primers for the ρ 0 cDNA template (Figure 1b), independently confirming the absence of mtDNA in these cells.

Co-amplification of numts and mtDNA
Amplification of the complete mitochondrial genome was performed on human formalin fixed and paraffin embedded (FFPE) prostate cancer samples using a set of 34 primers ( Table 2). Due to the amount and quality of DNA recovered, the average amplicon size was limited to 625 bp. Surprisingly, 24 (71%) of the primer sets co-amplified mitochondrial pseudogenes ( Figure 2, and data not shown). A similar ratio was previously reported by an independent group using 38 primers (26/38 or 68%) [16]. In an effort to fully characterize numts that represent the entire mitochondrial genome, we redesigned the remaining 10 primers to co-amplify nuclear loci. Thus, we amplified template from ρ 0 cells and subsequently identified (via sequencing) the cloned fragments from the nucleus. A region of the D-loop (base pairs 16211-420 and 15-711) was recalcitrant to co-amplification using our mitochondrial primers. Therefore, two chromosome 17 specific primers were designed to capture this D-loop fragment (Table 2). Hence, a total of 36 primer sets were used to recover the entire mitochondrial genome from the nucleus. The sequences representative of the entire mitochondrial genome are provided as an additional material (additional file 1). Figure 3 is an example of an alignment to rCRS of three numt clones recovered using primer set 1488F/2084R (Table 2). These three clones were recovered form three separate chromosomes (Chr11 -NT_009237, Chr5 -NT_006713, and Chr3 -NT_005612). Similar alignment of our consensus cloned sequences enabled the assembly of a pseudo-mitochondrial genome ( Figure 4).
The following chromosomes were represented in the data: 1, 2, 3,4,5,7,8,9,11,16,17,20 and X. The number of paralogous sequences, in some instances, was lower than the number predicted from BLAST searches (Figure 3, 5). We demonstrate that there are only a limited number of multiple copy numts that potentially contribute to a heteroplasmic signal. Subsequently, we systematically inspected heteroplasmic sites observed in sequences from the prostate cancer samples for numt contribution using our cloned ρ0 data as a reference. We discovered false het-ρ0 cellscells do not contain mtDNA Figure 1 ρ0 cells do not contain mtDNA. a. Southern blot analysis of total DNA extracted from blood (bld) and ρ0 cells and probed with a full length mtDNA probe. Note the absence of hybridization to ρ0 extracts. Lad is a DIG-labeled DNA molecular weight marker III (Roche). b. PCR amplification of cDNA from ρ0 and epithelial cells (EC). Note the amplification of ρ0 cDNA with primers to the nuclear gene hALAS, whereas primers to ND1, ATPase6, and CYTB failed to amplify ρ0 cDNA, although they all amplified cDNA from EC. Lad is a 100bp DNA size standard (Fermentas life sciences).   6). All three chromosomes have an A as opposed to a G, which correspond to mitochondrial positions 1709 and 1719. Using automated DNA sequencing, these multi-copy numts were detected as heteroplasmies at positions 1709 and 1719 ( Figure 6). At position 1711, chromosomes 5 and 11 have a C as does the tissue; however, chromosome 3 has a T. A weak heteroplasmic signal is evident by a minute T peak, but because of the poor detection limit of fluorescent sequencing, this peak is virtually equivalent to background ( Figure 6). Heteroplasmic signals were detected for other sites as well. For instance, the primer pair for the amplicon (3230-3893) co-amplified homologous numts on 5 different chromosomes (2, 4, 16, 17 and X). This region is evident in the pseudo-mitochondrial genome assembly from our clones ( Figure 4).

Multiple numt copies exist in the genome
To cross-validate our cloned data, we analyzed genomic DNA from ρ 0 cells, blood, and human placenta using mitochondrial primers that co-amplified nuclear loci in the prostate cancer specimens. In blood and human placenta samples, a single mtDNA amplicon was consistently observed ( Figure 7, and data not shown). Although sequence analysis of the prostate specimens detected numts, their signals were below the detection limit of agarose gel electrophoresis. In contrast, several primers con-sistently amplified numts from ρ 0 cells generating high molecular weight amplifications in addition to the expected mtDNA fragments ( Figure 7). These findings confirm the presence of multiple numts loci in the genome and translate into real concern that numts are present in amplifications that produce more that one band or different size amplicons.

Survey of mitochondrial genome mutations associated with disease suggests caution
Based on our findings that false heteroplasmic sites occurred when there was co-amplification of multiple numt loci with the same nucleotide at that particular site, we compared our cloning data to possible disease associated mutations listed on MITOMAP [52] and common sites were noted. In addition, a BLAST search was performed for these sites and hits held in common between the marker and cloning information were scored as well.
Numerous commonalities were noted, which is cause for concern (Table 3).

Discussion
In this study, we recovered and assembled the entire mitochondrial genome from nuclear loci. Moreover, this "pseudo-mitochondrial genome" involves numts from over half of the human complement of chromosomes, including the X chromosome. This suggests a widespread allocation of numts in the human nuclear genome. Surprisingly, this distribution was achieved with primers routinely used to amplify mtDNA, yet designed without consideration for numts. Seventy-one percent (24/34) of the primers co-amplified numts in prostate cancer tissue samples. This validates prior suggestions that numts are a potential source of misinformation and serves to illustrate the ease of co-amplification of both mtDNA and nuclear embedded paralogous mitochondrial DNA sequences [16]. Our data demonstrate that contrary to a consensus of opinions that the copy number of the mitochondrial genome "swamps" the signal from numts loci, there are circumstances which favor PCR recovery of numts, such as multiple pseudogene copy number [18]. For instance, heteroplasmic mutations had been associated with late-onset Alzheimer's disease [17,53]; however, these false heteroplasmies resulted from co-amplification of numts [19,54,55]. Indeed, human numts perplexed ancient DNA studies as well when it was reported that DNA had been recovered and amplified from a Cretaceous dinosaur bone [56]. This sequence corresponded to a human numt containing cytochrome b sequence [57], probably from reagent or sample contamination.
Direct pseudogene contribution is not always obvious and can confound suggested mtDNA biomarkers. For example, one set of primers in our data set amplifies tRNA leu and ND1 (3230-3893). Subsequent cloning data Numts co-amplify from clinical samples Example of an alignment of three clones Figure 3 Example of an alignment of three clones. Example of an alignment of three clones (clones G C11. A1, G C3. A1 and G C5.C1) recovered from three chromosomes (Chr11 -NT_009237, Chr5 -NT_006713, and Chr3 -NT_005612) to the rCRS is shown. a  [11]. In addition, the heteroplasmic patterns seen at bps 3697 and 3796 are mirrored by the nuclear pseudogene patterns (Table 3).
Re-examination of the raw data from the above studies could address if the disease mutations are actually due to co-amplified numts. Potential markers must be thoroughly investigated to preclude the inclusion of false mutations in the interpretation of mtDNA mutations. BLAST searches of nuclear pseudogenes belie the possibility of widespread integration and/or replication of these sequences, since primers may amplify homologous numts embedded elsewhere in the genome. Thus, high copy numbers for these nuclear segments can produce potential misleading heteroplasmic signals.  (Table  3). Clearly there are numerous paralogous loci for mitochondrial tRNA leu , the amplification of which depends on location and homology of primer sets.
Because of this association in our data, we compared our cloning results to the suggested mitochondrial genome disease associated sites listed on MITOMAP [52]. Results suggest that many mutations require meticulous scrutiny because of paralogous nuclear commonalities. Although many of these mutations may well be actual disease markers, the possibility of numt association may confound detection. For example, proposed prostate cancer mutations (G5913A, G5973A and G6081A) are identified, by cloning data, as resident on chromosomes 1 (2 homologous copies) and 17 even though the authors exercised precautionary measures by scanning a database of known nuclear pseudogenes [49,62]. A locus on chromosome 6 was identified as a potential co-amplification product, yet chromosomes 1 and 17 were not detected. Co-amplification of numts is primer dependent, which may explain the differences seen here; however, database limitations and the absence of extensive wet-lab numt data obscure the meaning of the marker.   If the use of mitochondrial DNA and in particular somatic mitochondrial genome mutations has important utility The distribution of numt clones. Figure 4 The distribution of numt clones. The distribution of numt clones (based on our primers) across the rCRS reveals regions of the rCRS with multiple numts copies. The pseudo-mitochondrial genome assembled from consensus numt sequences.The distribution of numt clones across the rCRS reveals sites that could be problematic when primers are designed to targets in these regions. Clone name, chromosomal location and rCRS positions are indicated above each clone.
and medical merit, much of the data requires critical follow-up from a pseudogene perspective. Amplification of ρ 0 DNA template with primers to identify and eliminate those which co-amplify nuclear pseudogenes is a vital and necessary procedure [16]. For example, mitochondrial PCR protocols were simultaneously run on clinical samples and nucleic acid recovered from ρ 0 cells to identify and exclude co-amplification of numts in work by Coskun et al. [64]. Alternatively, data may be screened by amplifi-cation and sequencing of ρ 0 derived DNA and conflicting sites then backed out of actual data generated with identical primers; however, this approach is labor-intensive [43]. Phylogenetic analysis of the data would also help distinguish polymorphisms from authentic mutations [22]. In general, and unfortunately the advice by Parfait et al. has been largely ignored [16].
Our surprising results are not limited to short amplicons, but are also detected in much larger amplicons. For example, the overlapping amplification of chromosome 5 from bp 8816 to 15051 cautions against assuming that long amplicons are pseudogene free. These possibilities and characteristics of the nuclear genome must be considered when using mitochondrial sequence data for population, forensic or disease studies. Although designing and testing primers to avoid co-amplification of numts is a good laboratory practice, compilation of numts representative of the entire mitochondrial genome is valuable to catalog and characterize the overall nuclear burden of these sequences.

Conclusion
Amplification of overlapping numts paralogous to the mitochondrial genome indicates that co-amplification of nuclear mitochondrial pseudogenes is a real problem for accurate sequence interpretation. Not only is co-amplifi-A "piggyback" effect resulting from chromosomal copy number and shared divergent sites is demon-strated in a patient sample. Figure 6 A "piggyback" effect resulting from chromosomal copy number and shared divergent sites is demonstrated in a patient sample. The chromatogram is from a patient for whom heteroplasmy at positions 1709 and 1719 were later noted tobe homologous to three chromosomes (3, 5, and 11), suggesting a possible co-amplification of numts in this instance.
Our primers recovered lower number of paralogous sequences compared to BLAST searches. Figure 5 Our primers recovered lower number of paralogous sequences compared to BLAST searches. A BLAST search using the rCRS region covered by the three clones ( Figure 3a)returns more numts representative of this region(25)than the three obtained by our cloning data.
cation dependent on the particular amplicon used, but the copy number of these loci is also important. Only certain positions across the mitochondrial genome are associated with multiple copies of numts. Mitochondrial DNA heteroplasmy should be interpreted with caution since they can be the result of nuclear/cytoplasmic co-amplification. Herein, we have demonstrated the robust amplification of numts. This paper is the first to fully sequence the 46 paralogous DNA fragments that represent the entire mitochondrial genome using 36 primer pairs. This is a surprisingly low number, but reveals that only a limited number of paralogous numts are relevant when considering if heteroplasmic call are authentic mutations. Compilation of a complete data set of numt sequences will help others distinguish paralogous nuclear based heteroplasmy in forensic, population and medical applications.

Southern Blotting
Mitochondrial genomes were cut with PvuII from 2 ug of total DNA extracted from normal blood and ρ 0 cells.
Digested product was electrophoresed through a 0.4% agarose gel and blotted onto a membrane (Hybond-N + , Roche Applied Sciences). Probes were generated from full length mtDNA (16.5 kb) by random primer labeling using the DIG System (Roche Diagnostics). Blots were incubated with probe, washed, blocked, incubated with anti-digoxigenin-AP fragments (Roche Applied Science) Figure 7 Multiple numt copies are present in the nucleus. PCR amplification of total DNA extracted from ? 0 and blood (bld) cells with primers targeting ND1, ATPase6 and CYTB genes.

Multiple numt copies are present in the nucleus
In contrast to the single amplicons obtained from blood, template from ? 0 contains additional high molecular weight amplicons. Lad is a 100bp DNA size standard (Fermentas life sciences).
and reacted with a chemiluminescent substrate (CDP-Star ® ) and exposed to X-ray film (Kodak) as recommended by the DIG Application Manual for Filter Hybridization (Roche Diagnostics, 2000).

PCR
For reverse transcriptase PCR analysis, total RNA was extracted from ρ 0 cells and a snap frozen skin sample using standard protocols outlined in the RNeasy Micro Kit manual (Qiagen). A DNase1 treatment step was included in the RNA extraction process to ensure the complete removal of all genomic DNA. We assessed RNA quantity and quality with the ND-1000 spectrophotometer (Nano-Drop ® technologies) and by gel electrophoresis. First strand DNA was synthesized with the Omniscript ® RT (Qiagen) kit. 2 ul of the cDNA was amplified with primer sets to coding mitochondrial genes and a nuclear gene, 5aminolaevulinate synthase (hALAS) ( Table 3), using the PCR conditions described above except the annealing temperature for these primers was 54°C.
To examine for multiple copy numts, 50 ng of genomic DNA from ρ 0 cells, blood and human placenta were amplified as described above, using primer sets to OXPHOS genes,.

Authors' contributions
RLP conceived of and supervised the study, and drafted the manuscript, JM, BR AA, and KR conducted experiments and participated in sequence alignment and data analysis, RW co-coordinated sample collection, GDD conducted experiments and helped draft the manuscript, JPJ provided intellectual insight and helped draft the manuscript, RET designed experiment and helped draft manuscript. All authors read and approved the final manuscript.