Mt genomes in nematodes
Our discovery and description of the mt genome of G. ellingtonae, composed of two large circles, provides new insight into a phenomenon rare in the metazoa: the division of mt genomes into multiple circles. The context of this discovery is especially intriguing as G. ellingtonae is phylogenetically intermediate between G. pallida and G. rostochiensis, both of which have at least five mini-circles, ranging in size from ~6 to 9 kb (Fig. 5) [19, 20, 25]. At the structural level, the finding of species with such different circle numbers and sizes is a pattern as yet unseen within any other metazoan genera. Of interest is the as yet unknown karyotype of the unsequenced mt genome of the tobacco cyst nematode, G. tabacum, sister to G. rostochiensis. There are very few complete mt genomes available for nematodes closely related to the Globodera, i.e. in the monophyletic clade of the infraorder Tylenchomorpha that includes Globodera. There are complete mt genomes available for Radopholus similis and Pratylenchus vulnus [27, 28] showing single circles in these species. Recent comparative studies have yielded single circle mt genomes for five species of Meloidogyne [29, 30]. There is one nearly complete Heterodera mt genome available (H. glycines), missing a section of presumable ncs, but PCR evidence indicates it too is a single circle mt genome [31]. There is PCR evidence of a mini-circle containing only a subset of mt genes in a close Globodera relative, Punctodera chalcoensis, indicating it likely has a multipartite genome; however, the partial mt sequence available for it has 100 % synteny with the mt genome of H. glycines [31]. Further investigation of the mt genome karyotype in P. chalcoensis and other basal relatives within the genus Globodera will be key in determining when the multipartite condition in this lineage first formed.
The single mt circles in these tylenchid relatives are among the largest mt genomes reported in nematodes, which most commonly are under 16 kb, defying the usual conservation of mt genome compactness in the metazoa (Table 3; Fig. 5). As mt genome size increases in these tylenchids so does the length of their ncs, with P. vulnus having the largest circle, 21.7 kb, and longest ncs, 6.8 kb. Yet the two circles of G. ellingtonae, individually smaller than the mt genome of P. vulnus, have even longer single stretches of ncs, 7.2 and 8.1 kb. Despite the exceptionally long stretches of ncs in tylenchid nematodes, no high identity sequence regions (i.e. longer than 20 bp at >90 % identity or longer than 40 bp at >80 % identity) were found between the ncs of G. ellingtonae and that of P. vulnus, R. similis, or any of the five sequenced species of Meloidogyne [27–30]. A highly unusual feature of the ncs in G. ellingtonae is its reduced percent AT composition compared to its own coding sequence and to the ncs of its tylenchid relatives: ~62 % AT in the G. ellingtonae ncs as compared to 73 to 87 % in the other genera. Nowhere in the ncs was there a long stretch of extremely high AT content as seen in the model nematode C. elegans. This is of particular note as “AT-rich region” is often used as a synonymous term for the non-coding region and/or control region in mammalian and other mt genomes.
Functional convergence in multipartite mt genomes
Given that a multipartite genome organization has appeared independently in such disparate metazoan taxa as rotifers, nematodes, and insects, the question arises of whether this state is mildly deleterious, neutral, or of benefit. A possible functional advantage of having the mt genome in multiple circles is the localization of genes to different transcriptional units. Ojala et al. [32] proposed that human mtDNA is transcribed as a single polycistronic molecule, which is then further processed to produce separate tRNAs, rRNAs, and mRNAs for individual genes. Although others have shown that transcription may have multiple sites of origin and termination, the transcripts are generally polycistronic [26]. It therefore seems advantageous for genes coding for proteins for the same ETC enzyme complex to be on the same transcriptional unit, yielding more efficient co-regulation of their expression [33]. By dividing the genome into separate circles, separate transcriptional units are created. The most extreme example of this is observed in several species of blood-sucking lice for which each protein-coding gene is on a separate circle [14].
Consistent with this model of segregation of transcriptional units, gene types were highly partitioned between the two G. ellingtonae mt circles, with 10 of 12 protein-coding genes on mtDNA-I and both rRNA genes, atp6 and nad1, and 20 of 22 tRNA genes on mtDNA-II (Fig. 1c). In two other metazoan genera reported to have a mt genome split into two similarly sized circles, Liposcelis and Brachionus, gene types (protein-coding, rRNA, or tRNA) also are distributed unevenly between the two circles, although not to the same extreme. The Brachionus mt genomes, which are structurally similar in the two species with complete sequences, have moderately segregated distribution of gene types, with the mtDNA-I having four protein-coding (atp6, cob, nad1, and nad2), both rRNA, and 14 tRNA genes, and mtDNA-II having eight protein-coding and nine tRNA genes [11, 34]. The mt genes of L. paeta and L. entomophila exhibit greater segregation, although the full complement of 37 mt genes were not all identified in either species [35]. Chromosome I of L. paeta has ten protein-coding, one rRNA, and three tRNA genes, while chromosome II has three protein-coding, one rRNA, and 11 tRNA genes. Chromosome I of L. entomophila has 11 protein-coding but no rRNA or tRNA genes; while chromosome II has one protein-coding, both rRNA, and 14 tRNA genes. The gene types are more evenly distributed in L. bostrychophila, with seven protein-coding, one rRNA, and 14 tRNA genes on chromosome I, and six protein-coding, one rRNA, and nine tRNA genes on chromosome II [12]. Nonetheless, potential bias in the genes for the different ETC complexes is evident, as chromosome II contains both atp genes and four nad genes, whereas chromosome I has all cox genes, cob and the other three nad genes. The convergent characteristic of gene type segregation between the two circles in these three very disparate genera is consistent with the hypothesis of a functional benefit to such an organization. Although the mt genome of the nematode Rhabditophanes sp. KR3021 is divided into two circles, the gene order is highly conserved with the conventional order found in several genera of related nematodes including Bursaphalenchus, Caenorhabditis, and Ascaris, suggesting either recent genesis of the multipartite structure or strong selection pressure for conservation in that lineage [13].
Differential copy numbers of each circle found in two-circle genomes further supports a possible functional role for a multipartite structure. That mtDNA-II of G. ellingtonae is in higher relative copy number may relate to a greater requirement for the “building blocks” (tRNAs and rRNAs) of the protein-coding genes than for the mRNA templates of those proteins. Differential copy numbers were found in Liposcelis and Brachionus as well; in B. plicatilis there was a 4:1 ratio of mtDNA-I to mtDNA-II and in L. bostrychophila mt chromosome I was twice as numerous as mt chromosome II. Although the differentiation in gene composition of those mt circles was not as great as in G. ellingtonae, the commonality among all three taxa is that the higher copy number circle contained more tRNA genes. Additionally, for both G. ellingtonae and Brachionus the higher copy circle contained atp6, nad1, and both rRNA genes. It is uncertain whether the increasing mt copy differential corresponding to developmental stage in G. ellingtonae serves a direct function or is a byproduct of some other force. For example, one functional explanation is that increased replication is associated with increased transcription of mtDNA-II. However, a possible non-functional explanation is that smaller mt circles may have a replicative advantage, resulting in differential copy numbers [36]; in other words, mtDNA-II may be at higher copy number as a byproduct of a “selfish” DNA process. The circle size and copy ratio relationship in Brachionus fits such a model, but in L. bostrychophila it is the slightly larger circle that has the higher copy number. Experiments with the nematode C. elegans indicate that the majority of its mt genome replication occurs in the gonads, with much of the mtDNA of somatic cells simply derived from mtDNA dispersed during embryonic development [37]. If this is also the case in G. ellingtonae, just a small increase in replication or slightly lower degradation rate of mtDNA-II in somatic tissue could generate the mtDNA-II:mtDNA-I ratio increase with age.
Formation of multipartite mt genomes
The mechanism by which the multipartite genome in Globodera and other lineages arose is uncertain. Movement of stretches of mtDNA can occur by various mechanisms including slipped-strand mispairing and recombination, resulting in rearrangements such as duplications, deletions, and inversions [38]. If rearrangements exist in a subset of the mtDNA creating a heteroplasmic population of mtDNA, further differentiation of the two (or more) types of mtDNA could eventually lead to circles with distinct gene structure and/or segregated gene content. Some have proposed multipartite mt genome formation via tandem duplication followed by random loss [35]. A particularly rapid path to such differentiation would be an initial duplication of the entire genome followed by degradation of distinct gene sets on or recombination between initially heteroplasmic circles. Evidence was recently reported supporting mtDNA replication by a rolling circle mechanism in C. elegans [6]. If rolling circle replication of mtDNA is conserved among nematodes, an ancestral duplication of the mt genome in the Globodera lineage is easily envisioned. Whole genome duplication followed by gene loss has resulted in higher rates of evolution in yeast [39]. Following such duplication, selection pressure could favor maintaining operational copies of gene functional groups on the same transcriptional units and thus causing segregation between circles. Given the unusual pattern of mt karyotypes in the three sequenced Globodera species, it is unclear whether the formation of the ancestral multipartite structure involved a single split into two circles or a “shattering” of the genome into multiple circles that were subsequently re-joined to form one or both large circles of G. ellingtonae. Future investigation of potentially multipartite mt genomes of nematode species basal to this group is needed.
As remnants of reorganization, the structure of the G. ellingtonae pseudogenes may provide insight into the etiology of the divided genome. Curiously, the pseudogenes of each circle are more similar to each other than they are to the respective functional gene. Three different hypotheses could explain this pattern. 1) All the pseudogenes were created when there was still a single (perhaps duplicated) circle, they differentiated from the functional genes, and then they were copied into both circles when the circles formed. 2) Two circles were formed and only one had the pseudogenes or different pseudogene segments were on different circles. Following pseudogene differentiation from functional sequence, recombination resulted in duplication of the pseudogene(s) into the other circle. 3) The process by which the two circles formed resulted in duplicate pseudogenes on both circles that were then subject to strong homogenization pressure. Differences in the divergence of the pseudogenes provide other clues. The longest evident pseudogene, p-cox1-b, is also the most differentiated from the functional gene, with both high deletion levels and high sequence divergence in the ungapped sequence. Preceding it in the ncs is p-cox2, which is the only pseudogene with a deletion between the copies on the different circles. This suggests the possibility that the ancestral creation of this p-cox2 to p-cox1-b region yielded a recombination hot spot, creating the peripheral more identical pseudogenes during recombination events while facilitating rearrangement of the genome. Such analysis of pseudogene structure may be hindered by the incomplete detection of highly degraded pseudogenes. A high rate of pseudogenization is also found in Liposcelis, with four, eight, and 15 pseudogenes in L. bostrychophila, L. paeta and L. entomophila, respectively, while no pseudogenes were reported in Brachionus.
The 2.2 kb of sequential pseudogenes in the mtDNA of G. ellingtonae is a subset of the 5.1 kb stretch of sequence with 98 % identity between the two circles. We did not identify a functional control region on the circles, although it is most likely located either in the 5.1 kb region with 98 % pairwise identity or in the upstream 1 kb sequence with 87 % pairwise identity. The control region presumably would have been necessary in both/all progenitors of the current two circles for their continued replication. However, if the control region is not in the 98 % identity region, then the latter may have been recently derived from a ncs region that diverged in just one of the circles but then was duplicated into the second circle. If that sequence region was always on both circles, its high conservation could be explained either by concerted evolution creating homogenization of the ncs on both circles, as seen in the two ncs regions in some snakes [40], or by the two-circle state being very recently derived and rates of evolution much higher in the protein-coding region than in the large ncs.
There are intriguing similarities and differences in the distribution of ncs in the other genera with two-circle multipartite genomes. Both G. ellingtonae and Brachionus spp. have long stretches of highly similar ncs shared between circles, ~6.5 and 4.9 kb, respectively [11]. However derived, long stretches of homologous sequence between otherwise different circles provides a region of increased recombinational potential. Although the mt genomes of both L. paeta and L. entomophila have a higher than usual proportion of ncs, it is highly dispersed between coding genes, particularly between tRNA genes; it is very unequally distributed between circles with much more on chromosome II; and little ncs is shared between circles, with just three stretches of shared sequence, each less than 400 bp [35].
Maintenance and inheritance of multipartite mt genomes
Once differentiated mt circles have formed, regardless of the cause, organisms must have a mechanism to maintain function of all circles to preserve mitochondrial function. To date, there has been little discussion in the literature of how multipartite genomes are partitioned in nucleoids. Generally, mtDNA is packaged with proteins into nucleoids, each containing from one to ten copies of the mt genome, that are evenly distributed on the mitochondrial inner membrane [41, 42]. An interesting line of inquiry is whether all circles of a multipartite genome are contained within individual nucleoids, and if so whether the copy ratio of the different circles is consistent between nucleoids. In single circle genomes with heteroplasmic variants, there is evidence both for random distribution of heteroplasmic types and for within-nucleoid purifying selection [41, 43]. Additionally, the mechanism by which the full complement of circles of a multipartite genome is faithfully transmitted to the next generation is unknown, but certainly involves nucleoid organization. It is of note that in the case of G. ellingtonae, where the circle copy number ratio changes with development, there must exist a mechanism by which the ratio is maintained within the germ line or reset during gametogenesis or embryogenesis. The multipartite mt genome in G. ellingtonae could provide a model system for deconstructing mechanisms of nucleoid based regulation of mtDNA both in somatic cells and during germline development.