Evolutionarily recent, insertional fission of mitochondrial cox2 into complementary genes in bilaterian Metazoa

Background Mitochondrial genomes (mtDNA) of multicellular animals (Metazoa) with bilateral symmetry (Bilateria) are compact and usually carry 13 protein-coding genes for subunits of three respiratory complexes and ATP synthase. However, occasionally reported exceptions to this typical mtDNA organization prompted speculation that, as in protists and plants, some bilaterian mitogenomes may continue to lose their canonical genes, or may even acquire new genes. To shed more light on this phenomenon, a PCR-based screen was conducted to assess fast-evolving mtDNAs of apocritan Hymenoptera (Arthropoda, Insecta) for genomic rearrangements that might be associated with the modification of mitochondrial gene content. Results Sequencing of segmental inversions, identified in the screen, revealed that the cytochrome oxidase subunit II gene (cox2) of Campsomeris (Dielis) (Scoliidae) was split into two genes coding for COXIIA and COXIIB. The COXII-derived complementary polypeptides apparently form a heterodimer, have reduced hydrophobicity compared with the majority of mitogenome-encoded COX subunits, and one of them, COXIIB, features increased content of Cys residues. Analogous cox2 fragmentation is known only in two clades of protists (chlorophycean algae and alveolates), where it has been associated with piecewise relocation of this gene into the nucleus. In Campsomeris mtDNA, cox2a and cox2b loci are separated by a 3-kb large cluster of several antiparallel overlapping ORFs, one of which, qnu, seems to encode a nuclease that may have played a role in cox2 fission. Conclusions Although discontinuous mitochondrial protein genes encoding fragmented, complementary polypeptides are known in protists and some plants, split cox2 of Campsomeris is the first case of such a gene arrangement found in animals. The reported data also indicate that bilaterian animal mitogenomes may be carrying lineage-specific genes more often than previously thought, and suggest a homing endonuclease-based mechanism for insertional mitochondrial gene fission. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3626-5) contains supplementary material, which is available to authorized users.


Background
Mitochondria contain residual genomes (mtDNA) with the majority of their original α-proteobacterial gene set transferred to the host nucleus or lost by other means [1,2]. Among the most compact mitogenomes are those of multicellular animals (Metazoa) with bilateral symmetry (Bilateria) [3][4][5][6][7]. They usually carry 37 annotated intronless genes, of which only 13 are protein-coding, and they have dramatically reduced or entirely absent intergenic regions. Deviations from this conserved gene set in the Bilateria are rare and comprise mostly tRNA genes. Although cases of a protein-coding gene missing from mtDNA are known in Vertebrata (atp8, nad5, nad6), Chaetognatha (atp6, atp8), Nematoda (atp8), and Platyzoa (atp8) [6], some of them may actually represent the presence of highly derived gene variants rather than true gene loss [6,8,9]. The only lineage-specific translated genes identified in bilaterian mitogenomes are the f-and m-ORFs found in bivalves (Mollusca) with doubly uniparental inheritance of mitochondria [10,11]. Moreover, a conserved non-overlapping ORF was identified in the control region (CR) of mammalian mtDNA [12], unassigned ORF sequences have been found in Lingula (Brachiopoda) [13], and an ORF that likely originated through the duplication of a canonical gene was found in the mtDNA of oysters (Mollusca) [14].
The association of cases of presumptive mitochondrial gene loss or acquisition of new ORFs with the increase of rate of nucleotide substitutions and mtDNA rearrangements prompted speculation that modifications of the mitochondrial gene content in Bilateria might be more common than is currently assumed due, in part, to the relative underrepresentation of faster-evolving mitogenomes among sequenced mtDNAs. Indirectly supporting this hypothesis, additional protein-coding genes have been identified in the mtDNA of basal metazoans [6], and the transfer of genetic material from the mitochondria to the nucleus is a well-known phenomenon that still occurs in almost all eukaryotes, although it usually generates nuclear pseudogene copies of mitochondrial genes (NUMTS) [15][16][17]. Functional relocation of mitochondrial genes to the nucleus, where they would resume their expression thus allowing for the loss of their mitochondrial copy, has been shown to continue in protists and plants, and involve both intact and fragmented genes [18]. Interestingly, half of the split, originally mitochondrial genes have at least one of the derived genes transferred to the nucleus and lost from the mtDNA. They include (i) cox1 in the majority of eukaryotic supergroups (excluding, among others, plants and Opisthoconta) where it split at the 3' end and the 3' terminal fragment was transferred to the nucleus [19]; (ii) cox2 in Alveolata and chlorophycean algae (Chlorophyta), where both or only the 3' terminal half of the gene was transferred to the nucleus [20][21][22][23][24]; (iii) rpl2 in eudicots (Angiospermae), where both or only the 5' or 3' section was transferred to the nucleus [25]; and (iv) sdhB in Euglenozoa, where both derived genes were transferred to the nuclear genome [26]. The fission of mitochondrial genes for proteins with transmembrane topology, which might be difficult to transfer to mitochondria if they were encoded in the nucleus, may allow for their partial relocation limited to a region coding for a less hydrophobic part of a protein [19-21, 24, 27-29]. Split protein-coding genes with both derived genes residing in mtDNA include nad1, nad2 and rps3 of ciliates (Alveolata) [30][31][32] and ccmF N and ccmF C (ccb) orthologs of bacterial ccmF (ccl1) in plants such as Marchantia and several groups of angiosperms [25,33].
Here, the correlation between mitochondrial gene loss, gene fragmentation or the addition of new genes and an increased rate of mtDNA evolution was explored using the mitogenomes of the apocritan Hymenoptera (Arthropoda, Insecta). Hymenoptera in the suborder Apocrita, which includes Aculeata and families grouped in the paraphyletic "Parasitica", were selected for the present studies due to more rapid evolution of their mtDNA compared with the evolution of the majority of sequenced mitogenomes of other insects and metazoans in general [34][35][36]. The screen applied in these studies retrieved unique for animals fission of a canonical mitochondrial gene, cox2, in representatives of Campsomeris (Dielis) (Scoliidae). Scoliids are a family of solitary wasps that develop as idiobiont ectoparasitoids of the larval stages of Scarabaeoidea and less often other Coleoptera. Cox2 encodes the subunit II of cytochrome c (CytC) oxidase (COX) that mediates the transfer of electrons from CytC to COX subunit I (COXI) during oxidative phosphorylation (OXPHOS). The split of cox2 in two genes for complementary COXIIA and COXIIB polypeptides likely occurred through intragenic insertion of a cluster of several ORFs, one of which encodes a putative endonuclease that might have been directly involved in the process of cox2 fission.

Mitochondrial cox2 gene in Campsomeris is split in half
The exploration of hymenopteran mitogenomes for potential changes in gene content was guided by a PCRbased screen that primarily targeted mtDNA segmental inversions, as well as deletions and duplications/insertions. Large, multigenic inversions represent uncommon type of mitochondrial genome rearrangements and can be mechanistically linked to gene translocation, fragmentation, loss and duplication, or the acquisition of new genes. For instance, both inversions and modifications of gene structure and content may arise during the process of the repair of DNA double-stranded breaks by nonhomologous end-joining [37]. The screen was designed to identify inversions of cox1-versus rrnL-bearing segments of mtDNA (Additional file 1: Figure S1). In the typical circular mitogenome of insects and other pancrustaceans [38], cox1 and rrnL genes are separated from one another by two and three rearrangement hotspots located within clusters of tRNA genes (trn[I-Q-M]; [34]. The primers used in the screen were designed to map within conserved regions of cox1 and rrnL, and their orientation permitted the amplification of mtDNA between the two genes if one of the genes was inverted. This strategy led to the identification of cox1 versus rrnL segmental inversions in representatives of Scoliidae and Chrysididae of Aculeata and in Cynipidae and Chalcidoidea of Proctotrupomorpha (Table 1, Fig. 1). Subsequent sequencing of these rearranged mitogenomes revealed the presence of a 3.0-kb insertion within the cox2 gene of Campsomeris (Dielis) plumipes (Drury) (ssp. fossulana (Fab.)) of Scoliidae ( Figs. 1 and 2). Thereafter, a similar 2.6-kb insertion was found in cox2 of another yet undescribed Campsomeris (Dielis) sp. HA10513 (Fig. 2).
To determine the incidence of the cox2 split within Scoliidae and verify its confinement to this aculeate family, the mtDNA of two Scolia species, S. bicincta and S. dubia (Scoliidae), and cox2 of randomly selected representatives of hymenopteran families of Tiphiidae, Mutillidae, Pompilidae, Formicidae, and Apoidea (Table 1), which are phylogenetically more closely related to Scoliidae, were sequenced. The integrity of the cox2 gene was preserved in all the additionally analyzed Hymenoptera, suggesting that cox2 fission may be confined to Campsomeris or Campsomerinae. Of note, the mitogenomes of Scolia also featured segmental inversion corresponding to the inversion of trn[Q-M-L2-M-H]-nad2-trn[W-C-Y]-cox1-cox2a-insert-cox2b-trnK found in Campsomeris mtDNA (Fig. 2).  The cox2-splitting insertion occurred within a relatively less conserved region of the gene (Additional file 1: Figure S2). It divided cox2 into cox2a, encoding two transmembrane helices, the N-terminal intermembrane space domain and the "heme-patch" region (containing Trp 105 , which functions as the point of electron entry from CytC; KT740996) of the canonical COXII, and cox2b, encoding intermembrane space C-terminal half of COXII containing the binuclear Cu A center. In C. p. fossulana, the insertion is located in-frame with cox2a and cox2b, meaning that cox2 might still be expressed as a single polypeptide that is larger than the original one.
In-frame insertions in the corresponding region of cox2 in ciliates, brown algae, microflagellata, and bacteria resulted in enlargement, not fission, of cox2 genes [39,40].
To determine whether C. p. fossulana COXII is encoded by an enlarged, single cox2 gene or separate cox2a and cox2b genes, the 5' and 3' ends of the cox2 transcripts were mapped by RACE. This analysis showed that cox2a and cox2b transcripts are discrete, non-overlapping, and polyadenylated. It also showed that the cox2a termination codon, UAA, was completed by polyadenylation (Fig. 2). Moreover, RACE analysis of the cox2 transcripts did not provide evidence for cox2 splicing in Campsomeris. However, since a group II intron is present within the cox1 gene in the mitogenomes of Annelida (the only known case of mitochondrial RNA splicing in Bilateria) [41], the absence of residual cox2 pre-mRNA splicing was additionally verified by PCR using cox2a-and cox2b-specific primers corresponding to sequences flanking the inserted DNA. The PCR did amplify a 3-kb DNA product from Campsomeris mtDNA but did not amplify any product from the cDNA, again arguing against even residual cox2 RNA splicing or cox2a and cox2b RNA trans-splicing into a single mRNA.
The relative levels of cox2a and cox2b transcripts were determined by RT-qPCR and appeared to differ from one another. In comparison with the RNA level of cox1, the cox2a transcript was slightly less abundant whereas the cox2b transcript was present at a level approximately 3 times higher. This finding further supports the results obtained by RACE that mature cox2a and cox2b mRNAs represent separate entities.
Cox2a and cox2b genes are translated To determine whether C. p. fossulana cox2a and cox2b genes did not represent transcribed pseudogenes, the C. p. fossulana mitochondrial proteome was analyzed by western blotting using polyclonal antibodies (Abs) generated against deduced COXIIA and COXIIB synthetic epitopes. Western blot analysis revealed that Campsomeris cox2 was translated as two separate polypeptides, COXIIA and COXIIB, with sizes comparable to those predicted from the cDNA sequences (115 and 100 amino acids, respectively) ( Fig. 3a, Additional file 1: Figures S3 and S4). Moreover, none of the Abs detected a larger polypeptide that might otherwise indicate the occurrence of posttranslational, inteinmediated trans-splicing of COXIIA and COXIIB. To date, split cox2 genes have only been found in two groups of protists, i.e. Chlorophyta [20][21][22]24] and Alveolata [23,42,43] (Fig. 1). Alignment of the predicted sequences of the COXII split sites indicated that, in protists, cox2 splitting occurred in the position corresponding to the COXII splitting site in Campsomeris (Additional file 1: Figure S2). In-frame insertions into cox2 in ciliates (Alveolata), which generated enlarged COXII Fig. 1 Cox2 fission across phylogeny. Lineages harboring taxons carrying the cox2 gene split into derived genes are marked in red. Asterisks denote the presence of segmental inversions of cox1 versus rrnL in the mtDNA of Hymenoptera (Additional file 1: Figure S1). Simplified tree topologies are based on recent revisions by He et al. [70] (Eukaryota), Mao et al. [71] (Hymenoptera) and Johnson et al. [72] (Aculeata) polypeptides, also occurred in the same position as COXII-splitting insertions in other protists and Campsomeris.
The three-dimensional structures of C. p. fossulana COXIIA and COXIIB polypeptides were modelled using a template-based method with the I-TASSER algorithm (NovaFold) (Fig. 3b). When superimposed on a similarly determined structure of the intact COXII of S. bicincta, both COXIIA and COXIIB showed a good fit supporting their functionality (Fig. 3b). COXIIA and COXIIB of Chlorophyceae (Chlorophyta) and Alveolata have been proposed to reassemble into functional heterodimeric COXII by taking advantage of the interactions between their unique C-and N-terminal extensions, respectively [22,23]. Sequencing of the ends of C. p. fossulana cox2a and cox2b cDNAs indicated that C. p. fossulana COXIIA and COXIIB do not have extended terminal regions (Additional file 1: Figures S3 and S4), and Instead, they might reassemble by taking advantage mostly of shape and electrostatic internal complementarity. Reconstitution of active proteins even from multiple fragments, including those with breakpoints mapping within welldefined functional domains, has been demonstrated for numerous proteins [44][45][46]. Moreover, by analogy to . The inversion might have occurred due to recombination between similar and oppositely oriented trnK and trnQ genes. Other modifications of the C. p. fossulana mitogenome include single-gene inversions of trnQ, trnC and trnS1, translocations of trnF and trnL2, shuffling of trnS1, duplication of trnM, the presence of trnH CAT in addition to trnH CAC , the presence of trnK AAA (within the CR) in addition to trnK AAG , and the loss of trnI (or its replacement by a putative trnI gene located within rrnL). TrnS1, trnR and putative trnI encode tRNAs that lack the TψC (T) arm. The chromatopherogram of the cDNA sequence corresponding to the cox2a mRNA 3' end (RACE product) shows that the cox2a stop codon, UAA, is generated by the polyadenylation of U (corresponding to T 6195 ; KT740996). The positions of genes/ORFs in Campsomeris sp. HA10513 were deduced from comparison with the corresponding regions of C. p. fossulana mtDNA (5' part of the reading frame of Campsomeris sp. HA10513 orf3, marked dark gray, is shifted in comparison to the reading frame of C. p. fossulana orf3). C. p. fossulana orf3-9 correspond to polyadenylated mRNAs that have been mapped by RACE. H2, N2, and K denote trnH CAT , putative (low covariance score) trnN AAT and trnK AAA (Campsomeris sp. HA10513) or trnK AAG (C. p. fossulana), respectively intramolecular interactions found in Paracoccus denitrificans COXII [47], the N-terminal loop of COXIIA might contribute to COXII heterodimer assembly by interacting in the mitochondrial intermembrane space with COXIIB. Interestingly, the N-terminal intermembrane space domain of Campsomeris sp. HA10513 COXIIA is shortened, but in this case, the N-terminus of COXIIB is significantly enriched in positively charged Lys residues (Additional file 1: Figure S2). Since the Cterminus of COXIIA contains negatively charged Glu residues (Additional file 1: Figure S2), the COXII heterodimer might be additionally stabilized in this case by a salt bridge between the C-and N-termini of COXIIA and COXIIB, respectively (Fig. 3c). Finally, the involvement of interacting proteins usually dramatically improves the kinetics of split protein reassembly [48]. COXII, together with COXI and COXIII, form the catalytic core of respiratory complex IV, surrounded by several COX subunits that are imported from the cytosol. Some of these proteins likely interact with COXIIA and COXIIB, contributing to the assembly of the functional COXII heterodimer. Of note, COXII splitting occurred within the CytC binding interface, the amino acid residues of which are scattered through the entire COXIIB and C-terminal intermembrane space region of COXIIA (Fig. 3c) [49]. Thus, COXII local folding around its binuclear center might be further adjusted during interactions with CytC.

Hydrophobicity and Cys content of COXIIA and COXIIB
Comparison of the amino acid content of Campsomeris COXIIA and COXIIB, with that of an intact COXII of S. bicincta and Apis mellifera revealed a decrease in fragmented COXII of Ile residues (the most abundant amino acid residue in COXII) and an increase of Cys residues (Additional file 1: Figure S5).
The impact of the reduced presence of hydrophobic Ile as well as Leu on the overall character of Campsomeris COXIIA and COXIIB was estimated by calculating the average hydropathy (GRAVY) for COXIIA, for the first and second transmembrane helices of COXIIA, for COXIIB, and for the corresponding regions of intact COXII of other Hymenoptera and representatives of other taxonomic groups. A comparison of the GRAVY values showed that Campsomeris COXIIA and, to a lesser degree, COXIIB exhibited reduced hydrophobicity compared with the corresponding regions of COXII in Scolia and in the majority of other Hymenoptera (Fig. 4a). The hydrophobicity of the first transmembrane helix of Campsomeris COXIIA was also among the lowest in Hymenoptera (Fig. 4a). Interestingly, the hydrophobicity of Campsomeris COXIIA and COXIIB polypeptides was similar to that of Chlamydomonas COXIIA and COXIIB or Scenedesmus COXIIB, all of which are encoded in the nuclear genome and transported to mitochondria. c Schematic alignment of COXII polypeptides and some of the regions proposed to be involved in COXII heterodimer reassembly. Terminal domains that are likely engaged in electrostatic interactions are shown in blue and marked "-"and "+", respectively. The COXII/CytC interface is defined as in Schmidt et al. [49] Cys residues are the only reactive amino acid side chains with substantially changed representation in Campsomeris COXIIB compared with intact COXIIs (Additional file 1: Figure S5). A phylogeny-wide survey of the Cys content in the COXII intermembrane domain, corresponding to COX-IIB, revealed that this domain was specifically enriched in Cys not only in Campsomeris COXIIB but also in other split or enlarged COXII polypeptides (Fig. 5), all of which might benefit from redox-based assistance to maintain their proper folding or intermolecular interactions.
Cox2a and cox2b loci are separated by a cluster of antiparallel overlapping transcribed ORFs Sequencing of the C. p. fossulana 3-kb insert and its conceptual translation revealed, in addition to the mentioned continuous ORF bridging cox2a and cox2b, the presence of five ORFs on the complementary mtDNA strand, ranging in size from 0.2 to 1.1 kb (Fig. 2). RACE analysis of C. p. fossulana mitochondrial cDNA indicated that all ORFs were transcribed and their RNAs were polyadenylated, with cleavage/polyadenylation sites being much more scattered along the transcripts than in case of canonical mitochondrial genes. This analysis also revealed that the continuous ORF, including cox2a and cox2b, was transcribed as RNA that was processed into cox2a and cox2b mRNAs and other mRNAs, four of which (qnu and orfs3-5) had inframe TAA termination codons generated by polyadenylation (Fig. 2). In Campsomeris sp. HA10513, continuity of the ORF corresponding to the C. p. fossulana largest ORF (including cox2a and cox2b) was interrupted in the middle of the insert, and there were only two ORFs, wfw and orf10, on the opposite to cox2 strand (orf10 did not share amino acid sequence similarity with polypeptides deduced from any of the C. p. fossulana ORFs) (Fig. 2). Pairwise alignments of deduced amino acid sequences of the inserted ORFs from the two Campsomeris species identified four groups of ORFs, qnu, wfw, orf3 and orf4, with orthologous ORFs sharing extensive similarity and hence being likely of potentially functional significance.
Nucleotide and protein database searches using BLAST revealed that none of the ORFs encoded by the inserted DNA fragment had significant sequence similarity at the DNA or protein level to previously described genes, thus obscuring the origin of the insertion (qnu exhibits limited stretches of sequence similarity that are discussed in the next section). The A + T content of the inserted DNA fragment was~13% lower compared to that of the remaining part of the C. p. fossulana mitogenome (Additional file 1: Figure S6), and was reflected by the decreased frequency of almost half of the A-and Campsomeris species and S. bicincta, respectively; in green for COXII of the non-hymenopteran species, Pediculus ("Phthiraptera") (29) and Drosophila (Diptera) (30); and in yellow for COXII of chlorophycean algae Scenedesmus (31) [73]. a The split COXII of Campsomeris is twice less hydrophobic than its intact counterpart in the next most closely related Scolia and is among the least hydrophobic eukaryotic COXII polypeptides. b The hydrophobicity of the first transmembrane helix of Campsomeris COXIIA is among the lowest in eukaryotes T-containing synonymous codons of the inserted ORFs (Additional file 1: Table S2).
A very distinctive feature of the insert was the antiparallel overlap of its ORFs (Fig. 2). Cis-natural sense antisense transcripts (cis-NATs) are found relatively frequently, even in the genomes of higher eukaryotes [50,51]. However, extensive bidirectional overlapping is rare especially among protein-coding genes because sequence variants in one gene can often have deleterious effects on the sequence of the complementary gene. In mitochondria, such gene arrangement has been proposed for cox1 and putative gene gau [52]. It seems interesting in this context that the open reading frames of overlapping qnu and wfw, as well as orf3 (to a lesser extent) and orf4, have been preserved despite experiencing numerous indels as was visualized by a pairwise comparison of their sequences from two Campsomeris species (Additional file 1: Figure S7). RT-qPCR-determined relative transcript levels of the inserted genes were in most cases 2-3 times higher than those of canonical mtDNA-encoded genes (Fig. 6). For each inserted pair of antiparallel overlapping genes, with the exception of qnu-wfw, both transcripts were present at relatively higher levels. In contrast, transcripts that were antisense to the canonical mitochondrial genes, were usually present at low levels, resembling mRNA profiles of Drosophila (Fig. 6) and human mitochondria [53]. Higher levels of cis-NATs versus non-cis-NATs have also been found in mammalian [50] and Arabidopsis [54] transcriptomes. In Campsomeris, the increase in RNA levels of some transcripts might indicate their mixed origin from the mitochondria and nucleus. No evidence of heteroplasmy was detected by sequencing RACE products corresponding to the inserted ORFs, cox2a and cox2b, or in sequences of cox2a amplified from total genomic DNA. Nevertheless, it is still possible that fragments of mtDNA containing the 3-kb inserted region or cox2 genes have been copied into Cys residue enrichment of derived COXII polypeptides. a Correlation between split or enlargement of COXII and Cys content of the COXII domain exposed to the mitochondrial intermembrane space, equivalent to COXIIB. Data for Campsomeris species (p. fossulana (3) and sp. HA10513 (5)), and S. bicincta (26) are plotted in red and black, respectively. Split or enlarged polypeptides encoded by mtDNA contained an average of two-fold or more Cys residues than unmodified COXII polypeptides (Mann-Whitney U-test: P = 0.001). A complete list of genera and the taxonomy of analyzed organisms are shown in (Additional file 1: Table S1). b The distribution of Cys residues (green marks) along the COXII intermembrane space domain (blue or black (Scolia)) homologous to Campsomeris COXIIB (red) nuclear genome and became transcribed. An increase in the stability of double-stranded RNAs or the presence of transcription promoter(s) within the insert might also contribute to higher levels of some transcripts. Sequences resembling the 15-bp promoter motif of human mtDNA were found similarly oriented upstream (GCTCCAGAA AAAGGAA) and downstream (TTCAACCAAATTA) of qnu and might account, in part, for the increased levels of qnu and orf3-5 transcripts. Higher levels of orf6-9 and wfw transcripts might result from the proximity of their corresponding genomic loci to the promoter(s) located within the CR, which, following inversion, were no longer separated from protein-coding genes by a cluster of several tRNA genes that likely slow down the elongation phase of transcription.
Qnu encodes a putative nuclease that might have been actively involved in cox2 fission Possibility of translation of the inserted ORFs was experimentally addressed for the two largest and best conserved inserted ORFs, qnu (Gln-Asn [QN] repeatcontaining nuclease gene) and wfw (Trp-Phe-Trp [WFW] repeat-encoding putative gene), by western blot (Fig. 7, Additional file 1: Table S3). By this criterion, both ORFs were likely expressed as polypeptides with sizes similar to those predicted from the mapping of their mRNA ends by RACE.
The predicted QNU polypeptide (364 and 387 aa-long isoforms) is hydrophilic (hydropathy value = -0.99) and rich in negatively charged amino acid residues (Fig. 7a). Bioinformatics analysis of its sequence using the BindN server (http://www.web.archive.org/web/20060907042245/ bioinformatics.ksu.edu/bindn/) indicated that the Gln-Asn (QN) signature motif-bearing domain and other regions have the potential to interact with DNA and RNA (Additional file 1: Figure S8). In agreement with this prediction, the N-terminal two-thirds of this polypeptide showed sequence similarity to proteins interacting with nucleic acids (Additional file 1: Table S3). A 30-amino acid sequence located within the C-terminal half of the QNU (His 212 -3aa-His-10aa-Asn-9aa-His-3aa-His 241 in C. p. fossulana; KT740996) exhibits features of a nucleolytic domain of homing endonucleases of the HNH family [55]. This domain could potentially form a finger-like structure with a central Asn residue stabilized by a bivalent metal cation coordinating two of its His and/or Cys residues located closer to the C-terminus. Thus, QNU might have been directly involved in cox2 splitting, functioning as an endonuclease. Pairwise alignment of the sequences around the inserted DNA ends in C. p. fossulana cox2 revealed the presence of putative remnants of direct repeats (Additional file 1: Figure S9), suggesting that the insertion followed staggered cleavage of the mtDNA, resembling cleavage at a target DNA site generated by homing nucleases.
To further test the possible involvement of QNU in cox2 fission, its gene was subcloned in an expression vector in E. coli, and the purified recombinant QNU polypeptide (rQNU) was assayed for nuclease activity. Two plasmid constructs were prepared, one expressing intact rQNU and the other rΔQNU, without DNA-binding Gln-Asn repeats. In the double-stranded plasmid DNA degradation assay, rQNU, but not rΔQNU, exhibited weak endonuclease activity (Fig. 7a). This result supports, in particular, the role of the QN repeats in interaction of QNU with DNA, although the two recombinant QNU proteins were expressed in E. coli and thus differed from the native protein due to differences between genetic codes of invertebrate mitochondria and bacteria.
The other putative polypeptide, WFW (360 aa) (Fig. 7b), has been predicted to be hydrophobic (hydropathy value = 0.15). Interestingly, its deduced amino acid sequence not only exhibits a relatively high number of Cys residues, but they were interspersed with an unusually high number of Trp residues (Fig. 7b). Because of this unusual amino acid composition and lack of sequence similarity to known proteins, the three-dimensional structure and function of WFW cannot currently be predicted reliably, necessitating expression and empirical structural analyses.

Discussion
Screening of the fast-evolving mitogenomes of apocritan Hymenoptera for segmental inversions was instrumental in identifying a unique for animals fission of a canonical protein-coding gene, cox2, in a genus Campsomeris (Dielis) of Scoliidae. Cox2 was split by an equally unique insertion of 3-kb long cluster of multiple ORFs of unknown origin. This evolutionarily recent gene fission, found in the mtDNA of two studied Campsomeris species but not in Scolia of the same family or in related hymenopteran families, divided Campsomeris cox2 into two translated genes, cox2a and cox2b. Such a genomic arrangement has not been found for this gene in the mtDNA of any other organism (Fig. 8). COXIIA and COXIIB polypeptides apparently assemble into a functional COXII heterodimer in a process that may involve interactions in the mitochondrial intermembrane space of COXIIA termini with COXIIB and is likely assisted by other proteins of respiratory complex IV. Although the folding of Campsomeris COXIIA and COXIIB has been predicted to be similar to that of S. bicincta COXII, COXIIA and, to lesser degree, COXIIB polypeptides exhibit reduced hydrophobicity compared with the corresponding domains of the majority of intact COXII polypeptides. The reduction in hydrophobicity, especially Fig. 7 Putative polypeptides encoded by the cox2-splitting DNA insert in the C. p. fossulana mitogenome. a Western blot and ribbon diagram of the I-TASSER-modeled three-dimensional structure of the QNU (the larger of its two isoforms) polypeptide. The tertiary structure was predicted by combining de novo and locally applied template-based modeling (PDB templates for local structure predictions were: 1wOrA, 3iymA, 2ocwA, 1pclA, 3cm9S). Signature motif and regions with similarity to nucleic acid-interacting proteins (Nai) and the active site of HNH homing endonucleases (HNH) are indicated on the polypeptide linear model. The inset shows the nuclease activity assay of the recombinant QNU using plasmid DNA as substrate, analyzed by agarose gel electrophoresis. No plasmid degradation was observed in the absence of recombinant proteins (P mock). The addition of rQNU caused a decrease in both SC and C forms of the plasmid and smearing of the L form, indicating at least endonuclease activity of the recombinant QNU (+rQNU). Addition of rΔQNU had no effect on the level of any form of the plasmid, indicating the absence of nuclease activity (+rΔQNU) over a 2-h incubation at 37°C. Plasmid topology: SC, supercoil; L, linear; C, coil. Deletion of Gln-Asn (QN) repeats suppressed the nuclease activity of the rQNU polypeptide. b Western blot of the putative WFW polypeptide and deduced sequence of the repetitive signature motif of WFW that was predicted to adopt helical structure stabilized by Trp residues of the first transmembrane helix of COXII, has been shown to be essential for functional import into the mitochondria of COXII encoded in the nucleus [29,56], but it might also promote intramitochondrial transport of fragmented COXII expressed in the mitochondrial matrix. In particular, Oxa1 is required for the export of the first transmembrane helix of COXII, synthesized in the mitochondrial matrix, to the inner membrane [57]. Similarly, the export of nuclear genome-encoded COXII from the mitochondrial intermembrane space has been proposed to require anchoring of the polypeptide in the inner membrane through its second transmembrane helix and reinsertion of the first helix, which temporarily entered the mitochondrial matrix, depending on Oxa1 [58]. Alternatively, a general decrease in hydrophobicity, especially of COXIIA compared with the N-terminal half of intact COXII, might have evolved to compensate for the original increase in COXIIA hydrophobicity caused by its split from the more hydrophilic C-terminal half of COXII.
The other characteristic of split COXII, namely the increase in Cys content in COXIIB, might facilitate the export of COXIIB to the intermembrane space by inner membrane translocases and chaperones [57] or its interactions with other components of the respiratory complex IV. Moreover, Cys residues might become reversibly oxidized to intra and interpeptide disulfides by, for instance, the intermembrane space MIA pathway [57] to regulate COXII complex assembly and activity in a redox-dependent manner [59,60].
The 3-kb DNA fragment dividing Campsomeris cox2 includes several ORFs that are expressed as polyadenylated mRNAs. Four of the ORFs have orthologs in both Campsomeris species used in these studies. One of the ORFs, qnu, was shown herein to potentially encode a nuclease. The putative polypeptide QNU contains a nucleic acid-binding domain and an HNH-like domain that is present in HNH-class homing endonucleases and may have been directly involved in mediating the split of cox2, as the recombinant rQNU exhibited endonucleolytic activity. The presence of remnants of direct repeats flanking the inserted DNA segment further suggested involvement of a homing nuclease in cox2 fragmentation. Similarly, a homing nuclease encoded by a group I intron located within the cox1 gene of a basal metazoan, Metridium (Cnidaria), was reported to be responsible for genic insertion of the intron [61]. In addition, in vivo experiments in yeast showed that endonuclease-encoding introns ensured their own propagation [62]. Examples of non-mitochondrial gene fission caused by insertion of a gene for free-standing homing nuclease mediating fission include split gene of the B-type DNApol of Methanobacterium [63] or fragmented nrdA gene of Aeromonas phage Aeh1 [64]. Alternatively, Campsomeris cox2 fission might be primarily caused by insertion of other DNA element that provided an integration site for the insertion of 3-kb gene cluster. However, this scenario seems less likely due to the lack of known cases in animals of cox2 splitting by intervening sequences other than the Campsomeris cases reported Fig. 8 Augmented compilation of the split cox2 arrangement and its subcellular localization through phylogeny. In the vast majority of eukaryotes, cox2 is intact and resides in the mtDNA. In wasps Campsomeris, cox2 is split into complementary cox2a and cox2b genes that reside in the mtDNA. In the chlorophycean algae Scenedesmus, Podohedriella, Neochloris, cox2 is also split, but cox2b had been transferred to the nucleus and lost from the mtDNA. In the chlorophycean algae Chlamydomonas, Polytomella, Volvox, Haematococcus, and in apicomplexan parasites, dinoflagellates, and Perkinsus, cox2 is split and both cox2a and cox2b have been relocated independently of one another to the nuclear genome and lost from the mtDNA herein. The implications of the continuing expression of QNU nuclease in the mitochondrial matrix are unknown. The activity of native QNU remains to be determined and might be residual or conditionally induced in vivo.
It is currently unclear whether copies of any portion of the Campsomeris cox2 genes or their 3-kb insert have been transferred to the nuclear genome. To date, no heteroplasmy has been detected for Campsomeris cox2a, cox2b and new ORFs. However, based on the high levels of some of the transcripts, it cannot be ruled out that the expressed copies, especially of cox2b and some inserted ORFs, also reside in the nuclear genome. In some legumes (Angiospermae, Fabaceae), not only do mitochondrial and nuclear copies of cox2 exist, but in Dumasia and a few other genera (mostly Phaseoleae), they are transcribed simultaneously from both genomes [65].

Conclusion
The discovery of functional fission of cox2 in the mtDNA of Campsomeris highlights the dynamics of mitogenome evolution in Hymenoptera. As a very distinctive character, cox2 fission can be used to clarify phylogenetic relationships within and among subfamilies of Scoliidae. Importantly, it also raises more general questions concerning the evolution of metazoan mitogenomes and their REDOX systems. Split COXII and the increased number of Cys in COXIIB likely established an additional regulatory mechanism to control OXPHOS by linking COXII assembly and activity to varying levels of reactive oxygen species. Interestingly, the fission of cox2 occurred through the genic insertion of a relatively large DNA fragment, hence contrary to the general trend of metazoan mitogenome evolution towards a decrease in mtDNA size. The current function, if any, of the ORFs encoded by the cox2-splitting insert remains unknown, although four of them have been largely preserved between the two compared Campsomeris species. It seems possible that at least QNU, which is encoded by one of these ORFs, might have been involved in cox2 fission and insert integration into mtDNA, similarly to the role played by mobile element-encoded homing nucleases. Further structural and functional studies of the inserted ORFs might contribute to a better understanding of the mechanisms of insertional mitogenome modifications.

Specimens, isolation of mitochondria, and nucleic acid extraction
The hymenopteran species analyzed herein are listed in Table 1. Voucher specimens were deposited at Texas A&M University (College Station, TX). Intact mitochondria were isolated from thoracic muscles of C. p. fossulana using the Qproteome Mitochondria Isolation Kit (Qiagen, Frederick, MD). For DNA preparation, mitochondria or thoracic muscle tissue were lysed in SDS-containing buffer and digested with proteinase K. The lysates were treated with phenol/chloroform, and DNA was precipitated with isopropanol. RNA was extracted using the miRNeasy Mini Kit (Qiagen) and treated with DNaseI (Invitrogen, Carlsbad, CA).

Transcript analyses by RACE and RT-qPCR
The mitochondrial RNA was reverse-transcribed with the SuperScript III First Strand Synthesis System (Invitrogen). The cDNA ends were amplified using SMART-RACE cDNA Amplification Kit (Clontech, Mountain View, CA), cloned into pGEM-T vector (Promega, Madison, WI), and on average 10 clones for each end were sequenced. Primers for qPCR (Additional file 1: Table S4) were designed with PrimerQuest (http:// www.idtdna.com/Primerquest/). The cob gene was chosen as an internal control. Readings were normalized to C. p. fossulana cox1 for Campsomeris genes or D. melanogaster cox1 for Drosophila genes. Aside from cob and cox1, only transcripts of similarly oriented genes were converted to cDNAs together using transcript-specific qPCR primers. The qPCR was performed in triplicate using Power SYBR Green PCR Master Mix (Applied Biosystems, Warrington, UK) under the following conditions: incubation at 95°C for 10 min and 40 cycles of incubation at 95°C for 15 s and 60°C for 1 min. For relative quantification, the comparative C T method was used.
The recombinant plasmids were introduced into E. coli NiCo21(DE3) (New England Biolabs, Ipswich, MA). Bacteria were grown to the exponential phase, at which point the expression of recombinant proteins was induced with 1 mM IPTG at 30°C for 6 h. Upon harvesting, the cells were disrupted using xTractor Buffer (Clontech). Recombinant proteins were purified using a CapturemHis-Tagged Purification Kit (Clontech). For the nuclease activity assay, 20 ng of protein was incubated with 400 ng pGEM-derived plasmid in a 20 μl reaction mixture containing 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, and 2 mM MgCl 2 at 37°C for 2 h. The samples were then electrophoresed in 1% agarose gel with ethidium bromide and analyzed under UV light.

Three-dimensional polypeptide structure prediction
Polypeptide tertiary structures were predicted using the I-TASSER algorithm [69] included in the NovaFold software (DNAStar, Madison, WI). The I-TASSER procedure involves multiple threading attempts to match the query and template sequences and ab initio folding utilizing the physical characteristics of the query sequence and simulations. Visualization of the polypeptide structures was performed using Lasergen Protean 3D (DNAStar).

Additional file
Additional file 1: Figure S1. Schematic representation of a PCR-based screen for segmental inversions in mtDNA. Figure S2. Alignment of COXII sequences in the region corresponding to the C-and N-termini of Campsomeris COXIIA and COXIIB, respectively. Figure S3. C. p. fossulana cox2a gene. Figure S4. C. p. fossulana cox2b gene. Figure S5. Amino acid residue content in COXII of C. p. fossulana, S. bicincta, and A. mellifera. Figure S6. A + T content along the C. p. fossulana mtDNA. Figure S7. Comparison of QNU and WFW orthologous polypeptides from two Campsomeris species. Figure S8. In silico-determined nucleic acid-binding potential of the C. p. fossulana QNU polypeptide. Figure S9. Alignment of the Campsomeris mtDNA sequences around the cox2 split site. Table  S1. Cys residue content of the COXII intermembrane space domain in canonical and modified COXII polypeptides. Table S2. Relative synonymous codon usage (RSCU) by mitochondrial genes/ORFs of C. p. fossulana. Table  S3. Amino acid sequence similarities between C. p. fossulana polypeptide QNU and nucleic acid-interacting proteins. Table S4. Primers used for RT-qPCR. (PDF 1205 kb) Abbreviations aa: amino acid; COX: Cytochrome oxidase; GRAVY: Grand average of hydropathy; mtDNA: mitochondrial genome; NATs: Natural sense antisense transcripts; NUMTS: Nuclear genome-encoded mitochondrial DNA sequences; ORF: Open reading frame; OXPHOS: Oxidative phosphorylation; PDB: Protein data bank; QNU: Gln-Asn repeat-containing nuclease; RACE: Rapid amplification of cDNA ends; WFW: Trp-Phe-Trp repeat-containing putative polypeptide