New view on the organization and evolution of Palaeognathae mitogenomes poses the question on the ancestral gene rearrangement in Aves

Background Bird mitogenomes differ from other vertebrates in gene rearrangement. The most common avian gene order, identified first in Gallus gallus, is considered ancestral for all Aves. However, other rearrangements including a duplicated control region and neighboring genes have been reported in many representatives of avian orders. The repeated regions can be easily overlooked due to inappropriate DNA amplification or genome sequencing. This raises a question about the actual prevalence of mitogenomic duplications and the validity of the current view on the avian mitogenome evolution. In this context, Palaeognathae is especially interesting because is sister to all other living birds, i.e. Neognathae. So far, a unique duplicated region has been found in one palaeognath mitogenome, that of Eudromia elegans. Results Therefore, we applied an appropriate PCR strategy to look for omitted duplications in other palaeognaths. The analyses revealed the duplicated control regions with adjacent genes in Crypturellus, Rhea and Struthio as well as ND6 pseudogene in three moas. The copies are very similar and were subjected to concerted evolution. Mapping the presence and absence of duplication onto the Palaeognathae phylogeny indicates that the duplication was an ancestral state for this avian group. This feature was inherited by early diverged lineages and lost two times in others. Comparison of incongruent phylogenetic trees based on mitochondrial and nuclear sequences showed that two variants of mitogenomes could exist in the evolution of palaeognaths. Data collected for other avian mitogenomes revealed that the last common ancestor of all birds and early diverging lineages of Neoaves could also possess the mitogenomic duplication. Conclusions The duplicated control regions with adjacent genes are more common in avian mitochondrial genomes than it was previously thought. These two regions could increase effectiveness of replication and transcription as well as the number of replicating mitogenomes per organelle. In consequence, energy production by mitochondria may be also more efficient. However, further physiological and molecular analyses are necessary to assess the potential selective advantages of the mitogenome duplications. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07284-5.


Background
Animal mitochondrial genomes are characterized by compact organization and almost invariable gene content, so any changes in them are especially interesting because they can be associated with major transitions in animal evolution [1,2]. The first fully sequenced avian mitogenome from chicken Gallus gallus [3] turned out to contain single versions of 37 genes and one control region (CR) as in most other vertebrates, but organized in a different order (Fig. 1). This rearrangement is believed to have derived from the typical vertebrate gene order by a single tandem duplication of the fragment located between ND5 and tRNA-Phe genes followed by random losses of one copy of duplicated items. Due to the prevalence of the Gallus gallus gene order in other birds, this rearrangement is generally believed to be an ancestral state for all Aves. In consequence, it is called common, standard or typical.
It has been commonly assumed that the mitogenomic duplications are derived states and occurred independently in many Psittaciformes and Passeriformes lineages [13,22,[29][30][31]. However, an independent origin of identical gene orders in different avian lineages seems unlikely because of the great number of possible arrangements [32][33][34][35]. More probable seems that the last common ancestor of many avian groups had a duplicated region. This feature was shown for Psittaciformes [23] and could be true for Accipitriformes [4,5,[36][37][38], Falconiformes [4,39], Gruidae [12] and Pelecaniformes [4,15,16], because all or almost all members of these groups contain mitogenomes with the duplicated regions. What is more, Mackiewicz, et al. [14] showed that even the last common ancestor of a larger monophyletic group of Aves including Psittaciformes, Passeriformes and Falconiformes could have had a duplication of the control region with adjacent genes in the mitochondrial genome.
The lack of duplication in some fully sequenced mitogenomes may be false and result from omission of identical repeats due to an inappropriate PCR strategy, insufficient sequencing methods or incorrect genome assembly. This problem was already addressed by Gibb, et al. [4], who found the fully duplicated gene order in Thalassarche melanophris mitogenome, which had been previously annotated without the duplication [40]. Similarly, two other mitogenomes of Notiomystis cincta and Turdus philomelos showed a novel duplicated gene order after a re-analysis [41], although previously the single version had been reported [42]. All re-amplified and resequenced crane mitogenomes also revealed the existence of duplication [12], which had not been found earlier [43]. Omitted duplications were also found within the mitochondrial genomes of Strigopoidea and Cacatuoidea, demonstrating that the ancestral parrot contained duplication in its mitogenome [23].
The growing number of formerly unidentified duplications implies that many avian mitogenomes published so far without duplication may, in fact, have it. Therefore, a diligent search for potential duplications is crucial in understanding the evolution of the avian mitogenome. Palaeognathae are particularly important to this subject because all comprehensive avian phylogenies have placed them as the sister group to the rest of birds, called Neognathae [44][45][46][47][48]. Palaeognaths comprise 25 genera and 82 species [49,50], which are currently grouped into three extinct and five extant orders: the flighted Lithornithiformes known from Paleocene and Eocene of North America and Europe, and possibly from the Late Cretaceous; the flighted tinamous (Tinamiformes) from South and Central America; the flightless ratites containing the recently extinct New Zealand moas (Dinornithiformes) and Madagascan elephant birds (Aepyornithiformes) as well as the extant African ostrich (Struthioniformes), South American rheas (Rheiformes), Australian emu and Australasian cassowaries (Casuariiformes), and New Zealand kiwi (Apterygiformes). Phylogenetic relationships between these groups have been controversial. Molecular analyses have revealed that the ratites are paraphyletic and suggested that flightlessness evolved several times among ratites independently [51][52][53][54][55][56][57][58][59][60].
So far, a duplicated region (cytb/tRNA-Thr/tRNA-Pro/ CR1/ND6/tRNA-Glu/CR2) has been found only in one representative of palaeognaths, namely Eudromia elegans [26]. This rearrangement has not been identified in any other avian species. Other Palaeognathae mitogenomes have a typical single avian gene order or were published as incomplete, especially in the part adjacent to the control region [26]. However, it cannot be ruled out that an inadequate PCR strategy was unable to amplify identical repeats or even prevented the completion of the mitogenome sequencing and assembly due to the presence of repeats [61]. Therefore, we applied another PCR strategy that allows the amplification of the fragment between two control regions including a potentially omitted duplication in representatives of Struthio, Rhea, Casuarius, Dromaius and Crypturellus. The new data help to elucidate the evolution of the Palaeognathae mitogenome in terms of duplication events, and also have implications for mitogenomic evolution in Aves as a whole.

Results and discussion
Duplicated gene order identified in mitogenomes of analyzed Palaeognathae taxa Using an appropriate PCR strategy (Fig. 2), the diagnostic fragments ranges from the first (CR1) and the second control regions (CR2) were obtained for Struthio camelus ( Fig. S1a in Additional file 1), Rhea pennata (Fig. S1b in Additional file 1), Rhea americana ( Fig. S1c in Additional file 1) and Crypturellus tataupa (Fig. S1d in Additional file 1). Only two out of 16 or 48 reactions failed in the taxa for which species-specific primers were designed based on the previously published sequences of complete mitogenomes (Struthio camelus and Rhea species) ( Table S1 in Additional file 2). In the case of Crypturellus tataupa, amplicons were obtained only for six out of 12 tested reactions. This was caused by the fact that primers dedicated for this species were designed on the sequence of more distant mitogenome from Eudromia elegans [26]. Similar to the published Crypturellus tataupa genomic sequence [62], the control region and adjacent genes were missing. Sequencing and annotation of the produced amplicons revealed the presence of tRNA-Pro/ND6/tRNA-Glu fragments between two control regions for Struthio camelus, Rhea pennata, Rhea americana and Crypturellus tataupa (Fig. 1). The duplicated fragment obtained for Struthio camelus differed only in one nucleotide from the homologous region in the previously published mitogenome (Fig. S2a in Additional file 1). These fragments in rheas showed 100% identity with corresponding homologous regions ( Fig. S2b and Fig. S2c in Additional file 1).
Although the high identity strongly indicates a mitochondrial origin of the amplified CR1/CR2 fragments, additional diagnostic reactions were designed to exclude a possibility of nuclear mitochondrial DNA inserts (NUMTs) amplification. Based on the obtained sequences of ND6 genes, appropriate primers were created to amplify ND6-1/ND6-2 regions. Sequencing of the amplified PCR products revealed the ND6/tRNA-Glu/ CR/tRNA-Pro/ND6 gene order for all analyzed species. The corresponding CR/tRNA-Pro/ND6 regions overlapped the appropriate CR1/CR2 diagnostic fragments and showed 100% identity. Additional PCR reactions (see Methods and Fig. 2) were run to complete the missing parts of CRs and to reveal the order of genes preceding the first control region. Finally, the complete mitogenomic fragments containing the duplicated regions were obtained by assembling four overlapping fragments (Fig. 2 Fig. S3 in Additional file 1), which was previously annotated only for two Passeriformes species, Notiomystis cincta and Turdus philomelos [41]. This gene rearrangement differs from the most complete known avian duplication (GO-FD; Fig. 1d) in the lack of the second copies of cytb and tRNA-Thr genes, expected between CR1 and tRNA-Pro2 gene. The presence of identical copies of tRNA-Glu gene ( Fig. S2a-d in Additional file 1) enabled us to position precisely the 5′ ends of both control regions. The 3′ ends of CR2s precede tRNA-Phe genes as in all other gene orders including two potentially functional control regions. The number of nucleotides between the tRNA-Glu copies and appropriate poly-C sequences located at the 5′ ends of CRs vary from 2 bp (Rhea americana, Rhea pennata and Crypturellus tataupa) to 26 bp for Struthio camelus (Table S2 in Additional file 2). The CR2 in Rhea pennata and Crypturellus tataupa is longer than CR1, which obey the rule observed in 13 crane species [12]. The tandem duplications found in the Crypturellus tataupa (c) mitogenomes. Lgene for tRNA-Leu, ND5gene for NADH dehydrogenase subunit 5, cytbgene for cytochrome b, Tgene for tRNA-Thr, Pgene for tRNA-Pro, ND6gene for NADH dehydrogenase subunit 6, Egene for tRNA-Glu, CRcontrol region, Fgene for tRNA-Phe, 12Sgene for 12S rRNA, Vgene for tRNA-Val, 16Sgene for 16S rRNA. L-F, ND5-F, CR-R, ND6-F, ND6-R, D-F, D-R, CR-F, 12S-R, 16S-R: primers that were used for amplification of four overlapping mitogenomic fragments mitogenomes of Struthio camelus, Rhea americana, Rhea pennata and Crypturellus tataupa make them longer compared with their previous genomic versions assuming the typical avian gene order.
Probable presence of mitochondrial CR1/CR2 fragments in Casuarius casuarius and Dromaius novaehollandiae nuclear genomes In the case of two other Palaeognathae species, Casuarius casuarius and Dromaius novaehollandiae, an attempt to amplify the CR1/CR2 fragment was also made. Similar to other taxa, species-specific D-F and D-R primers ( Fig.  2; Table S1 in Additional file 2) were designed using the sequences of previously published complete mitogenomes (AF338713.2 and AF338711.1). In contrast to the results obtained for the other Palaeognathae species, most PCR reactions failed to amplify the expected fragments. In Dromaius novaehollandiae, amplicons were obtained only for 3 out of 25 tested reactions (Fig. S4a in Additional file 1, Table S1 in Additional file 2). Analogously, PCR products were obtained only for 4 out of 56 reactions for Casuarius casuarius (Fig. S4b in Additional Table 1 Avian species analyzed in this study in terms of duplicated regions as well as gene orders found within their mitogenomic fragments, which were amplified and sequenced. The sequences are presented in Fig. S3 and S10 L gene for tRNA-Leu, ND5 Gene for NADH dehydrogenase subunit 5, cytb Gene for cytochrome b, T Gene for tRNA-Thr, P Gene for tRNA-Pro, ND6 Gene for NADH dehydrogenase subunit 6, E Gene for tRNA-Glu, CR Control region, rCR Remnant control region, F Gene for tRNA-Phe, 12S Gene for 12S rRNA, V Gene for tRNA-Val, 16S Gene for 16S rRNA file 1, Table S1 in Additional file 2). Moreover, single DNA fragments were not produced for any of these seven reactions, although different annealing temperatures were applied ( Fig. S4 in Additional file 1). Taking into account the heterogeneity of the obtained DNA fragments as well as the fact that most of the tested reactions failed, we can conclude that the PCR products presented in Fig. S4 in Additional file 1 were not amplified on the mitochondrial genome template. The D-F and D-R primers as well as the applied PCRs are highly specific and diagnostic for the presence of CR duplication in parrots [23], cranes [12] as well as black-browed albatross, ivory-billed aracari and osprey [4]. Therefore, the seven positive amplicons most likely represent mitochondrial DNA fragments located in the nuclear genomes, i.e. NUMTs. It means that Casuarius casuarius and Dromaius novaehollandiae or their ancestors had mitogenomes comprising two control regions, which were transferred into the nucleus during evolution.

Reannotation of Eudromia elegans mitochondrial gene order
The GO-I gene order ( Fig. 1) found in this study for four Palaeognathae taxa differs from that in the published mitogenomic sequence of Eudromia elegans [26]. This rearrangement appears to be a degenerated form of GO-I as it lacks the first copy of ND6 and tRNA-Glu genes as well as the second copy of tRNA-Pro gene. This fact prompted us to search for a potential tRNA-Pro pseudogene hidden within the last 122 nucleotides of the first control region of Eudromia elegans mitogenome. In fact, the comparison of CR1 sequence with the potentially functional tRNA-Pro sequence of this species revealed a significant similarity (E-value = 1.2·10 − 6 ; 81% identity without gaps and 64% including gaps) between these sequences along the 84-bp alignment ( Fig. S5a in Additional file 1), which suggests the presence of the tRNA-Pro pseudogene in the Eudromia mitogenome in the position between 16,272 bp and 16,349 bp. After reannotation of this pseudogene, the length of CR1 reduced to 1352 bp. The newly annotated Eudromia gene order was defined as GO-P1 in Fig. 1e.

Reannotation of mitochondrial gene order in the mitogenomes of Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus
Our analysis of 5′ spacers, i.e. fragments of control regions located between the tRNA-Glu gene and poly-C motif, revealed that they are much longer in annotated Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus mitogenomes than in other Palaeognathae species. These spacers of the most Palaeognathae taxa are from 2 bp to 33 bp in length ( Assuming the presence of ND6 pseudogenes in Anomalopteryx didiformis, Dinornis giganteus and Emeus crassus mitogenomes, the length of their CR is reduced to 1347 bp, 1360 bp and 1346 bp, respectively. The CR sequences show 71-81% identity at 5′ spacers on the length 165 bp (Fig. S5b in Additional file 1). The new avian gene order present in these reannotated mitogenomes is indicated as GO-P2 in Fig. 1e.

Comparison of the duplicated regions of Palaeognathae mitogenomes
The GO-I gene order found in four Palaeognathae species ( Fig. 1, Table 2) is characterized generally by a high similarity between paralogous sequences, i.e. copies found within the same mitogenome. The second copies of tRNA-Pro, ND6 and tRNA-Glu are identical with the first ones in the case of Struthio camelus, Rhea americana and Rhea pennata ( Table 3). The second copy of tRNA-Glu is also identical with the first one in Crypturellus tataupa mitogenome. However, the first copies of tRNA-Pro and ND6 genes of this species differ from their paralogous sequences in three nucleotides (Table  3). Two control regions of analyzed species show a slightly greater variation in identity, from 94.4% (Rhea pennata) to 97.8% (Crypturellus tataupa). The difference is mainly located at their 3′ ends, except for Rhea taxa, whose control regions differ also at their 5′ ends (Fig. S2 in Additional file 1). The high similarity of duplicated regions indicates that they evolved in concert, which homogenized their sequences as found in many other avian groups [4,6,14,23,25,28,30,[63][64][65][66][67][68][69][70].
In contrast to GO-I gene order, the newly defined rearrangement GO-P1 in Eudromia elegans is characterized by single versions of ND6 and tRNA-Glu gene ( Fig.   1). Moreover, the second copy of tRNA-Pro is a pseudogene, which has substantially diverged from its full version ( Fig. S5a in Additional file 1). Therefore, it seems that the GO-P1 rearrangement is a degenerated form of GO-I, in which two genes were removed and one gene was pseudogenized. Surprisingly, despite the high degree of degeneration in comparison with other analyzed  Palaeognathae species, two control regions of Eudromia elegans maintain the highest sequence identity (Table 3), although the alignment of these regions clearly shows the presence of several deletions/insertions ( Fig. S6 in Additional file 1). The comparison of paralogous control regions in Palaeognathae revealed that CR2s are much longer only in two species, i.e. Rhea pennata and Crypturellus tataupa (Table 3). Such a difference in the length of CRs seems to be a rule in most avian mitogenomes with a duplicated region [23]. Interestingly, CRs in Rhea americana are identical in length, while those in Struthio camelus and Eudromia elegans differ only in one and two nucleotides, respectively (Table 3).

Phylogenetic relationships within Palaeognathae based on mitogenomes
Three phylogenetic methods applied for the mitogenomic sequences resulted in a consistent topology (Fig. 3 In order to eliminate a potential artefact related with the compositional heterogeneity in the third codon positions of protein-coding genes, we created phylogenetic trees based on the RY-coding alignment (Fig. 4). The tree topology produced by the three methods was the same as that for the uncoded alignment. The posterior Moreover, we performed phylogenetic analyses based on ten alignments, from which we sequentially excluded partitions characterized by the highest substitution rate (Table S4 in Additional file 2). The calculations produced in total 16 topologies, out of which five are worthy of mention because they were obtained by many independent approaches (Fig. 5). The topology t1 was identical with that based on the alignments including all sites and demonstrated rheas as sister to all other nonostrich palaeognaths. Such a tree was produced by MrBayes, PhyloBayes and IQ-TREE using the alignment without sites characterized by the highest substitution rate, as well as by MrBayes and IQ-TREE using the alignment after removing sites with two highest rate categories. The posterior probabilities for the clade including palaeognaths other than ostrich and rheas were very high in MrBayes, i.e. 1 and 0.98, respectively, or moderate, i.e. 0.87 in PhyloBayes. In the topology t2, the Rhea clade was grouped with Casuariiformes + Apterygiformes. However, the support of this grouping was very weak and occurred only in MrBayes tree and IQ-TREE consensus bootstrap tree based on the alignments without seven and eight highest rate categories, respectively. A greater Bayesian support (0.95-0.97) was obtained by the node encompassing rheas with Casuariiformes in the topology t3 based on the alignments after removing three, four and five highest rate categories. This topology was also produced in MrBayes using the alignment without eight highest rate categories and in IQ-TREE for the alignments without four, five and six highest rate categories. However, the node support was generally weak. The topology t4 was produced only by PhyloBayes for the alignments without two, three, four, five, seven and eight highest rate categories. As in the topology t1, the Rhea clade was also sister to all other palaeognaths excluding Struthio, but Casuariiformes were clustered with the rest non-ostrich palaeognaths, not directly with Aepyornithiformes and Apterygiformes. The posterior probability values of the clade including palaeognaths sister to rheas did not exceed 0.8. The topology t5 differed from the others because Struthio camelus was placed within other Palaeognathae and the external position was occupied by Dinornithiformes + Tinamiformes, whereas Rhea was grouped with Casuariiformes. This topology was obtained for the alignments without three (in IQ-TREE) and six highest rate categories (in MrBayes and IQ-TREE). Nevertheless, the controversial nodes were poorly supported.
Removing the sites with the highest substitution rate eliminated the alignment positions that were saturated with substitutions, but the number of parsimony informative sites decreased, too (Fig. S7a). Therefore, the stochastic error could increase for the short alignments and the inferred phylogenetic relationships could be unreliable. After elimination of sites with two highest rate categories, the mean phylogenetic distance in the MrBayes tree decreased abruptly from 0.94 to 0.33 substitutions per site and the maximum distance in the tree dropped from 1.99 to 0.69 substitutions per site (Fig. S7a). The sharp decrease was also visible in the number of informative sites, which constituted 56% of those in the original alignment. However, the sisterhood of rheas to other non-ostrich palaeognaths was still present in the trees based on the purged alignments and the latter group was relatively highly supported (Fig. S7b). After removing sites with at least three highest rate categories, the alignment was deprived of more than half of informative sites and alternative topologies were favored, though with smaller support values (Fig. S7b).
Among the applied topology tests, the BIC approximation produced all Bayesian posterior probabilities for the alternative topologies much smaller than 0.05 indicating a strong rejection of the tested alternatives in favor of topology t1 (Table S5 in Additional file 2). Moreover, the topology t4 performed significantly worse than t1 in two bootstrap tests, whereas the bootstrap probabilities for the topology t2 were 0.063, i.e. very close to the 0.05 threshold. Other tests did not reject the alternative topologies. However, Bayes factor was greater than 9 indicating an overwhelming support for the topology t1 because the commonly assumed threshold for such interpretation is 5 [71].

Comparison of Palaeognathae tree topologies
All the phylogenetic analyses imply that the relationships presented in the topology t1 describe the most probable evolutionary history between the mitochondrial genomes of palaeognaths. Such relationships, but not always on the full taxa set, were also obtained in other studies based on mitochondrial genes [55,56], selected nuclear genes [48,54,57], the joined set of nuclear and mitochondrial genes [46,52,58] as well as the concatenated alignments of many nuclear markers [45,59,60]. However, the application of a coalescent species tree approach on these markers and the analysis of retroelement distribution indicated a closer relationship between rheas and the clade of Casuariiformes + Apterygiformes [45,59,60]. This phylogeny was also generated for selected nuclear genes [53] and in a supertree approach [47]. These relationships are presented in the topology t2 but are, however, insignificant for the The most frequent tree topologies obtained in the phylogenetic analyses of mitochondrial gene alignments. Partitions characterized by the highest substitution rate were sequentially excluded from the alignment. The values at nodes indicate support values received for various partitions in different approaches. The approaches' names were marked with the letter: MrBayes with M, PhyloBayes with P, SH-aLRT in IQ-TREE with T and non-parametric bootstrap in IQ-TREE with B. The digits after these letters indicate the number of the highest rate partitions removed from the analysis mitochondrial gene set. An alternative, poorly supported topology, in which Rhea clustered with Tinamiformes or Dinornithiformes + Tinamiformes, was also received for some nuclear genes [54,57].
Thus, two topologies, t1 and t2, aspire to be the real species tree but it is not easy to evaluate which one is true. Although t1 was found in many studies based on concatenated alignments of many markers, it has been criticized as a true species tree in favor of t2, which was obtained in coalescent-based approaches also using huge data sets [59,60]. The selection of one out these two topologies is complicated by the fact that the topology t2 is supported by 229 loci and is only the 7th in the ranking of the most common gene tree topologies, whereas the topology t1 is the 2nd, supported by 280 loci [59]. The largest number of markers, i.e. 357, indicated another topology, in which Rheiformes was grouped with the clade of Dinornithiformes + Tinamiformes. This discrepancy was explained by the existence of an empirical anomaly zone resulting from incomplete lineage sorting (ILS) across short internal branches leading to the last common ancestor of Rheiformes, Apterygiformes and Casuariiformes [59,60].
Although coalescent species trees account for ILS, simulations showed that species tree methods based on the gene tree summation may not provide significantly better resilts than concatenation alignment methods, which can perform even better [72]. Prum, et al. [45], who applied extensive avian taxon sampling and loci with slow substitution rates, found no single locus that would be able to fully resolve the tree topology. They concluded that this lack of phylogenetic information can challenge the accuracy of a coalescent-based summary approach relative to concatenation. The multispecies coalescent models work under some conditions [73]. For example, they assume that incongruent gene trees are independently generated from a coalescence process occurring along the species tree and there is no selection on the studied genetic markers. Because Cloutier, et al. [60] and Sackton, et al. [59] analyzed conserved and ultraconserved noncoding nuclear elements, we cannot exclude that they are involved in essential regulatory functions and then are subjected to selection [74][75][76], which could influence the model assumptions.
Generally, nuclear markers are more prone to ILS and hidden gene paralogy [77][78][79][80][81][82][83][84] than mitochondrial genes, which are present in a haploid genome and maternally inherited [85]. Thus, the time needed to completely sort the ancestral polymorphism of mitochondrial DNA is on average four times smaller than for nuclear genes [86]. Introgression of mtDNA is another reason for the discrepancy between the gene and species trees. However, this process concerning maternally inherited mtDNA is restricted between heterogametic avian species because female hybrids are characterized by reduced viability [87][88][89][90][91][92][93]. In agreement with that, a survey of causes of mtDNA gene tree paraphyly in birds found that 8% of studied species had paraphyly attributable to incorrect taxonomy, and only 3% because of ILS or hybridization, 2% on account of ILS and 1% due to introgressive hybridization [94]. Moreover, the mitochondrial genes are located on one molecule and are inherited together [82,95], so they should bear a consistent phylogenetic signal. Accordingly, analyses based on complete mitogenomes have provided well-resolved phylogenies of various avian groups [9,14,[40][41][42][96][97][98][99]. Nevertheless, it is not inconceivable that the ILS effect can influence mtDNA in rapidly radiating taxa, in which on-going speciation occurs before genetic sorting [100].

Distribution of mitogenomic rearrangements in phylogenetic trees of Palaeognathae
We considered both t1 and t2 topologies to analyze the presence and absence of the mitogenomic duplication in the phylogenetic context. Using these relationships, we mapped the mitogenomic features onto these topologies and inferred ancestral states for the individual lineages using maximum parsimony (MP) and maximum likelihood (ML) methods (Fig. 6).
Two methods applied to the topology t1 clearly indicated that the last common ancestor of palaeognaths contained a duplicated region in its mitogenome (Fig. 6  a and b). This state was inherited by the ostrich and rhea lineages. The ML method provided the probability P > 0.982 of this state for the last common ancestors of all palaeognaths and non-ostrich palaeognaths. The last common ancestor of the remaining groups, i.e. Dinornithiformes, Tinamiformes, Casuariiformes, Aepyornithiformes and Apterygiformes, could also contain a duplication with P = 0.899. However, the last common ancestor of Casuariiformes, Aepyornithiformes and Apterygiformes lost this duplication (P = 0.914). In turn, the last common ancestor of Dinornithiformes and Tinamiformes still had the duplicated region (P = 0.983), which was probably lost in Tinamus, whereas Anomalopteryx, Dinornis and Emeus maintained only a pseudogenized ND6.
According to the topology t2, the last common ancestors of all palaeognaths and non-ostrich palaeognaths also had the duplication in their mitogenomes with the probability of at least 0.982 (Fig. 6 c and d). The duplication was preserved in the last common ancestor of Dinornithiformes and Tinamiformes (P = 0.996) as well as the last common ancestor of Rheiformes, Casuariiformes, Aepyornithiformes and Apterygiformes (P = 0.906). Among four latter orders, only Rheiformes maintained the duplication, whereas the last common Fig. 6 Reconstruction of ancestral states and mapping of mitogenomic duplications onto the Palaeognathae trees. Two methods, maximum parsimony (a, c) and maximum likelihood (b, d) were applied for two tree topologies: t1 based on sequences of mitochondrial genes (a, b) and t2 supported by nuclear markers in coalescent-based approaches (c, d). The area of colors at nodes in b and d corresponds to the probability of the given state, single or duplicated region. The probability value for a more likely state was also given at these nodes. Mk1 model was applied for ML approach ancestor of the other orders had a mitogenome without the duplicated region (P = 0.914).
One could argue that the traces of ND6 pseudogene in Anomalopteryx, Dinornis and Emeus are equivocal. Therefore, we conducted analyses in which we assumed that mitogenomes of these genera had already lost the duplication (Fig. S8 in Additional file 1). However, the general conclusion that the last common ancestor of all Palaeognathae had a mitogenome with a duplicated region did not change. The probability of this state was 0.916 and 0.829 for topology t1 and t2, respectively. The last common ancestor of non-ostrich palaeognaths also had this feature with P = 0.889 and 0.697 for these topologies, respectively.
The above-mentioned scenarios of palaeognath mitogenome evolution assume that the mitochondrial tree represents the true species tree. However, as discussed in the previous section, the topology t1 is highly supported by mitochondrial data, whereas t2 is backed up by nuclear markers in coalescent-based approaches and is regarded as the true species tree by some authors [59,60]. Assuming that the t1 presents the real mitogenomic history, we superimposed the mitogenome phylogeny onto the potential species tree (Fig. 7). In order to reconcile the alternative positions of rheas in these topologies, we should assume that there existed heteroplasmy, i.e. at least two types of mitochondrial genomes, in the last common ancestor of non-ostrich palaeognaths. Both mitogenomes probably initially contained a duplication, as indicated by the inferred ancestral states. The lineages of these mitogenomes are marked in Fig. 7 as 1 and 2. Dinornithiformes and Tinamiformes inherited only mitogenome 2, whose duplicated regions had begun to fade and likely disappeared in Tinamus. However, the common ancestor of Rheiformes, Casuariiformes, Aepyornithiformes and Apterygiformes preserved two mitogenomes, but genome 2 lost the duplicated region during the course of evolution. Then, the mitochondrial lineages were segregated: mitogenome 1 with the duplication was left in rheas, whereas mitogenome 2 without Fig. 7 Superposition of the mitogenome phylogeny (thin colored lines) onto the potential species tree of Palaeognathae (grey thick branches). Two mitochondrial lineages were labelled as 1 and 2. Lineages with and without duplication were indicated in different colors. The tonal transition from orange to green indicates gradual disappearing of the duplication the duplication was passed to Casuariiformes, Aepyornithiformes and Apterygiformes.
The assumption of heteroplasmy is one of possible explanations. Alternatively, we can assume an introgression of mitochondrial DNA with a duplication to rheas from an extinct lineage that diverged after Struthioniformes lineage and before the clade of Dinornithiformes + Tinamiformes. However, this seems unlikely because the birds involved in this introgression must have had a long evolutionary history after the separation of their lineages, at least 9 up to 3.6 million years according to Kimball, et al. [47]. Thus, they would probably represent a separate species, which would protect them from hybridization. We would also have to assume that both the donor and the acceptor of mtDNA, i.e. rheas were still evolving in the same region, e.g. South America.

Role of duplication in the evolution of palaeognaths
It was reported that two duplicated control regions can lead to a more effective initiation of replication or transcription and a greater number of replicating mitogenomes per organelle, which may increase energy production by mitochondria [65,66,[110][111][112]. In support of this, it was found that parrots keeping two copies of control region in their mitogenomes show morphological features related to more energy-consuming active flight [23]. Following this finding, we can infer that the last common ancestor of palaeognaths was volant because mapping of mitogenomic features onto the phylogenetic trees showed that the ancestor most likely contained the duplicated regions (Fig. 6). This finding corresponds well to recently proposed scenarios for the evolutionary history of Palaeognathae, which may have originated in the Late Cretaceous in the Northern Hemisphere [58]. Around the Cretaceous-Paleogene boundary, they may have migrated to the Southern Hemisphere, where they dispersed widely and diversified. They extended their distribution by long-distance overseas dispersal to the Gondwana-derived landmasses, such as New Zealand and Madagascar [58]. These longdistance dispersals must have been accompanied by very good adaptations to active flight. However, modern palaeognath species are flightless or at most poor flyers, i.e. tinamous. Therefore, the maintenance of two control regions in some palaeognaths may be the legacy of their ancestors. Alternatively, these regions can still provide benefits even for flightless species during long-distance walking and running, which require a lot of energy.

Implications for mitogenome evolution in all Aves
The finding that a palaeognath ancestor contained a mitogenomic duplication challenges the common assumption that this feature evolved independently in individual lineages of Neognathae, i.e. the sister group of Palaeognathae [13,22,[29][30][31]. Increasing amount of data indicates that mitogenomes with a duplicated region are present in all or a vast majority of representatives of diverse Neognathae lineages, i.e. Accipitriformes [4,5,[36][37][38], Falconiformes [4,39,68], Gruiformes [12], Pelecaniformes [4,15,16], Psittaciformes [23] and Suliformes [25]. It suggests that ancestors of these groups could have also possessed a mitogenomic duplication. The presence of this state was recently inferred for the last common ancestor of three closely related orders, Falconiformes, Passeriformes and Psittaciformes [14]. Therefore, it is interesting to consider if this feature was present much earlier in the evolution of birds or even in the last common ancestor of all known Aves.
In order to obtain data for this analysis, we surveyed as well as annotated and reannotated all avian mitogenomic sequences available in GenBank to identify cases of duplication (Table S6 in Additional file 2; Fig. S9 in Additional file 1). Moreover, we conducted appropriate PCR reactions checking the presence of duplicated fragments in representatives of selected bird orders, which were poorly represented in the database ( Table 1). Most of PCR products were sequenced to determine the arrangement of genes located between two control regions ( Table S9 in Additional file 2). The thorough analysis revealed many duplicated regions that were previously omitted.

Searching for duplicated regions in avian mitogenomes
So far, only two representatives of the order Chardriiformes have been shown to have duplication in mitochondrial genomes: Calidris pugnax (GQ255993.1) and Turnix velox (MK453380.1). However, our annotation and reannotation of another mitogenomes from Alca torda (CM018102.1), Sterna hirundo (CM020500.1) and Uria aalge (MN356418.1) revealed the presence of the most fully duplicated avian region (GO-FD) or its variant (GO-FDr), in which the second control region is remnant ( Fig. 8 and Fig. S9 in Additional file 1). Therefore, we analyzed the mitogenomes of two of these species as well as 18 additional Charadriiformes annotated without duplication to identify a potentially unrecognized duplication in their mitogenomes. Using diagnostic PCR reactions amplifying a fragment between two control regions, we found GO-FD gene order in the mitogenomes of Alca torda and Uria aalge (Table 1 and Fig. S10 in Additional file 1) as well as additional 15 other species (paper in preparation). We also received previously omitted sequences of the GO-FD rearrangement in the mitogenomes of five additional avian orders: Cathartiformes (Cathartes aura), Ciconiiformes (Ciconia nigra), Gaviiformes (Gavia arctica, Gavia stellata), Podicipediformes (Podiceps cristatus) and Sphenisciformes (Spheniscus demersus) ( Table 1 and Fig. S10 in Additional file 1). Moreover, our PCR experiments demonstrated the GO-FD gene order in the mitogenomes of Apus apus (Apodiformes), Corythaixoides personatus (Musophagiformes) and Podiceps grisegena (Podicipediformes) ( Table 1 and Fig. S10 in Additional file 1). Furthermore, ab initio annotation of Calypte anna (Apodiformes) and Puffinus lherminieri (Procellariiformes) mitogenomes deposited in GenBank database revealed the presence of GO-FD (Fig. S9 in Additional We also obtained interesting results for sole two species from Eurypygiformes, Rhynochetos jubatus and Eurypyga helias. For the former, two types of amplifying reactions showed no duplication, because the length of obtained amplicons corresponded to those predicted based on the reference sequence (MN356362.1), representing the typical avian gene order ( Fig. S11a and b in Additional file 1). However, three out of nine reactions of the CR1/CR2 fragment gave an amplicon with 4000 bp accompanied by shorter ones (Fig. S11c in Additional file 1). These results suggest that these products represent NUMTs and there is no duplication in the Rhynochetos jubatus mitogenome, but it could be present in the past as indicated by the duplicated fragment transferred to the nuclear genome. For Eurypyga helias, we firstly amplified cytb/12S rRNA region, which produced a fragment 4000-bp longer than expected (Fig. S11d in Additional file 1). Thus, we checked fragments cytb/CR and CR/12S rRNA (Fig. S11e in Additional file 1). The latter occurred ca. 3600 bp longer than that in the reference genome, which strongly suggests the presence of duplication. Moreover, based on the length of amplicons cytb/CR, CR/12S rRNA, cytb/12S rRNA and obtained sequences, we found that the length of the region located between tRNA-Glu and tRNA-Phe genes is about 5100 bp. It includes a full CR terminated with repeated motifs, which are followed by an undetermined part and ended with another microsatellite region (Fig. S10 in Additional file 1). The great length of this fragment indicates that it most likely does not contain only one control region, because this fragment is 1704 bp longer than the longest single CR found in Asio flammeus (KP889214.1), so there is enough space for duplicates. Moreover, the length of this fragment corresponds well to those proved to contain two CRs in 46 representatives of avian orders closely related to Eurypygiformes, i.e. Charadriiformes, Gaviformes, Gaviiformes, Pelecaniformes, Phoenicopteriformes, Procellariiformes, Suliformes and Suliuformes [45,47,113]. The portions with two CRs are in the range of 2563-5575 bp with the average 3749 bp, which is even smaller than the length of the analyzed fragment in Eurypyga helias. This fragment should contain an additional but remnant CR and likely some pseudogenes, because no amplicon was obtained in diagnostic CR1/ CR2 reactions (Table S9 in Additional file 2) and no read was get for tRNA-Pro, ND6 and tRNA-Glu genes in sequencing of CR/12S rRNA amplicon.
A potential duplication was also noticed after the amplification of fragment cytb/12S rRNA for Trogon collaris, a representative of Trogoniformes (Fig. S11f in Additional file 1). The amplified product was ca. 3500 bp longer than expected. According to this, six out nine PCRs designed for CR1/CR2 showed several fragments with the length from ca. 3300 to 4300 bp indicating the presence of two CRs and their heteroplasmy (Fig. S11g in Additional file 1).
The data available in GenBank (in the day of 9.10.2020) and obtained in this study indicate that mitogenomes with sequenced duplications are distributed in 32 out of 43 avian orders (Table S6 in Additional file 2). Among 1261 species with known mitogenomes, 324 have a duplication and 713 do not show this feature, whereas in 46, both versions are reported. Among the species without duplication, 45 have partial mitogenomes too. Similarly, the mitogenomes of 187 species are too incomplete to classify them into one of two categories, with or without duplication. The number of mitogenomes with duplication is likely underestimated because of difficulties with the amplification and sequencing of repeated regions [4]. Accordingly, reanalysis of 13 crane mitogenomes, previously annotated without the duplication, showed that all of them contain the duplicated region [12]. Similarly, 15 mitogenomes of parrots from Cacatuidae and Nestoridae also revealed this character after using appropriate PCR and sequencing methods [23]. The obtained results indicate that underestimation of the mitogenomes with the duplication ranges from 85 to 100% in the case of Charadriiformes, Gruidae and Psittaciformes. It should be emphasized that the omission of GO-FD gene order can be common in the amplification and sequencing of avian mitogenomes using standard procedures due to the presence of two nearly identical copies.

Gene rearrangements in studied avian mitogenomes
The performed analyses revealed new gene orders in avian mitogenomes (Fig. 8). Fully duplicated gene rearrangement (GO-FD) was found in representatives of 12 avian orders Fig. S9 and S10). Other gene rearrangements are characterized by various degeneration levels of duplicated elements and were found in single or at most two avian groups, but some general tendencies can be noticed. The second copy of cytochrome b is the element that disappeared as the first in the evolution and shows pseudogene features already in the fully duplicated versions, GO-FD and GO-FDr. The second gene for tRNA-Thr fades also quite fast and is present intact only in the fully duplicated gene orders, whereas in GO-NB, GO-NC and GO-ND it is a pseudogene. CR2 also tends to degenerate because its remnant version is present in seven gene rearrangements. However, in the case of tRNA-Pro, ND6 and tRNA-Glu, the first copies were subjected to decay, whereas the second ones remained. Both copies of these genes are present only in the fully duplicated versions and GO-I. The same tendencies in the deterioration of duplicated elements were observed in other avian mitogenomes of Accipitriformes [4,114], Charadriiformes [115], Coraciiformes [116], Cuculiformes [10,117], Falconiformes [4,14,29], Passeriformes [13,14,29,41,102,118,119], Piciformes [4,97], Psittaciformes [23] and Strigiformes [24].
An exception is the gene order GO-NA in Caprimulgiformes, in which the first versions of tRNA-Pro, ND6 and tRNA-Glu have been preserved, whereas the second ones were lost (Fig. 8). The same rearrangement was only reported in one representative of Procellariiformes [20], Passeriformes [120] and Psittaciformes [23]. The bias in the loss and preservation of duplicated elements can result from a specific selection for structural organization or a special mechanism of duplication around the control region.
Especially interesting is a gene order found in Chrysolophus pictus from Galliformes because it does not resemble any rearrangement reported in Aves. Between two control regions, there are: the tRNA-Phe gene, the 770-nt beginning of 12S rRNA gene, the 66-nt end of ND6 gene and the tRNA-Glu gene ( Fig. 8 and S10). All these sequences are identical with those annotated in the GenBank record (NC_014576.1). Only the sequenced CR portions show 0.3% and 0.4% difference with the corresponding fragments. This gene order could have arisen by an insertion of duplicated genes tRNA-Phe and 12S rRNA between CR1 and ND6-2.

Reconstruction of ancestral states in terms of mitogenomic duplications in Aves phylogeny
The gathered sequence data (Table S6 in Additional file 2) and results of PCR experiments were used to reconstruct the evolution of mitogenomic duplications in all birds. Using maximum parsimony (MP) and maximum likelihood (ML) methods, we mapped the data onto the phylogenetic trees of Aves obtained by Prum, et al. [45], Kimball, et al. [47] and Kuhl, et al. [113], so six approaches were applied in total. The reconstruction presented in Fig. 9 assumes the duplication state for a given avian order if at least 33% of its species show this feature or this state was already inferred as an ancestral state, i.e. for Tinamiformes (in this study, see Fig. 6), Passeriformes [14] and Psittaciformes [23]. This assumption is reasonable because it corresponds to only 41% underestimation of mitogenomes with the duplication, which is much smaller than that above-mentioned, 85-100%. This assumption implicates that the total number of species with such mitogenomes in the avian orders already containing at least one mitogenome of this type exceeds the number of species without the mitogenomic duplication. It would strongly suggest that the last common ancestor of these orders also contained the mitogenomic duplication in the past. As a result, there are such 27 avian orders out of 41 ( Fig. 9 a and c) or 28 out of 43 (Fig. 9b).
For the tree topology obtained by Kimball, et al. [47], ML method produced 0.67 probability that the last common ancestor of all Aves contained a mitogenome with the duplication (Fig. 9b). The probability of this feature was still above 0.5 for the last common ancestors of Palaeognathae (0.83), Neoaves (0.61) and later diverged clades (≥0.62). In the case of Neognathae ancestor, the situation is ambiguous because the probability of its states is close to 0.5. Nevertheless, this method gave 0.71 probability of the duplication for all nodes in the tree topologies by Prum, et al. [45] and Kuhl, et al. [113] (Fig.  9 a and c). MP methods applied for the trees by Kimball, et al. [47] and Kuhl, et al. [113] also indicated the that the last common ancestor of all Aves had a mitogenome with duplication, which was also inherited by ancestors of deeply diverged lineages, Palaeognathae, Neognathae, Neoaves and its descendants (Fig. 9 b and c). However, for the tree by Prum, et al. [45], this method unresolved the states for the ancestors of these lineages and all Aves (Fig. 9a). This difference in the reconstruction is mainly caused by the placement of Caprimulgiformes, for which only five species with duplication and 20 without it were reported. Then, we assumed that the ancestor of this order did not contain a duplicated region in its mitogenome. This lineage diverged as the first among Neoaves in Prum, et al. [45] but later in the other trees [47,113].
Concluding, more approaches indicate that the last common ancestor of all Aves and main bird groups contained a mitogenome with a duplication and no method favored the state without the duplication (Fig. 9). This mitogenomic feature was passed to the ancestors of many modern orders. Depending on the tree topology, the duplication was lost at least 8 or 9 times independently. Among the groups with the lost duplication is Galloanserae (including Galliformes and Anseriformes), for which the duplication was reported for one species in this study. If the duplicated regions are found in more representatives of avian orders, the presented conclusions would be only reinforced by the reconstruction of the ancestral states. If we assume the duplication state for all orders for which a duplication was reported in at least one species, the results of the six approaches will consistently show the ancestry of mitogenomic duplication (Fig. S12).
This view about the ancestry of the mitogenomic duplication can be further supported by the distribution of the most fully duplicated avian region (GO-FD). It includes the repetition of cytb/tRNA-Thr/tRNA-Pro/ND6/ tRNA-Glu/CR in which only the cytochrome b gene is pseudogenized (Fig. 1). This rearrangement type occurs in 68 mitogenomes distributed among 17 bird orders (Fig. 9, Table S7 in Additional file 2). The length and complexity of this duplication suggests that it is unlikely that it occurred independently many times (see Fig. 9) because it would require the same recombination pattern and replication errors [12,16,25,67]. Thus, it seems more probable that this state was inherited from shared ancestors by the avian lineages that contain this rearrangement type. In other bird groups, the duplicated regions were subjected to degenerations and loss of selected elements.
Whether the duplicated region is only a neutral trait or can provide a real selective advantage is an interesting question. Previous studies have suggested that bird species having mitogenomes with the duplicated control region can be characterized by longer life-span [23,121] as well as a greater metabolic rate and energy production [14,23]. However, further understanding the potential selective advantages of mitogenome duplications will require direct study at physiological and molecular levels.

Conclusions
The obtained results indicate that duplicated control regions with adjacent genes are more common in Palaeognathae mitochondrial genomes than it was previously thought. What is more, this feature was most likely present in the last common ancestor of this avian group. Once the duplication occurred, it was preserved during the evolution of Struthioniformes, Rheiformes and some Tinamiformes. The duplicated regions were subjected to concerted evolution, which resulted in homogenization of some parts of duplicated regions and degeneration of others. Reconciliation of the mitogenome-based phylogenetic tree with a probable species tree based on nuclear markers suggests that the evolution of mitogenomes in Palaeognaths could have involved different mitogenomic variants in one cell, i.e. heteroplasmy, in a short period of time.
The distribution of mitogenomes with duplications across the avian phylogenetic tree implies that the last common ancestors of not only Palaeognathae but also Fig. 9 Reconstruction of ancestral states and mapping of mitogenomic duplications in Aves onto the three tree topologies. The colored thick lines of the trees correspond to maximum parsimony reconstruction, whereas values at nodes mean the probability (proportional likelihood) of a more likely ancestral state provided by maximum likelihood reconstruction. The tree obtained by Prum, et al. [45] is in a, by Kimball, et al. [47] in b, whereas by Kuhl, et al. [113] in c. The duplication state was assumed for a given avian order if duplication was found in at least 33% of its species or was inferred as an ancestral state in this study for Tinamiformes as well as in other studies for Passeriformes and Psittaciformes [14,23]. FD indicates the presence of the most fully duplicated avian region. AsymmMk model was applied for ML approach major Neognathae groups and even all known Aves could have had a mitogenomic duplication, which was then inherited by many modern lineages. The presence of duplicated regions in mitogenomes can be a neutral feature associated with the mechanism of replication and recombination or can give a selective profits, e.g. more effective energy production by mitochondria. However, it needs further studies in many avian representatives.

Samples and DNA extraction
Blood, muscle and feather samples from avian species were obtained thanks to courtesy of European zoological gardens, universities, foundations, aviculture parks, animal rehabilitation centers and private breeding facilities (Table 1). They were taken as dry blood spots on a fiber filter dedicated to laboratory analyses and were preserved at − 20°C in Eppendorf tubes sealed with parafilm to avoid damping. Total DNA was extracted with Sherlock AX Kit (A&A Biotechnology) according to the manufacturer's protocol.

PCR strategy for tandem duplication survey
To verify the presence or absence of tandem duplication within the mitochondrial genomes of Palaeognathae and other avian representatives, we used the strategy proposed by Gibb, et al. [4] and successfully applied to parrot mitogenomes [23]. Because two control regions (CRs) are common in the majority of avian duplicated rearrangements [22] and the paralogous CRs are usually nearly identical [12,63], we designed appropriate primers D-F and D-R that were to anneal to the central parts of CRs (Table S1 in Additional file 2, Fig. 2). This property makes the PCR strategy diagnostic because the expected amplicons occur only when two control regions are present in the genome. Due to the high variability of the control region sequences from the analyzed taxa, it was impossible to use universal primers, which forced us to design primers specific to each genus or even species (Table S1 and S9 in Additional file 2). Based on the selected primers, we ran 12-48 different reactions for Palaeognathae representatives, 35 for Chrysolophus pictus (Galloanserae) and 4-25 for Neoaves representatives (Table S1 and S9 in Additional file 2).

PCR strategy for amplification of the mitogenomic fragments containing the whole duplicated regions
The diagnostic fragment designed for tandem duplication survey comprises incomplete control regions, i.e. the second part of CR1 and the first part of CR2, as well as genes located between the two CRs. Therefore, in the case of taxa for which such a fragment was obtained, appropriate PCR reactions were performed to complete the missing parts of CRs and to reveal the order of genes preceding the CR1. The obtained partial CR1 sequences were used to design species specific CR-R primers for amplification of tRNA-Leu/CR1 fragment (for Struthio camelus) or ND5/CR1 fragments (for Rhea americana, Rhea pennata, Crypturellus tataupa) (Fig. 2). Similarly, the partial CR2 sequences were used to design species specific CR-F primers for amplification of CR2/16S fragments (for Struthio camelus, Rhea americana, Rhea pennata) or CR2/12S fragment (for Crypturellus tataupa) (Fig. 2). Appropriate L-F, ND5-F, 12S-R and 16S-R primers were designed based on reference mitogenomic sequences of the analyzed or related taxa deposited in GenBank (AF338715.1, AF090339.1, AF338709.2, AF338710.2). Suitable elongation times were applied to avoid amplification of tRNA-Leu/CR2, ND5/CR2, CR1/ 12S and CR1/16S fragments, which would contain two copies of some genes and/or control regions. Additional diagnostic ND6-1/ND6-2 fragments were amplified to confirm that the CR1/CR2 sequences were not errors of the PCR reactions or copies present in the nuclear genome, i.e. nuclear mitochondrial DNA inserts (NUMTs). Finally, the whole duplicated mitogenomic regions of Struthio camelus, Rhea americana, Rhea pennata and Crypturellus tataupa were amplified in four overlapping fragments (Fig. 2). Both amplicons containing only one control region fragment (CR1 or CR2) were 3 kb to 5 kb in length, which excludes a possibility of NUMTs amplification, whose average size is usually below 1 kb [122]. Despite the length of two diagnostic fragments depends on elements (genes and/or control region) located between CRs or ND6 genes, their length was longer than 1 kb, i.e. 2-3 kb on average. Uncropped and unprocessed agarose gels were presented in Fig. S13 in Additional file 1.

DNA amplification and sequencing
The PCR amplifications were performed in 25 μl reaction mixture containing 50 ng of the DNA template, 1 U DreamTaq Green DNA Polymerase (Thermo Fisher Scientific), 2.5 μl of 10 x buffer, 0.6 μl of 10 mM dNTPs, and 0.6 μl of each primer (10 μM). In the case of diagnostic fragments (CR1/CR2, ND6-1/ND6-2, tRNA-Pro1/tRNA-Pro2, ND6-1/tRNA-Pro2), following program was used: 94°C for 5 min; 94°C for 30 s, 56-62°C for 30 s, 72°C for 120 s repeated 35 times; and 72°C for 5 min. In the case of all other fragments the reaction conditions were as follows: 94°C for 5 min; 94°C for 30 s, 56-62°C for 30 s, 72°C for 180 s repeated 35 times; and 72°C for 5 min. For each amplified fragment, the appropriate amount of the PCR reaction mixtures was cleaned with the use of Clean-up Kit (A&A Biotechnology) to obtain the final volume of 100 μl with the concentration of at least 50 ng/μl. The two DNA strands of the cleaned PCR products were sequenced using the Primer Walking method (Wyzer Biosciences Inc., Cambridge, MA). Overlaps between three or four fragments amplified for each species were sufficient to assemble the whole mitogenomic regions including duplicated elements with the use of appropriate avian reference mitogenomes containing GO-FD (Table S7 in Additional file 2) or GO-I (Notiomystis cincta and Turdus philomelos) gene orders.

Sequence analyses
The annotation of genes was performed with the help of MITOS [123]. MEGA software [124] with MUSCLE aligner [125] were used to inspect and align sequences. Default parameters were assumed in the sequence alignments. JalView [126] was applied to visualize the alignments. Control regions were searched for potential pseudogenes using the optimal global:local algorithm (glsearch) form FASTA package version 36.3.8 g [127]. We assumed one million of shuffles, match/mismatch scores 5/− 4 and gap cost opening/extension − 10/− 1.

Phylogenetic analyses
Phylogenetic relationships between Palaeognathae were inferred based on all available 19 complete or almost complete mitochondrial genomes ( Table 2). Five representatives of Neognathae were used as an outgroup: Anas platyrhynchos (NC_009684.1), Anseranas semipalmata (NC_005933.1), Crax daubentoni (NC_024617.1), Gallus gallus (NC_001323.1) and Numida meleagris (NC_034374.1). The sequence records were downloaded from the GenBank database. The analyses were based on nucleotide sequences of 13 protein coding genes, genes for 12S and 16S rRNAs, as well as 22 tRNAs. Sequences of the control region (CR) were excluded due to their high variation. The sequences were aligned in MAFFT using a slow and accurate algorithm L-INS-i with 1000 cycles of iterative refinement [128]. The resulted alignments were edited manually in JalView [126] and sites suitable for phylogenetic study were selected in GBlocks [129]. The concatenated alignment of mitochondrial genes consisted of 15,351 bp.
We applied three phylogenetic approaches in the phylogenetic analyses: the maximum likelihood method in IQ-TREE [130], as well as Bayesian analyses in MrBayes [131] and PhyloBayes [132]. We considered 63 potential partitions, i.e. three codon positions for each individual protein coding gene and separate partitions for each of RNA genes to test the necessity of using separate substitution models.
The ModelFinder program associated with IQ-TREE [133,134], proposed 12 partitions with individual substitution models (Table S8 in Additional file 2). In the tree search, we used the more thorough and slower NNI search. In IQ-TREE, we applied Shimodara-Hasegawalike approximate likelihood ratio test (SH-aLRT) assuming 10,000 replicates and non-parametric bootstrap with 1000 replicates.
In MrBayes, we assumed 11 substitution models for the appropriate partitions as proposed by PartitionFinder [135]. However, we implemented mixed models rather than fixed ones to specify appropriate substitution models across the large parameter space [136], but the assumptions on the heterogeneity rate across sites were adopted from PartitionFinder results ( Table S4 in Additional file 2). Two independent runs each using 8 Markov chains were applied and trees were sampled every 100 generations for 20,000,000 generations. In the final analysis, we selected trees from the last 12,866,000 generations that reached the stationary phase and convergence, i.e. when the standard deviation of split frequencies (SD) stabilized and was below 0.00004.
In PhyloBayes, we applied the CAT-GTR + Γ5 model with the number of components, weights and profiles of the model inferred from the data. Two independent Markov chains were run for 100,000 generations with one tree sampled for each generation. The last 50,000 trees from each chain were collected to compute posterior consensus trees after reaching convergence, when the largest discrepancy observed across all bipartitions (maxdiff) was 0.0055.
To reduce a potential compositional heterogeneity in sequences related with AT or GC bias, we recoded respective nucleotides for purines (R) and pyrimidines (Y) in the third codon positions of protein-coding genes. In MrBayes and PhyloBayes, we adopted the assumptions as described above, whereas in IQ-TREE, we applied 11 partitions with individual substitution models proposed by ModelFinder (Table S10 in Additional file 2). The posterior consensus was calculated for trees from the last 5,845,000 generations in MrBayes for SD < 0.00013 and 90,000 generations in PhyloBayes for maxdiff = 0.007.
Besides the study based on the full alignment, we also performed phylogenetic analyses using ten alignments with sequentially removed partitions characterized by the highest substitution rate. The partition-specific rate was taken from MrBayes estimations. The settings for this study were the same as above. In the case of IQ-TREE analyses, the best substitution models were calculated for each set every time.
The best tree produced in IQ-Tree was compared in Consel [137] with alternative topologies using available tests. We applied 1000,000 replicates and site-wise log-likelihoods for the compared trees calculated in IQ-Tree under the best fitted substitution models. Moreover, we analyzed competitive topologies using Bayes Factor in MrBayes based on the stepping-stone method estimating the mean marginal likelihood using 50 steps of the sampling algorithm and other parameters as described above.
The data about the presence and absence of duplication in the palaeognath mitogenomes were mapped on the phylogenetic tree using Mesquite [138]. The lack of data about the duplication was coded as missing data. We applied maximum parsimony and maximum likelihood reconstruction methods. Depending on the AIC criterion, we used the better fit model for the data: either the Mk1 model (Markov k-state 1 parameter model) or the AsymmMk model (asymmetrical Markov k-state 2 parameter model).
Additional file 1: Fig. S1. Representative cropped agarose gels of CR1/ CR2 amplicons obtained for Struthio camelus (a), Rhea pennata (b), Rhea americana (c) and Crypturellus tataupa (d). The numbering of amplicons separated in the agarose gel corresponds to the reaction numbered in Table S1 in Additional file 2. The numbers of PCR fragments, which were finally sequenced, are underlined. Lane M -GeneRuler 1 kb DNA Ladder (Thermo Scientific). Original uncropped and unprocessed gels are presented in Fig. S13a-d. Fig. S2. Sequence comparison of two copies of tRNA-Pro, ND6from, tRNA-Glu genes and control region, found in this study as duplicated, with appropriate sequences of mitogenomes previously deposited in GenBank and showing a typical avian gene order. The alignments are shown for Struthio camelus (a), Rhea americana (b), Rhea pennata (c) and Crypturellus tataupa (d). Dots indicate residues identical in the obtained copies with those in the single sequence previously published. Genes for tRNA-Pro are marked in yellow coloring, NADH dehydrogenase subunit 6 in cyan coloring, tRNA-Glu in grey shading, and control regions in red fonts. In the case of Crypturellus tataupa species only two obtained copies are compared to each other because the previously published mitogenome of these taxa is incomplete and it does not contain any of the analyzed genes or the control region. The sequences were aligned with MUSCLE [125] in MEGA [124]. Fig. S3. Sequences of mitogenomic fragments amplified and sequenced in this study for Struthio camelus (MH264503), Rhea americana (MK696563), Rhea pennata (MK696564) and Crypturellus tataupa (MK696562). Genes for tRNA-Leu are marked in bold, NADH dehydrogenase subunit 5 in magenta coloring, cytochrome b in red coloring, tRNA-Thr in green coloring, tRNA-Pro in yellow coloring, NADH dehydrogenase subunit 6 in cyan coloring, tRNA-Glu in grey shading, 12S rRNA in blue fonts, tRNA-Val in brown fonts, 16S rRNA in green fonts, tRNA-Phe are in bold and underlined, and control regions in red fonts. Fig. S4. Cropped agarose gels of potential NUMTs amplified for Dromaius novaehollandiae (a) and Casuarius casuarius (b) with the use of primers designed for detection of CR1/ CR2 fragments. The numbering of lanes corresponds to the reaction numbered in Table S1 in Additional file 2. Lane M -GeneRuler 1 kb DNA Ladder (Thermo Scientific). Original uncropped and unprocessed gels are presented in Fig. S13e and f. Fig. S5. Comparison of selected sequence associated with reannotation of Palaeognathae mitogenomes. Eudromia elegans gene for tRNA-Pro with the last 200 nucleotides of the control region (a); Emeus crassus, Dinornis giganteus and Anomalopteryx didiformis control region 5′ spacers (b); Probosciger aterrinus (MH133970.1), Eolophus roseicapilla (MH133971.1) and Cacatua moluccensis (MH133972.1) ND6 pseudogenes (c); Anomalopteryx didiformis ND6 gene with control region 5′ spacer (d); Emeus crassus ND6 gene with control region 5′ spacer (e); Dinornis giganteus ND6 gene with control region 5′ spacer (f). Sequences rich in G and A residues are marked in a red box. The sequences in b and c were aligned with MUSCLE [125] in MEGA [124], whereas those in a, d, e and f using the optimal global:local algorithm (glsearch) form FASTA package version 36.3.8 g [127]. Fig. S6. Sequence comparison of two Eudromia elegans control regions. Dots indicate residues identical in the compared copies. Dots indicate residues identical in the compared sequences. Sequence repeats present in both control regions are marked in red and blue boxes. The sequences were aligned with MUSCLE [125] in MEGA [124]. Fig. S7. Influence of removing partitions with the highest substitution rate on alignment and tree parameters: the number of parsimony informative sites and mean distance (a) as well as mean support values (b). The mean phylogenetic distance was obtained from MrBayes tree. The mean support values were calculated from posterior probabilities received in MrBayes and PhyloBayes as well as SH-aLRT and nonparametric bootstrap percentages obtained in IQ-TREE. The posterior probabilities were scaled to 100%. Fig. S8. Maximum parsimony (a, c) and maximum likelihood (b, d) reconstruction of ancestral states and mapping of mitogenomic duplications onto the Palaeognathae tree topology t1 based on sequences of mitochondrial genes (a, b) and the Palaeognathae tree topology t2 supported by nuclear markers in coalescent-based approaches (c, d). In contrast to Fig. 6, this approach assumes that Anomalopteryx didiformis, Emeus crassus and Dinornis giganteus already lost mitogenomic duplication. The area of colors at nodes in b and d corresponds to the probability of the given state, single or duplicated region. Two-colored branches correspond to the equal probability of two states, single or duplicated region. The probability value for a more likely state was also given at these nodes. Mk1 model was applied for ML approach. . Genes for NADH dehydrogenase subunit 5 are marked in magenta coloring, cytochrome b in red coloring, tRNA-Thr in green coloring, tRNA-Pro in yellow coloring, NADH dehydrogenase subunit 6 in cyan coloring, tRNA-Glu in grey shading, 12S rRNA in blue fonts, tRNA-Phe in bold fonts, and control regions in red fonts. Pseudogenes are in italics and underlined. Microsatellites present within Pterocles gutturalis mitogenomes between tRNA-Thr gene and tRNA-Pro pseudogene or tRNA-Thr gene and ND6 pseudogene are marked in brown. . Genes for NADH dehydrogenase subunit 5 are marked in magenta coloring, cytochrome b in red coloring, tRNA-Thr in green coloring, tRNA-Pro in yellow coloring, NADH dehydrogenase subunit 6 in cyan coloring, tRNA-Glu in grey shading, 12S rRNA in blue fonts, tRNA-Phe in bold fonts, and control regions in red fonts. Pseudogenes are in italics and underlined. Fig. S11. Representative cropped agarose gels of the following amplicons: cytb/12S rRNA obtained for Rhynochetos jubatus (a), Eurypyga helias (d) and Trogon collaris (f); CR1/CR2 obtained for Rhynochetos jubatus (c) and Trogon collaris (g); CR/12S rRNA obtained for Rhynochetos jubatus (b) and Eurypyga helias (e); cytb/CR obtained for Eurypyga helias (e). The numbering of amplicons separated in the agarose gel corresponds to the reaction numbered in Table S9. Lane M -GeneRuler 1 kb DNA Ladder (Thermo Scientific). Fig. S12. Reconstruction of ancestral states and mapping of mitogenomic duplications onto the Aves three tree topologies under a more liberal assumption for assignment of duplication states. The colored thick lines of the trees correspond to maximum parsimony reconstruction, whereas values at nodes mean the probability of a more likely ancestral state (proportional likelihood) provided by maximum likelihood reconstruction. The trees obtained by Prum, et al. [45] are in a, by Kimball, et al. [47] in b, whereas by Kuhl, et al. [113] in c. The duplication state was assumed for a given order if a duplication was reported in at least one species from this order. FD indicates the presence of the most fully duplicated avian region. In ML approach, AsymmMk model was applied for a and, c whereas Mk1 for b. Fig. S13. Original uncropped and unprocessed agarose gels of: CR1/CR2 amplicons obtained for Struthio camelus (a), which were also shown in the Fig. S1a; CR1/CR2 amplicons obtained for Rhea pennata (b) which were also shown in the Fig. S1b; CR1/CR2 amplicons obtained for Rhea americana (7 lanes to the left of DNA Ladder) (c), which were also shown in the Fig. S1c; CR1/CR2 amplicons obtained for Crypturellus tataupa (6 lanes to the right of the DNA Ladder) (d), which were also shown in the Fig. S1d; potential NUMTs amplified for Dromaius novaehollandiae (e), which were also shown in the Fig. S4a; potential NUMTs amplified for Casuarius casuarius (f), which were also shown in the Fig. S4b; cytb/12S rRNA amplicons obtained for Rhynochetos jubatus (g), which were shown in the Fig. S11a; CR/12S rRNA amplicons obtained for Rhynochetos jubatus (h), which were shown in the Fig. S11b; CR1/CR2 amplicons (potential NUMTs) obtained for Rhynochetos jubatus (i), which were shown in the Fig. S11c; cytb/12S rRNA amplicon obtained for Eurypyga helias (j), which was shown in the Fig. S11d; cytb/CR and CR/12S rRNA amplicons obtained for Eurypyga helias (k), which were shown in the Fig. S11e; cytb/ 12S rRNA amplicon obtained for Trogon collaris (the first lane prior the first DNA Ladder) (l), which was shown in the Fig. S11f; CR1/CR2 amplicons obtained for Trogon collaris (m), which were shown in the Fig. S11g.
Additional file 2: Table S1. Characteristics of primers and PCR reactions for amplification of CR1/CR2 fragments. Sequences of primers, their internal laboratory numbering and naming as well as positions according to reference mitogenomes are included. Reactions that failed are marked with an asterisk. Reactions whose products were finally sequenced are marked in grey shading. Table S2. Characteristics of control regions in Palaeognathae species based on mitochondrial fragments obtained in this study and previously published mitogenomes. Table S3. Characteristics of alignments between ND6 copies for various avian taxa. Table S4. Substitution models, partitions and their relative rate in MrBayes analysis. Table S5. Results of tests comparing various palaeognath tree topologies. The topology t1 corresponds to the best tree found for the full alignment. The topologies are shown in Fig. 5. The table includes: p-value from an approximately unbiased test (AU), the bootstrap probability calculated from the multiscale bootstrap (NP), the bootstrap probability calculated in the usual manner (BP), Bayesian posterior probability calculated by the BIC approximation (PP), p-value of the Shimodaira-Hasegawa (SH) and the weighted Shimodaira-Hasegawa tests (WSH), Bayes factor (BF) expressed in natural logarithm units as differences between marginal likelihoods of the given and the topology (t1). Table S6. Number of known mitogenomic sequences and species in terms of duplication for individual avian orders. Table S7. Avian species in which GO-FD gene order was identified in their mitogenomes. Previous annotations assuming single, i.e. without duplication, version were also included. Incomplete mitogenomes are marked with an asterisk. Table S8. Substitution models and partitions used in IQ-TREE analysis. Table S9. Characteristics of primers and PCR reactions for amplification of mitochondrial fragments from Neoaves and Galloanserae representatives. Sequences of primers, their internal laboratory numbering and naming as well as positions according to reference mitogenomes are included. Reactions that failed are marked with an asterisk. Reactions whose products were finally sequenced are marked in grey shading. Table S10. Substitution models and partitions used in IQ-TREE analysis for RY-recoded sequences.