Tunicate mitogenomics and phylogenetics: peculiarities of the Herdmania momus mitochondrial genome and support for the new chordate phylogeny

Background Tunicates represent a key metazoan group as the sister-group of vertebrates within chordates. The six complete mitochondrial genomes available so far for tunicates have revealed distinctive features. Extensive gene rearrangements and particularly high evolutionary rates have been evidenced with regard to other chordates. This peculiar evolutionary dynamics has hampered the reconstruction of tunicate phylogenetic relationships within chordates based on mitogenomic data. Results In order to further understand the atypical evolutionary dynamics of the mitochondrial genome of tunicates, we determined the complete sequence of the solitary ascidian Herdmania momus. This genome from a stolidobranch ascidian presents the typical tunicate gene content with 13 protein-coding genes, 2 rRNAs and 24 tRNAs which are all encoded on the same strand. However, it also presents a novel gene arrangement, highlighting the extreme plasticity of gene order observed in tunicate mitochondrial genomes. Probabilistic phylogenetic inferences were conducted on the concatenation of the 13 mitochondrial protein-coding genes from representatives of major metazoan phyla. We show that whereas standard homogeneous amino acid models support an artefactual sister position of tunicates relative to all other bilaterians, the CAT and CAT+BP site- and time-heterogeneous mixture models place tunicates as the sister-group of vertebrates within monophyletic chordates. Moreover, the reference phylogeny indicates that tunicate mitochondrial genomes have experienced a drastic acceleration in their evolutionary rate that equally affects protein-coding and ribosomal-RNA genes. Conclusion This is the first mitogenomic study supporting the new chordate phylogeny revealed by recent phylogenomic analyses. It illustrates the beneficial effects of an increased taxon sampling coupled with the use of more realistic amino acid substitution models for the reconstruction of animal phylogeny.


Background
Mitochondrial genomes (mtDNAs) of bilaterian animals are short, circular DNA molecules of 14-16 kb in length, typically characterized by the absence of introns and the presence of only short intergenic regions, with the exception of the control region (CR), a non-coding region assumed to contain the elements for the regulation of replication and transcription of the genome [1,2]. To date, about 1,700 complete metazoan mitochondrial genomes have been sequenced and used in comparative mitogenomics and phylogenetic studies on different taxonomic scales [3][4][5][6][7][8][9].
To date, complete mitochondrial genomes of tunicates are mainly available for a single representative of Thaliacea (Doliolum nationalis) and five ascidians [13,[18][19][20][21][22], including four phlebobranchians (Ciona intestinalis type A and B, C. savignyi, Phallusia fumigata, P. mammillata) and one stolidobranchian (Halocynthia roretzi). The available mitochondrial data suggest that several unique features characterize mitogenomic evolution in tunicates relative to other chordate phyla. Two main peculiarities can be distinguished. The first refers to the highly variable gene order observed within the group, which implies that extensive gene rearrangements have occurred even at low taxonomic levels [13,18,19]. However, since most available tunicate complete mtDNA sequences belong to phylogenetically-related species (except H. roretzi) according to the 18S rRNA reference [32][33][34], it is not possible to evaluate whether mitochondrial gene rearrangements characterize the whole order or only the Aplousobranchia + Phlebobranchia + Thaliacea clade. The second specificity is that of an accelerated evolutionary rate of tunicates, as revealed by the long branches of the group in mitogenomic topologies [35][36][37] and the associated composition bias [37]. However, whether this accelerated substitution rate is restricted to protein coding genes as in snakes [38] or is a more general feature of the whole mtDNA of Tunicates, has yet to be investigated.
These two peculiar evolutionary features of tunicate mitochondrial genome evolution have hampered their reliable phylogenetic placement within metazoans. Analyses of mitochondrial protein-coding genes have almost always systematically placed tunicates outside Bilateria [20,35,36,39]. This is in sharp contrast with recent nuclear-based phylogenomic studies that identified tunicates as the closest living relatives of vertebrates within chordates [40][41][42][43]. Only two recent mitogenomic study have found marginal support for chordate monophyly. Bourlat et al. [37] grouped cephalochordates with vertebrates according to the traditional Euchordata hypothesis using a concatenation of the 13 protein coding genes under a site-and time-heterogeneous mixture model in Bayesian phylogenetic reconstructions. Alternatively, Zhong et al. [39] recovered the new chordate phylogeny under the maximum likelihood framework when removing the fastest evolving vertebrates species and when considering only the four most conserved mitochondrial proteins.
Here, we sequenced the complete mitochondrial genome of the solitary ascidian Herdmania momus (Ascidiacea: Stolidobranchia: Pyuridae), an Indo-Pacific species that was introduced into the Mediterranean Sea via the Suez Canal [44]. We describe the structural and compositional features of H. momus mtDNA, discuss its evolutionary dynamics with respect to other tunicate and chordate mitochondrial genomes, and provide an updated metazoan phylogeny based on probabilistic analyses of the 13 mitochondrial proteins using site-and time-heterogeneous mixture models of amino acid substitutions.

Results and Discussion
General features of H. momus mtDNA The mitochondrial genome of H. momus ( Figure 1) accounts for 15,816 base pairs (bp) in length which falls within the typical range of other tunicate genomes; the smallest genome being 14,579 bp long in P. fumigata and the largest one 16,351 bp in D. nationalis. It presents the typical tunicate gene content with all 13 protein-coding genes of the mitochondrial respiratory apparatus, including atp8, which appears so divergent in Tunicates that it had been difficult to annotate in the initial assemblies of C. savignyi and H. roretzi [18,21,22]. It also encodes for the two mitochondrial ribosomal genes, srRNA and lrRNA, and a total of 24 tRNAs (Figure 2), among which are two distinct tRNAs for the Gly (AGR and GGN), Leu (UUR and CUN), and Ser (AGY and UCN) codons (Figure 2), which is consistent with the modified mitochondrial genetic code of tunicates. Finally, H. momus mtDNA contains an additional tRNA-Met (AUA), similar to all other tunicate genomes sequenced so far [13,[18][19][20][21][22].
In contrast to other deuterostome mtDNA genomes, all genes are encoded on the same DNA strand and thus mitochondrial gene transcription shows only one orientation in H. momus, as in all other tunicate genomes sequenced to date. In chordates, although the majority of genes is co-orientated on one major DNA strand (Hstrand), a minimum set of tRNAs and the nd6 gene are encoded from the minor one (L-strand) [2,17,45]. The genome architecture of H. momus thus provides additional evidence in support of the hypothesis that gene arrangement in only one strand is a shared derived feature of tunicate mtDNAs [2].
Concerning base composition, the mitochondrial genome of H. momus is AT rich with a value of 63.5%. The overall base frequencies as estimated in the coding strand follow the decreasing order T, G, A and C. T is by far the most abundant nucleotide with 41.3% in the genome. Most tunicate mtDNAs present similar compositional ATrich profiles varying from 61.1% in D. nationalis to 78.6% in C. intestinalis. The only exception concerns the mitochondrial genomes of species from the genus Phallusia, which present more balanced AT and GC compositions with 52.8% and 53.2% AT in P. mammillata and P. fumigata, respectively [13].

Protein-coding genes
The predicted protein-coding genes in the mtDNA of H. momus present lengths that are overall similar to their orthologues in the other tunicate genomes. Nevertheless, H. momus was not predicted to carry any deletion at the Cterminal ends of nd4 and nd5 genes similar to those reported in H. roretzi [21].
The base composition of protein-coding genes in H. momus was measured as being A+T rich, as is the profile of its entire genome. Consequently, the codon usage follows the same pattern, with C-rich codons showing lower frequencies than those that are T-or G-rich. Extreme cases are constituted by the CGC codon for Arginine, which is never used, and by the UUU codon for Phenylalanine and UUG codon for Leucine, which are the most frequently used. The CGC codon is also never used in the two Ciona genomes, whereas it is present in other tunicate genomes [18,22].
The preferred start and stop codons in H. momus mitochondrial genes are GTG (6 genes) and TAG (7 genes), respectively. In the other Pyuridae (H. roretzi), GTG is the preferred start codon too, but TAA is the most frequent stop codon (8 genes). Like in other tunicate genomes, incomplete T-starting stop codons are predicted at the end of the three genes coding for nd1, nd4 and nd4L. The com-plete TAA stop codon is probably created by polyadenylation.

rRNA genes
Because of high sequence divergence of tunicate rRNA sequences, the boundaries of all tunicate mitochondrial rRNA sequenced so far have been inferred from the flank-ing genes. Likewise, the srRNA gene in H. momus is estimated at most as 673 bp long, thus being slightly shorter than its acidian orthologues which range from 687 to 738 bp, and about 30 bp longer than the srRNA of D. nationalis mtDNA. On the genome map, it is found located between the tRNA-Ala and tRNA-Glu. The lrRNA, on the other hand, appears to be at most 1,159 bp long, a length simi-Herdmania momus mitochondrial genome map Figure 1 Herdmania momus mitochondrial genome map. Protein coding genes, rRNAs, and tRNAs are shown in red, green, and black, respectively. Gray numbers indicate the length of non coding regions longer than 40 bp, the location of the repeated sequences is indicated in blue.
Putative cloverleaf structures for the 24 tRNA genes of the Herdmania momus mitochondrial genome lar to that of other tunicates, and is located downstream of tRNA-Tyr and upstream of cytb ( Figure 1).
Both lengths and locations of the two rRNA genes in the H. momus mitochondrial genome provide additional evidence for two unique features of mtDNA evolution in tunicates. The first is that tunicates present the shortest mitochondrial rRNA genes among chordates, and more generally among deuterostomes, with lengths varying at most from 641 to 738 bp for the small ribosomal subunit and from 1,059 to 1,279 bp for the large ribosomal subunit [13]. The corresponding lengths for the srRNA and lrRNA genes in the other chordate lineages are estimated as higher than 844 bp and 1,367 bp, respectively. The second feature shared by all tunicates concerns the location of the two genes in the mitochondrial genome. In contrast to all other chordates, where the two rRNA coding genes are usually adjacent, srRNA and lrRNA are found rearranged and separated in all available tunicate mtDNAs [2]. The distance between the two genes in H. momus genome is about 5 kb. Distances of the same order of magnitude (5-7 kb) were also identified in the mtDNAs of Cnidaria, some Protostomia, and some Echinodermata [2].

tRNA genes
The predicted cloverleaf structures for the 24 tRNA genes of the H. momus mitochondrial genome are presented in Figure 2. As in H. roretzi, only two tRNAs are characterized by an unusual structure: tRNA-Ser (AGY) and tRNA-Asn. tRNA-Ser (AGY) appears to possess an 8 bp-long anticodon stem structure. Similar long stems (7-9 bp) have been reported in Ciona ssp., D. nationalis, and H. roretzi tRNA-Ser (AGY) [18,[20][21][22]. However, this structure is absent in Phallusia tRNA-Ser (AGY), which appears to have retained the classical cloverleaf conformation [13]. Concerning the tRNA-Asn gene, two alternative cloverleaf structures have been inferred, as also previously reported for H. roretzi [21]. The structure predicted by tRNA-scan ( Figure 2) is characterized by a two-nucleotide spacer between the DHU and the anti-codon stem as well as a mismatch at the start of the T stem. The alternative conventional cloverleaf structure ( Figure 2) appears less stable, in forming a mismatch at the start of the D arm. Similar two-nucleotide spacers have also been found in the tRNA-Asn structures of Phallusia ssp., H. roretzi, and D. nationalis, but not in Ciona ssp [13,18,[20][21][22]. Table 1 summarizes some attributes of the non-coding (NC) regions in H. momus and the other available tunicate mitochondrial genomes. In total, 28 NC regions have been identified in H. momus mtDNA, with a total size of 1,457 bp (9.21% of the total genome). When compared to other ascidians, the proportion of NC regions in H. momus appears higher, with previous estimates varying from 1.94 to 5.89%, in the mtDNAs of P. mammillata and P. fumigata, respectively. The proportion of NC sequences in the closely-related H. roretzi is three-fold lower (3.09%) than that of H. momus, yet the total number of NC fragments is almost identical in both species, and the two genomes differ by ca. 1 kb in length. Among tunicates, only Doliolium nationalis has more non-coding regions with 13.1% of the total genome length ( Table 1).

Non-coding regions
Considering length distributions, 19 of the 28 NC regions present lengths over 20 bp, with the longest accounting for 197 bp and being located between the nd6 and tRNA-Tyr genes (Figure 1). Three additional NC regions measuring more than 100 bp have been identified in H. momus mtDNA, involving the disjunction of the following gene pairs: cytb -nd2, nd4 -tRNA-His, and tRNA-Lys -nd3. The remaining nine NC sequences are shorter intergenic spacers, mostly 3-10 bp long, distributed homogenously through the genome.
The presence of palindrome sequences was checked by sequence similarity searches and has been detected within 12 NC regions, and in particular within the three longest NC regions of the H. momus mitochondrial genome. However, no stem-loop structure similar to that of the control region of vertebrate mt-genomes has been identified in H. momus. Further studies are necessary to establish whether these regions are involved in the control of replication and transcription.
Sequence similarities searches have finally revealed the occurrence of a duplication event in the NC regions of the H. momus genome, similar to those previously determined in Ciona intestinalis and Phallusia fumigata mtDNAs [13].
Here, a fragment of 89 bp was found duplicated in two different locations of the genome: the first, situated in the non-coding region downstream of the nd6 gene and upstream of the gene encoding for the tRNA-Tyr (NC = 197 bp); while the second involved the non-coding region downstream of the tRNA-Asn (NC = 88 bp) and 13 bp of the 5' region of the tRNA-Lys. The estimated distance between the repeated motifs is about 5 kb (Figure 1).

Gene order
The mtDNA of H. momus shows a novel gene arrangement with respect to other tunicates, and is radically different from that of its close relative Halocynthia roretzi ( Figure 3). Surprisingly, only one block containing two consecutive genes is conserved between the two species: the pair tRNA-Trp -tRNA-Ile. When considering only the relative arrangement of protein-coding genes, an additional block appears to be shared by the two members of Stolidobranchia, consisting of the three genes cytb -nd2 -nd5. In H. roretzi, cytb and nd2 are separated by four tRNA-coding genes, whereas in H. momus the two genes are separated by a long non-coding region of 157 bp. Likewise, the number of tRNA genes separating nd2 and nd5 as well as their identities differ in each genome (two and five tRNAs separate these two genes in H. momus and H. roretzi respectively). Conversely, the pair cox2-cytb, previously considered as unique and shared by all other tunicates is absent in H. momus, as well as the pair tRNA-Arg -tRNA-Gln which is conserved in most tunicate genomes [13,[18][19][20][21][22].
Thus, the mitochondrial genome organization in H. momus when compared to that of H. roretzi suggests that extensive gene rearrangements occur within Stolidobranchia, similar to those observed in Phlebobranchia [13]. Future work should determine whether co-generic species in Stolidobranchia also present high variations in gene order, as reported for the genera Ciona and Phallusia within Phlebobranchia [2].

Phylogenetic analyses
The mitochondrial genome has been repeatedly used in molecular phylogenetics of animals, principally due to several convenient features, such as its relatively small size, its cellular abundance, and its mostly uniparental mode of inheritance facilitating orthology assignment [5]. However, the phylogenetic resolving power of mtDNAs is often reduced by pervasive lineage-specific variations of base composition and substitution rate [46,47]. Bilaterian evolutionary relationships obtained from previous phylogenetic reconstructions based on mitochondrial genes [35,36] are in relatively good agreement with the currently accepted view of Protostomia phylogeny [48]. However, the situation is more complex concerning Deuterostomia, mainly because of the recurrent placement of tunicates as sister-group of either the remaining Bilateria [35,36] or the other chordates [37]. Such placement outside chordates is in direct contradiction to results from recent largescale phylogenomic studies, which strongly support the clade Olfactores, i.e., the grouping of tunicates with craniates (jawed vertebrates + cyclostomes) to the exclusion of cephalochordates [40][41][42][43]. The sister position of tunicates relative to the remaining Bilateria is generally interpreted as a long-branch attraction artifact caused by the peculiarities of tunicate mitogenomic evolution in terms of both lineage-specific evolutionary rate and amino-acid composition [36,37,39].
In our mitogenomic dataset, the potential occurrence of compositional biases was explored through a Principal Component Analysis (PCA) of amino acid composition of the 54 taxa ( Figure 4). This statistical analysis shows that tunicates have a markedly heterogeneous amino acid composition that is clearly distinct from most other sampled taxa. Moreover, tunicates appear extremely divergent from the other chordate representatives. Tunicates and jawed-vertebrates are located at the extreme opposite sides of the graph, with cephalochordates in an intermediate position. There is thus a strong compositional heterogeneity in our amino-acid dataset that might cause phylogenetic artefacts if it is not specifically accounted for in models of sequence evolution.
The maximum likelihood tree obtained on the 13 proteincoding genes concatenation using the site-and timehomogeneous mtREV+Γ 4 model of amino-acid sequence evolution places tunicates as the sister group of the remaining Bilateria, as previously reported [35,36,39]. However, the bootstrap support for such a position is low (BP = 61). This result is presumably the outcome of two types of systematic biases: compositional heterogeneity of the tunicate lineage compared to other chordate and animal phyla (see Figure 4) combined with accelerated evolutionary rates [37,39]. ML phylogenetic analyses were also conducted under an empirical site-heterogeneous CAT mixture model [49] which has been shown to alleviate long-branch attraction artefacts in the context of animal phylogenomics [50]. The ML tree inferred under this model shows a dramatic topological shift in placing tunicates as the sister-group of craniates within monophyletic chordates and deuterostomes, in agreement with phylogenomic studies [35,41]. However, this phylogenetic placement is not statistically supported (BP < 50).
In order to further explore the effect of model misspecification on mitogenomics tree inference, Bayesian analyses have been conducted using a range of amino-acid sequence evolution models ( Table 2). There is a clear distinction between the results obtained using site-homogeneous models (mtREV+Γ 4 and GTR+Γ 4 ) on the one side, and variants of the CAT+Γ 4 site-heterogeneous model [51] and the site-and time-heterogeneous CAT+BP+Γ 4 model [52], on the other side. Site-homogeneous models moderately support a sister-group relationship between tunicates and Xenoturbella (PP ≥ 0.83), whereas the different CAT mixture models support the grouping of tunicates with craniates (i.e., monophyly of Olfactores) with the highest Bayesian posterior probabilities obtained using the most complex and better-fitting models (PP ≥ 0.94). These results exemplify the impact of the model of sequence evolution on mitogenomic inference of phylogeny by showing that the new chordate phylogeny [35,40,41] can be corroborated by using a model that accounts for both compositional and evolutionary rate heterogeneities.
The Bayesian consensus tree obtained under the site-and time-heterogeneous CAT+BP+Γ 4 model is detailed in Figure 5. The overall phylogenetic picture is consistent with the common tripartite structure of Bilateria phylogeny [53], with Protostomia and Deuterostomia, and Protostomia being further divided into Lophotrochozoa and Ecdysozoa. Strong statistical support is obtained for the monophyly of Protostomia (PP = 1), and for both of its two major lineages Lophotrochozoa and Ecdysozoa (PP ≥ 0.99). The monophyly of Deuterostomia is less strongly supported however (PP = 0.89), in agreement with recent mitogenomic [37] and phylogenomic results [41,54]. Within Deuterostomia, Echinodermata and Hemichordata were both retrieved as firmly monophyletic (PP = 1) and are strongly grouped into Ambulacraria, as suggested by early mitogenomic studies [55] and later confirmed by phylogenomics [35,41,43]. However, the position of Xenoturbella within Deuterostomia remains unresolved in our analysis (PP < 0.5) as also found in the latest mitogenomic analysis [37], whereas phylogenomics support its sister-group relationship with Ambulacraria into a clade named Xenambulacraria [35,41]. Strong statistical support is obtained for the respective monophyly of the three chordate groups: Cephalochordata (PP = 1), Tunicata (PP = 1), and Craniata (PP = 0.99). The monophyly of Chordata is strongly supported (PP = 0.96) for the first time in a mitogenomic study, as is the monophyly of Olfactores (PP = 0.94). These results are fully congruent with the growing body of evidence coming from phylogenomic studies that revealed the unexpected sister-group relationship between tunicates and craniates [35,39,40,42,43].
Although removal of fast-evolving gene and taxa has been found to recover the new chordate phylogeny, such an approach is not successful to solve relationships within Tunicata [39]. Our mitogenomic analysis provided strong evidence for intra-tunicate relationships, with the overall phylogenetic scheme being well resolved despite the high lineage-specific evolutionary rate ( Figure 5). The inferred relationships for the group were notably in concordance with nuclear-based phylogenies and morphological data. More precisely, Herdmania was unambiguously retrieved in a sister-group relationship with Halocynthia (PP = 1), in agreement with the systematic classification that places these two ascidian genera within the stolidobranch family Pyuridae [33]. Firm evidence was similarly obtained for the respective grouping of the congeneric species of Ciona and Phallusia (PP = 1), and high posterior probability is retrieved for a sister-group relationship of the two genera into Phlebobranchia (PP = 1). Finally, the thaliacean Doliolum is found to branch within ascidians with solid statistical support (PP = 0.99), favoring the hypothesis of Ascidiacea paraphyly [20] as also supported by 18S rRNA phylogenies [32,34,56].

Compared evolutionary rates of rRNA-and protein-coding genes
A detailed analysis of lineage-specific evolutionary rate was performed by inferring branch lengths on the Bayesian consensus reference phylogeny (see Figure 5). The Gnathostomata comparison of branch lengths obtained on the proteinand rRNA-coding mitochondrial partitions, first demonstrates that tunicates are clearly the fastest evolving species of the metazoan dataset on both partitions ( Figure 6). It also shows that the rRNA-coding partition evolves on average about three times more slowly than the proteincoding gene partition (TBL ratio = 34.79/11.76 = 2.96). However, in contrast to what has been observed in snake mitochondrial genomes [38,57], there is very good correlation between root-to-tip distances inferred from the two partitions (R 2 = 0.88). These results reveal that tunicate mitochondrial genomes have experienced a drastic acceleration of evolutionary rate that affects both protein-coding genes and ribosomal-RNA genes. This latter observation seems therefore to exclude the occurrence of adaptive evolution episodes in protein-coding genes, recently reported for snake and agamid lizard mitochondrial genomes [58], as the ultimate cause behind the elevated substitution rates in tunicate mitochondrial genomes. Nevertheless, the lineage-specific rate acceleration of tunicates probably explains the difficulties previously encountered in reconstructing their phylogenetic position based on mitogenomic data.

Conclusion
The complete mitochondrial genome of the ascidian H. momus shares several features with the other tunicate mtD-NAs so far sequenced: (i) all genes are encoded on the same strand; (ii) two additional tRNAs, when compared with vertebrates, are present (one for Gly, one for Met); (iii) the gene order is highly rearranged, and differs from all other known tunicates, with the noticeable synteny dis-ruption of the cox2--cytb gene block; and (iv) the evolutionary rate is elevated and involves both protein-and rRNA-coding genes. From the phylogenetic viewpoint, the analysis of the concatenated mitochondrial proteins recovers the tunicates + craniates clade within monophyletic chordates, provided that site-and time-heterogeneous mixture models of amino acid replacements are used.

DNA sequencing and gene annotation
Primers were designed to match conserved regions of chordate genomes in the cox1, cox2, cox3, and cytb genes ( Table S1 in Additional File 1). PCR amplifications were performed using the ExTaq polymerase (TaKaRa) in 25 μL reaction mixture according to manufacturer's instructions and were conducted in two steps. A first amplification was performed with external primers followed by a re-amplification of the initial PCR product using different sets of internal primers. Based on the partial sequences obtained for the aforementioned genes, longer specific primers were subsequently designed. This allowed us to determine the cox3-cox1 region and to redesign (long) primers in the nd3 gene located in the same region.
The complete mitochondrial genome was finally amplified in two overlapping fragments using the Long and Accurate (LA) Taq polymerase (TaKaRa) in 25 μL reaction mixture according to manufacturer's instructions. The first fragment covered the cox3-cytb region (~7,700 bp) and was obtained by PCR amplification using the external primers HMO_Cox3_Long_F1 5'-ACTGTTGTTTTACTTAG TTCGGGAGTTACTGTGAC-3' and HMO_Cob_Long_R1 5'-ACATATAAGCGACCCCCAACAAAAAGAAAC-3' followed by a re-amplification with the primers HMO_Cox3_Long_F2 5'-AGTTTTGGTGGAGGGCTTAT-GGGATTAGTTTGGAC-3' and HMO_Cob_Long_R2 5'- The second fragment encompassed the cytb-nd3 region (~8,900 bp) and was similarly acquired by a first PCR amplification using the external primers HMO_Cob_Long_F1 5'-AGGGGTGCTACTTTAACGCG-GTTTTACAC-3' and HMO_Cox1_Long_R2 5'-ACATTATA-CAACTGCCCATCTCCAATTACCTG-3' and a subsequent re-amplification using the primers HMO_Cob_Long_F2 5'-AGTGGTTTTGTCTTTGGTGCATTTGGTTTTC-3' and NADH3_R 5'-CTGAATAGAATAACCCTCGCTATCACT-3'. The two fragments overlapped over ~150 bp at 5' end of the cytb and over ~780 bp at the cox3-nd3 region. The sequencing of the long PCR products was performed by the company MACROGEN (Seoul, Korea) using shotgun sequencing (i.e., shearing of the PCR fragment, bacterial library construction, and sequencing of the clones). Con-Phylogeny of Metazoa inferred from the concatenation of the 13 mitochondrial proteins

Deuterostomia
Ambulacraria Craniata tigs were assembled using both ARACHNE [60] and Chromaspro (Technilysium Pty. Ltd.) programs. Regions of low coverage in the assembly (i.e., fewer than three clones) were confirmed by primer-walking sequencing (~4 kb of the genome was re-sequenced) with the use of specific primers and genomic DNA as template. The complete mtDNA sequence of H. momus was deposited under the EMBL accession number FN296153.
Protein coding genes were identified using the ORF-Finder tool of NCBI [61] and the DOGMA server [62]. Following Iannelli et al. [13], boundaries of protein coding gene were inferred in such a way that overlap between genes was minimized and similarity between tunicate proteins was maximized. ATG and non-standard initiation codons [63] were considered as reliable start codons.
Comparison of lineage-specific evolutionary rates Figure 6 Comparison of lineage-specific evolutionary rates. The graph is a plot of the 54 root-to-tip distances calculated from branch lengths estimated under the GTR+Γ 8 model on the reference topology (see Figure 5) for the concatenations of rRNAcoding (y-axis) and protein-coding (x-axis) mitochondrial genes. Symbols corresponding to non-bilaterian outgroups (green squares), tunicates (red triangles), and all other bilaterians (blue diamonds) are indicated. The phylograms showing the branch lengths inferred from each dataset are presented with the highly evolving tunicates figured in red. The linear regression dotted line is figured with its associated equation and correlation coefficient. The y = x line is also indicated to underline the difference in average evolutionary rate between the rRNA and protein mitochondrial partitions.

Other bilaterians
Tunicates Both the tRNAscan-SE 1.23 program and the DOGMA server were used to identify and infer the secondary structures of tRNA genes [64]. tRNAs that were not identified by the above tools were sought by folding all putative non-coding regions using the Mfold server [65]. All predicted cloverleaf secondary structures were manually checked and compared with other known ascidian tRNAs. The positions of the small (srRNA) and large (lrRNA) ribosomal RNA genes were identified by the DOGMA server and confirmed based on sequence similarity searches to orthologous genes in other ascidians. Finally, the boundaries of the rRNA genes were inferred from the flanking genes.

Dataset assembly
The phylogenetic dataset was built upon the taxon sampling used by Bourlat et al. [35]. We expanded the tunicate sampling to seven by including the newly-sequenced H. momus and two recently published genomes from the genus Phallusia [13]. The snake representative (Boa constrictor) was excluded because snake mitochondrial genomes have been shown to be the subject of adaptive evolution that can severely bias phylogenetic inference [58]. The final dataset therefore comprises 54 taxa including 31 deuterostomes, 19 protostomes, and four non-bilaterian outgroups (cf. Additional file 1). The nucleotide sequences of all mitochondrial genes were retrieved from the Organellar Genome Retrieval (OGRe) database [66]. For the 13 protein coding genes, sequences were translated, and aligned at the amino-acid level using MAFFT with default parameters [67]. Ambiguously aligned sites were identified and removed from all individual genes separately, using the program Gblocks [68] with the following parameters: minimum number of sequences for a conserved position = 18; minimum number of sequences for a flanking position = 29; maximum number of contiguous nonconserved positions = 8; minimum length of a block = 2; allowed gap positions = with half. The concatenation of the 13 proteins yielded a phylogenetic dataset including 54 taxa and 2136 unambiguously aligned amino-acid sites, of which 1697 were variable (Additional file 2).

Phylogenetic analyses
Phylogenetic analyses were conducted using Maximum Likelihood (ML) and Bayesian Inference (BI) reconstruction approaches on the concatenated amino acid dataset. Maximum Likelihood (ML) analyses were performed using the program PHYML 3 [69] under the models mtREV+Γ 4 and CAT+Γ 4 , with a 4-category Gamma (Γ 4 ) distribution of the among-site amino acid replacement rate heterogeneity [70], and with the number of CAT categories set to 20 (C20) as recommended by Le et al. [49]. The heuristic ML searches were conducted by performing Subtree Pruning and Regrafting (SPR) moves on a Neigh-bor-Joining (NJ) starting tree. Statistical support was estimated by Bootstrap resampling with 100 pseudoreplicates generated by the program SeqBoot of the PHYLIP package [71]. In all replicates, ML analyses were performed using PHYML through the same heuristic search strategy. Nodal bootstrap supports (BS) were obtained from the 50% majority rule consensus of the 100 reconstructed trees using the program TREEFINDER [72].
Bayesian inference was conducted using the program Phy-loBayes 3.1 [73] under both homogeneous amino acid models (mtREV+Γ 4 and GTR+Γ 4 ) and variants of the siteheterogeneous CAT+Γ 4 mixture model (CAT+Poisson+Γ 4 , CAT+mtREV+Γ 4 , and CAT+GTR+Γ 4 ) [51]. The program nhPhyloBayes [74] was used to performed Bayesian analysis under a site-and time-heterogeneous model which combines the break-point approach (BP) in order to model variations of amino acid replacement rates along branches and the CAT mixture model in order to account for site-wise variations of these rates. In this CAT+BP+Γ 4 analysis, the number of categories of the mixture component was fixed at 60 (C60), and the biochemical profiles were those inferred by Le et al. [49], rather than being estimated.
In Bayesian analyses under mtREV+Γ 4 , GTR+Γ 4 , and under the different variants of CAT+Γ 4 (CAT+Poisson+Γ 4 , CAT+mtREV+Γ 4 , and CAT+GTR+Γ 4 ), three independent Markov Chain Monte Carlo (MCMC) were run in parallel, whereas five MCMC were run for analyses under CAT+BP+Γ 4 . Each MCMC was launched from a random initialization and for a large number of cycles with parameters and trees saved every cycle. Priors were set to values as described in Blanquart and Lartillot [52] concerning the CAT+BP model, and in Lartillot et al. [73] concerning all other applied models. Convergence of MCMC was checked by monitoring the marginal likelihood through cycles. Bayesian Posterior Probabilities (PP) were obtained from the trees sampled during the stationary phase of the different MCMC.

Branch length analysis
Two concatenated nucleotide datasets were constructed to compare evolutionary rates of protein-coding and rRNAcoding mitochondrial genes. In both cases, the different genes were individually aligned using MAFFT, and ambiguously aligned sites were removed using Gblocks before building each concatenation. This led to a 13 protein-coding gene dataset containing 8,130 nucleotide sites and a rRNA dataset including 806 sites (Additional files 3 and 4). Using the rooted topology obtained from the analysis of the amino-acid sequences under the CAT+BP+Γ 4 mixture model as a reference, ML branch lengths were optimized under the GTR+Γ 8 model using PAUP* 4.0b10 [75] on both the protein-coding and rRNA-coding nucleotide concatenations. For each dataset, the root-to-tip distance was calculated for each of the 54 taxa by summing the branch lengths on the path going from the root of the tree to each of its terminals. The two sets of root-to-tip distances were then compared graphically to evaluate the lineage-specific rate correlation between the two mitochondrial partitions.