Comparative mitogenome analyses uncover mitogenome features and phylogenetic implications of the subfamily Cobitinae

Background Loaches of Cobitinae, widely distributed in Eurasian continent, have high economic, ornamental and scientific value. However, the phylogeny of Cobitinae fishes within genera or family level remains complex and controversial. Up to now, about 60 Cobitinae mitogenomes had been deposited in GenBank, but their integrated characteristics were not elaborated. Results In this study, we sequenced and analyzed the complete mitogenomes of a female Cobits macrostigma. Then we conducted a comparative mitogenome analysis and revealed the conserved and unique characteristics of 58 Cobitinae mitogenomes, including C. macrostigma. Cobitinae mitogenomes display highly conserved tRNA secondary structure, overlaps and non-coding intergenic spacers. In addition, distinct base compositions were observed among different genus and significantly negative linear correlation between AT% and AT-skew were found among Cobitinae, genus Cobitis and Pangio mitogenomes, respectively. A specific 3 bp insertion (GCA) in the atp8-atp6 overlap was identified as a unique feature of loaches, compared to other Cypriniformes fish. Additionally, all protein coding genes underwent a strong purifying selection. Phylogenetic analysis strongly supported the paraphyly of Cobitis and polyphyly of Misgurnus. The strict molecular clock predicted that Cobitinae might have split into northern and southern lineages in the late Eocene (42.11 Ma), furthermore, mtDNA introgression might occur (14.40 Ma) between ancestral species of Cobitis and ancestral species of Misgurnus. Conclusions The current study represents the first comparative mitogenomic and phylogenetic analyses within Cobitinae and provides new insights into the mitogenome features and evolution of fishes belonging to the cobitinae family. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07360-w.


Background
Vertebrate mitogenome is a small (16-17 kb) and circular double-stranded molecule [1]. It contains 37 genes including 22 tRNA genes, 13 PCGs and two rRNA genes [1]. It also has two noncoding regions, O L and CR, and the latter contains regulatory elements for controlling the transcription and replication of mtDNA molecule [2,3]. Due to its unique features, such as high copy numbers in tissues, simple genomic organization, maternal inheritance, almost unambiguous orthology, haploid inheritance and high nucleotide substitution rate [4][5][6], mitogenome has been widely applied in species identification, i.e., DNA barcoding, as well as population genetics, conservation biology, molecular phylogenetics and evolutionary processes [7][8][9][10][11][12][13]. Gene arrangements of fish mitogenomes are generally conserved, only with a few exceptions [1]. However, the genome sequence length, the bias of base composition and start/stop codon, the overlap and IGSs are diverse among different species [14].
Cobitinae is a subfamily of Cobitidae that was first identified by Hora (1932). To date, it contains 214 species recorded in FishBase, covering 21 genera, such as Cobits, Misgurnus and Paramisgurnus [15]. Loaches of subfamily Cobitinae are bottom-dwelling fishes and widely distributed in Eurasian continent. They usually possess high economic, ornamental and scientific research value. Loach commercial farming, including cobitid loach (M. anguillicaudatus) and large-scale loach (P. dabryanus), occupies a significant position in freshwater aquaculture of Asia, due to their enjoyable taste, high nutritional value, rapid growth and strong adaptation [16][17][18]. In China, loach is used as a diet therapy or folk remedy for patient's recovery or treatment of many diseases, such as hepatitis, osteomyeitis, carbuncles, and cancers. Many Cobitis populations are mixed diploid-polyploid, even bisexual and unisexual forms co-existing in the same niche [19][20][21]. They are suitable as models to reveal the relationship among hybridization, polyploidization, reproduction, speciation and evolution [21][22][23]. Due to their great diversity, they are also used to trace the biogeographic history of freshwater systems and to reflect geologic events [24]. Cobitinae fishes usually inhabit various benthic habitats in rivers, lakes, streams and ponds [25]. However, dilapidation of the ecological environment has led to a decrease of benthic organisms [26,27]. Cobitinae fishes are seriously threatened and their wild populations are gradually decreasing [28]. On this account, the diversity of these benthic fishes have been used as a bioindicator to assess the quality of the ecological environment [29,30]. In addition, many Cobitinae species, such as the "kuhli loaches", are well-known in Southeast Asia and Europe as ornamental fish for their varied morphological patterns and the ability to ingest bottom organic residues.
Cobitinae fishes are difficult to be classified because of their morphological similarity and high plasticity in morphology [31]. Although the secondary sexual dimorphism is used to define genera, it is not always congruent with the current genera definitions. The molecular phylogeny of Cobitinae fishes has been studied at the genera or family level via one or two mitochondrial and/or nuclear genes [24,[31][32][33][34][35][36], and remains complex and controversial. For example, based on mitochondrial gene cytb and nuclear gene rag-1, Perdices et al. (2016) [37] reconstructed the phylogenetic relationship of Northern Clade of family Cobitidae that inhabit in Europe, and North and Northwest parts of Asia. The subfamily Cobitinae was divided into Cobitis sensu lato group (Cobitis, Iksookimia, Niwaella and Kichulchoia), Misgurnus sensu lato group (Misgurnus, Paramisgurnus and Koreocobitis), Microcobitis, and Sabanejewia. Although the monophyly of the groups were resolved, the relationships within the groups are discordant with current taxonomic status.
Highly conserved tRNAs secondary structure, overlaps and non-coding intergenic spacers among Cobitinae mitogenomes Cobitinae mitogenomes range from 16,337 bp (L. annandalei) to 16,647 bp (M. anguillicaudatus and C. takatsuensis) in length (Table 1). Their gene composition, gene arrangement and strand bias are highly conserved ( Fig.1 and Table 2). Among the 22 tRNAs, due to the absence of DHU arm, tRNA ser(AGN) (S1) is the only one that is not folded into the typical clover-leaf secondary structure (Fig. 2a). In the Cobitinae mitogenomes, unmatched base pairs are widespread among tRNAs. Taking C. macrostigma as an example, there are 446 base pairs among the 22 tRNAs, and only one gene (tRNA Leu(-
We also compared the gene overlaps and IGSs among 58 Cobitinae mitogenomes. Two long overlaps (atp8-atp6 and nd4l-nd4) and two long IGSs (O L and tRNA Asp -cox2) were found in Cobitinae mitogenomes. Highly conserved motifs "ATGCTAA" and "ATGGCAA-TAA" were found in the overlapped junctions between nd4l and nd4, and between atp8 and atp6, respectively (Fig. 3a). There are also several small overlaps between adjacent tRNA genes, such as tRNA Ile -tRNA Gln and tRNA Thr -tRNA Pro . O L is located within the five gene cluster (WANCY) ( Table 2, Fig.1) and its secondary structure shows a stable stem-loop hairpin, which is strengthened by six C-G base pairs (Fig. 2b). Among the 31 bp of O L , the C-G base pairs on stems are highly conserved while the loops in the middle are variable (Fig.  3b). Another long IGS, between tRNA Asp and cox2, is also conserved in the 5′ and 3′ end, and highly variable in the middle.

Usage bias of start and stop codon, codon distributions and relative synonymous codons in Cobitinae mitogenomes
The typical start codon ATG is conservative and is used in 12 PCGs, while GTG is only used in cox1 in 98% (57/ 58) analyzed Cobitinae mitogenomes except one individual of M. anguillicaudatus (No. 11) (Fig. 4, Supplementary Table 3). Five types of stop codons were found, containing three canonical (TAA, TAG and AGA) and two truncated stop codons (TA-and T--) (Fig. 4). The two truncated termination codons are used in nd2, cox2, atp6, cox3, nd3, nd4 and cytb, the 3′ -ends of which are followed by a tRNA gene encoded with the same strand. The codon distribution and relative synonymous codon usage (RSCU) of 58 Cobitinae mitogenomes were analyzed. Our results show that codon distribution is largely coincident among these Cobitinae mitogenomes (Supplementary Figure S1). As shown by six representative species of Cobitinae, the codons encoding Leu (CUN) , Ala and Thr are the three most frequently present, while those encoding Cys are rare (Fig. 5a). Compared to the other five Cobitinae species, P. anguillaris uses more codons of Leu (CUN) and less codons of Leu (UUR) . The patterns of RSCU are also consistent among the analyzed species (Fig. 5b). Degenerated codons are biased to use more A/T than G/C in the 3rd position of PCGs, which results in the content of A + T is higher than G + C in the 3rd position of Cobitinae PCGs. For example, the codons for Arginine CCA and the codes for Tryptophan UGU are prevalent, while their other synonymous codons are relatively less used.

A + T %, AT-skew and their linear correlations of Cobitinae mitogenomes
The A + T content and AT-skew of whole mitogenomes, PCGs, tRNAs, rRNAs and CR were calculated ( Fig. 6ab). The 58 Cobitinae mitogenomes all exhibit AT bias, and the A + T content is the lowest (54.8 ± 0.6%) in tRNAs and the highest (66.3 ± 0.9%) in CR (Fig. 6a, Supplementary Table 2). The AT-skew values are the largest and positive in rRNAs, while they are the smallest in  PCGs and most are negative except Canthophrys gongota, Acantopsis choirorhynchos, P. cuneovirgata, P. kuhlii, P. oblonga, and Kottelatlimia pristes (Fig. 6, Supplementary Table 2). These results indicate that PCGs are biased towards using T not A in most Cobitinae mitogenomes. To examine whether the A + T content and AT-skew are different in three codon position of PCGs, we also selected the six Cobitinae species for a more detailed analysis. The A + T content shows 1st < 2nd <3rd in the three position of PCGs in all analyzed fishes. Meanwhile, the AT-skew of 1st and 3rd are positive while 2nd is negative (Table 3). This is due to the bias usage of relative synonymous codons (Fig. 5b). In all analyzed Cobitinae mitogenomes, CRs possess more A and C with all AT-skew values positive (0.002-0.112) and GC-skew negative (− 0.341−− 0.101) (Supplementary Table 2). The correlations of Cobitinae mitogenomes (y A1 = − 0.0166x -0.9047, R 2 = 0.5991) genus Cobits (y A2 = − 0.012x + 0.5786, R 2 = 0.5197) and Pangio (y A3 = = − 0.0466x + 2.5813, R 2 = 0.5486) were calculated between A + T % versus AT-skew. All of them showed negative linear correlations, implying that AT-skew becomes more positive with the increasing of A + T content (Fig. 6c). The similar negative linear correlations were also found in G + C % versus GC-skew (Fig. 6d).

Non-synonymous and synonymous substitutions
To better understand the role of selective pressure and evolutionary relations of Cobitinae fishes, the ω or dN/ dS value of each PCG was calculated (Fig. 7). All the PCGs evolved under a purifying selection (ω < 0.5). The atp8 gene showed the highest ω value (ω = 0.12) and the cox family genes were lowest (ω = 0.02 ± 0.01). This phenomenon is also found in most Metazoa [56], but the fold change (> 10 fold) is particularly high in Cobitinae. The lower ω value represents less variations in amino acids. Thus, cox1, cox3 and cytb are potential barcoding markers for Cobitinae species identification.

Phylogenetic analysis of Cobitinae fishes
Molecular phylogenetic analyses were performed using 13 PCGs from 58 Cobitinae mitogenomes, belonging to 41 species from 14 genera. The ML and BI analyses generated similar topology with high bootstrap support / posterior probability values. Each tree was similarly divided into two main clades: Cobitis-Misgurnus-other genera (clade I) and Pangio-Lepidocephalichthys-other genera (clade II) ( Fig. 8 and Supplementary Figure S2). Clade I included all analyzed species of Cobitis, Paramisgurnus and Misgurnus, and five species from other genus (I. longicorpa, K. multifasciata, N. delicata, K. naktongensis, and Microcobitis sp.). Four Pangio species, five Lepidocephalichthys species and other five species (K. pristes, A. choirorhynchos, A. gracilentus, L. macrochir, and C. gongota) were clustered into Clade II, among which the analyzed species of genus Pangio and Lepidocephalichthys formed two well-supported (pp = 1.00) monophyletic groups respectively. In addition, Pangio is the sister genus to Lepidocephalichthys.
The BI phylogenetic tree confirmed that Cobitis was a paraphyletic group, since Misgurnus clade A, N. delicate, I. longicorpa, and K. multifasciata

Divergence time estimation of Cobitinae fishes
The combination of strict clock model and Yule process tree prior provided the best fit to the data sets (Supplementary Table 4). The chronogram with divergence time of Cobitinae lineages was estimated based on the cytB mutation rate (0.68% per million years) (Fig. 9)   southern lineages, Pangio was estimated to have occurred about 20.14-29.88 Ma, and the divergence times of the four species analyzed in this study are congruent with the previous described dating [24].

Discussion
In this study, we conducted a comparative mitogenome analysis and revealed the conserved and unique characteristics of 58 Cobitinae mitogenomes. Cobitinae mitogenomes display highly conserved tRNA secondary structure, overlaps and non-coding intergenic spacers. Among the 22 tRNAs, tRNA ser(AGN) (S1) is the only one that is not folded into the typical clover-leaf secondary structure (Fig. 2a). Loss of stem in S1 is common character among Cobitinae and other metazoan mitogenomes [57,58]. Similarly, the widespread unmatched base pairs among Cobitinae tRNAs is also a conserved feature in the eukaryote mitogenome [59][60][61]. Although their functions are not clear in fish, the unmatched base pairs are considered as the current state of evolutionary and irreversible process, which might be caused by tRNA editing [62]. Like other cyprinid fishes [14,63], two long overlaps and two long IGSs were found in Cobitinae mitogenomes. The motif "ATGCTAA" in nd4l-nd4 was conserved in vertebrates, including fish, turtle and human IGSs are important for transcription and associated with gene rearrangement in insects [70][71][72]. It is commonly assumed that IGS had a rapid nucleotide substitution rate under relaxed selection [73]. Moreover, Cobitinae mitogenomes share highly conserved sequences in IGSs that are immediately adjacent to tRNAs, such as "CTTTCCCGCC", "AAGGCGGGA" and "AGC". Whether these conserved sequences have a function or not and how they act awaits further investigation. As the longest IGSs, CR plays an important role in controlling the transcription and replication of mtDNA molecule by several domains and motifs [74,75]. Although significant length variation were found in CR of vertebrate [76], the three domains can also be recognized in Cobitinae mitogenomes. Furthermore, the AT-skew and GC-skew of CR might reflect the strand asymmetry [77][78][79]. In teleost, the skew inversion of CR was only found in the mitogenomes of Albula glossodonta and Bathygadus antrode, showing a reversed strand asymmetry [75]. The normal Cobitinae mitogenomes CR skewness indicates that the strand asymmetry is not reversed. The phylogenetic analyses show the monophyly of the genus Pangio and Lepidocephalichthys, consistent with the previous study [35]. However, Cobitis, the biggest genus of Cobitinae [15], is a complex and controversial  paraphyletic group. Similar to the trees constructed by cyt b [25,80,81], Iksookimia, Kichulchoia and Niwaella species were nested within Cobitis, implying a close relationship among them. Perdices [37] proposed that these species of Iksookimia, Kichulchoia, and Niwaella might belong to genus Cobitis, as morphologically specialized species derived from a local Cobitis species. However, this assumption awaits more morphological, karyological and molecular investigation. In addition, our phylogenetic analysis confirmed the assumption that M. mizolepis and P. dabryanus are conspecific [33,80] and the different lineages under the species name C. striata and C. takatsuensis might actually represent different species. The species of Misgurnus were separated into two independent clade and clustered into Cobitis species and P. dabryanus-K. naktongensis, respectively. The same results were observed in the trees based on the cyt b [80] and 13 PCGs from 28 cobitidae species [47]. However, all Misgurnus and Koreocobitis species were grouped into a monophyletic clade when their phylogenetic relationships were constructed by nuclear gene rag-1 [80]. This incongruity between mitochondrial and nuclear gene trees was explained by the different evolutionary rate of markers, hybridization or introgression [82]. It is commonly believed that hybridization and subsequent mtDNA introgression might occur between ancestral species of Cobitis and ancestral species of Misgurnus [35,80]. In this study, we collected 14 mitogenomes from M. anguillicaudatus, which were divided into two genetically divergent clades. The similar phenomenon has been reported by several previous studies, which is explained by hybridization and mtDNA introgression [34,35,47,83,84]. Considering that M. anguillicaudatus clustered into the clade of Misgurnus and Koreocobitis by nuclear analyses [80], we supposed that the 12 mitogenomes (No. 1-12) of M. anguillicaudatus in Misgurnus clade A could be considered as the introgressed mtDNA type because of their close relationship with Cobitis species, whereas the other two individuals in Misgurnus clade B retained the original M. anguillicaudatus mitogenomes. M. anguillicaudatus with introgressed mtDNA type spread over most of East Asia, including China, Japan and Korea. M. anguillicaudatus shows extensive ploidy variability in nature. Besides most common diploid individuals (2n = 50), triploid (3n = 75) and tetraploid (4n =100) have been frequently recorded in some localities of China and Japan [21,47,85,86]. Rare pentaploid (5n = 125) and even hexaploid (6n = 150) individuals were found in the Yangtze River basin [87]. All of M. anguillicaudatus polyploids analyzed in this study belonged to the introgressed mtDNA type. Since mtDNA is inherited maternally, these polyploids might have originated from the diploid M. anguillicaudatus with introgressed mtDNA. Further analyses are needed to confirm this hypothesis of inter-genus mtDNA introgression based on a large-scale sampling with quantitative morphological features, definite ploidy, and more genes from both mitochondria and nuclear genomes.
The first split of Cobitinae lineages was estimated to have occurred in the late Eocene (42.11 Ma, 95% HPD: 36.35-47.86 Ma), separating northern clade and southern lineages, consistent with reconstruction dates of the paleo-drainages of East Asia [35,88]. Cobitinae fishes in Clade I and Clade II, nominated as "northern clade" and "southern lineages" respectively, show a distinct disjunctive distribution with a small area of sympatry in Vietnam [35]. Consistent with their locations, the northern clade

Phylogenetic analyses
The phylogenetic analysis was performed based on 13 PCGs of 58 Cobitinae mitogenomes. Sinorhodeus microlepis and Rhodeus shitaiensis were chosen as the outgroups ( Table 1). Each of the 13 gene sequences was separately aligned using Muscle v3.8.31 [96] and concatenated into a sequence matrix by PhyloSuite v1.2.2 [97]. Then PartitionFinder2 [98] was used to find the best partitioning strategy and to calculate the best-fit evolutionary models for each subset. For the alignment, a scheme with eight partitions was selected and GTR + G + I was chosen as the best-fit evolutionary model for each partition. Phylogenetic trees were constructed by the maximum likelihood (ML) method and bayesian inference (BI). The ML method was implemented in RAxML v8.2.12 [99]. Each partition scheme was run with the GTRGAMMAI model, and 1000 rapid bootstrapping replications were set to evaluate the bootstrap support values and search for the best-scoring ML tree. The BI phylogeny was performed in MrBayes v3.1.2 [100] with the "unlink" and "prest ratepr = variable" model parameters. 10,000,000 generations were run in two independent runs of four independent Markov Chain Monte Carlo (MCMC) chains, and were sampled every 1000 generations. The convergence of the BI analyses was investigated using Tracer v1.7.1 software. The first 2500 trees were discarded as conservative burn-in, and the rests were used to generate a majority rule consensus tree.
In cobitid fishes, 0.680% (divergence per pairwise comparison per Ma) was calculated and suggested for the mutation rates of cytb gene [32]. In this study, BEAST v1.10.4 [101] was used to estimate the divergence time with the rate (0.68%). GTR + G + I was chosen as the best fit model by PartitionFinder2 [98]. The best-fit clock type and tree prior were selected from two clock models (strict clock and uncorrelated relaxed clock) and four tree priors (Yule process, Exponential growth, Constant size and Bayesian skyline) by comparing the marginal likelihood values estimated by path sampling [102]. The analyses were simultaneously run for 20,000,000 generations, with parameters sampled every 1000, then the first 25% of the trees were discarded as burn-in. Tracer v1.5 [103] and Figtree were used to assess the convergence and view trees, respectively.