- Research article
- Open access
- Published:
Mitochondrial genomes of the early land plant lineage liverworts (Marchantiophyta): conserved genome structure, and ongoing low frequency recombination
BMC Genomics volume 20, Article number: 953 (2019)
Abstract
Background
In contrast to the highly labile mitochondrial (mt) genomes of vascular plants, the architecture and composition of mt genomes within the main lineages of bryophytes appear stable and invariant. The available mt genomes of 18 liverwort accessions representing nine genera and five orders are syntenous except for Gymnomitrion concinnatum whose genome is characterized by two rearrangements. Here, we expanded the number of assembled liverwort mt genomes to 47, broadening the sampling to 31 genera and 10 orders spanning much of the phylogenetic breadth of liverworts to further test whether the evolution of the liverwort mitogenome is overall static.
Results
Liverwort mt genomes range in size from 147 Kb in Jungermanniales (clade B) to 185 Kb in Marchantiopsida, mainly due to the size variation of intergenic spacers and number of introns. All newly assembled liverwort mt genomes hold a conserved set of genes, but vary considerably in their intron content. The loss of introns in liverwort mt genomes might be explained by localized retroprocessing events. Liverwort mt genomes are strictly syntenous in genome structure with no structural variant detected in our newly assembled mt genomes. However, by screening the paired-end reads, we do find rare cases of recombination, which means multiple concurrent genome structures may exist in the vegetative tissues of liverworts. Our phylogenetic analyses of the nuclear encoded double stand break repair protein families revealed liverwort-specific subfamilies expansions.
Conclusions
The low repeat recombination level, selection, along with the intensified nuclear surveillance, might together shape the structural evolution of liverwort mt genomes.
Background
Mitochondrial (mt) genomes of vascular plants are highly variable in size (i.e., from ca. 66 Kb in Viscum scurruloideum [1] to 11.3 Mb in Silene conica [2]), and in gene content (i.e., 13 to 64 [3], excluding duplicated genes and open reading frames (ORFs)). By contrast, the mitogenomes within the three bryophyte lineages are conserved in terms of gene content, and exhibit rather narrow size variation [4]. The 40 mt genomes from 29 moss genera (e.g., [5, 6]) are typically smaller (101–141 Kb, median ~ 107 Kb) than those of other land plants, and constantly encode 40 protein-coding genes (PCGs), 24 tRNAs, and 3 rRNAs. The mt genomes of hornworts sampled from four genera [7,8,9] are larger in size (185–242 Kb) but smaller in gene content (21–23 PCGs, 18–23 tRNAs, 3 rRNAs). The 18 mt genomes from seven liverwort genera [10,11,12,13,14] are intermediate in size between those of mosses and hornworts (142–187 Kb, median ~ 164 Kb) and may comprise more genes (i.e., 39–42 PCGs, 25–27 tRNAs, and 3 rRNAs).
Vascular plant mt genomes contain a variable set of introns, ranging in numbers from three (Viscum album [3]) to 37 (Selaginella moellendorffii [15]), with an average of 21. Bryophyte mt genomes tend to hold more introns than those of vascular plants, i.e., 28–38 (average 33) in hornworts [7], 26 or 27 in mosses [5], and 23–30 (average 28) in liverworts [13]. Each major land plant lineage contains a number of unique mt introns, and not a single intron is shared across all mt genomes of land plants [16]. As a result, the intron content is conserved within but not among land plant lineages [13, 16], suggesting that vast intron gains and losses happened during the early evolution of land plants, and conservative evolution maintained their stable content in the descendant lineages. Although each bryophyte group appears to hold a stable set of introns that parallels their conserved evolution of mt genomes of overall structure, recent small-scale mitogenomic studies provided evidences for distinct intron losses in leafy liverworts, and suggested retroprocessing as the likely causes [13]. Comprehensive studies with expanded taxon samplings are still needed to provide the framework for reconstructing the evolution of introns in liverwort mt genomes, and to assess the underlying causes for their variation in number.
Mt genomes of vascular plants hold abundant repeated sequences, including some large repeats (> 1000 bp), more medium sized repeats (100–1000 bp), and numerous small repeats (50–100 bp). Increased repeat length and identity facilitate intragenomic recombinations [16], and may hence account for the fluid genome structure of vascular mt genomes. In contrast, the mt genomes of the three bryophyte lineages usually have fewer repeated sequences and are structurally more stable. The mt genome of mosses contains only few and moreover small repeated sequences and is hence structurally nearly static [5]. Hornworts contain relatively more repeated sequences, which may explain the few rearrangements distinguishing the four assembled mitogenomes [7]. Finally, the mt genome of liverworts contains repeated sequences of intermediate abundance and size between those of mosses and hornworts [10]. The structure of their mt genome is highly conserved except for that of Gymnomitrion concinnatum [11], for which two inversions are needed to restore collinearity with the other liverwort mt genomes.
The nuclear encoded double-strand break repair (DSBR) proteins are involved in the suppression of the recombination between repeated DNA sequences [17]. Mutation of DSBR proteins in model organisms, such as Arabidopsis and Physcomitrella, often results in increased rearrangements between repeated sequences in organellar genomes that can impact the function of plastids and/or mitochondria [18,19,20,21]. The evolution of plant mt genome structure is therefore shaped by intrinsic parameters, such as repeat-mediated recombination, and extrinsic ones, such as nuclear DSBR proteins [17]. The structure of the mt genome of bryophytes is considered highly conserved and stable [5] but the possible underlying causes such as the repeat recombination level and the evolution of the nuclear encoded DSBR genes remain unexplored. At present, only representatives of five of the currently recognized 15 orders of liverworts have seen their mitogenome assembled [22]. Here we broadened the sampling to exemplars of 10 orders to 1) test whether the mt gene content and 2) the structure of the mt genome is conserved across a broader phylogenetic breadth of liverworts; 3) investigate the repeat recombination rates in liverwort mitochondrion; 4) explore the possible intrinsic and extrinsic factors responsible for the structural evolutionary pattern of liverwort mt genomes.
Results and Discussion
Liverwort mitogenomes are small in size but abundant in genes
Assemblies of high throughput sequences from total DNA extracts yielded complete circular mt genomes for 29 liverworts, representing 27 genera, 23 families, and 10 orders, of which five orders were newly sampled for their mt genomes. The sequencing depth ranges from 98 to 756 × (Additional file 1: Table S1). Including the published mt genomes from 13 species, complete liverwort mt genomes are thus available from 37 species, 31 genera, and 10 orders (Additional file 2: Table S2).
On average, liverwort mt genomes are smaller (~ 167 Kb) than those of vascular plants (average, ~ 476 Kb) or hornworts (average, ~ 197 Kb), but larger than those of mosses (average, ~ 107 Kb). Among liverworts, complex thalloids hold the largest (average, ~ 185 Kb) mt genomes, about 1.3 times the size of those of the Jungermanniopsida clade B that have the smallest mt genomes (average, ~ 147 Kb). Thus, variation in liverwort mitogenome size is much narrower compared to that in angiosperms (i.e., ~ 200-fold range). The trend in the evolution of mitogenome size during the diversification of liverworts is ambiguous (Fig. 1a), compared to a distinct increase during the evolution of land plants (Fig. 1b). The rather stable exome is consistently the smallest component (~ 23%) of liverwort mt genomes (~ 37 Kb). Consequently, changes in total mitogenome size in liverworts are shaped by the variation in intron content and, as shown in other land plants, intergenic spacer size (Fig. 1b).
The mitochondrial genome of liverworts spanning a broad phylogenetic breadth (Additional file 3: Figure S1) holds a nearly constant and large set of PCGs (39–43, Additional file 4: Figure S2). Treubia possesses the smallest set of mt PCGs (i.e., 39) among liverworts, and lacks all Cytochrome c genes (ccmB, ccmC, ccmFN and ccmFC), a characteristic first reported by Liu et al. [23] and confirmed here based on a new accession. A complete set of Cytochrome c genes is also lacking in the mitogenome of hornworts [7], lycophytes [24], and algae [16]. The nad7 gene is widespread among algae, mosses, and vascular plants, but is pseudogenized in liverworts, except in the Haplomitriopsida (Haplomitrium and Treubia) that hold the intact and potentially functional nad7, lending support to the hypothesis of the earliest divergence of this lineage in the diversification of liverworts [25]. Ribosomal protein genes have been transferred from the mitochondria to the nucleus many times during the evolution of angiosperms [26]. By contrast, all liverwort mt genome share an identical set of ribosomal protein genes, including rps8 and rps10 that rarely occur in other land plant mt genomes (Additional file 5: Table S3). Liverwort mt genomes also encode a rich set of RNA genes among land plants (Additional file 6: Figure S3), including three rRNA genes (rrn5, rrn18 and rrn26) and 25–27 tRNA genes (i.e., ± trnRucg and trnTggu). Among land plants (Additional file 5: Table S3), trnRucg is restricted to the complex thalloid liverworts, and trnTggu occurs in only few liverwort lineages (i.e., Blasia, Marchantia and Haplomitrium), as well as in mosses and hornworts. Overall, the gene content of mitogenomes seems to be highly conserved during the evolution of liverworts, which spans for at least 400 million years [27].
Intron content of liverwort mitogenomes is much variable than that of mosses
Liverwort mt genomes hold a rich intron set, ranging from 23 introns in some leafy species to 32 in all the complex thalloid species, with an average of 28 introns (Fig. 2). Species of Haplomitriopsida and Jungermanniopsida possess an average of 29 and 25 introns, respectively. Simple thalloid Pelliidae contains an average of 28 introns, whereas Metzgeriidae hold an average of 31 introns. Leafy Jungermanniidae show most cases of intron losses, especially in the mt genomes of Frullania, Radula, clade B and C of Jungermanniales (Fig. 2). Among the 17 genes disrupted by at least one intron, six (i.e., atp1, cob, cox1, nad4L, rrn18, and rrn26) vary in their intron content (Fig. 2). The three introns of cob gene are commonly present in liverworts, except for 3′ end intron cobi824g2 that is absent from Radula and clade A of Jungermanniales. The nine introns of cox1 gene are shared by all thalloid liverworts, and the first three at the 5′ end (cox1i44g2, cox1i178g2, and cox1i375g1) by almost all liverworts, but the remaining six variably lost among leafy liverworts. The respiratory chain complex I (or nad) genes possess a stable intron content, except for nad4L, for which one intron (nad4Li100g2) is lacking in a few Jungermanniales (clade B, and two species from clade A: Plagiochila, Heteroscyphus), and another one (nad4Li283g2) in Treubia. The liverwort unique rrn18 group II intron (rrn18i1065g2) is present only in Haplomitrium and complex thalloids. Two introns were found in the ribosomal gene rrn26, and intron rrn26i827g2 is present in all liverworts but clade B of the Jungermanniales, and rrn26i2352g1, a group I intron, is here newly reported and only observed in Haplomitrium.
Based on our phylogeny reconstructed from a concatenated nucleotide sequences of 41 mitochondrial protein-coding genes (Fig. 2), the early diverging lineages Haplomitriopsida and Marchantiopsida tend to hold a more complete set of introns compared to the derived clades (especially clade B of Jungermanniales) that seem to have undergone parallel reduction in intron content. The alignments (Additional file 7: Figure S4; Additional file 8: Figure S5) of the four protein coding genes with intron number variations and RNA editing site distributions in liverworts support localized retroprocessing events [13] as the most possible causes for the intron loss phenomenon observed in liverworts. First, all exons are intact and the lost introns appear to be precisely cut off from the splicing site in liverwort mt PCGs. Moreover, liverwort intron losses are more frequent toward the 3′ end of genes (i.e., cob, cox1), which is a strong characteristic of retroprocessing induced intron losses [28]. Furthermore, in some cases, the intron losses in liverworts seem to be accompanied by RNA editing site losses near the exon-intron splicing boundaries.
Intergenic spacer variations mirror the genome size variations in liverwort mitogenomes
Intergenic spacers compose the bulk of the mt genome of land plants, accounting, for example, for about 80% in vascular plant mt genome size [16]. Intergenic spacers comprise repeated sequences [29], sequences transferred from plastid [30] or nuclear genomes [31], and DNA fragments horizontally transferred from foreign donors [32, 33]. In liverworts, the intergenic spacers constitute the largest portion of mt genomes (average, ~ 81 Kb, ~ 49%), matching the combined exonic (~ 37 Kb, ~ 24%) and intronic (~ 44 Kb, ~ 27%) components (Table 1). The pattern in spacer length variation parallels that of genome size variation in liverworts (Fig. 1a). On average the complex thalloid mt genomes hold the largest spacer component (~ 91 Kb or ~ 49%), followed by that of Haplomitrium (~ 89 Kb or ~ 50%). The average spacer size decreases to ~ 83 Kb (~ 48%) in Pelliidae, 79 Kb (~ 50%) in Porellales, and ~ 74 Kb (~ 45%) in Metzgeriidae. In the Jungermanniales, the average spacer size is relatively small but varies between 84 Kb (~ 51%), 69 Kb (~ 46%), and 82 Kb (~ 51%) in clade A, B, C, respectively (Additional file 2: Table S2).
The spacer region of liverwort mt genomes is composed of ORFs, pseudogene fragments, nuclear homologous sequences, dispersed repeated sequences, SSRs, and other non-coding sequences of unknown origin. Plastid-derived or horizontally transferred DNA sequences are seemingly lacking in liverwort mt genomes (Table 1), as they were in those of mosses [5]. Nuclear derived sequences generally make up the largest component with an average of 20%, ranging from 13% in Haplomitrium to over 30% in the Marchantiopsida. Considering that we used the nuclear genome of Marchantia polymorpha, the only available nuclear genome for liverworts, as the reference for the blast search, it is not surprising that Marchantia polymorpha mt spacers returned the highest percentage hit (31%) on homologous sequences. The fast evolving/long branch lineage Haplomitrium (13%) and some simple thalloid taxa (such as Aneura and Riccardia) show the lowest percentage (i.e., 15%) of homologous sequence hits. Generally, the total size of the nuclear homologous sequence content in liverworts (average, ~ 15.7 Kb) is comparable to that in mosses (average, ~ 14.7 Kb), but the percentage relative to the total spacer size is only half that of mosses (average, ~ 42% [5]), which reflects the smaller size of moss mt spacers. ORFs usually make up the second largest part of the liverwort mt spacers, with an average ratio of 16%, ranging from 12% in some Jungermanniales species to 26% in Haplomitrium. The ratio of ORF to spacer size is smallest in the Marchantiopsida (average, ~ 13%), followed by the Jungermanniopsida (average, ~ 15%), then simple thalloids (average, ~ 17%). The ORF content in liverworts is distinctively larger than in mosses (i.e., 5%). As in mosses, SSR sequences compose generally less than 1% of the combined spacer region in liverwort mt genomes. Whereas moss mitogenomes lack repeated sequences in their intergenic spacers, liverworts hold on average 4% of repeated sequences in their intergenic spacer regions, with only ~ 2% in the Haplomitriopsida and ~ 7% in the Marchantiopsida. The relatively higher repeated sequence content might allow for higher potential for mt genome recombination, as a positive correlation between the number of repeated sequences and the gene rearrangements has been suggested [5].
Liverwort mitogenomes are conserved in gene order despite repeat mediated recombinations
Repeated sequences play an important role in plant mt genome structure stability since they can mediate intragenomic homologous recombinations leading to inversions and translocations of genomic regions [1, 16, 34]. Except for the mitogenome of Gymnomitrion concinnatum of the Jungermanniales [11], all remaining available liverwort mt genomes (i.e., 46) share exactly the same gene order (Fig. 3). The mt genome of G. concinnatum needs two inversions to make it collinear with the other liverwort mt genomes, which is in stark contrast to the situation in vascular plants wherein any two mt genomes require on average 31 rearrangements to gain collinearity [5]. The stable mt genome structure of liverworts might suggest either very low recombination level in liverwort mitochondrion and/or intensified nuclear surveillance over repeat recombinations.
Compared to the compact moss mt genomes with only a few small repeats shorter than 100 bp, liverwort mt genomes are much inflated, containing on average 14 pairs of small repeated sequences of 50–100 bp, and 14 pairs of medium-sized repeats of 100–900 bp (Additional file 9: Table S4). We examined the recombination rates of all repeats (535 pairs, Additional file 10: Table S5) within the 50–250 bp range for all the 29 samples and found recombination evidence for 26 repeats from 16 species (Table 2), 40% of the liverwort species show no evidence of recombination. The mitogenomes of nine leafy liverworts appear to recombine more frequently than those of complex thalloids (three species) and simple thalloids (two species). The two representatives of the early diverging Haplomitriopsida have recombinants detected. The repeats actively involved in recombination have an average size of the 103 bp with ten repeats (38%) exceeding 100 bp in length. Recombination rates range from 0.67 to 60% with a median value of 4.45%. About two thirds of the recombination events were mediated by small repeats (i.e., shorter than 100 bp) with a median recombination rate of 11.46%. Repeat length and recombination rate are not positively correlated. About half (12 out of 26) of the repeat mediated recombinations cause gene order changes and direct repeat recombinations affect genes more often than their inverted counterparts (62.5% vs 30%). Most (eight out of 10) inverted and four out of 16 direct repeat recombinations cause gene order changes. These gene order changes could give rise to alternative genome conformations, which, if they occurred in germ cells would be passed on to the offspring [35].
Although empirical studies suggest that repeats longer than 50 bp and with an identity above 85% may mediate recombinations [17, 34], the recombination activity of repeats is actually positively correlated with the length of repeated sequences, with small repeats (< 100 bp) rarely inducing recombinations [36, 37]. The detection of repeat recombinations for two thirds of liverwort species with an average of two active repeats per species might indicate repeat recombinations occurred, but in low frequencies. Considering the average of five repeats (per species) longer than 250 bp not investigated for recombinations, it is very likely that some of these larger repeats (250–900 bp) may also allow for recombination. As recombination and structure fluidity are supposed to be positively correlated [1, 4, 5], the stable mt genome structure across liverwort diversity that spans over than 400 myr [27] is surprising given that liverwort mt genomes indeed recombine and alternative genome conformations coexisted. The apparent paradox of structural stasis of the mitogenome during the evolution of liverworts despite evidence of ongoing recombination may be addressed in the following three contexts. First, as some authors reported the adaptive value of repeats and recombinations in plant mitochondrion [37], recombinations happen more frequently when the cells are in stress and/or under some environmental stimuli [38], it is likely that the low recombinations observed in liverworts might primarily happen in the old vegetative cells rather than in young differentiating cells and germ cells, therefore the liverwort progeny inherited the master circle conformation. Second, if repeat recombinations occurred in the reproductive cells, those recombinants with genes affected or genes missing might be selected against. Recent studies in Drosophila also suggested a generalizable mechanism for selection against deleterious alternative mt configurations: ATP selection after mitochondrion fragmentation can retain those mtDNA fragments that contain the ‘correct’ mt genome with the complete set of the genes [39]. Nevertheless, those alternative conformations with altered gene order but no genes affected may survive this ATP selection and might possibly lead to offspring with rearranged gene order. Finally, It is also possible that maintaining the organization of genes is essential to the transcription of polycistronic operons in liverworts and mosses [5], hence an identical gene order is selected across all liverwort lineages despite the existence of alternative genome conformations.
Liverwort mt genomes might be under intensified nuclear surveillance
The structural stability of liverwort mt genomes, in accordance with the remarkable structural conservatism of liverwort plastomes [40], might also be shaped by nuclear encoded DSBR proteins that suppress the error-prone ectopic recombinations across small direct repeats during DSBR in mitochondrion and plastid [41]. We here characterized six frequently reported DSBR gene families in the transcriptome assemblies of 125 land plant representatives (Additional file 11: Table S6). Functional studies have confirmed the mtDNA repair function of the moss orthologs in Physcomitrella patens for four of these gene families, RecA, RecG, RecX, and MSH1. In our phylogenetic analyses (Additional file 12: Figure S6), four DSBR gene families (RecA, RecG, RecX, and OSB) show notable liverwort-specific subfamily expansions, suggesting peculiar evolutionary pattern of DNA repair mechanisms in liverworts. The subcellular localization of these DSBR proteins (as predicted by TargetP) yielded ambiguous results for most of these DSBR proteins, including those expanded ones, suggesting either truncated protein sequences or incomplete query database. Although the subcellular localization (as predicted by TargetP), and the in vivo function of these expanded liverwort gene family members remain elusive, we could not rule out the possibility that these liverwort specific expansions might also contribute to the mtDNA sequence stability of liverworts. Specifically, selective pressure might have driven the functional divergence of DSBR gene families under different evolutionary constraints, and that those diversified DSBR proteins may exhibit a wider range of characteristics and specificities, which possibly help intensify the DNA repair mechanisms in liverworts.
Conclusions
Based on the assembly of mitochondrial genomes for a widely expanded sample of liverworts spanning over 400 million years of evolution [27], we a) confirm that the mitogenome of liverworts is highly conserved in both gene content and gene order, while having a large and variable set of introns (23–32 introns) subject to a series of localized retroprocessing events, b) reveal ongoing intragenomic recombination mediated by small repeats and c) show liverwort specific DSBR subfamily expansions in four gene families. Although the mt genomes are relatively conserved in each of the three major bryophyte lineages (i.e., liverworts, mosses and hornworts), they differ in the content of repeat sequences and frequency of rearrangements, suggesting that the mechanisms maintaining mitogenome structure differ among these lineages. The conserved evolution of liverwort mt genomes might be explained by low recombination level, selection, and intensified nuclear surveillance.
Methods
Materials and DNA extractions
Wild samples of liverwort materials of 29 accessions were collected in field trips from China (Xizang, Sichuan, Yunnan, Fujian, and Guangzhou Provinces), Madagascar, the United States (Connecticut), Vietnam, and New Zealand (Additional file 1: Table S1). No specific permissions were needed on current study of these samples. These samples were studied and identified by Qiang He, and the voucher specimens were deposited in SZG (Herbarium of Shenzhen Fairy lake Botanical Garden, Shenzhen, China). Our sampling spreads across the liverwort phylogeny, representing 10 of the 15 orders of extant liverworts. Each sample was cleaned with distilled water and dried using lab paper. The clean shoots of individual samples were isolated under the dissecting microscope and used for DNA extractions using the modified CTAB methods [42]. The DNA quality and quantity of each sample were examined using 1% Agarose gel electrophoresis and Nanodrop 2000 spectrophotometer.
Mitochondrial genome sequencing and assembly
For genomic DNA sequencing, 1 μg high quality genomic DNA was sheared using the Covaris M220 (Woburn, MA, USA), DNA fragments of 300–500 bp were used to generate sequencing libraries using the Illumina TruSeq™ DNA PCR-free library preparation kit (Illumina, CA, USA) following the manufacturer’s instructions. The libraries were paired-end (2 × 150 bp) sequenced on an Illumina HiSeq 2000 sequencing platform at the WuXiNextCode (Shanghai, China). Approximately 10 Gb sequencing data were generated for each sample. The raw NGS data were trimmed and filtered for adaptors, low quality reads, undersized inserts, and duplicate reads using Trimmomatic [43]. The filtered reads for each species were de novo assembled using CLC Genomics Workbench v5.5 (CLC Bio, Aarhus, Denmark). All assembled contigs were blasted to the Marchantia polymorpha mt genome (GenBank accession: NC_001660) to identify mt contigs. As the genomes of the three cellular compartments are significantly different in copy numbers and hence read coverage in sequencing [44], the read depth for mt contigs is distinctly lower than that of plastid contigs, but significantly higher than that of nuclear contigs [45]. For each species, every resulting mt contigs (usually with one to five per species) was first checked for read depth to ensure they are in a similar range, and then each of these mt contig was elongated at both ends in Geneious v10.0.2 (Biomatters, New Zealand) using Bowtie2 [46] till their ends overlapped with one another by at least 5000 bp. Altogether, 29 circular mt genomes were assembled for liverworts. Finally, the corresponding genomic reads were mapped back to the complete mitochondrial genomes to double check for sequencing depth (Additional file 1: Table S1) of the assembled mt genomes. All the 29 liverwort mt genomes newly assembled in this study received constant depth along the mt genomes.
Genome annotation and comparative analysis
The mt genomes were annotated following the steps described by Li et al. [9] and Xue et al. [8]. In summary, the protein-coding and rRNA genes were annotated by Blastn searches of the non-redundant database of National Center for Biotechnology Information (NCBI). The exact gene and exon/intron boundaries were further verified in Geneious v10.0.2 (Biomatters, New Zealand) by aligning orthologous genes from the published annotated liverwort mitochondrial genomes. The tRNA genes were annotated using tRNAscan-SE v2.0 [47]. Mitochondrial RNA editing sites were obtained from our previous study on liverwort organellar RNA editing [48]. We summarized and compared the structural evolution of all the formally published (as by June 2019) liverwort mt genomes (Additional file 2: Table S2) from the gff3 annotation files. As accessions from the same species/genus show very similar characters in mt genomes in all aspects, directly averaging the values across all organisms would bias towards better-represented lineages. Therefore, to avoid such biases we calculated the average values under three different strategies, if multiple accessions of a species exist: (1) we average them first and then calculate the average value among species (e.g., genome/exon/intron length, and gene/intron number); (2) similarly for among genera, we averaged the values of species, and (3) for among clades, we averaged the value of each genus.
Repeats and recombination analysis
Repeat sequences are annotated and analyzed (Additional file 9: Table S4) using the python tool developed by Wynn & Christensen [37]. Considering at least 50 bp matches at both ends of the insert fragment and an average size of insert fragments being 350 bp in our sequencing libraries, we can estimate the recombination rate of all non-overlapping repeats between 50 bp and 250 bp. Altogether, we analyzed 534 repeats for the 29 liverwort mt genomes (Additional file 10: Table S5) following Dong et al. [36]. Specifically, for each repeat pair, we built four or eight reference sequences, each with 200 bp up- and down-stream of the two template sequences (original sequences), and the two (for repeat pair with identity = 100) or six (for repeat pair with identity < 100) recombined sequences (alternative configurations) constructed from the putative recombination products. All the recombined templates were blasted against the corresponding mt genome sequences to identify those on the genome. Then, we searched the reference sequences against the corresponding paired-end reads database, and counted the number of matching read pairs with a blast identity above 98%, and a hit coverage of at least 50 bp in both flanking regions of each repeat sequence. After that, the best matched read pairs for all the recombinants were extracted and mapped to the corresponding mt genomes using Bowtie2 [46] and the resultant bam file was visualized in Geneious v10.0.2 (Biomatters, New Zealand) to confirm the existence of the alternative configurations in reads.
Gene alignment and phylogenetic reconstruction
For the phylogenetic reconstruction, sequences of each of the 41 mitochondrial genes were aligned using TranslatorX [49], which first translates the nucleotide (nt) sequence into amino acids (aa) using the standard, universal genetic code, and then uses MAFFT [50] to create an amino acid alignment. The alignment is further trimmed for ambiguous portions by GBLOCKS [51] with the least stringent settings. The cleaned aa alignment is then used as a guide to generate the nt sequence alignment. The resultant nt alignments of individual genes were then combined as one dataset in Geneious v10.0.2 (Biomatters, New Zealand). The concatenated nt dataset was analyzed by the maximum likelihood (ML) method as implemented in the RAxML software [52] with codon-partitioned GTRGAMMA model. Branch support for each internode was evaluated with 300 bootstrap approximation (Additional file 3: Figure S1). The mitochondrial phylogeny of liverworts is mostly congruent with that of the plastid phylogeny [53].
Identification of DSBR proteins and phylogenetic analysis
We used HMMER [54] to conduct profile hidden Markov model (HMM) searches using the Pfam RecA (PF00154), SSB (PF00436), Whirly (PF08536), RecX (PF02631), MutS_V (PF00488), DEAD and Helicase_C (PF00270, PF00271) domains as queries to search the annotated proteins from 125 plant species (Additional file 11: Table S6) for characterization of the gene family members of RecA, OSB, Why, RecX, MSH, and RecG, respectively. For gene loci with multiple isoforms predicted, the primary isoform was used if primary isoform annotation is available; otherwise the longest protein was used. We considered sequences with Pfam domain identified by HMMER with an E-value of 1e-6, and the hit alignment length of at least 50% of the domain length to be candidate proteins. Those candidate proteins from the HMMER search were further manually confirmed using the SMART [55] and Pfam [56] databases. The proteins sequences were filtered for redundancy using CD-hit [57] with a similarity threshold of 0.95, aligned using MATTF [50], and trimmed for ambiguous portions by GBLOCKS with the least stringent settings [51]. The final protein alignment were analyzed by the maximum likelihood (ML) method as implemented in the IQ-TREE software [58] with 1000 ultrafast bootstrap replicates. The subcellular localizations of these DSBR proteins were predicted using online TargetP 1.1 server [59].
Availability of data and materials
Newly generated mitochondrial genomes of 29 liverworts have been deposited in GenBank under the accession numbers MK230925–MK230962. The raw genomic NGS read data have been deposited in the Short Read Archive (SRA) database of NCBI under the study number SRP170808. Other supporting results are included within the article and its additional files.
Abbreviations
- aa:
-
Amino acid
- DSBR:
-
Double strand break repair
- ML:
-
Maximum likelihood
- Mt genome:
-
Mitochondrial genome
- mtDNA:
-
Mitochondrial DNA
- nt:
-
Nucleotide
- ORFs:
-
Open reading frames
- PCGs:
-
Protein-coding genes
- rRNAs:
-
Ribosomal RNAs
- SZG:
-
Herbarium of Shenzhen Fairy lake Botanical Garden, Shenzhen, China
- tRNAs:
-
Transfer RNAs
References
Skippington E, Barkman TJ, Rice DW, Palmer JD. Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes. Proc Natl Acad Sci U S A. 2015;112(27):E3515–24.
Sloan DB, Muller K, McCauley DE, Taylor DR, Storchova H. Intraspecific variation in mitochondrial genome sequence, structure, and gene content in Silene vulgaris, an angiosperm with pervasive cytoplasmic male sterility. New Phytol. 2012;196(4):1228–39.
Petersen G, Cuenca A, Møller IM, Seberg O. Massive gene loss in mistletoe (Viscum, viscaceae) mitochondria. Sci Rep. 2015;5(1):17588.
Liu Y, Wang B, Li L, Qiu YL, Xue J. Conservative and dynamic evolution of mitochondrial genomes in early land plants. Genomics Chloroplasts Mitochondria Springer Neth. 2012;35:159–74.
Liu Y, Medina R, Goffinet B. 350 my of mitochondrial genome stasis in mosses, an early land plant lineage. Mol Biol Evol. 2014;31(10):2586–91.
Vigalondo B, Yang L, Draper I, Lara F, Garilleti R, Mazimpaka V, Goffinet B. Comparing three complete mitochondrial genomes of the moss genus Orthotrichum Hedw. Mitochondrial DNA Part B Resour. 2016;1(1):179–81.
Dong S, Xue JY, Zhang S, Zhang L, Wu H, Chen Z, Liu Y. Complete mitochondrial genome sequence of Anthoceros angustus: conservative evolution of the mitogenomes in hornworts. Bryologist. 2018;121(1):14–23.
Xue JY, Liu Y, Li L, Wang B, Qiu YL. The complete mitochondrial genome sequence of the hornwort Phaeoceros laevis: retention of many ancient pseudogenes and conservative evolution of mitochondrial genomes in hornworts. Curr Genet. 2009;56(1):53–61.
Li L, Wang B, Liu Y, Qiu YL. The complete mitochondrial genome sequence of the hornwort Megaceros aenigmaticus shows a mixed mode of conservative yet dynamic evolution in early land plant mitochondrial genomes. J Mol Evol. 2009;68(6):665–78.
Dong S, He Q, Zhang S, Wu H, Goffinet B, Liu Y. The mitochondrial genomes of Bazzania tridens and Riccardia planiflora further confirms conservative evolution of mitogenomes in liverworts. Bryologist. 2019;122:130–9.
Myszczyński K, Górski P, Ślipiko M, Sawicki J. Sequencing of organellar genomes of Gymnomitrion concinnatum (Jungermanniales) revealed the first exception in the structure and gene order of evolutionary stable liverworts mitogenomes. BMC Plant Biol. 2018;18:321.
Oda K, Yamato K, Ohta E, Nakamura Y, Takemura M, Nozato N, Akashi K, Kanegae T, Ogura Y, Kohchi T, et al. Gene organization deduced from the complete sequence of liverwort Marchantia polymorpha mitochondrial DNA: a primitive form of plant mitochondrial genome. J Mol Biol. 1992;223(1):1–7.
Ślipiko M, Myszczyński K, Buczkowska-Chmielewska K, Bączkiewicz A, Szczecińska M, Sawicki J. Comparative analysis of four Calypogeia species revealed unexpected change in evolutionarily-stable liverwort mitogenomes. Genes-Basel. 2017;8(12):395.
Wang B, Xue J, Li L, Yang L, Qiu YL. The complete mitochondrial genome sequence of the liverwort Pleurozia purpurea, reveals extremely conservative mitochondrial genome evolution in liverworts. Curr Genet. 2009;55(6):601–9.
Hecht J, Grewe F, Knoop V. Extreme RNA editing in coding islands and abundant microsatellites in repeat sequences of Selaginella moellendorffii mitochondria: the root of frequent plant mtDNA recombination in early tracheophytes. Genome Biol Evol. 2011;3:344–58.
Mower JP, Sloan DB, Alverson AJ. Plant mitochondrial genome diversity: The genomics revolution. Heidelberg: Springer Vienna; 2012.
Marechal A, Brisson N. Recombination and the maintenance of plant organelle genome stability. New Phytol. 2010;186(2):299–317.
Abdelnoor RV, Christensen AC, Mohammed S, Munoz-Castillo B, Moriyama H, Mackenzie SA. Mitochondrial genome dynamics in plants and animals: Convergent gene fusions of a MutS homologue. J Mol Evol. 2006;63:165–73.
Zaegel V, Guermann B, Le Ret M, Andres C, Meyer D, Erhardt M, Canaday J, Gualberto JM, Imbault P. The plant-specific ssDNA binding protein OSB1 is involved in the stoichiometric transmission of mitochondrial DNA in Arabidopsis. Plant Cell. 2006;18:3548–63.
Shedge V, Arrieta-Montiel M, Christensen AC, Mackenzie SA. Plant mitochondrial recombination surveillance requires unusual RecA and MutS homologs. Plant Cell. 2007;19:1251–64.
Marechal A, Parent JS, Veronneau-Lafortune F, Joyeux A, Lang BF, Brisson N. Whirly proteins maintain plastid genome stability in Arabidopsis. Proc Natl Acad Sci U S A. 2009;106:14693–8.
Söderström L, Hagborg A, Konrat MV, Bartholomew-Began S, Bell D, Briscoe L, Brown E, Cargill DC, Costa DP, Crandall-Stotler B, et al. World checklist of hornworts and liverworts. Phytokeys. 2016;59(59):1–828.
Liu Y, Xue JY, Wang B, Li L, Qiu YL. The mitochondrial genomes of the early land plants Treubia lacunosa and Anomodon rugelii: Dynamic and conservative evolution. PLoS One. 2011;6:e25836.
Grewe F, Herres S, Viehover P, Polsakiewicz M, Weisshaar B, Knoop V. A unique transcriptome: 1782 positions of RNA editing alter 1406 codon identities in mitochondrial mRNAs of the lycophyte Isoetes engelmannii. Nucleic Acids Res. 2011;39(7):2890–902.
Groth-Malonek M, Knoop V. Bryophytes and other basal land plants: the mitochondrial perspective. Taxon. 2005;54(2):293–7.
Daley DO, Adams KL, Clifton R, Qualmann S, Millar AH, Palmer JD, Pratje E, Whelan J. Gene transfer from mitochondrion to nucleus: novel mechanisms for gene activation from cox2. Plant J. 2002;30(1):11–21.
Morris JL, Puttick MN, Clark JW, Edwards D, Kenrick P, Pressel S, Wellman CH, Yang Z, Schneider H, Donoghue PCJ. The timescale of early land plant evolution. Proc Natl Acad Sci U S A. 2018;115(10):E2274–83.
Cuenca A, Ross TG, Graham SW, Barrett CF, Davis JI, Seberg O, Petersen G. Localized retroprocessing as a model of intron loss in the plant mitochondrial genome. Genome Biol Evol. 2016;8(7):2176–89.
Lilly JW, Havey MJ. Small, repetitive dnas contribute significantly to the expanded mitochondrial genome of cucumber. Genetics. 2001;159(1):317–28.
Alverson AJ, Wei XX, Rice DW, Stern DB, Barry K, Palmer JD. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol Biol Evol. 2010;27(6):1436.
Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD. Origins and recombination of the bacterial-sized multichromosomal mitochondrial genome of cucumber. Plant Cell. 2011;23(7):2499–513.
Bock R. Witnessing genome evolution: experimental reconstruction of endosymbiotic and horizontal gene transfer. Annu Rev Genet. 2017;51:1–22.
Rice DW, Alverson AJ, Richardson AO, Young GJ, Sanchezpuerta MV, Munzinger J, Barry K, Boore JL, Zhang L, DePamphilis CW, et al. Horizontal transfer of entire genomes via mitochondrial fusion in the angiosperm Amborella. Science. 2013;342(6165):1468–73.
André C, Levy A, Walbot V. Small repeated sequences and the structure of plant mitochondrial genomes. Trends Genet. 1992;8:128–32.
Woloszynska M. Heteroplasmy and stoichiometric complexity of plant mitochondrial genomes–though this be madness, yet there's method in't. J Exp Bot. 2010;61(3):657–71.
Dong S, Zhao C, Chen F, Liu Y, Zhang S, Wu H, Zhang L, Liu Y. The complete mitochondrial genome of the early flowering plant Nymphaea colorata is highly repetitive with low recombination. BMC Genomics. 2018;19(1):614.
Wynn EL, Christensen AC. Repeats of unusual size in plant mitochondrial genomes: Identification, incidence and evolution. G3 (Bethesda). 2019;9(2):549–59.
Wallet C, Le Ret M, Bergdoll M, Bichara M, Dietrich A, Gualberto JM. The RECG1 DNA translocase is a key factor in recombination surveillance, repair, and segregation of the mitochondrial DNA in Arabidopsis. Plant Cell. 2015;27:2907–25.
Lieber T, Jeedigunta SP, Palozzi JM, Lehmann R, Hurd TR. Mitochondrial fragmentation drives selective removal of deleterious mtDNA in the germline. Nature. 2019;570:380.
Yu Y, Liu H, Yang J, Ma W, Pressel S, Wu Y, Schneider H. Exploring the plastid genome disparity of liverworts. J Syst Evol. 2019;57(4):382–94.
Alexandre M, Brisson N. Recombination and the maintenance of plant organelle genome stability. New Phytol. 2010;186(2):299–317.
Porebski S, Bailey LG, Bernard RB. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Report. 1997;15(1):8–15.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Dierckxsens N, Mardulyn P, Smits G. Novoplasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017;4:gkw955.
Zhang T, Zhang X, Hu S, Yu J. An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform. Plant Methods. 2011;7(1):38.
Langmead B. Fast gapped-read alignment with Bowtie2. Nat Methods. 2012;9(4):357–9.
Lowe TM, Chan PP. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 2016:gkw413.
Dong S, Zhao C, Zhang S, Wu H, Mu W, Wei T, Li N, Wan T, Liu H, Cui J, et al. The amount of RNA editing sites in liverwort organellar genomes is correlated with the GC content and PPR protein diversity. Genome Biol Evol. 2019. https://doi.org/10.1093/gbe/evz232.
Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010;38(suppl_2):W7–W13.
Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33(2):511–8.
Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007;56(4):564–77.
Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22(21):2688–90.
Yu Y, Yang J, Ma W, Pressel S, Liu H, Wu Y, Schneider H. Chloroplast phylogenomics of liverworts: a reappraisal of the backbone phylogeny of liverworts with emphasis on Ptilidiales. Cladistics. 2019;0:1–10.
Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:29–37.
Letunic I, Peer B. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 2017;46:D493–6.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:222–30.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658.
Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. IQ-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.
Emanuelsson O, Nielsen H, Brunak S, Heijne GV. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300:1005–16.
Acknowledgments
We are highly grateful to Dr. Qin Zuo and Hui Dong at the Shenzhen Fairy Lake Botanical Garden for assistance in acquiring liverwort materials. We wish to thank Dr. Qiang He and Qinghua Wang from the Institute of Botany (CAS) for the help in fieldwork and taxonomic consultation. We also thank David Glenny for providing fresh Treubia materials. We gratefully acknowledge the lab assistances by Yang Peng and Na Li at the Shenzhen Fairylake Botanical Garden.
Funding
This project is fund by the National Science Foundation of China (31470314, 31600171). The funding bodies did not participate in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
YL conceived and designed the study. YL, SD, CZ, RZ, and BG collected the liverwort materials. SD performed the experiments. SD and CZ prepared the figures. SD and YL carried out the analyses and drafted the manuscript. YL, HL, SZ, LZ and BG revised the manuscript. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Fieldwork was conducted in accordance with local legislation.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1: Table S1.
Overview of the 29 newly generated liverwort mitochondrial genomes.
Additional file 2: Table S2.
Comparison of the genome content of the 47 liverwort mitochondrial genomes.
Additional file 3: Figure S1.
ML phylogram of selected land plants with an emphasis on liverworts based on a concatenated nucleotide data set.
Additional file 4: Figure S2.
Distribution of mitochondrial protein genes in 31 liverwort genera.
Additional file 5: Table S3.
Comparison of gene content and gene order from 88 selected land plant mitochondrial genomes.
Additional file 6: Figure S3.
Distribution of mitochondrial RNA genes in 31 liverwort genera.
Additional file 7: Figure S4.
RNA editing site variations and intron losses in the cox1 gene as an exemplar.
Additional file 8: Figure S5.
RNA editing site distributions on three gene alignments in a phylogenetic context.
Additional file 9: Table S4.
Distribution of repeat sizes among liverwort mitochondrial genomes.
Additional file 10: Table S5.
Repeat recombinations for 534 repeats within 50 to 300 bp range in liverwort mitochondria.
Additional file 11: Table S6.
Taxa list for DSBR protein identification.
Additional file 12: Figure S6.
Phylogenetic trees of DSBR protein sequences from 125 Viridiplantae taxa inferred by Iqtree.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Dong, S., Zhao, C., Zhang, S. et al. Mitochondrial genomes of the early land plant lineage liverworts (Marchantiophyta): conserved genome structure, and ongoing low frequency recombination. BMC Genomics 20, 953 (2019). https://doi.org/10.1186/s12864-019-6365-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12864-019-6365-y