Rapid evolutionary divergence of Gossypium barbadense and G. hirsutum mitochondrial genomes
BMC Genomics volume 16, Article number: 770 (2015)
The mitochondrial genome from upland cotton, G. hirsutum, was previously sequenced. To elucidate the evolution of mitochondrial genomic diversity within a single genus, we sequenced the mitochondrial genome from Sea Island cotton (Gossypium barbadense L.).
Mitochondrial DNA from week-old etiolated seedlings was extracted from isolated organelles using discontinuous sucrose density gradient method. Mitochondrial genome was sequenced with Solexa using paired-end, 90 bp read. The clean reads were assembled into contigs using ABySS and finished via additional fosmid and BAC sequencing. Finally, the genome was annotated and analyzed using different softwares.
The G. barbadense (Sea Island cotton) mitochondrial genome was fully sequenced (677,434-bp) and compared to the mitogenome of upland cotton. The G. barbadense mitochondrial DNA contains seven more genes than that of upland cotton, with a total of 40 protein coding genes (excluding possible pseudogenes), 6 rRNA genes, and 29 tRNA genes. Of these 75 genes, atp1, mttB, nad4, nad9, rrn5, rrn18, and trnD(GTC)-cp were each represented by two identical copies. A single 64 kb repeat was largely responsible for the 9 % difference in genome size between the two mtDNAs. Comparison of genome structures between the two mitochondrial genomes revealed 8 rearranged syntenic regions and several large repeats. The largest repeat was missing from the master chromosome in G. hirsutum. Both mitochondrial genomes contain a duplicated copy of rps3 (rps3-2) in conjunction with a duplication of repeated sequences. Phylogenetic and divergence considerations suggest that a 544-bp fragment of rps3 was transferred to the nuclear genome shortly after divergence of the A- and D- genome diploid cottons.
These results highlight the insights to the evolution of structural variation between Sea Island and upland cotton mitochondrial genomes.
Plant mitochondrial genomes are remarkable from both evolutionary and comparative genomics stand-points. Like their animal counterparts, plant mitochondrial genomes generally are characterized as circular chromosomes  (barring notable exceptions, e.g., [2, 3]) that contain a variable number of genes interspersed within non-coding DNA; however, this simplistic generalization belies the dynamic and complex nature of plant mitochondrial genomes . Not only is the overall structure mitochondrial genomes an oversimplification of their possible morphologies [5–7], but recent comparative analyses among flowering plants have demonstrated extensive fluidity in plant mitochondrial genomes [2, 8, 9]. The structure and evolution of angiosperm mitochondrial genomes are driven by extremely high rates of recombination and rearrangement, with major rearrangements detected even in hybrid plants . Paradoxically, mitochondrial genes are among the slowest evolving, and this rate paradox can be partially explained by DNA repair mechanisms . DNA repair in the coding regions of the mitochondria is biased toward gene conversion, reducing the mutation rates within genes, whereas the more inaccurate break-induced replication (BIR) is common in the noncoding regions, leading to the expansions and rearrangements observed outside of genes [12–15]. Consequently, plant mitochondrial genomes vary remarkably both in size and composition within plant families and genera [7, 9, 16, 17], with genome sizes ranging from 30 kilobases in some algae to several megabases in certain angiosperms [2, 3, 18]. Intraspecies comparisons suggest that plant mitochondrial genomes can be highly divergent even among different varieties of the same species [19, 20], and together with the observed genomic diversity within a single order of angiosperms , further indicates the remarkable diversity in mitochondrial genomes among green plants .
Perhaps the two most surprising recent realizations regarding plant mitochondrial genome evolution are the extensive variability in mitochondrial genome size and the compositional changes that have led to this variability. Plant mitochondrial genomes vary by an amazing 870-fold, from the ultra-compact, 12 kb (12,998 bp) genome (Accession Number: NC 010357) of the alga Polytomella capuana  to the spectacularly bloated 11,319 kb genome (11,318,806 bp) of Silene conica . The evolutionary dynamics that underlie this remarkable variation are not fully understood; however, it is clear from several analyses that plant mitochondrial genomes are repositories for DNA from myriad sources . These not only include the nucleus and chloroplast genomes of the host species itself, but may also include sequences derived from the chloroplast and mitochondrial genomes of other species . Much of this sequence is large (>1 kb) and repetitive in nature , providing sufficient tracts of homology to promote the highly dynamic recombination evident in plant mitochondrial genomes [25–27]. Indeed, it is the high rates of sequence acquisition/loss and recombination that give plant mitochondrial genomes their reputation for rapid intergenic evolution, leading to low levels of non-genic homology among even closely related species [2, 8, 28]. Furthermore, this propensity for recombination can have additional intriguing consequences, such as the generation of substoichiometic recombinant molecules [29, 30], variable chromosomal structures [7, 31, 32], and novel cytoplasmic male sterility (CMS)-inducing open reading frames (ORFs) [19, 20, 33, 34].
Despite the extensive variation in sizes and structures of plant mitochondrial genomes, their coding sequences rank among the most slowly evolving genes known [35, 36]. Although considerable gene- and lineage-specific variation in rates of gene retention/loss exist for both protein and tRNA genes , most sequenced angiosperm mitochondrial genomes have ~50–60 genes, including subunits of respiratory complexes, ribosomal RNAs (rRNAs), and transfer RNAs (tRNAs) , and a variable number of pseudogenized forms and/or copies of mitochondrial genes [38–42].
Sea Island cotton (Gossypium barbadense L.) is a New World allotetraploid (2n = 52) grown in many countries because of its superior quality fiber . Upland cotton (G. hirsutum), however, is more commonly grown because it is earlier maturing and has a higher yield potential, and accordingly it now accounts for about 90 % of world fiber production. Sea Island cotton (G. barbadense) accounts for only approximately 5 % of present global commerce . In addition to its superior spinning performance and unique high quality fiber characteristics, Sea Island cotton is a potential source of genes for resistance to Verticillium wilt [45, 46]. The objective of the present study was to complement earlier efforts [47, 48] to generate a high-quality sequence of the mitochondrial genome of G. barbadense. We provide this sequence and compare it to the mitogenome of G. hirsutum , resulting in insights to the evolution of structural variation and new fields into mtDNA duplicated copy gene.
Plant materials and mitochondrial DNA extraction
Mitochondria were isolated from week-old etiolated seedlings of “Pima 90–53”, a variety of Sea Island cotton (G. barbadense L.) whose seeds were obtained from Hebei Agricultural University [40, 49]. Mitochondrial DNA was extracted from isolated organelles as reported [40, 41]. Briefly, the extraction protocol for the mtDNA of Sea Island cotton was as follows:
The seeds were planted in sand and the seedlings were kept in darkness to obtain etiolated seedlings. From these, 7 d-old etiolated seedlings were ground and used to isolate mitochondria.
Ground seedlings were collected and further purified by centrifugation in a discontinuous sucrose-density gradient (60 %, 52 %, 36 % and 20 % M/V) in purification buffer (10 mM Tris–HCl pH 7.4 and 20 mM EDTA) (Additional file 1: Figure S1)
The mitochondria band from the interface between 52 % and 36 % was carefully collected and washed with 0.3 mol · L−1 sucrose buffer to obtain the intact mitochondrial fractions.
The mitochondrail fraction was lysed in cetyltrimethyl ammonium bromide (CTAB) for release of mtDNA, and further purified by proteinase K digestion, phenol-chloroform extraction, and ethanol precipitation.
The plastid band was located in the interface between 36 % and 20 % sucrose, while the nuclei were precipitated to the bottom. PCR validation failed to detect nuclear contamination, but did detect partial contamination from plastid DNA (Additional file 2: Figure S2). To avoid contamination from chloroplast, we filtered the reads based on the sequence of Gossypium barbadense chloroplast genome before assembly.
Mitochondrial genome sequencing and assembly
Isolated Sea Island cotton mitochondrial DNA was cloned into whole-genome shotgun libraries using CopyControl Fosmid Library Production Kit (Epicentre, Cat. No. CCFOS110) and sequenced to about 700 × coverage with Solexa using paired-end, 90 bp read at Beijing Genomics Institute (BGI). Adaptor and contaminant sequences were removed from the raw reads and the clean reads were assembled using ABySS . Since nuclear and chloroplast contamination is possible in the extraction procedure, BLASTn  against nt/nr database was used to identify and remove contaminant contigs. In addition, known mitochondrial genome sequences of G. hirsutum  and G. harknessii (unpublished) were also used to identify mitochondrial-type contigs. Contigs were ordered/oriented and gaps were closed via additional fosmid and BAC sequencing. Primers representing both conserved mitochondrial genes and scaffold terminals were used to screen both a fosmid library  and a BAC library [48, 49]. Twenty fosmid clones (also previously associated with G. barbadense mitochondria; see Fig. 5 in ) and two BAC clones were selected by this PCR screen and independently sequenced by Solexa and 454 sequencing methods in BGI and Shanghai Majorbio Bio-pharm Biotechnology, respectively. The resulting clones were assembled with SOAPdenovo  and Newbler (Version 2.53), respectively; these were then used to anchor and orient the previously assembled mitochondrial contigs into supercontigs. To close the remaining gaps, the known relationships of the fosmids were used to predict the order and orientation of contigs, and the remaining gaps were filled by LA-PCR (Long and Accurate Polymerase Chain Reaction) using the primers listed in Additional file 3: Table S1. These primers were also used to verify each contig joined.
Genome annotation and sequence analysis
Mitochondrial genes were annotated as reported , using the genes annotated in the G. hirsutum mtDNA as references. Functional genes (other than tRNA genes) were identified by local blast searches against the database, whereas tRNA genes were predicted de novo using tRNAscan-SE . A genome map (Fig. 1) was generated using OGDRAW  and the repeat map was drawn by Circos .
The newly generated G. barbadense sequence was aligned to the published G. hirsutum mitochondrial sequence , and the values of dS and dN/dS were evaluated with PAML4 . PipMaker was used to identify repeated sequences within G. barbadense , and repetitive DNA from nuclear sources was identified using RepeatMasker (http://www.repeatmasker.org) and a custom, Gossypium-enriched repeat database. Dot matrix comparisons were generated between the mitochondrial genome of G. barbadense and those of Arabidopsis thaliana, Carica papaya, and G. hirsutum using the nucmer program of MUMmer with the parameters: 100-bp minimal size for exact match and 500-bp minimal interval between every two matches . We used Circos plots  to show the collinear relationships between G. barbadense and G. hirsutum mitochondrial genome sequences. Possible pseudogenes and non-functional tRNAs were predicted using previously published mitochondrial genomes, and the distribution of pseudogenes was drawn by program pheatmap in R. A phylogenetic tree was constructed based on 17 conserved mitochondrial genes (nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad9, cob, cox1, cox2, cox3, atp1, atp4, atp6, atp8, atp9) using maximum likelihood (ML) method with the model GTR + G + I in MEGA5.05 .
Results and discussion
Assembly of the complete G. barbadense mitochondrial genome
A total of 607 Mbp (sequence coverage: 867×) of clean reads was generated for the G. barbadense mitochondrial genome. These reads were initially assembled into 14 contigs (average length = 43,530 bp; putative contaminant contigs removed), ranging in size from 10,246 bp to 105,651 bp. Because repeated sequences hinder the assembly of these contigs into a single circular chromosome, 20 Fosmid clones and two BAC clones were sequenced and used to inform the order and orientation of these contigs. In addition, the previously published physical map of the G. barbadense mitochondrial genome was also used . The order and orientation of contigs was confirmed and remaining gaps were filled using PCR (Additional file 3: Table S1). The Sea Island mitochondrial genome was assembled as a 677,434 bp circular molecule with four large repeats (Fig. 1) (Genbank Accession Number KP898249), similar to an earlier prediction of mitogenome size (690–700 kb) .
Comparative analysis of G. barbadense and G. hirsutum mitochondrial genomes
The mitochondrial genomes of G. barbadense and G. hirsutum  are largely similar; however, as observed in other genera, many differences exist even between these closely related species (Table 1). The size difference between the mitochondrial genomes of G. barbadense and G. hirsutum is about 9 %, representing almost 56 kb of additional sequence in G. barbadense. In terms of nucleotide composition, the two mitochondrial genomes are almost identical, with the GC content of G. barbadense and G. hirsutum being 44.98 % and 44.95 %, respectively. Likewise, a similar number of genes were predicted for both, with G. barbadense having seven more functional genes annotated than did G. hirsutum (75 versus 68 genes, respectively; Table 2), including 4 additional protein coding genes, 2 additional rRNA genes, and one more tRNA gene, generating a slightly higher gene length in the G. barbadense than in the G. hirsutum mitochondrial genome (36.4 kb versus 31.7 kb). In total, 40 protein coding genes, 6 rRNA genes, and 29 tRNA genes were predicted for G. barbadense. Most of these genes were intact, even in the duplicate copies; however, both the sole nad1 and the rps3 copy displayed deviations from expectations for intact genes (compared to G. hirsutum, the G. barbadense mitochondrial genome contains extra nad1b and nad1c exons. The truncated rps3 is 544 bp shorter than the intact copy in G. hirsutum.).
As with the annotated genes, the amount of chloroplast-derived sequence was similar between the two mitochondrial genomes, with G. barbadense having 1.42 kb less identifiable chloroplast-derived sequence (Table 1). In G. barbadense, 19 fragments ranging from 35 bp to 2,203 bp in size, contribute 5,383 bp of sequence to the genome (>1 %; Table 1 and Additional file 4: Table S2) versus 6,833 bp in G. hirsutum. Most of the inserted sequences in both cases were either non-coding or were tRNAs. With respect to tRNAs, both have nearly the same set of tRNAs; however, G. barbadense has additional copy of trnD(GTC)-cp, but lacks one of the five conserved cp-derived tRNAs  (chloroplast-derived trnP).
Together, the differences between the two cotton mitochondrial genomes attributable to gene or chloroplast-derived sequence represent a small fraction of the difference in genome size (~5 % of the total size difference). As expected from the nature of plant mitochondrial genomes, the greatest difference was in the proportion of repeated sequences, with approximately 1.8 times more sequence in G. barbadense derived from repetitive sequences than in G. hirsutum (21.27 %). Interestingly, the amount of sequence attributable to identifiable transposable elements comprised only 17.3 % and 26.6 % of the repetitive sequences detected in the G. barbadense and G. hirsutum mitochondrial genomes, respectively. The remainder of the sequence was unclassified repetitive sequences contained within the mitochondrial genomes themselves. As with nuclear genomes, gypsy elements comprised the largest fraction of the identifiable repetitive sequences, and followed by unclassified LTR-retrotransposons and transposable elements.
The presence and distribution of short repeats also distinguished the two mitochondrial genomes, with 207 and 343 repeats larger than 19 bp in G. barbadense and G. hirsutum, respectively (Table 3). As in G. hirsutum, G. barbadense short repeats were typically small (20 bp to 39 bp) . Therefore, while the short repeats were more numerous, their small length had relatively little effect compared to the large repeats (>10 kb; average size in G. barbadense = ~28 kb) (Fig. 1 and Additional file 5: Table S3). In fact, most of the genome expansion in G. barbadense is attributable to the largest repeat (R1 = 63,904 bp), contributing a full 18.9 % of the genome, as well as several duplicated rRNA genes (rrn5 and rrn18). Such large repeats have precedence in plant mitochondrial genomes, including, for example, a 120-kb repeat in maize  and an 87-kb repeat in Beta . In total, the proportion of repeats in G. barbadense was nearly 1.5 times that of G. hirsutum (Table 3).
Syntenic regions and rearrangement
Syntenic regions were identified between G. barbadense and A. thaliana, C. papaya, and G. hirsutum, respectively. Plant mitochondrial genomes are known to experience myriad synteny-disrupting rearrangements over short evolutionarily timescales, and, reflecting this, appreciable synteny was limited to the G. barbadense - G. hirsutum comparison (Fig. 2). A set of 8 sequence blocks larger than 10 Kb with high homology (>99.8 %) were detected between the G. barbadense and G. hirsutum mitochondrial genomes, here named block 1 to block 8, respectively (Additional file 6: Table S4). The sizes of these eight syntenic blocks ranged from 33.0 kb (block 4; Fig. 3) to 131.5 kb (block 8; Fig. 3). Interestingly, after the four large repeats (R1-R4) were identified on the G. barbadense mitochondrial genome (Fig. 1 and Additional file 7: Figure S3), we also found a short direct repeat “R08” (Additional file 5: Table S3) at the ends of large repeat R1 (Fig. 1 and Additional file 7: Figure S3). Interestingly, R1 is duplicated in G. barbadense whereas it exists as single copy in G. hirsutum, suggesting either a gain in G. barbadense or a loss in G. hirsutum. Compared to the bordering syntenic block 2 and block 8 (Additional file 7: Figure S3), the small repeats at the ends of R1 that might account for the large duplication event and supply some information on the origin of R1 since the divergence from a common ancestor. It bears noting, however, that the assembled circular map likely represents only one of several possible actual configurations of the genome. Mitochondrial repeats frequently recombine, resulting in an equilibrium composed of multiple configurations (Additional file 7: Figure S3). As both species of cotton probably include several isoforms, differing by repeat-based configurations. The placement of these repeats relative to other syntenic blocks suggest there exists interspecies reorganization during the evolution of G. barbadense and G. hirsutum. Notably, however, the rearrangements detected between these two mitochondrial genomes did not disrupt gene clusters, which mostly were in syntenic regions. Further sequencing of additional cotton mitochondrial genomes will be necessary to elucidate the extent and fluidity of genomic rearrangements in cotton mitochondrial genomes.
Nucleotide-level changes in cotton mitochondrial genomes
Synonymous substitution rates (Ks values) of orthologous gene pairs serve as a useful measure of evolutionary distance . The average Ks values for 35 collinear mitochondrial gene pairs were 0.051 for either G. barbadense or G. hirsutum versus C. C. papaya (Fig. 4), about 1/10th the value for nuclear genes . These data indicate the commonly observed low mutation rates for mitochondrial genes, likely because of efficient DNA repair mechanisms [12, 13]. These data, as well as paired t-tests (P = 0.957 > 0.05) indicate that the two Gossypium mitochondrial genomes have had equal mutation rates. dN/dS ratios for six genes (nad6, ccmB, ccmFN, sdh3, sdh4, matR) in both mitochondrial genomes were greater than 1 (Table 4), suggesting that these genes may have experienced positive selection during divergence from the common ancestor of Gossypium and C. papaya.
Pseudogenes in mitochondrial genomes of land plants
As mentioned above, the suite and synteny of genes was largely conserved between G. barbadense and G. hirsutum. Likewise, both cotton genomes shared the relatively few potential pseudogenes. This is interesting because while complex I, III, IV, and V genes (nad, cox, cob, and atp genes, respectively) are generally universally conserved in land plant mitochondrial genomes , pseudogenes also are ubiquitous [64, 65]. To explore further the patterns of pseudogenization in mitochondrial genes, we analyzed all 41 currently sequenced mitochondrial genomes deposited in NCBI (Table 5). This comparison revealed that: (1) pseudogenes may arise from any category of mitochondrial genes and from the chloroplast genome; (2) the frequency of pseudogenization (Fig. 5) is highest for ribosomal protein genes, and lower for genes encoding subunits of the respiratory chain proteins. This is consistent with a prior analysis of pseudogene distribution of 41 protein-coding genes among 20 land plants mitochondrial genomes , who also reported that pseudogenes mainly occurred in complex II subunit of the respiratory chain (sdh genes) and ribosomal protein genes (rps genes and rpl genes); (3) some pseudogenes are lineage-specific (e.g., in Oryza sativa subsp. japonica and Oryza sativa subsp. indica, Table 5); and (4) the presence of multi-copy pseudogenes in some mitochondrial genomes (e.g., rpl16, atp9, rps3, etc.), as observed here and in previously  (Vitis vinifera), may indicate further duplication during pseudogene formation. Recent research has shown that some pseudogenized genes followed endogenous functional gene transfer to the nucleus  leading to the gradual mutational degradation of the corresponding mitochondrial copies. In addition, the tendency for ribosomal genes to pseudogenized more frequently may be associated with three sets of translation systems in a single cell compartment that leads to more “gene replacement” . Analysis of additional mitochondrial genomes will help illuminate these trends.
rps3 gene transfer in pieces into the Gossypium mitochondrial genome
Like Vitis, rps3 have partial duplicated copy in the mtDNA of the Gossypium. In both the mitochondrial genomes of G. barbadense and G. hirsutum, there was a duplicated copy of rps3 (rps3-2) which was nearly identical to its corresponding ortholog rps3. Horizontal gene transfer (HGT) into mitochondrial genomes is a propensity noted previously [16, 37–39, 66], however, the primary source of the divergent copy of rps3 is not from HGT but rather from the mtDNA of Gossypium itself. The full-length gene of rps3 in Gossypium is 3,401 bp, and contains two exons and one intron (Fig. 6). In both Gossypium mitochondrial genomes, however, rps3-2 is truncated at the end of the second exon (Fig. 6a). The missing part of this exon in rps3-2 was not found elsewhere in either cotton mitochondrial genome, even when using a relaxed BLAST of 1e−10 to 1e−6. To explore the possibility that the latter half of this exon was copied within the cotton mitochondrial genome and then subsequently migrated to the nucleus, we used the published genomes of G. raimondii (D5) [67, 68] and G. arboreum (A2)  as BLAST databases (with a cutoff of 1e−10). Interestingly, the latter half of exon 2 (exon 2–2; Fig. 6) was recovered from G. arboreum chromosome 5 (only), along with 755 bp of additional mitochondrial sequence derived from the flanking region of rps3 (Figs. 6 and 7). The percent identity between the intact mitochondrial sequence and the nuclear copy is ~97 %, which is similar to the average difference in non-coding regions for nuclear genes in the A- and D- genome cottons. These observations are interesting for two reasons. First, the recovery of this mitochondrial sequence from the G. arboreum (A-genome) only, which is also the model maternal progenitor for both G. barbadense and G. hirsutum , suggests that this mitochondria to nuclear transfer occurred subsequent to the divergence of the A- and D- genomes of cotton, which is estimated to have been 5–10 mya; the level of sequence divergence suggests that the transfer occurred shortly after the divergence of the A- and D- lineages. Second, the formation of rps3-2 was complex, involving both sequence duplication and intracellular transfer. As shown in Fig. 6, the sequence R2′ (28,235 bp) was duplicated (sequences in red rectangular box in Fig. 6b), including part of rps3 transferred to nuclear genome, and the remnant sequences of R2′ remained in mitochondrial genome. These remnant sequences became rps3-2 and R2, respectively.
Patterns of tRNA presence in plant mitochondrial genomes
While plant mitochondrial genomes possess native tRNAs, nuclear-encoded tRNAs need to be imported from the cytosol to compensate for those that are missing [71–73]. In both Gossypium genomes, four (trnA, trnL, trnR and trnT) of the 20 tRNAs are absent from the mitochondrial genome, and therefore must be imported from the cytosol. To evaluate the patterns of loss of tRNAs during the evolution of plant mitochondrial genomes, we analyzed tRNAs in 37 land plant mitochondrial genomes (Fig. 8). Of the genomes analyzed, only the non-seed plants Marchantia polymorpha, Pleurozia purpurea and Treubia lacunose have a complete set of tRNAs. Patterns of presence/absence suggest that trnA was lost early in the evolution of seed plants, while trnL, trnR, trnT, and trnV were lost during the evolution of the eudicots. Interestingly, trnV exists in both Gossypium and B. vulgaris; however, these may both represent subsequent gains, as BLAST comparison of the trnV copy in Gossypium shows more than 99 % identity to the corresponding copy in the Gossypium chloroplast (Table 1). Similar to the observation for the eudicots, trnG was lost early during monocot evolution. Finally, S. latifolia and P. dactylifera experienced rapid loss of large numbers of tRNAs . Overall, only trnC, trnE, trnM, trnP and trnY are present in all species evaluated, indicating that these tRNAs may be most conserved in plant mitochondrial genomes.
Mitochondrial genomes of plants are evolutionarily intriguing because of their highly conserved genic content and slow rates of genic evolution [11–13], features which contrast sharply with their highly labile genomic structure, genome size, DNA repair mechanisms and recombination induced by different types and origins of repeated sequences. Common evolutionary modifications of mitochondrial genomes include gene loss [75, 76]; intracellular, intergenomic transfers [37, 75, 77, 78]; sequence acquisitions, horizontal transfers from other, sometimes distantly related species ; multiple sequence rearrangements  and DNA repair mechanisms [11–13]. Here we compare the mitochondrial genomes of two closely related allopolyploid cotton species, which diverged only 1–2 mya and share the same organellar ancestry [70, 79]. Despite the short divergence time separating G. barbadense and G. hirsutum, many of the hallmark features of mitochondrial genome evolution are evident, including differential genic content, gains/losses of multiple small and large repeats, and genome rearrangements, horizontal transfer, and the evolution of duplicated genes. We illustrate how phylogenetic analysis combined with divergence data can illuminate the timing of duplicated gene formation and of differences in mitochondrial tRNA and protein coding gene content. Increasing insight into the mechanisms and functional consequences of mitochondrial gene and genome variation are expected as additional plant mitochondrial genome sequences become available.
Availability of supporting data
The data sets supporting the results of this article are included within the article and its additional files.
- Gossypium barbadense :
- Gossypium hirsutum :
Cytoplasmic male sterility
Open reading frames
Cetyltrimethyl ammonium bromide
Long and accurate polymerase chain reaction
Horizontal gene transfer
Unseld M, Marienfeld JR, Brandt P, Brennicke A. The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides. Nat Genet. 1997;15(1):57–61.
Sloan DB, Alverson AJ, Chuckalovcak JP, Wu M, McCauley DE, Palmer JD, et al. Rapid evolution of enormous, multichromosomal genomes in flowering plant mitochondria with exceptionally high mutation rates. PLoS Biol. 2012;10(1), e1001241.
Rice DW, Alverson AJ, Richardson AO, Young GJ, Sanchez-Puerta MV, Munzinger J, et al. Horizontal transfer of entire genomes via mitochondrial fusion in the angiosperm Amborella. Science. 2013;342(6165):1468–73.
Backert S, Lynn Nielsen B, Börner T. The mystery of the rings: structure and replication of mitochondrial genomes from higher plants. Trends Plant Sci. 1997;2(12):477–83.
Allen JO, Fauron CM, Minx P, Roark L, Oddiraju S, Lin GN, et al. Comparisons among two fertile and three male-sterile mitochondrial genomes of maize. Genetics. 2007;177(2):1173–92.
Handa H. Linear plasmids in plant mitochondria: peaceful coexistences or malicious invasions? Mitochondrion. 2008;8(1):15–25.
Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD. Origins and recombination of the bacterial-sized multichromosomal mitochondrial genome of cucumber. Plant Cell. 2011;23(7):2499–513.
Kubo T, Newton KJ. Angiosperm mitochondrial genomes and mutations. Mitochondrion. 2008;8(1):5–14.
Sloan DB, Alverson AJ, Wu M, Palmer JD, Taylor DR. Recent acceleration of plastid sequence and structural evolution coincides with extreme mitochondrial divergence in the angiosperm genus Silene. Genome Biol Evol. 2012;4(3):294–306.
Sanchez-Puerta MV, Zubko MK, Palmer JD. Homologous recombination and retention of a single form of most genes shape the highly chimeric mitochondrial genome of a hybrid plant. New Phytol. 2015;206(1):381–96.
Davila JI, Arrieta-Montiel MP, Wamboldt Y, Cao J, Hagmann J, Shedge V, et al. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis. BMC Biol. 2011;9:64.
Christensen AC. Plant mitochondrial genome evolution can be explained by DNA repair mechanisms. Genome Biol Evol. 2013;5(6):1079–86.
Christensen AC. Genes and junk in plant mitochondria-repair mechanisms and selection. Genome Biol Evol. 2014;6(6):1448–53.
Janicka S, Kuhn K, Le Ret M, Bonnard G, Imbault P, Augustyniak H, et al. A RAD52-like single-stranded DNA binding protein affects mitochondrial DNA repair by recombination. Plant J. 2012;72(3):423–35.
Miller-Messmer M, Kuhn K, Bichara M, Le Ret M, Imbault P, Gualberto JM. RecA-dependent DNA repair results in increased heteroplasmy of the Arabidopsis mitochondrial genome. Plant Physiol. 2012;159(1):211–26.
Alverson AJ, Wei XX, Rice DW, Stern DB, Barry K, Palmer JD. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol Biol Evol. 2010;27(6):1436–48.
Rodriguez-Moreno L, Gonzalez VM, Benjak A, Marti MC, Puigdomenech P, Aranda MA, et al. Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin. BMC Genomics. 2011;12:424.
Burger G, Saint-Louis D, Gray MW, Lang BF. Complete sequence of the mitochondrial DNA of the red alga Porphyra purpurea. Cyanobacterial introns and shared ancestry of red and green algae. Plant Cell. 1999;11(9):1675–94.
Fujii S, Kazama T, Yamada M, Toriyama K. Discovery of global genomic re-organization based on comparison of two newly sequenced rice mitochondrial genomes with cytoplasmic male sterility-related genes. BMC Genomics. 2010;11:209.
Sloan DB, Muller K, McCauley DE, Taylor DR, Storchova H. Intraspecific variation in mitochondrial genome sequence, structure, and gene content in Silene vulgaris, an angiosperm with pervasive cytoplasmic male sterility. New Phytol. 2012;196(4):1228–39.
Grewe F, Edger PP, Keren I, Sultan L, Pires JC, Ostersetzer-Biran O, et al. Comparative analysis of 11 Brassicales mitochondrial genomes and the mitochondrial transcriptome of Brassica oleracea. Mitochondrion. 2014;19:135–43.
Mower J, Sloan D, Alverson A. Plant mitochondrial genome diversity: the genomics revolution. In: Wendel JF, Greilhuber J, Dolezel J, Leitch IJ, editors. Plant Genome Diversity, vol. 1. Vienna: Springer; 2012. p. 123–44.
Smith DR, Lee RW. Mitochondrial genome of the colorless green alga Polytomella capuana: a linear molecule with an unprecedented GC content. Mol Biol Evol. 2008;25(3):487–96.
Kitazaki K, Kubo T. Cost of having the largest mitochondrial genome: evolutionary mechanism of plant mitochondrial genome. Journal of Botany. 2010;2010.
Marechal A, Brisson N. Recombination and the maintenance of plant organelle genome stability. New Phytol. 2010;186(2):299–317.
Mower JP, Case AL, Floro ER, Willis JH. Evidence against equimolarity of large repeat arrangements and a predominant master circle structure of the mitochondrial genome from a monkeyflower (Mimulus guttatus) lineage with cryptic CMS. Genome Biol Evol. 2012;4(5):670–86.
Siculella L, Damiano F, Cortese MR, Dassisti E, Rainaldi G, Gallerani R, et al. Gene content and organization of the oat mitochondrial genome. Theor Appl Genet. 2001;103(2–3):359–65.
Darracq A, Varre JS, Touzet P. A scenario of mitochondrial genome evolution in maize based on rearrangement events. BMC Genomics. 2010;11:233.
Kajander OA, Rovio AT, Majamaa K, Poulton J, Spelbrink JN, Holt IJ, et al. Human mtDNA sublimons resemble rearranged mitochondrial genomes found in pathological states. Hum Mol Genet. 2000;9(19):2821–35.
Woloszynska M. Heteroplasmy and stoichiometric complexity of plant mitochondrial genomes--though this be madness, yet there’s method in’t. J Exp Bot. 2010;61(3):657–71.
Darracq A, Varre JS, Marechal-Drouard L, Courseaux A, Castric V, Saumitou-Laprade P, et al. Structural and content diversity of mitochondrial genome in beet: a comparative genomic analysis. Genome Biol Evol. 2011;3:723–36.
Sloan DB. One ring to rule them all? Genome sequencing provides new insights into the ‘master circle’ model of plant mitochondrial DNA structure. New Phytol. 2013;200(4):978–85.
Chase CD. Cytoplasmic male sterility: a window to the world of plant mitochondrial-nuclear interactions. Trends Genet. 2007;23(2):81–90.
Tuteja R, Saxena RK, Davila J, Shah T, Chen W, Xiao YL, et al. Cytoplasmic male sterility-associated chimeric open reading frames identified by mitochondrial genome sequencing of four Cajanus genotypes. DNA Res. 2013;20(5):485–95.
Zhu A, Guo W, Jain K, Mower JP. Unprecedented heterogeneity in the synonymous substitution rate within a plant genome. Mol Biol Evol. 2014;31(5):1228–36.
Drouin G, Daoud H, Xia J. Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 2008;49(3):827–31.
Adams KL, Palmer JD. Evolution of mitochondrial gene content: gene loss and transfer to the nucleus. Mol Phylogenet Evol. 2003;29(3):380–95.
Bergthorsson U, Richardson AO, Young GJ, Goertzen LR, Palmer JD. Massive horizontal transfer of mitochondrial genes from diverse land plant donors to the basal angiosperm Amborella. Proc Natl Acad Sci U S A. 2004;101(51):17747–52.
Goremykin VV, Salamini F, Velasco R, Viola R. Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer. Mol Biol Evol. 2009;26(1):99–110.
Li SS, Liu GZ, Chen ZW, Wang YM, Li PB, Hua JP. Construction and initial analysis of five Fosmid libraries of mitochondrial genomes of cotton (Gossypium). Chinese Sci Bull. 2013;58(36):4608–15.
Liu G, Cao D, Li S, Su A, Geng J, Grover CE, et al. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes. PLoS One. 2013;8(8), e69476.
Xue JY, Liu Y, Li L, Wang B, Qiu YL. The complete mitochondrial genome sequence of the hornwort Phaeoceros laevis: retention of many ancient pseudogenes and conservative evolution of mitochondrial genomes in hornworts. Curr Genet. 2010;56(1):53–61.
Wendel J, Brubaker C, Seelanan T. The origin and evolution of Gossypium. In: Stewart J, Oosterhuis D, Heitholt J, Mauney J, editors. Physiology of Cotton. Netherlands: Springer; 2010. p. 1–18.
Wendel JF, Cronn RC. Polyploidy and the evolutionary history of cotton. Adv Agron. 2003;78:139–186.
Liu C, Yuan D, Zhang X, Lin Z. Isolation, characterization and mapping of genes differentially expressed during fibre development between Gossypium hirsutum and G. barbadense by cDNA-SRAP. J Genet. 2013;92(2):175–81.
Liu N, Tu L, Tang W, Gao W, Lindsey K, Zhang X. Small RNA and degradome profiling reveals a role for miRNAs and their targets in the developing fibers of Gossypium barbadense. Plant J. 2014;80(2):331–44.
Lei BB, Li SS, Liu GZ, Chen ZW, Su AG, Li PB, et al. Evolution of mitochondrial gene content: loss of genes, tRNAs and introns between Gossypium harknessii and other plants. Plant Syst Evol. 2013;299(10):1889–97.
Su AG, Li SS, Liu GZ, Lei BB, Kang DM, Li ZH, et al. Study on the mitochondrial genome of Sea Island cotton (Gossypium barbadense) by BAC library screening. J Integr Agr. 2014;13(5):945–53.
Wang XF, Ma J, Ma ZY, Zhang GY, Zheng YM. BAC library construction and characterization of Suyuan 7235, a cotton germplasm with high fiber strength. Cotton Science. 2006;18(4):200–3.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–23.
Jo YD, Choi Y, Kim DH, Kim BD, Kang BC. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing. BMC Genomics. 2014;15(1):561.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18.
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.
Lohse M, Drechsel O, Bock R. OrganellarGenomeDRAW (OGDRAW): a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr Genet. 2007;52(5–6):267–74.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–45.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, et al. PipMaker--a web server for aligning two genomic DNA sequences. Genome Res. 2000;10(4):577–86.
Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002;30(11):2478–83.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28(10):2731–9.
Hsu CL, Mullin BC. Physical characterization of mitochondrial DNA from cotton. Plant Mol Biol. 1989;13(4):467–8.
Satoh M, Kubo T, Mikami T. The Owen mitochondrial genome in sugar beet (Beta vulgaris L.): possible mechanisms of extensive rearrangements and the origin of the mitotype-unique regions. Theor Appl Genet. 2006;113(3):477–84.
Yu J, Wang J, Lin W, Li S, Li H, Zhou J, et al. The Genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005;3(2), e38.
Li F, Fan G, Lu C, Xiao G, Zou C, Kohel RJ, et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol. 2015;33(5):524–30.
Groth-Malonek M, Wahrmund U, Polsakiewicz M, Knoop V. Evolution of a pseudogene: exclusive survival of a functional mitochondrial nad7 gene supports Haplomitrium as the earliest liverwort lineage and proposes a secondary loss of RNA editing in Marchantiidae. Mol Biol Evol. 2007;24(4):1068–74.
Tada SFS, Souza AP. A recombination point is conserved in the mitochondrial genome of higher plant species and located downstream from the cox2 pseudogene in Solanum tuberosum L. Genetics and Molecular Biology. 2006;29:83–9.
Bergthorsson U, Adams KL, Thomason B, Palmer JD. Widespread horizontal transfer of mitochondrial genes in flowering plants. Nature. 2003;424(6945):197–201.
Wang K, Wang Z, Li F, Ye W, Wang J, Song G, et al. The draft genome of a diploid cotton Gossypium raimondii. Nat Genet. 2012;44(10):1098–103.
Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012;492(7429):423–7.
Li F, Fan G, Wang K, Sun F, Yuan Y, Song G, et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet. 2014;46(6):567–72.
Small RL, Wendel JF. The mitochondrial genome of allotetraploid cotton (Gossypium L.). J Hered. 1999;90(1):251–3.
Dietrich A, Small I, Cosset A, Weil JH, Marechal-Drouard L. Editing and import: strategies for providing plant mitochondria with a complete set of functional transfer RNAs. Biochimie. 1996;78(6):518–29.
Glover KE, Spencer DF, Gray MW. Identification and structural characterization of nucleus-encoded transfer RNAs imported into wheat mitochondria. J Biol Chem. 2001;276(1):639–48.
Small I, Akashi K, Chapron A, Dietrich A, Duchene A-M, Lancelin D, et al. The strange evolutionary history of plant mitochondrial tRNAs and their aminoacyl-tRNA synthetases. Journal of Heredity. 1999;90(3):333–7.
Sloan DB, Alverson AJ, Storchova H, Palmer JD, Taylor DR. Extensive loss of translational genes in the structurally dynamic mitochondrial genome of the angiosperm Silene latifolia. BMC Evol Biol. 2010;10:274.
Adams KL, Ong HC, Palmer JD. Mitochondrial gene transfer in pieces: fission of the ribosomal protein gene rpl2 and partial or complete gene transfer to the nucleus. Mol Biol Evol. 2001;18(12):2289–97.
Adams KL, Qiu YL, Stoutemyer M, Palmer JD. Punctuated evolution of mitochondrial gene content: high and variable rates of mitochondrial gene loss and transfer to the nucleus during angiosperm evolution. Proc Natl Acad Sci U S A. 2002;99(15):9905–12.
Adams KL, Rosenblueth M, Qiu YL, Palmer JD. Multiple losses and transfers to the nucleus of two mitochondrial succinate dehydrogenase genes during angiosperm evolution. Genetics. 2001;158(3):1289–300.
Michalovova M, Vyskot B, Kejnovsky E. Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: size, relative age and chromosomal localization. Heredity (Edinb). 2013;111(4):314–20.
Wendel JF, Grover CE. Taxonomy and evolution of the cotton genus. In: Fang D, Percy R, editors. Cotton, Agronomy. Madison, WI: Monograph 24, ASA-CSSA-SSSA; 2015. in press.
We are indebted to Prof. Yingguo Zhu and Dr. Shaoqing Li (College of Life Sciences, Wuhan University, China) for supplying the experimental platform and helpful discussion. We also thank Professor Shu-Miaw Chaw (BRCAS, Taiwan, China) for suggestions and helpful discussions. We thank Longfei Xue, Aiguo Su and Binbin Lei for participating in assembling the mitochondrial genomes. This work was supported by a grant from the National Natural Science Foundation of China (31171591) to J. Hua.
The authors declare that they have no competing interests.
MT and ZC assembled the mitochondrial genome, annotated the mitochondrial genomes, performed the data analysis and prepared the manuscript. CG performed the transposon analysis and classification. YW participated in the bench work. SL and GL constructed the Fosmid libraries and selected positive clones. ZM provided cotton materials and technical support for the BAC library. JFW assisted with experimental design and revised the manuscript. JH conceived the experiments, provided the experimental platform and revised the manuscript. All authors approved the final manuscript.
Mingyong Tang and Zhiwen Chen contributed equally to this work.
The distributions of plastid, mitochondrion and nuclei in sucrose-density gradient. (JPEG 325 kb)
PCR validation for Pima90-53 mtDNA and total DNA with two mitochondrial, nucleus and chloroplast markers, respectively. Note: M1 (nad4L) and M2 (ccmB) represent mitochondrial markers; N1 (actin) and N2 (RT165) represent nuclear markers; C1 (GCS20) and C2 (GCS60) represent chloroplast markers. D: D2000 plus DNA ladder. (JPEG 252 kb)
Partial primers of PCR in genome assembling. (DOC 91 kb)
Chloroplast-like sequences in the mitochondrial genome of G. barbadense. (DOC 37 kb)
Repeats (>100 bp) in Gossypium barbadense mitochondrial genome. (DOC 59 kb)
Eight syntenic blocks (>10 kb) between Gossypium barbadense and G. hirsutum mitochondrial genomes. (DOCX 14 kb)
Schematic illustration of the eight syntenic regions in mitochondrial genomes of G. barbadense and G. hirsutum and five repeats located in G. barbadense while the sequences of R1 were just present once in the mitochondrial genome of G. hirsutum. The map has been rotated 90° counterclockwise after being inverted compared to Fig. 1. (JPEG 285 kb)
About this article
Cite this article
Tang, M., Chen, Z., Grover, C.E. et al. Rapid evolutionary divergence of Gossypium barbadense and G. hirsutum mitochondrial genomes. BMC Genomics 16, 770 (2015). https://doi.org/10.1186/s12864-015-1988-0
- Mitochondrial genome
- Comparative genomics
- DNA rearrangement
- Duplicated copy
- Sequence transfer