The chondriome is known to carry different types of mitochondrial genomes and showing polymorphism within each cell and tissue type . Most reports on heteroplasmy in plants are based on sequencing of a few genes or ORFs [18, 22, 37]. The next generation sequencing method used in this study allowed deep sequencing with a range of 61-102x coverage in the mitochondrial genome of all three lines. This provided a clearer picture of heteroplasmy within each mtDNA (mtDNA hereafter will refer to different types of mitochondrial genomes or mitotypes presented in heteroplasmic condition in a plant). Heteroplasmy was observed in inter-genic space as well as genic space within each genome. Most mtDNA sequencing in wheat to date has been based on BAC or large cosmid library construction [7, 36]. This may make genome assembly easier due to the lower complexity of the data, but it also can limit the understanding of heteroplasmy and substoichiometric changes of the mitotypes in the chondriome.
This study provides for the first time a detailed analyses of an alloplasmic line of wheat in comparison to its two euplasmic nuclear and cytoplasmic donor lines. Analysis of the mitochondrial genome of an alloplasmic wheat with the Ae. kotschyi cytoplasm reported by Liu and his colleagues  lacked the sequence information of the genome from the euplasmic Ae. kotschyi maternal line for a better comparison. Reference assembly identified, several gaps in the mitochondrial genomes of the T. turgidum, the Ae. longissima and the (lo) durum as compared with that of the T. aestivum (Figure 1). Genome rearrangements and structural changes were evident using the de novo assembly of the mitochondrial genomes of the three lines (Figure 2). Mitochondrial genome of the alloplasmic line was not only distinct from the T. turgidum but also different from its maternal parent the Ae. longissima. Differences in gene order have not only been reported between different species of the same family, because of gene shuffling , but also among different ecotypes of a single species such as Arabidopsis. Different genome structures have been observed comparing euplasmic lines with the CMS lines in maize , rice  and wheat . Structural changes between the three genomes were expected, but the degree of these changes, which occurred in less than 50 years for the (lo) durum line, is quite significant. The rearrangements observed between the (lo) durum and the Ae. longissima mitochondrial genomes were greater than that observed for the T. turgidum and the T. aestivum, which are separated by 10,000 years of evolution . The Ae. longissima mitochondrial genome has undergone a drastic structural change possibly as a result of introducing a new nuclear genome and production of a live alloplasmic line .
In alloplasmic conditions, incompatibility between the encoded gene products from nucleus and mitochondria could lead to altered mitochondrial function. The enzymes of the inner mitochondrial membrane which contain subunits encoded from both mitochondria and nuclear genomes may not be compatible and less functional in the newly formed alloplasmic line . This pressure in addition to possible changes in recombination, due to improper function of nuclear gene(s) such as MSH1 and RECA3 that control recombination in the mitochondrial genome, can lead to high frequency of heteroplasmy. The substoichiometric changes in favor of mitotypes that survive better in the new condition will accelerate evolution. This accelerated evolution of newly formed mitotypes can lead to multiple types different from the maternal mitochondrial genome in organization and genetic information, as found here and in other species [9, 13]. Therefore, in the alloplasmic CMS lines, not only a larger number of newly formed mitotypes exist, but also the stoichiometric shift changes the predominant mitotype from the maternal type to the one best suited to the new condition .
Multiple synteny blocks were observed between different lines possibly indicating sites of recombination (Figure 2). Studies on wheat mitochondrial gene expression indicated that nad6 and nad1 (exon d) are co-transcribed . Both genes were found to be present in Block II, despite the high structural differences observed. Therefore, the co-existence of genes in the same block may be related to their functional association. At least 10 large repeated sequences exist in the wheat mitochondrial genome making it possible to have multiple sub-genomic circles in a cell. Multiple genome rearrangements, likely caused by recombination in mitochondrial repeated sequences, would result in the conserved coding regions often flanked by different sequences . The co-existence of different mitotypes in a cell after these rearrangements may have hampered the production of a contiguous assembly of the whole wheat mitochondrial genome. Although reference assembly with the T. aestivum mitochondrial genome was possible for the T. turgidum, it was problematic for the other two lines. De novo assembly in these situations provided a better and more complete picture of the genome and its organization. The de novo assembly of mitochondrial genome was a challenge in our study due to the multipartite nature and rearrangements through recombination. The same difficulty was also faced by other groups using next generation sequencing of mitochondrial genome . The de novo assembly using MIRA program followed by manual editing resulted in one contig of ~432 kb and two contigs with combined length of ~399 kb for the (lo) durum and the Ae. longissima, respectively. This emphasizes the necessity for developing more bioinformatics tools that are specific for mitochondrial genome analysis.
All previously characterized genes in the T. aestivum mitochondrial genome were present in the alloplasmic line and its parents. Despite conservation in gene content, multiple genes showed considerable sequence differences. The alloplasmic durum and the Ae. longissima line shared the same sequence variation in a number of genes such as atp6-1, nad6, rps19-p, cob and cox2 exon-2 when compared to the T. turgidum. The atp6 subunit of F0F1 – ATP synthase is considered to be a mitochondrial encoded gene. Early studies recognized the chimeric structure of atp6 in Triticum species . Later, it was found that the presence of those chimeric versions were associated with cytoplasmic male sterility in rice, where proper RNA editing of altered apt6 could restore male fertility . The same differences in atp6-1, nad6, rps19-p, cob and cox2 exon 2 identified in this study have been reported by Liu et al.  in an alloplasmic line of T. aestivum with the cytoplasm of Ae. kotschyi. These appear to be the common differences between the Triticum and Aegilops genera rather than being due to the CMS condition as reported by Kawaura  for atp6.
To confirm the allelic differences of atp6 between the (lo) durum and its parental lines, a complimentary PCR was performed with specific primers for each allele after genome sequencing (Figure 3). It seems the alloplasmic and its euplasmic maintainer have both atp6-T and atp6-L versions of the gene. However, atp6-T version appears to be only present in the (lo) durum nucleus rather than in mitochondria. The same situation seems valid for the presence of atp6-L in the T. turgidum nucleus. Recently it has been shown that two versions of atp6 are present in the Ae. crassa, T. aestivum cv. Chinese Spring and the alloplasmic line of Chinese spring with the Ae. crassa cytoplasm . The atp6-CR (crassa) version in Chinese Spring and atp6-AE (aestivum) version in the Ae. crassa were present in less than 10% of the mitochondrial pool in the cell. However, the frequency of atp6-AE was 30% in the alloplasmic line possibly due to paternal leakage. There was no obvious amplification for atp6-T in the (lo) durum or the Ae. longissima mtDNA and atp6-L in the T. turgidum mtDNA (Figure 3). Therefore, presence of a nuclear copy of both genes is likely.
Three different alleles of nad9 were identified in this study. The allele found in the alloplasmic line is more similar to that of the paternal T. turgidum line than the Ae. longissima. This provides another evidence of paternal leakage during the creation of the alloplasmic line. Paternal leakage and its contribution to heteroplasmy has been indicated by numerous studies [19, 22, 45–47]. In wheat paternal leakage was investigated in detail in the alloplasmic hexaploid wheat having Ae. crassa cytoplasm . It seems the proportion of paternal genes in the alloplasmic line increases by each backcrossing with the paternal line and then remains at a constant level . The sequence of nad9 in the (lo) durum has similarities to both parents. Therefore, this copy of nad9 not only shows paternal leakage but also suggests that a recombination between the maternal and paternal mitochondrial genomes may have occurred. Since the Ae. longissima copy of the gene is absent in the (lo) durum, paternal leakage was either high for this gene and/or the nucleus of the T. turgidum selected in favor of the recombinant version which was similar to its own. It is known that nuclear genes determine the stoichiometry of alternative mitotypes . Therefore, the second reason is most likely to be valid.
Two genes, rpl5 and ccmB were missing in the final assembly of the (lo) durum. The rpl5 gene encodes a ribosomal protein, responsible for rRNA maturation and formation of the 60S ribosomal subunits. Its function could be critical to the survival of the alloplasmic line. Both rpl5 and ccmB genes were present in the raw assembly data but not included in the final assembly. In a recent study on sequencing the mitochondrial genome of a CMS line of T. aestivum, the lack of rpl5 gene was associated with the CMS phenotype . The lack of this gene in the final assemblies of the (lo) durum in our study and the (Ae. kotschyi) alloplasmic line  may be related to the proportion of mitochondrial DNA carrying that gene by means of substoichiometric shifting . It is possible that the rpl5 gene exists in a mitotype which is in a much smaller proportion compared to the major mitotype in the cell.
Besides major changes discussed, a nucleotide polymorphism search was performed within known mitochondrial genes in the sequenced species. Out of six SNPs observed between the (lo) durum and the Ae. longissima, three were in ribosomal protein coding genes rps2, rps4 and rps13. Nucleotide variation within ribosomal genes was also observed in the alloplasmic line of T. aestivum, where among other, differences in rps2, and rps4 were recognized . This indicates that ribosomal genes are the possible “hot spots” for mutation in alloplasmic lines. Since ribosomal proteins are responsible for protein expression, these differences may be important to our understanding of CMS phenotype in plants. These findings suggest the possibility of creating potential SNP based markers to investigate other cytoplasm in alloplasmic lines of wheat. The functional effect of nucleotide variation in cox3, rps13, and mttB gene was evaluated in comparison to A. thaliana, Zea perennis, T. aestivum and Oryza sativa sub. Indica (Figure 4). Amino acid variations found in these genes are common among various plant species except for the change in rps13 leading to an amino acid change unique to the alloplasmic line. This data supports the hypothesis of accelerated evolutionary changes in alloplasmic lines observed here and in another study .
Nucleotide polymorphism could be detected within each chondriome and categorized as single HSNPs and HSNP blocks. The HSNP blocks within each chondriome is an indicator of heteroplasmy in each genome and could possibly be used to investigate the proportion of particular mitotype in the genome. Overall mitochondrial genome polymorphism comparison between chondriome of each line showed an expected result that the alloplasmic line is closer to the Ae. longissima than to the T. turgidum. In the alloplasmic durum line, one particular region of DNA, designated as orf359, showed the highest level of polymorphism compared to other regions of the genome. Gene content and order showed that orf359 does not exist in the cytoplasm donor line, but was completely conserved (sequence and position) among the T. aestivum and the T. turgidum genomes. Existence of orf359 in the alloplasmic durum line is additional evidence for paternal leakage, although it was highly mutated. Since the (lo) durum line has been developed through a complicated backcrossing scheme including the (lo) T. aestivum cv. Selkirk and the T. timopheevii, the impact of these changes may have influenced the new ORF composition. Sequencing of the T. timopheevii mitochondrial genome can provide a better insight to these observations.
Several open reading frames specific to the alloplasmic line were detected, implying that alloplasmic condition can lead to creation of new ORFs. These new ORFs can contribute to CMS and other characteristic phenotypes in the (lo) durum line. Interestingly, ORFs 63, 65, 112 from the alloplasmic durum line were also observed in the alloplasmic hexaploid wheat containing the Ae. kotchyi cytoplasm . The occurrence of these ORFs might be related to the alloplasmic condition for they were not present in the maternal lines. As both alloplasmic lines having these ORFs are CMS, it is possible that their existence are associated with the CMS condition. The most characteristic ORF recognized in this study was chimeric orf113 (Additional file 9: Figure S6) composed entirely of four fragments of other mitochondrial genes, including rps2, cox1, nad4-2, and rps19-p. Expression of a new chimeric ORF “orf72” in wild cabbage has been found to be related to the CMS phenotype . This ORF contained parts of atp9 and expressed in CMS line, but not in the euplasmic line. It was also found that expression of orf224/atp6 chimeric gene is correlated with CMS trait in Brassica napus.