Synchronous dissection of chloroplast and mitochondrial genomes remodels the intra- and inter-genus phylogeny for the agriculturally important genus Brassica

Background : The genus Brassica mainly comprises three diploid and three recently derived allotetraploid species, most of which are highly important vegetable, oil or ornamental crops cultivated worldwide. Despite being extensively studied, the origination of the allotetraploid crops and the overall phylogeny of Brassica genus are still far from completely resolved, which has greatly hindered the development of novel Brassica crops. Here, we target and integrate the chloroplast DNA and mitochondrial DNA to investigate the genetic diversity and relationships in large plant populations centering on Brassica genus. Results : The phylogenetic analyses based on a data set including 72 de novo assembled whole chloroplast genomes, delineated a comprehensive evolutional atlas inside and around Brassica genus. The maternal origin of both B. juncea and B. carinata are monophyletic from cam-type B. rapa and B. nigra , respectively. Nonetheless, the current B. napus contains three major cytoplasmic haplotypes: the cam -type which directly inherited from B. rapa , polima -type which is close to cam -type as a sister, and the predominant nap -type. Intriguingly, nap -type seems phylogenetically integrated with certain sparse C-genome wild species, thus implying that which may have primarily contributed the cytoplasm and the corresponding C subgenome to B. napus . Human breeding creation of the B. napus cytoplasmic male sterile lines (e.g., mori and nsa ) dramatically disturbed the concurrent inheritance between mtDNA and cpDNA. Strong parallel evolution among genera Raphanus , Sinapis, Eruca , Moricandia with Brassica indicates their uncomplete divergence from each other. Conclusions : The overall variation data and elaborated phylogenetic relationships obtained herein can substantially facilitate the development of novel Brassica crops, e.g. the allotetraploid rapeseed with new cytonuclear integrations and the allohexaploid rapeseed.

carinata are of seed oil use. To date, B. napus (rapeseed) has become the second largest vegetable oil crop worldwide [2]. Recently, the release of several reference genome sequences have brought Brassica as a new ideal model genus for studying polyploidy genomes [3][4][5][6][7].
B. napus is supposed to originate from certain kind of hybridization between B. rapa and B. oleracea, which co-existed in European Mediterranean coastwise regions, at approximately 10,000 years ago [4]. Then it has diffused worldwide (mainly to Asia, America and Australia), and eventually formed several ecological and morphological types, which mainly include winter, spring and semi-winter ecotypes or oil-use, root-tuberous and leafy morphotypes. Recently, extensive resequencing studies concerning the mechanisms for the progenitors, evolution and improvement of this versatile crop have been performed. Phylogenomic analyses combining diverse B. napus and its potential progenitors revealed that winter type rapeseed might be the original form of B. napus, European turnip ancestor might donate the A subgemone, the C subgenome may evolve from the common ancestor of kohlrabi, cauliflower, broccoli, and Chinese kale [8]. The A and C subgenomes evolved asymmetrically and higher genetic diversity was identified in A subgenome [9]. Nonetheless, frequent post-formation introgression events occurred during human breeding and generally confused the recovery of the origination trajectory of B. napus. To date, how B. napus originated and then stabilized at genomic level remain largely unresolved, the A-C subgenome interacting mechanisms are fascinating and need to be finely elucidated.
Cytoplasmic DNA in plant cell, especial for chloroplast DNA (cpDNA), are structurally simple with a small genome size (100 -300 kb) and stably inherited mostly in a uniparental pattern with much less recombinant variation [10]. Thus, it has been extensively employed in phylogenetic studies, especially for exploring the maternal origin [11][12][13][14]. Genotyping by using six chloroplast SSR primer pairs or TILLING analysis, one most prevalent cpDNA haplotype was identified in B. napus [15,16].
While, the B. napus of this same cpDNA haplotype generally formed an ambiguous clade, which did not group with the investigated B. rapa or B. oleracea accessions [17]. A few B. napus accessions were grouped with the majority of B. rapa accessions suggested another independent cytoplasmic origin from B. rapa [9,18]. Both the cpDNA and mtDNA phylogenetic studies revealed that the cytoplasm of B. juncea (AABB genome) and B. carinata (BBCC genome) originated from B. rapa (AA genome) and B. nigra (BB genome), respectively [17][18][19][20]. The mitochondrial DNA (mtDNA) of B. napus has drawn much more attention for the extensive application of its cytoplasmic male sterility (CMS) lines in hybrid breeding, mainly containing polima (pol), cam and nap mitotypes in the natural resources. Pol mitotype was inferred to be derived from B. rapa cam mitotype with certain evolutionary modifications. Nap mitotype is predominant, while it remains unsolved and were supposed also from an unidentified or lost mitotype of B. rapa [19]. However, the nap mitotype was further judged to be derived from B. oleracea, since it was phylogenetically grouped with botrytis-type and capitata-type B. oleracea [20].
Apparently, the current above conclusions regarding the maternal origin of nap-type B. napus are still ambiguous and controversial. Previous cpDNA and mtDNA-based studies were separated and need to be corresponded and integrated to finely dissect the multiply origin of B. napus cytoplasm.
Cytoplasmic DNA and its corresponding cytonuclear interactions, are highly valuable for crop breeding not only due to its cause of cytoplasmic male sterility [21], but also in regulating certain agricultural traits, e.g., seed oil content in rapeseed [22] and plant resistance to adverse living conditions. Here in this study, a well-chosen set of both the chloroplast and mitochondrial DNA from the materials centering on Brassica genus were sequenced to dissect their genetic diversity, a detailed phylogenetic pedigree at a lower taxonomic level has been constructed to try to recover the originating and evolving trajectories of Brassica crops.

Results
Genotyping, sequencing and assembly of diverse cytoplasmic DNA haplotypes centering on Brassica genus To distinguish the cytoplasmic DNA (cpDNA and mtDNA) haplotypes within Brassica genus, genotyping analysis through High Resolution Melting (HRM) method were performed within our germplasm collection of Brassica crops ( Figure S1). Primers were designed being targeted on a set of intra/inter-specific cpDNA polymorphic sites that are identified previously [16] (Table S2) [23,24], nsa [25] and mori [26,27]. Certain relative materials, i.e. Raphanus sativus, Sinapis arvensis and Moricandia arvensis, were also included to enrich this study.
Cytoplasmic DNA was synchronously isolated from those accessions that represent for the major cytoplasmic haplotypes and morphological varieties (Table 1), using an optimized organelle isolation procedure (Materials and Methods), which can substantially remove nuclei and balance the proportion of cpDNA and mtDNA content. Reads mapping analysis demonstrated that the isolated total DNA contains an average ratio of 37.2% chloroplast DNA and 3.4% mitochondrial DNA, respectively, approximately 5-10 times higher than the ratio of cytoplasmic DNA in the total leaf DNA [28]. The cytoplasmic DNA mixture was then subjected to high-throughput sequencing with high depth (> 500x, Table 1). The obtained paired-end reads (150 bp) were directly mapped to a tandem sequence gather, which consists of 10 published chloroplast genome sequences all over Brassicaceae family. The mapped reads were extracted and de novo assembled by SOAPdenovo software package [29].
Generally, two or three large contigs were eventually generated for chloroplast genomes. Gaps were directly filled through manual jointing of the overlapping ends between each two contiguous contigs, and then verified by Sanger sequencing of the gap-spanning PCR fragments. All the obtained chloroplast genome sequences are provided in in Additional file 3.

General information of the genome-wide cpDNA and mtDNA variants in Brassica
The chloroplast and mitochondrial genome sequences of a B. napus strain 51218 [22] (GenBank: KP161617.1), which is an intermediate breeding material of nap mitotype, were respectively used as references to call the overall cpDNA and mtDNA basic variants for all the sequenced materials. The calling was conducted by standard BWA/Genome Analysis Toolkit (GATK) pipeline with manual inspection [30], and further randomly verified by Kompetitive Allele Specific PCR (KASP) analysis. A total of approximately 4700 reliable basic polymorphic sites, including 3880 SNP and 820 InDels, were identified for all the sequenced chloroplast haplotypes in Brassica genus. While, approximately 3400 polymorphic sites (2700 SNP and 700 InDels) were identified for the mitochondrial haplotypes (Table   S3). The average SNP density in the whole chloroplast and mitochondrial genomes was 25 and 12  (Table S3).
Consistent difference patterns were also found for cpDNA variants as for the three cytoplasmic types.
KASP analysis using the primers targeted to the mitochondrial and chloroplast haplotype-specific polymorphic sites showed that nap, cam and polima cytoplasms accounted for 87.1%, 7.2% and 5.7% in a germplasm collection containing 480 worldwide B. napus accessions ( Figure S2), basically corresponding to the above mentioned cpDNA genotyping results by HRM method. Undoubtedly, naptype is the predominant cytoplasmic DNA haplotype, which ought to be the predominant haplotype identified in previous studies [15,16]. Most of the B. rapa materials are of the same cam-type in B.
napus, KASP analysis showed that another major haplotype accounted for a frequency of approximately 5.8% in the investigated B. rapa population. This haplotype was named as sarson-type hereinafter, since here we found that it mainly exists in B. rapa var. sarson accessions.
Intra-genus phylogeny within Brassica genus based on whole chloroplast genomes Analyses based on whole chloroplast genomes or genome-wide variations instead of partial cpDNA fragments can infer a phylogeny that is much closer to the natural truth and with high resolution and reliability, even at lower taxonomic levels [14]. To forecast the evolutionary trajectories for Brassica crops, all the above-obtained whole chloroplast genomes were subjected to phylogenetic analysis.
The phylogenetic trees tentatively conducted using the Maximum Likelihood method, neighbor-joining napus accessions and 13 C-genome species each clustered well and were separately integrated into a species-specific group. The B. rapa separated a little branch containing only two accessions, which were classified as sarson-type cytoplasm ( Figure S3). The B. juncea accessions did not diverge any secondary branches, indicating a lack of cytoplasmic genetic diversity ( Figure S4). The B. napus cluster were split into two large branches, one branch containing the nap-type lines (e.g., the nucleargenome sequenced cultivars Darmor and ZS11), another branch further split into two little branches, containing cam-type (e.g., Shengli Rape/AC32) and polima-type (e.g., Jianyang Rape/AC399) lines, respectively ( Figure S5). All the investigated cultivated B. oleracea (e.g., Cauliflower, Broccoli, A part of the above intra-specific materials were selected capable of maximumly representing each their intraspecific genetic diversities, and then together with Brassica nigra, B. carinata and B. maurorum, were combined to construct a larger tree comprising of materials all over Brassica genus. The cpDNA sequence data for materials Root mustard-1 (B. juncea),, Sarsons-1 (B. rapa),, Broccoletto-3 (B. rapa),, Black mustard (B. juncea) and Ethiopian mustard (B. carinata) were added from Li et al., [18] to enrich the whole phylogenetic tree. The results indicated that Brassica genus was mainly divided into three clades, from which the maternal origin of the three natural allotetraploid species can be clearly inferred ( carinata and B. napus) speciated during the period 0.17-0.01 Mya or much later, which are consistent with the estimated originating time of ~7,500 years ago for B. napus [4] and the cultivation beginning time of ~7,000 years ago for B. juncea [7]. Brassica tetraploid species are much younger than other polyploidy crops, e.g., emerge of cotton (Gossypium hirsutum) at 1-2 Mya [39,40] and emerge of soybean (Glycine max) at 0.8 Mya [41].

Discussion
The cytoplasmic phylogeny clarifies the multiple maternal origin of B. napus  (Table S3), indicating a direct maternal origin of these B. napus accessions from the cam-type B. rapa. Two previously known polima-type lines (Xiang5A and 20A) also clustered in Clade I but adjacent to sarson-type B. rapa (Sarson-1) with minor variation, suggesting that the polima haplotype may inherit from certain sarson-type like B. rapa. Nap-type cytoplasm, which occupies numerous elite cultivars worldwide (e.g., Darmor and ZS11), is predominant with a population frequency of 87.1% in our B. napus collection. The cluster of nap-type B. napus seems to be inserted in the middle of C-genome Clade II, appears like a separate haplotype parallel to B. cretica, B. incana, B. insularis, B. villosa and B. oleracea. This was also supported by both our mtDNA-based phylogeny ( Figure 5) and the recent cpDNA-based phylogenetic studies [9], which included other three more C-genome species, B. rupestris, B. montana and B. macrocarpa.
Since nap-type is highly divergent from the existing C-genome cytoplasm types, there is a great possibility that one certain C-genome wild species rather than B. oleracea may have donated the naptype cytoplasm to B. napus, and also the corresponding nuclear C subgenome. Judging from the cytoplasmic inheritance, the current natural B. napus may have three maternal parents, two of A genome B. rapa and one of C genome species, possess higher cytonuclear diversity than B. juncea and B.carinata. A refined model of U's Triangle illustrating the diffusion of cytoplasmic haplotypes in Brassica genus is proposed in Figure 6. Surprisingly, the B. rapa variety broccoletto had been identified possessing identical cpDNA haplotype as the nap-type B. napus [15,18]. Whether broccoletto is the original female parent of nap-type B. napus yet need further investigation. The investigated broccoletto accession collected from Italy were generally cultivated alongside multifarious wild Brassica species [18]. Whereupon, the presence of nap-type haplotype in these B. Comparative analysis of genomic framework using 22 genomic blocks (GB) demonstrated that most GB associations in Brassica species could be detected in Raphanus sativus [42], suggesting that Raphanus and Brassica species potentially shared a common hexaploid ancestor after whole genome triplication (WGT). Common translocation Proto-Calepineae Karyotype (tPCK)-like ancestors were deduced to be the likely common ancestor of all current Brassiceae species that had undergone WGT and repetitive chromosomal rearrangements. Phylogenetic analysis based on 32 mitochondrial protein-coding genes suggested that Eruca sativa is closer to the Brassica species and Raphanus sativus than to Arabidopsis thaliana [43]. The U's Triangle theory has been accordingly revisited and extended into a multi-vertex model [42], which should include not only Raphanus, but also Eruca, Moricandia and Sinapis species as basic diploid species as suggested herein by our studies (Figure 6).  [26,44], and then the CMS phenotype was transferred into B. napus through several rounds of sexual hybridization. Nsa sterile line (AC500) was developed primarily also from protoplast fusion between B. napus and Sinapis arvensis [25]. Sequencing analysis of Ogura sterile line, which was also developed through somatic hybridization of Raphanus sativus and B. napus [45,46], revealed that rearrangement happened extensively in its mitochondrial genome [47]. Nine regions were identified to be unique to the all the published Brassica mitochondrial genome sequences belonging to U's Triangle. Therefore, Both the mori and nsa lines should contain plenty of mitochondrial genome regions from their incipient donor parents, thus clustered close to Moricandia arvensis and Sinapis arvensis, respectively, in the mtDNA based phylogenetic tree. It seems that somatic hybridization through protoplast fusion is an effective means to induce the recombination of mitochondrial genomes. Intergenomic recombinations and DNA rearrangements had been frequently identified within mitochondrial genomes [48,49], suggesting that there may be a stronger variation dynamics in mtNDA than in cpDNA.
While, it is notable that none recombination happened with the chloroplast genomes, since both the nsa and mori lines possess the identical nap-and cam-type chloroplast genomes from each of their recipient B. napus and B. juncea lines ( Figure S5), respectively. This may result from lower interspecific recombination frequencies for cpDNA or strong artificial selection during the breeding process. Similarly, recombination of parental mitochondrial genomes rather than chloroplast DNA has been identified in a cybrid (cytoplasmic hybrids) obtained by protoplast fusion of Nicotiana tabacum and Hyoscyamus niger [50]. Thus, this phenomenon also would be potentially existent in other B. napus cybrid materials, e.g., the recent inap [51] CMS lines containing mtDNA components from Isatis indigotica. Collectively, these results indicated that recent human interference have drastically disturbed the evolutionary accordance between cpDNA and mtDNA in a mass of cybrid lines.

Potential application of the Brassica cytoplasmic variant and genetic information
Sustainable development of Brassica crops requires further improvement on resistance to phytopathogens (e.g., biotrophic Plasmodiophora brassicae and necrotrophic Sclerotinia sclerotiorum),, seed yields, oil quality and mechanization level for field planting pattern. The diversified Brassica relatives stated above have been identified possessing desirable elite traits. For example, Eruca sativa (2n = 22) is a diploid edible plant and its medicinal properties have various health promoting effects [52]. Moricandia arvensis (2n = 28) is reported to be a C3-C4 intermediate species, transferring this feather of higher photosynthetic efficiency into Brassica crops have been tried by means of hybridization with the purpose of increasing yields and drought resistance [53,54].
Sinapis arvensisis is a wild weedy plant of the genus Sinapis, both Sinapis arvensisis and Sinapis alba (2n = 24) possess high resistances to drought, leanness, multiple diseases, herbicides and pod shattering [55][56][57]. Certain Raphanus species were identified to be immune to clubroot and their resistance genes need to be rapidly diverted into Brassica crops [58]. The inter-specific evolutionary relationships ( Figure 6) of Brassica present a potential guidance for creating extensive novel allotetraploid species by intercrossing the corresponding diploid species.
Cytonuclear interactions perform a major role in the evolutionary dynamics and contribute to the early stages of speciation [59]. Roux et al., [60] investigated the effects of intraspecific cytonuclear interactions on the fitness-related traits in Arabidopsis thaliana, by using a unique series of 56 cytolines derived from cytoplasmic substitutions among eight cytoplasmically diversified natural accessions. These results demonstrated that the natural cytoplasmic variations could interact with nuclear genomes to shape a large proportion of phenotypic traits that contributed to adaptation. The cytoplasm in most of the current Brassica populations (e.g., B. juncea, B. carinata and cultivated B. oleracea) are rather lack of genetic diversity, which may be a key limiting factor for improving Chloroplast genome engineering is a powerful means to help to improve crop resistance to adverse abiotic and biotic stresses [61]. Moreover, the large tissue-specific biomass of Brassica crops also makes them as potential plant factories for producing exogenous valuable commercial proteins and metabolites. As guided by Daniell et al., [14], these genome-wide inter-species variant information would benefit for the design of universal chloroplast genome engineering strategies, which are applicable for most of Brassica crops.

Conclusions
Compared with the huge nuclear genomes, chloroplast DNA is a primary and easy means to evaluate the evolutionary relationships in phylogenetic studies. Meanwhile, it is also highly effective to clarify the maternal origins and the related hybridization events at the beginning. Herein, the intra-and inter-genus phylogeny for Brassica were dissected by synchronous analysis of both the cpDNA and mtDNA, the disputable origin of the predominant nap-type B. napus are further clarified, the whole Brassica phylogeny were refined and enriched. Human interference has remodeled the cytoplasmic inheritance in B. napus. The overall variation data and elaborated phylogenetic relationships obtained herein can substantially facilitate the development of novel Brassica crops, e.g. the allohexaploid rapeseed with improved biomass and yield. Further studies regarding the chromosome-level and genomic mechanisms underlying Brassica phylogeny ought to be elucidated.

Plant materials
A set of 480 worldwide B. napus accessions were collected from the National Mid-term Genebank for Oil Crops of China, it has been repeatedly used as a core rapeseed collection in our previous studies [62]. The B. rapa and B. juncea populations contain primarily landraces, which were collected across China. The cultivated B. oleracea inbred lines were obtained commercially from market, the wild B.
oleracea and other C-genomewild species which are native to coastal southern and western Europe were collected from rocky Atlantic coasts of Spain (Bay of Biscay) and the Centre for Genetic Resources, The Netherlands (CGN). Detailed information in regard to all the above materials and other materials in Brassica genus and its relative genera are given in Table S1. Plant materials were

Genotyping analysis
Leaf total DNA of the corresponding accessions were directly extracted using the cetyltrimethylammonium bromide (CTAB) method described by Lutz et al., [63] and then subjected to BSA (w/v) and 0.1% PVP-40 (w/v), pH 7.8]. One centrifugation step (300 g, 5min) was performed to remove the unwanted whole plant cells and cell debris that mainly contain nuclear DNA contaminant.
Another following centrifugation step (1500 g, 10 min) was added to remove a large proportion of chloroplasts to keep a proportionable ratio between cpDNA and mtDNA content. Then, the mixture of chloroplasts and mitochondria were collected by a further centrifugation step (20,000g, 20 min) and then subjected to DNA isolation, using CTAB method.

Sequencing, genome assembly, variant calling and validation
High-throughput sequencing of the cytoplasmic DNA was performed according to our previous study [16]. The DNA was randomly ultrasonically sheared and prepared into paired-end (PE) libraries with insert sizes ranging from 300 to 400 bp, and then subjected to an Illumina Hiseq2500 (Illumina, San Diego, CA, USA) sequencing platform for sequencing at both single ends. Clean reads were directly mapped to a tandem sequence gather consisting of representative public cruciferous chloroplast genomes using Burrows-Wheeler Aligner (BWA) MEM program [65] under default parameters. The mapped paired-end reads were extracted and de novo assembled using the SOAPdenovo software package [29]. The obtained contigs were located on the yet published Brassica chloroplast genomes through BLAST alignments, and then sutured by manually jointing the overlapping ends between each two contiguous contigs. Basic variants (SNP and short InDels) were called using standard       Additional file 2:    improved memory-efficient short-read de novo assembler.