Reconstruction of ancestral diploid karyotype and evolutionary trajectories leading to the formation of Camelina sativa chromosomes

Background : Belonging to lineage Ⅰ of Brassicaceae, Camelina sativa is formed by two hybridizations of three species (three sub-genomes). The three sub-genomes were diverged from a common ancestor, likely derived from lineage Ⅰ (Ancestral Crucifer karyotype, ACK). The karyotype evolutionary trajectories of the C. sativa chromosomes are currently unknown. Here, we managed to adopt a telomere-centric theory proposed previously to explain the karyotype evolution in C. sativa . Results : By characterizing the homology between A. lyrate and C. sativa chromosomes, we inferred ancestral diploid karyotype of C. sativa (ADK), including 7 ancestral chromosomes, and reconstructed the karyotype evolutionary trajectories leading to the formation of C. sativa genome. The process involved 2 chromosome fusions. We found that sub-genomes Cs-G1 and Cs-G2 may share a closer common ancestor than Cs-G3. Together with other lines of evidence from Arabidopsis, we propose that the Brassicaceae plants, even the eudicots, follow a chromosome fusion mechanism favoring end-end joining of different chromosomes, rather than a mechanism favoring the formation circular chromosomes and nested chromosome fusion preferred by the monocots. Conclusions : The present work will contribute to understanding the structural and functional innovation of C. sativa chromosomes, providing insight into Brassicaceae karyotype evolution.


Background
Brassicaceae (mustard family) is one of the largest groups in plants, being composed of an approximate 3700 species, classified into 338 genera [1]. It includes several species of prominent scientific and economic importance, such as model plants (e.g., Arabidopsis thaliana), vegetable producing crops (Brassica rapa, AA genome), and oil producing crops (Brassica napus, AACC genome). According to phylogenetic relationship, Camelineae species (A. thaliana, Arabidopsis lyrata, and Capsella rubella) and Brassiceae species (B. rapa, Brassica nigra, BB genome and Brassica oleracea, CC genome) respectively represent lineage Ⅰ and lineage Ⅱ, two of three well-supported lineages among the Brassicaceae [2,3].
With the rapid increase of Brassicaceae genome assemblies, reconstructing ancestral genome can help understand the evolutionary history of the extant Brassicaceae families and species. Parkin et al. identified 21 common genomic blocks (GBs) between A. thaliana and B. napus by using restriction fragment length polymorphism (RFLP) mapping approach [4]. With genetic maps of A. lyrata and C. rubella, Schranz et al. defined 24 conversed GBs (labelled as A-X) related to ancestral karyotype (AK, n = 8), by referring to the extant A. lyrata and C. rubella karyotypes [5]. Ancestral crucifer karyotype (ACK, n = 8), improved from AK, is recognized as ancestral state of lineage I (Figs. 1 and 4a), based on the fact that most base common number of chromosomes is eight in the family and comparative chromosome painting (CCP) analysis of A. thaliana, A. lyrata, and C. rubella [ 6].
Besides, they reconstructed Proto-Calepineae karyotype (PCK, n = 7) as ancestral karyotype of 6 Brassicaceae tribes (Calepineae, Conringieae, Noccaeeae, Eutremeae, Isatideae, and Sisymbrieae) (Fig. 1), which involved three translocations and two inversions comparing to AK. While PCK is inherited in three of the six tribes (Calepineae, Conringieae, and Noccaeeae), which belongs to lineage Ⅱ, the rest of three of the six tribes (Eutremeae, Isatideae, and Sisymbrieae) is characterized by an additional translocation comparing to PCK. The derivative of PCK within the three tribes is referred as translocation Proto-Calepineae Karyotype (tPCK, n = 7). Cheng et al. provided conclusive evidence to support that tPCK was adapted to be ancestral karyotype of the mesohexaploid B. rapa, the genus Brassica, and the tribe Brassiceae, by comparing three ancestral sub-genomes of Chinese cabbage (B. rapa) with PCK and tPCK [7].
Running through the evolutionary history of plant kingdom, polyploidization continually led to genome doubling/tripling, genome repatterning, and gene loss, characterizing genome instability and fractionation [8][9][10]. Interestingly, chromosome numbers could be much reduced to a kind of normal range after rounds of polyploidization. For example, it was inferred that the eudicot proto-chromosome number before the major eudicot-common whole-genome triplication is seven [11].
After two extra Brassicaceae-common duplications (BCD) [12], A. thaliana has only five base chromosomes. It was proposed that chromosome number reduction (CNR) was often the result of reciprocal translocations, which combined two chromosomes into a larger one and a smaller one, and the smaller chromosome got lost during meiosis [13]. For example, ACK and PCK shared the same karyotype of five chromosomes (AK1-4,7), AK6/8 and AK5/6/8 in PCK formed by reciprocal translocations between AK5, AK6 and AK8, resulting in chromosome number reduction from eight to seven [6]. However, an alternative explanation considers that the removal of telomeres caused chromosome fusion and chromosome number reduction during the karyotype evolution, and explained the molecular dynamics of karyotype evolution [14]. This is a telomere-centric theory. A chromosome may form a circular form and cross-over may occur near its two telomeres, and the resolution of the cross-over may produce a telomere-free chromosome and a satellite chromosome of two telomeres and little DNA; the telomere-free chromosome may invade another chromosome and eventually result in the merge of the invading one into the invaded one, referred to nested chromosome fusion (NCF). Alternatively, two chromosomes may cross over near one telomere of each chromosome, resulting in chromosome end-end joining (EEJ) and formation of a satellite chromosome.
Besides, reciprocal translocation of arms (RTA) may although occur. The loss of satellite chromosome explains chromosome number reduction. Based on the telomere-centric theory, ancestral karyotype and evolutionary trajectories of chromosomes were reconstructed for Arabidopsis, grasses and legumes [14][15][16].
Belonging to lineage Ⅰ of Brassicaceae [3] (Fig. 1), C. sativa (false flax) is a highquality oilseed crop with several advantages of high production and resistance to drought and diseases for industrial production of biodiesel [17]. It was proposed that C. sativa represented a whole-genome triplication event relative to A. thaliana, and three sub-genomes were defined (Cs-G1, Cs-G2 and Cs-G3) [18,19] (Fig. 2). The sub-genomes Cs-G1and Cs-G2 are more closely related to each other than any of the diploids assayed based on phylogenetic relationship, Cs-G3 shows a clear expression level advantage over the other two sub-genomes, and the three subgenomes have an almost identical Ks distribution of synteny genes with A. thaliana.
The three sub-genomes were likely diverged from a common ancestor and the extant C. sativa hexaploidy genome result from a two-stage allopolyploid pathway.
The ancestral diploid karyotype of C. sativa (derivative of ACK, dACK) and the three sub-genomes has been reconstructed based on synteny and collinearity between C.
sativa and Arabidopsis species [19]. Here, using the theory of telomere-centric genome repatterning, we inferred a different ancestral diploid karyotype of C. sativa (ADK) comparing to previous study and reconstructed the karyotype evolutionary trajectories of the extant C. sativa genome. The present work will contribute to understanding the structural and functional innovation of chromosomes in C. sativa and Brassicaceae.

Results
Inference of ancestral diploid karyotype of C. sativa Parsimony-based phylogenomic analysis can help understand karyotype evolution.
To understand the evolutionary trajectories of ADK before divergence of three C. sativa sub-genomes, we analyzed the syntenic conservation and chromosome repatterning between the genomes of the ancestor of lineage Ⅰ and C. sativa. Here, we take the A. lyrata genome as the reference of ancestral genome of lineage Ⅰ for the sake of the high similarity between their karyotype. By searching homologous genes between them, we drew homologous gene dot-plots (Figs. 2 and 3), which showed orthologous correspondence between ancestral genomes of lineage Ⅰ and C. sativa genomes.
In the homologous gene dot-plots of the two genomes, produced directly by using BLASTP hits and further highlighted by using inferred collinear genes, every chromosome in the ancestral genome has three homoeologous chromosomes or groups of homoeologous chromosome regions in C. sativa genome. We found that 5 ACK chromosomes had nearly perfect orthologous correspondence with at least one or more complete chromosomes in C. sativa (Fig. 3a, b, c, d and e), showing that the integrity of each of these 5 chromosomes in ADK (correspondingly defined as ADK chromosomes 1, 2, 5, 6, 7) inherited the chromosome structure of ACK (AK chromosomes 1, 3, 6, 7, 8) without prominent DNA rearrangements. This means that during the formation process of ADK, 5 ADK chromosomes nearly perfectly retained the chromosome structures of the corresponding ACK ones.
Inferring evolutionary trajectories from ADK to extant C. sativa karyotype Chromosome structure can help understand phylogenomic relationship. In homologous gene dot-plots, orthologous correspondence between AK7 (ADK6) and Cs10, 11, 12 (Fig. 3d) suggested that one paracentric inversion is common to Cs10 and Cs 11, from Cs-G1 and Cs-G2, respectively, but not in chromosome Cs12 from Cs-G3. It suggested that Cs-G1 and Cs-G2 are not directly diverged from ADK, but share a common ancestor with one paracentric inversion as compared to ADK6.
The formation process of the three sub-genomes and C. sativa genomes can occur as follows: ancestral diploid of C. sativa differentiate into species A and B firstly, then species A differentiate into species C and D after one paracentric inversion occurred in ADK6 (Fig. 5). Cross-over between ADK6 and ADK7 near one telomere of each chromosome in species C, resulting in chromosome end-end joining (EEJ) to produce ADK6/7 and formation of a satellite chromosome of two telomeres and little DNA. ADK5 in species D experienced one paracentric inversion independently ( Fig. 3c and 5). Cross-over between ADK3 and ADK4 in species B, which experienced one translocation, resulting in reciprocal translocation of arms (RTA) to produce ADK3/4 and ADK4/3, which experienced one pericentric inversion (Fig. 3j, k and 5).
During the formation of the karyotype of C. sativa, 14 chromosomes of C. sativa

Discussion
The ancestral diploid karyotype of C. sativa, inferred by previous study, only involves chromosome repatterning between AK2 and AK4, but ignores that AK5 also took part in the formation of the ancestral diploid karyotype of C. sativa, which is strongly suggested in homologous gene dot-plots between A. lyrate and C. sativa genomes. Besides, we found clear evidences to support that Cs-G1 and Cs-G2 shared a closer common ancestor, having evolved from an ADK chromosome, than Cs-G3. This explains that Cs-G1 and Cs-G2 were shown to be closely related to each other than Cs-G3 based on phylogenetic relationship from previous study.
Comparing to previous study [19], we further inferred the evolutionary trajectories from ADK to extant C. sativa karyotype, which involved one EEJ and two RTAs.
While homologous gene dot-plots are always used to observe the occurrence and scale of polyploidization [12,20], it is always overlooked that homologous gene dotplots can also intuitively show the information of chromosome structure and trace of genome repatterning. It reminds us that homologous gene dot-plot is a powerful tool to help understand karyotype evolution and even phylogenetic relationship.
Chromosome number reduction (CNR) in Brassicaceae plants took always the endend-joining or EEJ mechanism rather than the nested-chromosome fusion or NCF mechanism. NCF and EEJ, which can generate satellite chromosome(s), the loss of which resulted in the CNR. Interestingly, the occurrence of the two mechanisms of CNR always showed an obvious plant family preference. The number of occurrences of the two mechanisms in grass family is summerized as follows: from the common 12 ancestral chromosomes, 7 NCFs and 0 EEJ occurred to produce 5 extant Brachypodium chromosomes, 5 NCFs and 1 EEJ to form wheat chromosomes, 1 NCF and 0 EEJ to form foxtail millet chromosomes, 13 NCFs and 4 EEJs to form maize chromosomes [14]. In summary, there are 23 NCFs and 5 EEJs occurring independently to form extant grass chromosomes, showing NCFs were significantly more preferred than EEJ (Chisq-test P-value ≈ 0.02395). In contrast, the CNR during the formation of A. thaliana chromosomes from eight ancestral chromosomes involved only three EEJs but not NCF [14]. Similar to A. thaliana, the formation of ADK, and the formation of the extant hexaploid genome of C. sativa, EEJ is the only mechanism that causes CNR. This shows an exclusive preference of EEJ in Brassicaceae. A significant preference of EEJ over NCF was also observed in legumes. Though the sampled families are still too limited, it seems that eudicots prefer EEJ, and monocots prefer NCF, resulting in CNR.

Conclusions
By using the telomere-centric model, we inferred ancestral diploid karyotype of C. sativa (ADK), including 7 ancestral chromosomes, and reconstructed the karyotype evolutionary trajectories leading to the formation of C. sativa genome. The process involved 2 chromosome fusions. Notably, we found that sub-genomes Cs-G1 and Cs-G2 may share a closer common ancestor than Cs-G3. The present work will contribute to understanding the structural and functional innovation of C. sativa chromosomes, providing insight into Brassicaceae karyotype evolution.

Dot-plot generation
We used BLASTP [22] to search for homologous pairs (E-value < 1 × 10 − 5 ) between every possible pair of chromosomes in two genomes. The best, second best, and other matches with E-value > 1e-5 were displayed in different colors, to help distinguish orthology from paralogy, or layers of paralogy as a result of recursive WGD events. Dot-plots were produced using home-made Python scripts.

Inferring positions of breakpoints
Homologous pairs detected by BLASTP were used as input for ColinearScan 1.01 [23], searching syntenic regions between A. lyrate and C. sativa genomes to infer positions of breakpoints caused by RTA. The maximum gap length (mg) was set to be 50 intervening genes between neighboring genes in collinearity on both chromosomes. Homologous dot-plots between selected A. lyrata or C. sativa and C. sativa chromosomes. Cs Evolutionary trajectories from ACK to ADK. a Ancestral Crucifer Karyotype (ACK). b Ancestral Figure 5 Evolutionary trajectories from ADK to extant C. sativa karyotype.