Draft Genome Sequence and intraspecific diversification of the wild crop relative Brassica cretica Lam. using demographic model selection

Crop wild relatives contain great levels of genetic diversity, representing an invaluable resource for crop improvement. Many of their traits have the potential to help crops become more resistant and resilient, and adapt to the new conditions that they will experience due to climate change. An impressive global effort occurs for the conservation of various wild crop relatives and facilitates their use in crop breeding for food security. The genus Brassica is listed in Annex I of the International Treaty on Plant Genetic Resources for Food and Agriculture. Brassica oleracea (or wild cabbage) is a species native to coastal southern and western Europe that has become established as an important human food crop plant because of its large reserves stored over the winter in its leaves. Brassica cretica Lam. is a wild relative crop in the brassica group and B. cretica subsp. nivea has been suggested as a separate subspecies. The species B. cretica has been proposed as a potential gene donor to a number of crops in the brassica group, including broccoli, Brussels sprout, cabbage, cauliflower, kale, swede, turnip and oilseed rape. Here, we present the draft de novo genome assemblies of four B. cretica individuals, including two B. cretica subsp. nivea and two B. cretica. De novo assembly of Illumina MiSeq genomic shotgun sequencing data yielded 243,461 contigs totalling 412.5 Mb in length, corresponding to 122 % of the estimated genome size of B. cretica (339 Mb). According to synteny mapping and phylogenetic analysis of conserved genes, B. cretica genome based on our sequence data reveals approximately 30.360 proteins. Furthermore, our demographic analysis based on whole genome data, suggests that distinct populations of B. cretica are not isolated. Our findings suggest that the classification of the B. cretica in distinct subspecies is not supported from the genome sequence data we analyzed.


Introduction
Crop Wild relatives 2 Many plant species are used in the food and agriculture market; however, 30 crops account for 3 the 95% of food production worldwide (Brozynska et al. 2016). Domesticated crops, used in the 4 food production, show reduction in the genetic diversity, compared to their respective Crop 5 Wild Relatives (CWRs). This genetic "bottleneck" of domestication (Tanksley and McCouch 6 1997) resulted to loss of valuable genetic alleles. On the other hand, during the domestication 7 process cultivated varieties, introgression from wild species may generate additional genetic 8 diversity may arise (Hufford et al. 2013, Sawler et al. 2013). 9 As wild 'progenitors' of crops continue to evolve under abiotic and biotic stresses, it is very 10 important to conserve this resulting genetic biodiversity, which can be useful for agriculture (in 11 situ conservation). Seed banks or germplasm collections are also important to preserve as 12 another resource for agriculture (ex situ conservation). The total genome sequencing of CWRs 13 may be used first to characterize wild populations and inform strategy for their conservation. 14 On the other hand, analysis of the sequence can reveal genetic variation and important genetic 15 characters that have been lost during domestication, and that could be transfer into crop 16 species to support food security, climate adaptation and nutritional improvement (Brozynska 17 et al. 2016). The ready availability of low-cost and high-throughput re-sequencing technologies 18 enable the survey of CWR genomes for genetic variation and novel genes and allelels. 19 Recent decades have seen some remarkable examples of introducing favored traits from CWRs 20 into their respective domesticated crop plants. In most cases, these traits concern resistance 21 to biotic stresses, such as resistance to late blight (Phytophthora infestans) from the wild potato 22 Solanum demissum Lindl. (Prescott-Allen 1986, Witek et al. 2016). Besides biotic tolerance, 23 many quantitative trait loci have been identified and/or introduced, regarding the grain quality 24 for increased yield, such as from Oryza rufipogon, a wild species of rice, to Oryza sativa 25 29 Brassica oleracea L. is a very important domesticated plant species, comprising of many 30 vegetable crops as different cultivars, such as cauliflower, broccoli, cabbages, kale, Brussels 31 sprouts, savoy, kohlrabi and gai lan. Brassica oleracea or wild cabbage belongs to the family of Brassicaceae and is found in coastal Southern and Western Europe. The species has become 1 very popular because of its high content to nutrients, such as vitamin C, its anticancer were predicted, with mean transcript length of 1,761 bp and 3,756 non-coding RNAs (miRNA, 16 tRNA, rRNA and snRNA). It is observed that there is a greater number of transposable elements 17 (TEs) in B. oleracea than in B. rapa as a consequence of continuous amplification over the last 18 4 million years (MY), the time that the two species were diverged from a common ancestor, 19 whereas in B. rapa the amplification is made mostly in the recent 0.2 MY (Liu et al. 2014). 20 Moreover, there has been observed massive gene loss and frequent reshuffling of triplicated 21 genomic blocks, which favored over-retention of genes for metabolic pathways. 22 23 Brassica cretica 24 Among the Aegean islands, Crete is the largest and the most diverse from a floristically point of 25 view. It has experienced a much longer history of isolation compared to the smaller Aegean 26 islands. Over two-thirds of all Greek plant species are found in Crete and it has the greatest 27 proportion of endemic species in the Aegean area (Edh et al. 2007, Greuter 1971, Webb 1978 habitat is restricted at present to areas that are surrounded by a 'sea' of low-lying areas acting as dispersal barriers (Davis 1951), this especially includes various chasmophytic plant species. 1 Brassica cretica Lam is a typical example of a Cretan chasmophyte species. It is a wild plant 2 species preferentially inhabiting limestone cliffs and gorges, mainly in Crete but also on other 3 Mediterranean countries in the surrounding coastal areas (Snogerup et al. 1990). Brassica 4 cretica Lam, is wild relative of the cultivated cabbage, B. oleracea L. (Lázaro and Aguinagalde, 5 1998). The species is hermaphrodite (has both male and female organs) and is pollinated by 6 Insects. Brassica cretica is a diploid (2n = 18), partially self-incompatible, with a native  Brassica cretica Lam. is being considered as a wild crop relative of a big number of crops of the 30 genus Brassica, proposed to be the ancestor of broccoli, Brussel sprouts, cabbage, cauliflower, 31 kale, swede, turnip and oilseed rape. Since this species is thought to be the gene donor of many crops of the brassica group, it might contain genes that are not included in the domesticated 1 crops, as well as, a different set of NLRs. Potential analysis of the NLRsome of wild species would 2 help us find which genes or locus are responsible for the recognition of effectors from 3 important phytopathogens and thus create resistant plants in the field via transfer of these 4 favored genes/locus (Chen et al. 2013). 5 6 Aim of this work 7 Here, we present the first draft de novo genome assemblies of four individual of B. cretica and 8 using the derived genomic data we investigate mechanisms of diversification of four isolated B. 9 cretica populations taking into consideration their genomic and subspecies variation. one from the third mainland population (C) and the other from Crete, the island population (D), 18 have been used for the genome assembly ( Figure 1). 19 20 Total DNA extraction and Library preparation for NGS 21 Genomic DNA was extracted from the young emerging leaves using two previously published 22 protocols. For total DNA isolation of up to 1 g plant leaf tissue was used. For the DNA isolation 23 we used several protocols including the DNeasy Plant Mini Kit from Qiagen, as the 24 manufactures propose. Likewise, we used a modified triple CetylTrimethyl Ammonium Bromide 25 (CTAB) extraction protocol for total plant DNA isolation, as it has been described before (Abbasi 26 and Afsharzadeh 2016). 27 The yield and quality of DNA were assessed by agarose gel electrophoresis and by a NanoDrop Genome assembly and annotation 6 Prior to assembly, Illumina MiSeq sequence reads were filtered on quality scores and trimmed 7  identified single-nucleotide polymorphisms by alignment against the reference genome 24 sequence, according to the following procedure. After trimming and filtering with TrimGalore, 25 sequence reads were aligned against the reference sequence using Burrows-Wheeler Aligner Brassica oleracea has been examined thoroughly in the past and there is a gene list of the 1 organism organized into chromosomes. We used this list to exclude SNPs with a distance less 2 than 10kb from those coding regions. This process of removing SNPs is necessary when the 3 SNPs are used to infer the demographic model. Due to linkage disequilibrium SNPs within or in 4 the proximity of genic regions are affected by selection forces, especially negative selection. 5 Negative selection effectively increases the low frequency derived variants and therefore it 6 introduces biases in the demographic inference. For this reason, we excluded SNPs located 7 within or in the proximity of genic regions. Demographic inference 10 Using dadi to infer the demographic model. 11 Inferring a demographic model consistent with a particular data set requires random walks into GenBank genomes assembling deposition details 25 The assembly statistics for each of the assembled genomes can be found in Table 1 and Table S4. The 26 assembly accession number as they appear at the GenBank are: 1) Brassica cretica PFS-1207/04,  distinct subspecies is not supported from the data. Using only the non-coding part of the data 5 (thus, the part of the genome that has evolved nearly neutrally), we find the gene flow between 6 different B. cretica population is rather recent and its genomic diversity is high. 7 We followed two approaches, to infer the neutral demographic model for the B. cretica data. 8 The two approaches are related to the separation of the individual plants into distinct groups 9 (i.e., population or subspecies). According to the first approach, the subspecies approach, we  4 Following the subspecies definition of the two groups of plants, the model "Vicariance with late 5 discrete admixture" is the most likely among the 30 different models with two populations. 6 Such a model suggests that the two subspecies were discrete for a long period of time. 7 However, recently, introgression took place from group 1 (plants A and B) to group 2. Such a 8 massive gene flow suggests that the two groups of plants may not define distinct subspecies, 9 therefore they can be considered as different population of the same species ( Figure 3A). 10 11 Demographic Model Inference based on the PCA plot 12 Based on the logPCA results we identified 2 populations. The first comprising three individuals 13 (B, C, D) and the second containing one (A). It is important to note that despite the fact that the 14 A, B, and C plants were sampled from Central Greece and D from Crete, logPCA shows that the 15 Cretan individual is genetically closer to B and C. The distances of A and D to the B-C cluster 16 have small difference, as a result we generated an additional population schema grouping 17 together A, B, C and D as another subpopulation. 18 For the first grouping, the "Founder event and discrete admixture, two epoch" model, was 19 selected as the most possible demography model ( Figure 3B). The second grouping resulted in 20 the "Divergence with continuous symmetric migration and instantaneous size change" as the 21 best model to explain the data ( Figure 3C). The first model specifies that the original population 22 split into two subgroups that allowed symmetric migration between them, continuing the 23 population size of each subgroup changed, whereas the second model allows the 24 subpopulations to migrate as the time progresses and the second subpopulation experiences a 25 population size change. The joint 2 population AFS for the real and the simulated data, as well 26 as their difference (residues) are shown in Figure 4. 27 In all grouping definitions, it is apparent that populations are not isolated. This is considerable 28 gene flow between all possible groupings of the populations. Especially, in the subspecies- 29 based grouping, the inferred model proposes introgression between the two groups, i.e., 30 massive, directional gene flow. Thus, the genetic data suggest that the subspecies separation 31 of Brassica cretica plants may be, in fact, not supported by the data. The parameter values for all inferred demographic models as well as the AIC scores of the competing models are 1 presented in the supplementary tables S1, S2 and S3. Understanding mechanisms generating parallel genomic divergence patterns among populations is a 6 modern challenge in population ecology, which can widely contribute in the perception of the 7 intraspecific diversification of crop wild relatives. Here we investigated the genomic divergence 8 between three population schemes of Brassica cretica using demographic model selection. According 9

Demographic Model Inference based on the subspecies definition
to the above results we can support that strict isolation is not recorded between populations. Discrete 10 unidirectional admixture event or continuous symmetric migration was recorded indicating an absence 11 of insuperable barriers in gene flow between populations. Even in the case of taxonomic segregation, 12 where strengthen barriers would be expected, late discrete unidirectional admixture event is 13 corroborated. 14 15 The above finding poses the need for further studies concerning the potential gene flow Crescent reflects genetic processes such as drift, founder effect and infrequent out-crossing 8 with related individuals, rather than environmental selection pressure. 9 Unidirectional gene flow has also been reported in cases of other organisms, such as in the case 10 of two lizard subspecies, where gene flow from one subspecies (Podarcis gaigeae subsp. B. cretica subsp. nivea has not been suggested as a standing subspecies. 28 In the case of non-taxonomic segregations, that is the case of genomic-variation based 29 population schemes, both divergence and founder event were recorded as split mechanisms of 30 the original population, while continuous symmetric migration and discrete unidirectional 31 admixture event in late epoch respectively were specified. In the literature in population genetics, migration and gene flow are often used interchangeably (Tigano & Friesen 2016). 1 Nevertheless, migration refers to the movement and dispersal of individuals or gametes, and 2 gene flow for the movement of alleles, and eventually their establishment, into a genetic pool 3 different from their genetic pool of origin (Endler 1977, Tigano & Friesen 2016. In our case a 4 more appropriate term to use for migration would be dispersal, as migration is mainly used for 5 animals, incorporating also the seasonal movements. 12 would like to acknowledge Dr Karen Moore and the Exeter Sequencing Service at University of 13 Exeter, for technical assistance with DNA genome sequencing.

0.0519
The scaled time between the admixture event and present.

4.1887
Fraction of ancient population that goes to second population. (Pop 1 has size nuA*(1-s).)

0.9789
Fraction of updated population 2 to be derived from population 1.