Genome diversification, deleterious mutation accumulation, and evidence of negative selection among clonally propagated grapevines

Abstract Vegetatively propagated clones accumulate somatic mutations. The purpose of this study was to better understand the consequences of clonal propagation and involved defining the nature of somatic mutations throughout the genome. Sixteen Zinfandel winegrape clone genomes were sequenced and compared to one another using a highly contiguous genome reference produced from one of the clones, Zinfandel 03. Though most heterozygous variants were shared, somatic mutations accumulated in individual and subsets of clones. Overall, heterozygous mutations were most frequent in intergenic space and more frequent in introns than exons. A significantly larger percentage of CpG, CHG, and CHH sites in repetitive intergenic space experienced transition mutations than genic and non-repetitive intergenic spaces, likely because of higher levels of methylation in the region and the disposition of methylated cytosines to spontaneously deaminate. Of the minority of mutations that occurred in exons, larger proportions of these were putatively deleterious when they occurred in relatively few clones. Repetitive intergenic space is a major driver of clone genome diversification. Clonal propagation is associated with the accumulation of putatively deleterious mutations. The data suggest selection against deleterious variants in coding regions such that mutations are less frequent in coding than noncoding regions of the genome.

2 Abstract 14 Vegetatively propagated clones accumulate somatic mutations. The purpose of this study was to 15 better understand the consequences of clonal propagation and involved defining the nature of 16 somatic mutations throughout the genome. 17 18 Sixteen Zinfandel winegrape clone genomes were sequenced and compared to one another using 19 a highly contiguous genome reference produced from one of the clones, Zinfandel 03. can be unknowingly propagated that render fruit highly acidic and inedible (Soost et al., 1961; 56 Whitham & Slobodchikoff, 1981). Interestingly, somatic mutations in plum are associated with a 57 switch from climacteric to non-climacteric ripening behavior (Farcuh et al., 2017). 58 There is limited understanding and evidence of the extent, nature, and implications of the 59 somatic mutations that accumulate in clonally propagated crops (McKey et al., 2010). 60 Genotyping approaches based on whole genome sequencing make it possible to identify genetic Mutations occur in somatic cells that proliferate by mitosis. These can occur by a variety 75 of means, including single base-pair mutations (Ossowski et al., 2010;Hershberg & Petrov, 76 2010) that are more prevalent in repetitive regions because methylated cytosines are passively 77 deaminated to thymines (Selker, 1990;Mautino & Rosa, 1998;Meunier et al., 2005), polymerase 78 slippage that drives variable microsatellite insertions and deletions (Schlötterer & Tautz, 1992 ). This is the case for green-yellow bud sports of the grey-fruited Pinot Gris, wherein sub-94 epidermal white cells invaded and displaced epidermal pigmented cells (Pelsy et al., 2015). In 95 contrast to replacement (L1 cells invade L2), displacement is likely more common because of the 96 relative disorganization of the inner cell layers (Thompson & Olmo, 1963 (Muller, 1932;Pineda-Krch & Fagerström, 1999). A recent study of the long-lived pedunculate 109 oak, Quercus robur, described substantial intra-organismal genetic variation, but did not draw  (Fig. S1, Methods S1), and its long history of cultivation make it a useful 125 model for studying clonal variation in grapevine, specifically, and the nature of the accumulation 126 of somatic mutations in clonally propagated crops, generally. 127 The purpose of this study was to better understand the nature of the somatic variations 128 that occur during clonal propagation. Representatives from at least a portion of Zinfandel's 129 history (Mirošević & Meredith, 2000;Maletic et al., 2003;2004;Fanizza et al., 2005) (Muller, 1932), that asexually propagated organisms accumulate deleterious mutations.

6
Third, somatic mutations were relatively scarce in the coding regions of genes relative to introns 139 and intergenic space, suggesting some degree of negative selection against deleterious mutations. High quality genomic DNA was isolated from grape leaves using the method described in  were also mapped to Zin03's primary assembly to assess differences in gene content between 224 Zin03's haplotypes. SMRT reads from Zin03 and CS08 were mapped to Zin03 using NGMLR 225 (v. 0.2.7) and structural differences were called with Sniffles (v.1.0.8; Sedlazeck et al., 2018). 226 Zinfandel clones were compared to one another using Illumina short reads and Delly (v. 0.7.8) 227 with default parameters (Rausch et al., 2012). The structural variations identified by Sniffles and 228 Delly in Zin03 were intersected. Several filters were applied to the results of SV analyses. 229 Transversions, non-reference Zin03 genotype calls, SVs annotated at the ends of contigs, and 230 SVs that intersected the repeat annotation were filtered from Delly output.  (Table 2). A total of 53,560 265 . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/585869 doi: bioRxiv preprint first posted online Mar. 22, 2019; complete protein-coding genes were annotated on the primary (33,523 genes) and haplotig 266 (20,037 genes) assemblies (Table 2). 267 Of the 20,037 genes annotated on the haplotig assembly, 18,878 aligned to the primary 268 assembly, leaving 1,159 genes that may exist hemizygously in the genome due to structural 269 variation between homologous chromosomes or because of substantial divergence in sequence 270 between haplotypes. These genes were annotated with a broad variety of putative functions, 271 including biosynthetic processes, secondary metabolism, and stress responses. Long reads were 272 mapped to both the primary and haplotig assemblies to evaluate the circumstances that explain  Table 3). SVs intersected 4,559 genes in the primary assembly (13.6% of primary assembly 279 genes) and 390 SVs spanned more than one gene. Manual inspection of the long reads aligned to 280 the primary assembly support that large, heterozygous deletions and inversions occurred in the 281 Zin03 genome that were either inherited from different structurally distinct parents or arose 282 during clonal propagation (Fig. 1b,c,d). Importantly, there was substantial hemizygosity in the 283 genome, with long reads supporting deletions affecting 2,521 genes and 4.56% of the primary 284 assembly's length (Table 3). 285 Next, we considered whether specific structural variation could account for the 1,159 286 genes uniquely found in the haplotig assembly. Three hundred eighty-two genes of the 287 previously mentioned 1,159 genes that uniquely exist within the haplotig assembly intersected 288 structural variations. Two hundred ninety of these intersected deletions, accounting for the 289 failure to identify them on the primary assembly. Some of the haplotig genes that failed to map 290 to the primary assembly intersected additional types of SVs, including duplications (80 genes), 291 insertions (89 genes), and inversions (16 genes). 292 These results reveal structural differences between Zinfandel's haplotypes. These 293 differences could have been inherited and/or could have occurred during clonal propagation. 294 Overall, these structural variations affected 4,559 primary assembly genes. Importantly, these 295 data show that a notable portion of the primary assembly's length (4.56%) is hemizygous. 296 297 . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.  Fig. 2a). Three hundred nine protein coding genes were found uniquely in Zin03 305 relative to PN40024 and CS08; 223 were annotated on the primary assembly and 86 were 306 annotated on the haplotigs (Fig. 2a, Table S1). These genes had a panoply of functions that 307 included but were not limited to nucleotide binding (60 genes), protein binding (58 genes), stress 308 response (34 genes), and kinases (28), and were associated with membranes (48 genes), signal 309 transduction (23 genes), carbohydrate metabolism (12 genes), and lipid metabolism (8 genes; 310 Table S1). 311 Structural differences between Zin03 and CS08 were explored in more detail by mapping 312 the long SMRT reads of CS08 onto Zin03's primary and haplotig assemblies with NGMLR and 313 calling SVs with Sniffles (Fig. 2b, Table 3). Overall, these SVs corresponded to 17.74% (159/ 314 897 Mbp) of the Zin03 assembly's total length, 12.5% of its total protein-coding regions (28 / 315 223 Mbp), and 25.6% of all Zin03 genes. SVs affected 9,885 genes in the primary assembly and 316 3,804 genes in the haplotigs. Manual inspection of the alignment of long CS08 reads to Zin03's 317 primary assembly support that large SVs exist between the two genotypes ( Fig. 2c,d). Next, we 318 considered whether specific structural variation called by Sniffles could account for the 576 319 Zin03 genes absent from CS08 according to the reciprocal mapping analysis (Fig. 2a CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. their recorded origins prior to acquisition by FPS (Fig. 3a,b). The ambiguity surrounding the 335 travels and histories of these clones means that it should not be taken for granted that the 336 Californian selections, for example, ought to be more closely related to one another than to the 337 Italian or Croatian selections. Notably, Crljenak kaštelanski 3 stands notably apart from the other 338 Zinfandel clones. In addition, Pribidrags 5 and 15, which have a known and close relationship, 339 do not co-localize in the PCA (Fig. 3a,b, Table 1). 340 A kinship analysis (Manichaikul et al., 2010) was then used to quantitatively assess the 341 relationships between the Zinfandel selections. These values range from zero (unrelated) to 0.5 342 (self). Additional cultivars were included in the analysis with known relationships to help 343 contextualize the differences between clones and the integrity of the analysis (Fig. 3c). Cabernet  (Fig. 3c). These 351 data suggest that Crljenak kaštelanski 3 is either not a clone of Zinfandel, contradicting marker 352 analyses, or that it is a highly divergent clone. 353 Across the Zinfandel clones, the median number of homozygous and heterozygous 354 variants called relative to Zin03 were 38,092 and 717,925, respectively. Between 10-fold and 355 ~27-fold more heterozygous variants were called than homozygous variants in each clone except 356 for Crljenak kaštelanski 3, for which only ~2.5-fold more heterozygous sites were called (Table   357 S2). Crljenak kaštelanski 3 had 4.3-fold more homozygous variants and 1.8-fold fewer 358 heterozygous variants than the other clones (Table S2). Furthermore, unlike other clones, for 359 which less than 10% of sites did not share the Zin03 reference allele, ~29% of variant sites were 360 called where Crljenak kaštelanski 3 did not share the Zin03 reference allele (Table S2). Together, 361 . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.   (Table S2) Zinfandel clones (Fig. 5a). Because all clones are identically heterozygous at these loci, these 407 variants are those inherited from Zinfandel's parents. 408 Individual and subsets of Zinfandel clones accumulated heterozygous mutations as clonal 409 propagation occurred (Fig. 5a). Thirteen percent and 16% of heterozygous INDELs and SNPs, 410 respectively, and 1% of large (>50 bp) structural variants occurred in only one or two clones 411 (Fig. 5a). The distribution of SVs called by Delly is markedly different than those of SNPs and 412 INDELs (Fig. 5a). For both SNPs and INDELs, there were 3 and 3.5-fold as many heterozygous 413 variants shared by all 15 clones as there were uniquely occurring variants; there were 71.5-fold 414 more structural variants shared by all clones than there were unique variants in individual clones 415 (Fig. 5a). This might imply that the mechanisms that give rise to small mutations are more 416 common among clones than the large-scale changes associated with SVs. 417 The distribution of unique and shared heterozygous INDELs in exons, introns, repetitive, 418 and non-repetitive intergenic spaces were not equal (Fig. 5b) (exons, introns) and intergenic (repetitive, non-repetitive) regions were not equal (Fig. 5b). 422 Shared heterozygous SNPs were most common in intergenic non-repetitive regions and introns 423 and least common in exons and repetitive intergenic regions (Fig. 5b). Interestingly, unique 424 heterozygous SNPs occurred at high rates in repetitive intergenic regions (Fig. 5b).

425
. CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. . This is also supported by the significantly higher ratio of transitions to transversions 430 in repetitive intergenic regions than in exons, introns, and non-repetitive intergenic space (Fig.   431 5c). Furthermore, the mean percentage of CpG, CHG, and CHH sites affected by transition 432 mutations was significantly higher in repetitive intergenic space than genic and non-repetitive 433 intergenic spaces (Fig. 5d; Tukey HSD, p < 0.01). The mean percentage of CpG sites affected by 434 transition mutations was also significantly higher in introns than exons (Tukey HSD, p < 0.01). 435 Compatible with this hypothesis, INDELs, which should not increase in frequency due to 436 methylation, did not occur preferentially in repeats (Fig. 5b). 437 The impact of specific variants also varied with their prevalence among the clones ( CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.  (Fig. 6a) and included 325 retrotransposons, 460 mostly Copia and Gypsy LTRs, and 69 DNA-transposons (Fig. 6b). Because uniform loci are 461 excluded, in-common TEI were not captured when clones were compared to Zin03. Comparing 462 the clones relative to PN40024, however, revealed that the majority (64.8%) of TEI were shared 463 among the 15 Zinfandel clones. Five hundred thirty TEI occurred in only one, two or three 464 clones (Fig. 6a). This result supports the derivation of these selections from a common ancestral 465 plant and the accumulation of somatic variations over time. 466 In addition to being suggestive of their shared heritage, the positions of these insertions 467 and their proximity to coding genes were notable. Three-hundred forty-seven TEI occurred 468 within 314 coding genes. The remaining 938 TEIs were in intergenic regions (Fig. 6c). The 469 median upstream and downstream distance of intergenic TEs from the closest feature were 470 11,811 and 11,279 base-pairs, respectively, and 25% of TEI were less than 4,345 bases 471 downstream of the closest feature and/or less than 3,826 bases upstream of the closest feature 472 (Fig. 6c). CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/585869 doi: bioRxiv preprint first posted online Mar. 22, 2019; common ancestral Zinfandel mother plant and show the accumulation of somatic mutations over 489 time (Figs. 6 and 7). The structure of the Zinfandel genome, location of mutations among clones, 490 their frequency and prevalence, and the relationship between these factors provides some insight 491 into the nature of mutations in clonally propagated plants. Mutations among clones were 492 predominantly heterozygous (Fig. 4) and uncommon heterozygous mutations shared by a subset 493 of or individual clones were increasingly deleterious when they occurred in exons (Fig. 5e). 494 There are costs and benefits associated with clonal propagation (McKey et al., 2010). 495 Among the benefits are that the plants need not breed true-to-type; clonal propagation generally 496 fixes heterozygous loci and valuable phenotypes. However, the increase in the proportion of 497 deleterious alleles supports Muller's ratchet, which posits that sex is advantageous and that 498 clonal propagation increases mutational load (Muller, 1932). Though these and previous data do  (Selker, 1990;Cantu et al., 2010). 512 Together, the expectations that intergenic regions are rich in transposable elements, that these 513 regions are typically highly methylated and as a result will experience greater transition rates 514 account for the high rates of SNPs in repetitive intergenic spaces among Zinfandel clones. Also 515 notable, these data show that some transposable elements are not entirely silenced, with a 516 substantial number inserting in genes or in close proximity to genes (Fig. 6b). These insertions   CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.  . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.