Skip to main content

Chimeras in Merlot grapevine revealed by phased assembly

Abstract

Chimerism is the phenomenon when several genotypes coexist in a single individual. Used to understand plant ontogenesis they also have been valorised through new cultivar breeding. Viticulture has been taking economic advantage out of chimeras when the variant induced an important modification of wine type such as berry skin colour. Crucial agronomic characters may also be impacted by chimeras that aren’t identified yet. Periclinal chimera where the variant has entirely colonised a cell layer is the most stable and can be propagated through cuttings. In grapevine, leaves are derived from both meristem layers, L1 and L2. However, lateral roots are formed from the L2 cell layer only. Thus, comparing DNA sequences of roots and leaves allows chimera detection. In this study we used new generation Hifi long reads sequencing, recent bioinformatics tools and trio-binning with parental sequences to detect periclinal chimeras on ‘Merlot’ grapevine cultivar. Sequencing of cv. ‘Magdeleine Noire des Charentes’ and ‘Cabernet Franc’, the parents of cv. ‘Merlot’, allowed haplotype resolved assembly. Pseudomolecules were built with a total of 33 to 47 contigs and in few occasions a unique contig for one chromosome. This high resolution allowed haplotype comparison. Annotation was transferred from PN40024 VCost.v3 to all pseudomolecules. After strong selection of variants, 51 and 53 ‘Merlot’ specific periclinal chimeras were found on the Merlot-haplotype-CF and Merlot-haplotype-MG respectively, 9 and 7 been located in a coding region. A subset of positions was analysed using Molecular Inversion Probes (MIPseq) and 69% were unambiguously validated, 25% are doubtful because of technological noise or weak depth and 6% invalidated. These results open new perspectives on chimera detection as an important resource to improve cultivars through clonal selection or breeding.

Background

Chimera phenomenon

Individuals including cells with different genotypes are called chimeras or genetic mosaics. They are formed when a somatic genetic variation appears in a single cell in the meristem and is propagated through cell divisions. Occasionally the variation modifies a character of the plant and makes the chimera visible. This phenomenon called sporting was observed many centuries ago and has been fascinating scientists since then [1,2,3]. Chimeras can also induce variegation which is a very noticeable sporting type because it appears as a mosaic of colours in either leaves, flowers or fruits [4]. In 1907, Winkler was the first to use the term chimera while observing grafted plants [5]. Then, the investigation of variegation led Erwin Baur’s on the path of non-Mendelian inheritance [6]. Later, colchicine treatment on Datura seeds revealed periclinal chimeras [7] which allowed the understanding of cell lineages and ontogenesis of plant organs [8,9,10]. Indeed, two main types exist: sectorial and periclinal chimeras. The differences between those are the complete (periclinal) or incomplete (sectorial) colonisation of a cell layer by the somatic variation [11,12,13]. Periclinal chimeras are the most stable and can propagate by vegetative multiplication from cuttings [14]. Chimeras have already highly contributed to plant ontogenesis comprehension [10, 15]. In most cases these mutations are silent although some may modify the plant’s phenotype on important agronomic traits [16]. They can be used to study biosynthetic pathways [17] but should also be considered as an important source to improve current cultivars or breed new ones [18, 19].

Viticulture has already been taking economic advantage of grapevine chimeras by propagating the new phenotype as a new cultivar. For instance, genetic mosaics explain the origin and the evolution between cv. ‘Pinot Blanc’ and ‘Pinot Gris’ with a modification of berry skin colour [20, 21] and between several teinturier cultivars [22]. A somatic mutation in cv. ‘Meunier’, derived from ‘Pinot Noir’, was used to produce a microvine which strongly accelerates physiology, biology and genetics studies [23].

For these reasons, grapevine is a good example to study chimeras in woody plant species. While plant organs generally originate from three meristematic cell layers, grapevine organs come from only two functional cell layers (L1 and L2) in the apical meristem [12]. Leaves derived from L1 and L2 cell layers while other organs like gametophytic tissues and lateral roots originate from L2 cell layer only [12, 24]. Indeed, lateral roots are formed from the differentiation of the meristematic L2 cell layer (Fig. 1). Comparing the whole DNA sequence obtained through leaf samples (L1 + L2) with lateral roots samples (L2 only) could allow us to identify chimeras that may not be visible but play an important role in intra-varietal genetic diversity. In order to do this, a high quality whole genome sequence to accurately validate a chimera against sequencing mistakes is necessary. The assembly genome also needs to be resolved per haplotype to distinguish chimeras from grapevine heterozygosity.

Fig. 1
figure 1

Cellular layers in grapevine roots and leaves. Schematic representation of a grapevine plant. Leaf and lateral root cross sections are enlarged in order to present the different cell layers present in both organs. Leaves are derived from both L1 and L2 meristem cell layers, while lateral roots are only formed out of the L2 layer

Whole genome new generation sequencing

The first grapevine whole genome sequencing was published in 2007 by the French-Italian consortium [25]. Since then, the ‘PN40024’ sequence obtained with a nearly homozygous inbred ‘Pinot Noir’ plant has been the reference genome for Vitis sp. The first version had 8X coverage and has been gradually updated through the 12x.v0, the 12x.v2 [26] and the most recent one is PN40024.v4 which has 40X coverage (European Nucleotide Archive - project PRJEB45423). Having a good quality whole genome reference has highly increased the understanding of Vitis vinifera genome but the nearly homozygous plant (PN40024 is homozygous on 93% of the genome) can hardly be considered as representative of cultivars used in grape production.

Short reads technology produces accurate genome sequencing, but because of the highly repetitive sequences of the grapevine genome they are difficult to assemble, therefore producing a whole reference genome with short reads can become very challenging [27]. Long reads technology has allowed massive improvement in genome assembly. The Falcon-Unzip phasing algorithm was very successful on Arabidopsis but had more difficulty with the grape cv. ‘Cabernet-Sauvignon’ sample because of the high rate of heterozygous position and the amount of repetitive sequences [28]. Purging treatment on haplotigs allowed to increase the assembly quality on the ‘Chardonnay’ sequence [29]. The ‘Carmenère’ phased assembly was also improved by optimising coverage, error correction, repeat masking methods and assembly parameters of FALCON-UNZIP [30]. Since then resolving haplotype phased assembly has become more accessible and numerous cultivars of the Vitis genus have been sequenced (Table 1).

Table 1 Whole grapevine genome sequences published until today:

Third generation long reads sequencing with high accuracy brings us into a new perspective of whole genome sequencing. Bioinformatics engineering is adapting to these new sequencing technologies, long reads haplotype assembly is now possible for diploids through tools like Hi-Canu and Hifiasm [34]. To increase accuracy and have a better chance to resolve haplotype phasing, trio-binning with parental sequences can be used to sort the child’s reads in two groups [35]. Up to now different techniques have led to chimera detection: (i) random amplified polymorphic DNA [36]; (ii) comparing phenotypes of regenerated plants from different cell layers [14]; (iii) comparing microsatellites markers in wood or roots tissues (L2) against leaves (L1 + L2) coming from the same plant [37]; (iv) flow cytometry measurements on pericarp and flesh fruit tissues in order to compare ploidy level between L1 and L2 [38]; (v) Real Time PCR on regenerated transgenic plants in order to evaluate the amount of chimeras and the uniformity of the transformation [39]; (vi) microsatellite (SSR) amplification by PCR when three alleles are found on one loci and confirmed by comparing different regenerated plants [40]; (vii) comparing DNA sequences obtained from different tissues dissection of leaf or berry skin (L1 + L2) against flesh or roots (L2 only) [20, 22, 41]. This last method has also been used in other species such as bananas by comparing DNA from leaf, stem, rhizome and roots [42]. Although all these experiments demonstrate the existence of chimeras in plants and sometimes their crucial impact on agronomical traits, genome wide chimera detection has yet not been possible. Validation of chimeras is also a challenge because we expect low alternative allele frequency since the variant will appears on only one haplotype and one cell layer representing a small proportion of leaf tissue. Sanger sequencing has been found to be limited when alternative allele frequency is under 15–20% [43] but other technologies such as Molecular Inversion Probes (MIP) [44] has proven to be efficient in this particular condition [45]. Widely used in medical programs to detect rare diseases [46,47,48] it has also been used in plants to detect pathogens [49] or assist in genomic selection [50]. Because of repetitive sequences, it is also difficult to design specific target sequences that only capture the identified SNVs, MIPs along with MIPGEN designing software [51] is efficient on this specific criteria, it also has the advantage of being performant with a small amount of DNA (200ng) which also makes it useful in forensic applications [52, 53]. According to this information MIPs should be an interesting technology for chimera validation.

Importance of ‘Merlot’ grapevine cultivar

‘Merlot’, which is a cross between ‘Cabernet Franc’ and ‘Magdeleine Noire des Charentes’ [54], is the grape cultivar used in this study. It was first mentioned in south-western France in the late 18th century and expended in Bordeaux area since the middle of 19th century; the impressive spreading of this cultivar in other French regions and worldwide only dates from the 1970s [54]. Today it is the fourth most planted cultivar in the world for table and wine grapes and the second cultivar for wine, cultivated in at least thirty-seven countries on 266 000 ha [55]. It is also the most planted variety in France with 114 578 ha in 2018 [56]. The international success of cv. ‘Merlot’ is mainly explained by the high quality red wines produced in renowned Bordeaux vineyards [57]. This cultivar is also one of the earliest black varieties to be harvested and thus one of the most impacted by climate change. In some areas, cultivating ‘Merlot’ could become inappropriate to produce high quality red wines because of cooked aromas and too high alcohol content [58]. Therefore, exploring ‘Merlot’ genome could open new perspectives to better understand its genetic and physiologic functioning as well as its intravarietal diversity required for clonal preservation and selection. New knowledge on ‘Merlot’ genome and chimeras could also help future grape breeding in order to create improved varieties with a similar fruit phenotype.

Throughout this study we take advantage of the latest sequencing and bioinformatics technologies not only to obtain a whole phased assembly ‘Merlot’ genome but also to contribute to a better understanding of a complex biological phenomena. We used parental sequences of cv. ‘Cabernet Franc’ and ‘Magdeleine Noire des Charentes” to bin ’Merlot’ reads in two groups, assemble the reads per haplotype and build pseudo-molecules. We compared root and leaf sequence to detect periclinal chimeras on each haplotype. We transferred gene annotation from ‘PN40024’ Vcost.v3 [26] to our pseudo-molecules in order to have a functional interpretation of the chimera’s location. Finally, a subset of the chimeras was analysed by MIP in order to validate them with an independent technology.

Results

Building pseudo-molecules

DNA samples from ‘Merlot’ lateral roots and leaves and from leaves only for ‘Magdeleine Noire des Charentes’ (maternal) and ‘Cabernet Franc’ (paternal) have been sequenced using Pacific Bioscience Sequel II technology. For each sample between 1.7 and 2.3 million HIFI reads are obtained with an average length of 13 kb and 99.9% phred score accuracy. Taking 500 Mb as Vitis genome size we estimate a mean coverage between 47x and 58x according to the sample (Table 2).

Table 2 Sequencing quality information for the 4 samples

Hifiasm trio-binning resolved the raw assembly of each haplotype with high confidence in regards to the statistics presented below (Table 3). Using 500 Mb as the expected genome size of Vitis vinifera, the mean N90 of 7,39 Mb with the L90 of 27 allows us to consider these results as being very good quality.

Table 3 Assembly quality after trio-binning

After two successful alignments, first on the PN40024.v4 reference [43] then on the second haplotype (see Material and Methods section), each contig was assigned to its chromosome, their order and orientation were found. In fine, a unique contig to a maximum of 5 were needed to shape chromosomes and between 33 and 47 were used for the whole genome (Table 4). Thus, “Merlot haplotype Cabernet-Franc” (Merlot-hap-CF) is set to 486–490 Mb and “Merlot haplotype Magdeleine Noire des Charentes” (Merlot-hap-MG) to 491 Mb. Because of the sequencing technology and the high performance of long read assembly, we obtain longer chromosomes compared to PN40024.v4. Chromosome lengths are very similar between leaves and roots but trio-binning which correctly phase the assembly, there is a slight difference between Merlot-hap-CF and Merlot-hap-MG (Fig. 2).

Table 4 Pseudomolecules characteristics for the haplotypes from both roots and leaves genomes
Fig. 2
figure 2

Chromosome length per haplotype compared to PN40024_12X.v4 genome. Chromosome length per haplotype in Mbp for each pseudomolecule built (Merlot-Root-Hap-CF; Merlot-Leaf-Hap-CF; Merlot-Root-Hap-MG; Merlot-Leaf-Hap-MG) against PN40024.v4.

From 1,6% to 2,8% of the assembly can’t be accurately placed in the pseudo-molecule. It is mainly small highly repetitive sequences that are found in different places throughout the genome. This proportion was not attributed to a specific location and couldn’t be associated to a contig.

The results of Benchmarking Universal Single-Copy Orthologs (BUSCO) using embryophyta lineage-specific databases [59] are resumed in Table 5. Results show that up to 98.7% of genes searched are found in the pseudo-molecules and nothing is lost compared to the raw assembly. Duplicated genes are reduced to 1.2–2.2% while missing genes remain around 0.4–0.6%. Therefore, those unplaced contigs were confidently ignored in the further analysis.

Table 5 Search for genome completion using BUSCO embryophyta odb 10

The annotation ‘PN40024’ Vcost.v3 was transferred to each pseudo-molecule using Liftoff tool [60]. In average 95% of the 42 413 genes were positioned throughout the 19 chromosomes with difference of gene numbers between chromosomes as expected (Fig. 3). BUSCO analysis was also performed on protein sequences from ‘PN40024’ Vcost.v3 files and obtained a score of 97,3% complete genes against 95,2% for Merlot-leaf-hap-CF and 95,1% for Merlot-leaf-hap-MG.

Fig. 3
figure 3

Number of genes per chromosome, for each Merlot sample and each haplotype. Number of gene per chromosome for each pseudomolecule detected by transferred annotation from PN40024 Vcost.v3 using Liftoff

Haplotype comparison

‘Magdeleine Noire des Charentes’ haplotype (Hap-MG) is slightly longer than ‘Cabernet Franc’ haplotype (Hap-CF). Merlot-leaf-hap-MG has 111 more genes than Merlot-leaf-hap-CF (Table 4). All six groups of reads (Cabernet Franc, Magdeleine Noire des Charentes, Merlot-root-hap-CF, Merlot-root-hap-MG, Merlot-leaf-hap-CF, Merlot-leaf-hap-MG) are aligned on Merlot-root-hap-CF pseudo-molecule and DeepVariant is used to perform variant calling (see Material and Methods) [61]. Mapping the reads from both Cabernet Franc haplotypes back on the Merlot-root-hap-CF consensus pseudomolecule allows us to estimate the amount of potential errors in sequencing or assembling. Merlot-root-hap-CF and Merlot-leaf-hap-CF have respectively 3.2k and 3.7k variant sites, mostly in repeated sequences against the pseudo-molecule (Table 6) which are thus potential sequencing errors but could also mean that mosaic mutations appear more frequently in repetitive regions. Mapping the reads from Merlot-MG haplotype and from both ‘Cabernet Franc’ and ‘Magdeleine Noire des Charentes’ lets us accurately compare haplotypes. We identified about 3.5 millions of variants between Merlot-leaf-hap-MG or Merlot-root-hap-MG and Merlot-root-hap-CF pseudo-molecules. These variants are 89% Single Nucleotide Variants (SNV), mainly located in repeated sequences, around 30% are included in a gene region and 5% in a coding region (Table 6).

Table 6 Variant calling statistics when reads were aligned on Merlot-root-hap-CF pseudomolecule

Chimera detection

Periclinal chimeras were detected by variant calling from the alignment of Merlot-leaf-hap-CF reads on Merlot-root-hap-CF pseudo-molecule. As presented in Fig. 1, chimeras can be located on either L1 or L2 cell layers. The comparison of both haplotypes and parental sequences allows to distinguish one case from the other. When Merlot-root-hap-CF reads (L2) and some Merlot-leaf-hap-CF reads (L1 + L2) carry the same allele but all other sequences carry an alternative one, also present in Merlot-leaf-hap-CF reads, we considered that the L2 cell layer has mutated (Fig. 4). On the contrary, when all sequences have the same allele but a variant is confidently detected on Merlot-leaf-hap-CF, we consider the variant allocated to the L1 cell layer. When these mutations are confirmed by all root reads and are present only in a subset but not in all leaf reads, it means that the entire L2 meristem cell layer carries the mutation and can be called periclinal chimeras. To increase confidence in detection, we focused on haplotype specific chimeras when no variant is found in reads of the opposite haplotype. We also chose to only select ‘Merlot’ specific chimeras and therefore excluded variants if parental reads were heterozygous. Grapevine DNA also has a lot of repeated sequences known to evolve more rapidly (e.g. microsatellites, transposable elements). In this study we focused on periclinal chimera that are located in non-repeated sequences because they are more stable and less prone to mapping errors. Only SNVs were kept. The work was executed on each haplotype separately. In total, 51 positions match the requirements on Merlot-hap-CF, and 53 on Merlot-hap-MG (Table 7).

Fig. 4
figure 4

Schematic genetic interpretation of L2 periclinal chimeras in grape cv. ‘Merlot’: Different allele configurations expected for L2 periclinal SNV ‘Merlot’ specific chimeras; a L2 periclinal chimera should be identified in Merlot leaf and root of the same haplotype but not found in either the opposite haplotype nor in parental reads. Each cell layer is represented here as a “stick”, two for leaves and only on for roots. The SNV is represented as either A or G. Cabernet Franc and Magdeleine noire des Charentes leaves are also represented

Table 7 Number of chimeras per haplotype and per cell layer

Respectively 37 and 36 are found on L1 cell layer for Merlot-Hap-CF and Merlot-Hap-MG and 14 and 17 respectively on L2 cell layer. A total of 19 and 16 on each haplotype are located in a gene region, 9 and 7 chimeras are in a coding region. The exact position of the chimeras in the genome, the nucleotide and the number of reads for each allele, the type of chimera and location in coding region are presented in Tables 8 and 9.

Table 8 SNVs Chimeras found on Merlot-hap-CF
Table 9 SNVs Chimeras found on Merlot-hap-MG

Validation of chimeras by MIPs

Out of the 104 positions identified as chimeras, MIPGEN software was able to design a ligation and extension probe for 95 target regions. MIPseq was performed on Merlot_leaf and Merlot_root samples. Amplification was obtained for 86 positions but only 32 had enough depth to compare both samples (Table 10 and Additional files 2 and 3). 22 positions have the expected alleles on each sample and validate with enough depth PacBio results. 8 positions have the expected alleles but also have a single read that is unexpected which makes them ambiguous. One position is invalid because an allele is missing on Merlot leaf sample (chr3-3419274 on MG haplotype). Finally, one position is classified as ambiguous (chr10-22333673 on MG haplotype) because it has both alleles on leaves and roots. This doesn’t necessarily invalid the existence of the chimera but makes it unclear on which cell layer it is located on.

Table 10 Chimera validation with molecular inversion probes sequencing

Results of molecular inversion probes sequencing is shown per haplotype; the location of the previously identified chimera is detailed. For both samples, first column contains the alleles found by PacBio sequencing, the second one the alleles revealed by MIP and the third one the number of reads that support each allele separated by “/” symbol. Valid positions have the expected alleles according to PacBio results and ambiguous or invalid have unexpected alleles.

Discussion

The combination of long reads high quality sequencing, trio binning using parental sequences, and a long read assembler gives the opportunity to resolve accurately phased assembly. This new ‘Merlot’ genome encompasses a total length of about 500 Mbp, and was constructed with only 33 to 47 contigs. Some chromosomes were resolved with a single contig, while others needed up to 5. The number of contigs for each chromosome for the 4 haplotypes were however different, therefore not linked to a specific chromosome, but most probably to sequence depth. Chromosome lengths and gene number were slightly different between both haplotypes. The gene numbers may however be underestimated since no de novo annotation was done. In the objective towards the definition of grape pangenome, precise de novo annotation of ‘Merlot’ genome should be performed. According to Shumate and Salzberg [60], Liftoff can accurately transfer 99.9% of the genes when working intra-species. Here, PN40024 reference genome for Vitis vinifera has been used to transfer genes on a Vitis vinifera cultivar. Because of this intra specific design it is expected that liftoff accurately transferred most of the genes, this is confirmed by the close results of BUSCO analysis performed on protein sequences, 97% of complete genes for PN40024 against 95% for Merlot pseudomolecules. Moreover, the program works by aligning the reference genes on the target sequence, so although it can’t identify new genes the mapping correspondence can be considered accurate.

The pseudomolecules obtained in this study were compared to existent whole genome sequencing data available in the literature, information is compiled in “Additional file 1”. The closest genome found in terms of assembly quality is the ‘Cabernet-Sauvignon’ assembly [32]. However, taking advantage of the newest technologies the pseudo molecules obtained in this work have longer assemblies (~ 490 Mb), longer average sequence lengths (~ 26 Mb), longer maximum lengths (~ 37 Mb), longer N50 (~ 25 Mb) and higher Busco scores (~ 98%) with less missing genes (~ 0.5%). Moreover, these technologies are time saving as the complete assembly and pseudomolecule building can be done in a couple of days. Liftoff offers a fast tool to transfer annotation, 96% of the genes from ‘PN40024’ Vcost.v3 were successfully replaced on all four pseudo-molecules allowing a functional interpretation of the results.

Advantage of having parental reads

The originality of this study was to not only sequence the cultivar of interest but also its parents. Parental sequences have allowed to discriminate ‘Merlot’ reads from each haplotype. Each haplotype was then assembled independently, as if we had two homozygous individuals. This step increases confidence in haplotype comparison statistics. Haplotype differences in a single individual (~ 3.5 million of variant sites) are similar to what has already been reported [41]. The pseudomolecule used as reference for the variant calling is the Merlot-root-hap-CF which shares half of the ‘Cabernet Franc’s DNA. This explains why 2.7 million variants are detected for ‘Cabernet Franc’ (difference for only one haplotype) whereas ‘Magdeleine Noire des Charentes’ displayed 4.9 million variants, both haplotypes being different. In addition, being able to retrace parental origin could make it possible to know what agronomic character comes from each parent which increases possibilities in breeding and cultivar improvement. Considering the difference between both ‘Merlot’ haplotypes, ~ 60% variants are located on repeated sequences, ~ 30% are located in gene regions and ~ 6% in coding regions. These numbers align with the apportionment of each of one in the genome [25]. This suggests that the variants are not preferentially located in coding or repeated sequences. However, this doesn’t fit with previous publications on ‘Pinot Noir’ [62] or ‘Nebbiolo’ [63] that found more variations in coding regions between both haplotypes. This could be a specificity of ‘Merlot’ haplotypes since differences in variation rates have already been noticed between ‘Nebbiolo’ and ‘Zinfandel’ or it could also be explained by trio binning technology that allows to rebuild each haplotype more accurately and therefore have a better appreciation of the comparison.

Chimera identification and their impact on the phenotype

Until now, chimera detection was only possible with PCR sequencing when three alleles were found on the same locus or by dissecting tissues derived from different meristem cell layers as cited above. However, a genome wide screening of chimeras was not yet possible with these methods. Throughout this study we show that quantity and high quality sequencing, long reads, trio binning and organ comparison and strong selection open new doors in chimera detection. Around 3 000 variants were found when mapping Merlot-leaf-hap-CF and Merlot-root-hap-CF reads to the Merlot-root-hap-CF pseudomolecule, these are mainly SNVs (67–71%). The variants identified by mapping Merlot-root-hap-CF reads on the consensus pseudomolecule most certainly correspond to sequencing, trio binning or assembly errors but it is also possible that some of these variants are sectorial chimeras and only located in a few cells. To detect chimeras, we remove these 3 000 positions. In addition, we focused on variants outside repeated sequences which are easier to map and more likely to be stable during evolution and less prone to errors. We also focused on variants that meet periclinal chimera definition because they are the most stable. They indeed meet very specific conditions but they are also the fewest. Nevertheless, the very selective criteria applied allow us to confidently identify these variants as being chimeras. It is not excluded that other types of chimeras exist but were not selected in this work. Indeed, a mutation can be present in a few cells of one or both cell layers and appear as a variant site but it would need extra experiments to truly validate them.

Similar amounts of SNV periclinal chimeras were found on each haplotype (51 and 53). These results seem to mean that they appear randomly and at the same frequency on both haplotypes. Among those, 70% correspond to mutations on L1 cell layer and 30% on L2 cell layer. Some sequencing errors detected on leaf samples and not on roots could explain this difference between L1 and L2 although such difference in frequency could also make sense because L1 cell layer is located on the surface of leaves and is more exposed to UV radiation. Moreover, L2 cell layer produces gametes and are probably more protected [64]. Validation on independent reads is needed the support this last theory.

The consequences of a chimera depend on its position on the genome, in our study 33% are located in a gene body region and 15% are located in a coding region and could modify the protein which can be perceived on the phenotype (Tables 7, 8, 9, 11 and 12). Although our data confirms this possibility, the phenomena appears to be a rare event.

Table 11 Description of the periclinal chimeras on Merlot-Hap-CF located in a gene
Table 12 Description of the periclinal chimeras on Merlot-Hap-MG located in a gene region

MIP sequencing allowed to validate with confidence a subset of positions which makes chimera detection through hifi PacBio long reads and trio-binning reliable. However, MIP sequencing results overall did not have the depth expected compared to what is described in the literature leading to the loss of more than half of the positions tested. This means that either MIP target region design or laboratory protocol should be optimised. Having unexpected alleles only supported by a single read makes conclusion ambiguous but could be due to sequencing errors, mutation induced by PCR or it could be due to the higher sensitivity of the MIP technology to detect rare mutations. It appears that each technology has its own pros and cons and only a cross result between two sequencing technologies can bring a high confidence in the detection of the chimeras. However, PacBio technology seems trustworthy to detect SNVs on one hand and also makes it possible to determine on which haplotype and which cell layer chimeras are located on.

Throughout this study we have made a specific focus on single nucleotide variant because they are more stable. Yet some essential functions or characteristics of grapevine such as berry colour can be altered by structural variants [20], studying these types of variants would also be of interest.

Chimeras are rare but they can have a strong impact on phenotype. If they are identified and selected, they can lead to a new cultivar as it has already been reported with ‘Pinot Gris’. In a less obvious evolution, perennial plants propagated over centuries only through cuttings, chimeras are most likely to accumulate over time and could slowly induce genetic diversity among the cultivar. By continuously selecting the best plant to fit specific characteristics, breeders increase their chance to select and propagate useful chimeras. When chimeras are stable and conserved through several generations of cuttings, they could also be used to trace and identify clonal lineage. Since we have developed a tool for revealing chimeras, it would be interesting to analyse the presence of a subset of the chimeric mutations in different ‘Merlot’ clone 343 plants in order to check how stable these chimeras might be. For grapevine, clonal identification is an important issue because no low cost and rapid test can guarantee clonal origin, although it is the economic unit used today. Clonal lineage is only done by human traceability which can contain errors especially after a long period of time.

Conclusion

Through this study, whole genome DNA sequence was obtained using the latest genomic technologies and bioinformatics tools. Hifi long read sequencing, trio-binning, long read assembler, have all together allowed to obtain high quality, haplotype resolved pseudo-molecules. In addition, repeat masker tools, mapping and deep variant calling opened new possibilities in chimera detection. By comparing root and leaf samples and through severe selection it has been possible to identify hundred chimeras based on SNVs on both haplotypes. MIP validation has confirmed the presence of these chimeras. Other types of chimeras could be present, but we were not able to identify them. A functional interpretation was done through transferred annotation. Actual genomic tools open new doors in chimera detection, representing opportunities for perennial plant breeding. In addition, this high quality ‘Merlot’ genome, could also open new perspectives such as structural variants identifications, but could also serve as a basis for a study of intra-varietal variability for this cultivar.

Materials and methods

DNA sequencing

‘Magdeleine Noire des Charentes’ leaves were harvested from INRAE Vassal-Montpellier grape collection (Marseillan, FRANCE), while ‘Merlot’ clone 343 leafs and roots as well as ‘Cabernet Franc’ were harvested from IFV collection, Domaine de l’Espiguette (Grau du Roi, FRANCE). Two young leaves about 10 cm wide were collected, carefully rolled over and placed in a 13 ml tube. Secondary lateral roots from the same ‘Merlot’ clone 343 plant were also collected on the same day. This plant was not grafted and was destined to be pulled out which made it possible to collect its roots. All samples were conserved in a -80°C freezer until the DNA extraction process. DNA was extracted following the Tip 100 Qiagen Genomic kit with slight modifications. Lysis was performed 3 hours at 50°C on 0.5 g of ground plant material with 9.5 ml of G2 buffer supplemented with 1% PVP-40, 19 µl of RNase A and 500 µl of proteinase K. After tip filtrations, DNA was precipitated with isopropanol, centrifuged 15 min à 5000 g, washed with Ethanol 70° and re-suspended in 50 µl of TE buffer. DNA quality and high molecular weight were controlled. DO 260/280 ratio between 1.8 and 2.0 and DO 260/230 ratio between 2.0 and 2.2 were confirmed and an Agilent Genomic DNA Screen Tape was performed. Fifteen µg of high quality DNA were then used to carry out the sequencing. Samples were sequenced using Single Molecule Real Time PacBio SEQUEL II hifi long reads at INRAE Clermont-Ferrand GENTYANE platform (France).

Assembly and building pseudo-molecules

DNA consensus call sequences obtained under BAM format were converted to fastq using bam2fastq tool from SMRTLink v9.0.0 PacBio library. The HIFI sequencing DNA quality was verified using FastQC version 0.11.7.

Figure 5 illustrates the whole bioinformatics workflow to build pseudo-molecules and transfer annotation. Paternal and maternal kmers were identified using the parental reads with yak-0.1 software. The outputs were then used in hifiasm-0.13 with default parameters to bin ‘Merlot’ long reads and assemble both haplotypes. This was done on both organs (leaf and root).

Fig. 5
figure 5

Bioinformatic workflow applied in this study. The workflow is described step by step. Step 1 is the kmers specific selection on each parent, these are variable size sequences that allow to specifically recognize reads from on parent. In step 2 these kmers are used to sort child reads out in two haplotypes that are each specific to one parent. If reads can’t be attributed to one parent, they are considered to be in both. Step 3 is the assembly of several reads into contigs for each haplotype. Step 4 is the building of pseudomolecules using multiple alignments. Step 5 is the transfer of annotation from the reference genome to the pseudomolecules

For each haplotype, contigs were aligned on PN40024.v4 using minimap2 version 2.17 [65]. Best contigs alignments were used to build an AGP file and from there reconstruct each pseudo-molecule. In order to refine the pseudo-molecules, we then reexecuted the same process starting with an alignment of each haplotype on the other previously reconstructed.

The embryophyta_odb10 lineage from BUSCO 5.3.1 software was carried out in genome mode to estimate the completeness of all assemblies [59]. BUSCO was also performed on protein sequences using “prot” option, protein sequences were obtained from the pseudomolecules by using gffread tool version 0.12.6 with default parameters.

Liftoff 1.6.1 tool with default parameters was used to transfer the annotation of PN40024 Vcost.v3 reference genome to the pseudomolecules [60].

Chimera detection

Reads were mapped on Merlot-root-hap-CF and Merlot-root-hap-MG pseudomolecules with Minimap2 version 2.17 [65] with the option –x map-hifi and variant calling was performed with DeepVariant software version 1.1.0 [66] using PacBio model and default parameters. Finally filtering variants was done with vcftools 0.1.16 version [67].

Chimera detection was processed by filtering vcf from Merlot-leaf on Merlot-root pseudomolecule. We only conserved variants with more than 10 depth coverage, “PASS” quality flag and genotype quality (GQ) over 20. Non homozygous positions on all other sequences were excluded. Repeated sequences were identified by building a specific ‘Merlot’ library with repeatmodeler/2.0.2a-bin [68] and then using repeatmasker/4.1.1 software [69], and all chimeras in repeated sequences were excluded. Both repeatmodeler and repeatmasker were used with default parameters. Only single nucleotide variants were kept. Finally, Tables 7, 8 and 9 were manually checked site per site by visualisation in Integrative Genome Viewer (IGV 2.12.3) that allows a larger overview of the region on several samples [70]. These sites were crossed with the annotation file with intersect Bed function of BEDtools/2.30.0 [71] and Table 10 and 11. were completed.

Chimera validation

MIPGEN software [51] was used to design mips specific target regions previously identified as chimeras with following parameters: tag sizes 0.8 to introduce UMI (Unique Molecular Identifiers) to filter out duplicate reads and PCR errors, minimum ligature length 20, extension minimum length 16, arm length sums 36, 37, 38, 39, 40, minimum capture size 120, maximum capture size 150 and trf option was activated. DNA samples were adjusted in quantity using the previous DNA extraction and used in adapted MIP library protocol previously described [72] with some modifications. 100 ng of DNA template was added to a hybridization mix together with the oligo MIP pool (final concentration of 0.025 pM per probe) in 0.85x Ampligase buffer (Epicentre). Mix was incubated in a thermal cycler at 95 °C for 10 min, followed by a 60 °C cycle overnight. Products were mixed with dNTPs (Jena Bioscience, 15 pM), Betaine (Sigma-Aldrich, 375 mM), NAD+ (New England, Biolabs 1 mM), additional Ampligase buffer (0.75×), Ampligase (Epicentre, 1.25 U) and Klentaq (New England Biolabs, 0.16U). Mixture was incubated at 56 °C for 60 min followed by 72 °C for 20 min. Enzymatic digestion of linear probes was performed at 37 °C for 2 h, followed by 80 °C for 20 min by adding Exonuclease I (New England Biolabs, 8 U) and Exonuclease III (New England Biolabs, 50 U). Final product was amplified using Q5 Hot start High-Fidelity DNA Polymerase (New England Biolabs, 8 U) with different index combinations. PCR cycling conditions were an initial denaturation step for 2 min at 98 °C, followed by 20 cycles of 30 s at 98 °C, 20 s at 60 °C, and 20 s at 72 °C. PCR Samples were pooled and clean up using AMPureXP beads (BeckmanCoulter) at 0.8× ratio. Samples were sequenced in 2 × 150 bp paired end mode using a MiSeq (Illumina) platforms with custom sequencing primers. UMI were extracted from obtained reads using umi_tools version 1.1.4 extract [73] with --extract-method = string and --bc-pattern = NNNNNNNNNNNN. Adapters were trimmed using cutadapt version 3.5 [74] with following parameters: -q 30 -m 100 -e 0.10 -a ACACTACCGTCGGATCGTGCGTGT -A CTTCAGCTTCCCGATTACGGATCTCGTATG. SNP calling was done using process_reseq from VCFhunter version 2.2.0 with -s acefg option [75]. Finally, variant calling file was filtered when depth was below 10 for at least one sample.

Data Availability

Raw reads of PacBio sequencing are available on the European Nucleotide Archive repository, under the project named PRJEB59893: https://www.ebi.ac.uk/ena/browser/view/PRJEB59893.

Contigs, AGP file, chromosome scale assembly and annotations transferred from PN40024 VCost.v3 with liftoff are available on Recherche Data Gouv: https://doi.org/10.57745/OJ07SN.

PN40024 sequence and annotation used in this study are available on INTEGRAPE platform, https://integrape.eu/resources/genes-genomes/genome-accessions/.

Merlot clone 343 is available at INRAE, domaine de Vassal, under the code 0Mtp2399.

Abbreviations

RT PCR:

Real-Time Polymerase Chain Reaction

SSR:

Single Sequence Repeat

SNV:

Single nucleotide Variant

References

  1. Bauhin C. Illustrated exposition of plants (the Pinax theatri botanici): sumptibus [. et] typis Ludovici Regis; 1598.

  2. Nati P. Petri Nati… Florentina phytologica obseruatio de malo limonaia citrata-aurantia florentiae vulgo la bizzarria. typis Hippolyti de Naue; 1644.

  3. Darwin C, Wallace A. On the variation of organic beings in a state of nature. J Proc Linn Soc Lond (Zoology). 1858;3:45–52.

    Article  Google Scholar 

  4. Marcotrigiano M. Chimeras and variegation: patterns of Deceit. HortScience. 1997;32(5):773–84.

    Article  Google Scholar 

  5. Winkler H. About grafted hybrids and plant chimeras. Ber dtsch bot Ges. 1907;25:568–76.

    Google Scholar 

  6. Baur E. Untersuchungen über die Erblichkeitsverhältnisse einer nur in Bastardform lebensfähigen Sippe von Antirrhinum majus: Borntræger; 1907.

  7. Blakeslee AF, Avery AG, Bergner AD, Satina SA, Sinnott EW. Induction of periclinal chimeras in Datura stramonium by colchicine treatment. Science. 1939;89(2314):402.

    Google Scholar 

  8. Satina S, Blakeslee AF. Periclinal chimeras in datura stramonium in relation to development of leaf and flower. Am J Bot. 1941;28(10):862–71.

    Article  Google Scholar 

  9. Satina S. Periclinal chimeras in Datura in relation to the development and structure of the ovule. Am J Bot 1945:72–81.

  10. Szymkowiak EJ, Sussex IM. What chimeras can tell us about plant development. Annu Rev Plant Biol. 1996;47(1):351–76.

    Article  Google Scholar 

  11. Satina S, Blakeslee AF, Avery AG. Demonstration of the three germ layers in the shoot apex of datura by means of induced polyploidy in periclinal chimeras. Am J Bot. 1940;27(10):895–905.

    Article  Google Scholar 

  12. Thompson MM, Olmo H. Cytohistological studies of cytochimeric and tetraploid grapes. Am J Bot. 1963;50(9):901–6.

    Article  Google Scholar 

  13. Frank MH, Chitwood DH. Plant chimeras: the good, the bad, and the ‘Bizzaria’. Dev Biol. 2016;419(1):41–53.

    Article  CAS  PubMed  Google Scholar 

  14. Franks T, Botta R, Thomas MR, Franks J. Chimerism in grapevines: implications for cultivar identity, ancestry and genetic improvement. Theor Appl Genet. 2002;104(2–3):192–9.

    Article  CAS  PubMed  Google Scholar 

  15. Marcotrigiano M. Genetic mosaics and the analysis of Leaf Development. Int J Plant Sci. 2001;162:513–25.

    Article  CAS  Google Scholar 

  16. Torregrosa L, Fernandez L, Bouquet A, Boursiquot J-M, Pelsy F, Martínez-Zapater JM. Origins and consequences of somatic variation in grapevine. Genet genomics Breed grapes. 2011;68:92.

    Google Scholar 

  17. Kazemian M, Mohajel Kazemi E, Kolahi M, Omran V. Floral ontogeny and molecular evaluation of anthocyanin biosynthesis pathway in pinwheel phenotype of Saintpaulia inontha Wendl. Periclinal chimera. Sci Hort. 2020;263:109142.

    Article  CAS  Google Scholar 

  18. D’Amato F. Role of somatic mutations in the evolution of higher plants. Caryologia. 1997;50(1):1–15.

    Article  Google Scholar 

  19. Carbonell-Bejerano P, Royo C, Mauri N, Ibáñez J. Miguel Martínez Zapater J: Somatic Variation and Cultivar Innovation in Grapevine. In.: IntechOpen; 2019.

  20. Vezzulli S, Leonardelli L, Malossini U, Stefanini M, Velasco R, Moser C. Pinot blanc and Pinot gris arose as independent somatic mutations of Pinot noir. J Exp Bot. 2012;63(18):6359–69.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Pelsy F, Dumas V, Bévilacqua L, Hocquigny S, Merdinoglu D. Chromosome replacement and deletion lead to Clonal Polymorphism of Berry Color in Grapevine. PLoS Genet. 2015;11(4):e1005081.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Röckel F, Moock C, Braun U, Schwander F, Cousins P, Maul E, Töpfer R, Hausmann L. Color Intensity of the Red-Fleshed Berry phenotype of Vitis vinifera Teinturier grapes varies due to a 408 bp duplication in the promoter of VvmybA1. Genes. 2020;11(8):891.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Boss PK, Thomas MR. Association of dwarfism and floral induction with a grape ‘green revolution’mutation. Nature. 2002;416(6883):847–50.

    Article  CAS  PubMed  Google Scholar 

  24. Pratt C, Einset J, Zahur M. Radiation damage in Apple shoot Apices. Am J Bot. 1959;46(7):537–44.

    Article  Google Scholar 

  25. Jaillon. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449(7161):463–7.

    Article  CAS  PubMed  Google Scholar 

  26. Canaguier A, Grimplet J, Di Gaspero G, Scalabrin S, Duchêne E, Choisne N, Mohellibi N, Guichard C, Rombauts S, Le Clainche I, et al. A new version of the grapevine reference genome assembly (12X.v2) and of its annotation (VCost.v3). Genomics Data. 2017;14:56–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Di Genova A, Almeida AM, Muñoz-Espinoza C, Vizoso P, Travisany D, Moraga C, Pinto M, Hinrichsen P, Orellana A, Maass A. Whole genome comparison between table and wine grapes reveals a comprehensive catalog of structural variants. BMC Plant Biol. 2014;14(1):7.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13(12):1050–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Roach MJ, Johnson DL, Bohlmann J, van Vuuren HJJ, Jones SJM, Pretorius IS, Schmidt SA, Borneman AR. Population sequencing reveals clonal diversity and ancestral inbreeding in the grapevine cultivar chardonnay. PLoS Genet. 2018;14(11):e1007807.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Minio A, Massonnet M, Figueroa-Balderas R, Castro A, Cantu D. Diploid genome Assembly of the wine grape Carménère. G3 Genes|Genomes|Genetics. 2019;9(5):1331–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Girollet N, Rubio B, Lopez-Roques C, Valière S, Ollat N, Bert P-F. Author correction: De novo phased assembly of the Vitis riparia grape genome. Sci Data 2019, 6(1).

  32. Massonnet M, Cochetel N, Minio A, Vondras AM, Lin J, Muyle A, Garcia JF, Zhou Y, Delledonne M, Riaz S et al. The genetic basis of sex determination in grapes. Nat Commun 2020, 11(1).

  33. Zou C, Massonnet M, Minio A, Patel S, Llaca V, Karn A, Gouker F, Cadle-Davidson L, Reisch B, Fennell A et al. Multiple independent recombinations led to hermaphroditism in grapevine. Proceedings of the National Academy of Sciences 2021, 118(15):e2023548118.

  34. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Marx V. Long road to long-read assembly. Nat Methods. 2021;18(2):125–9.

    Article  CAS  PubMed  Google Scholar 

  36. Sugawara K, Oowada A, Moriguchi T, Omura M. Identification of Citrus chimeras by RAPD markers. HortScience. 1995;30(6):1276–8.

    Article  CAS  Google Scholar 

  37. Riaz S, Garrison KE, Dangl GS, Boursiquot J-M, Meredith CP. Genetic divergence and chimerism within ancient asexually propagated Winegrape Cultivars. J Am Soc Hortic Sci. 2002;127(4):508–14.

    Article  CAS  Google Scholar 

  38. Noh J-H, Park K-S, Yun H, Do G-R, Hur Y, Seung Hui K, Lee H-C, Ryou M-S, Park S-J, Jung SM. Determination of Chimera types and Ploidy Level of Sports from ‘Campbell Early’ grape (Vitis labruscana). Korean J Hortic Sci Technol 2010, 28.

  39. Faize M, Faize L, Burgos L. Using quantitative real-time PCR to detect chimeras in transgenic tobacco and apricot and to monitor their dissociation. BMC Biotechnol. 2010;10(1):53.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Hocquigny S, Pelsy F, Dumas V, Kindt S, Heloir MC, Merdinoglu D. Diversification within grapevine cultivars goes through chimeric states. Genome. 2004;47(3):579–89.

    Article  CAS  PubMed  Google Scholar 

  41. Gambino G, Dal Molin A, Boccacci P, Minio A, Chitarra W, Avanzato CG, Tononi P, Perrone I, Raimondi S, Schneider A et al. Whole-genome sequencing and SNV genotyping of ‘Nebbiolo’ (Vitis vinifera L.) clones. Sci Rep 2017, 7(1).

  42. Hou B-H, Tsai Y-H, Chiang M-H, Tsao S-M, Huang S-H, Chao C-P, Chen H-M. Cultivar-specific markers, mutations, and chimerisim of Cavendish banana somaclonal variants resistant to Fusarium oxysporum f. sp. cubense tropical race 4. BMC Genomics 2022, 23(1).

  43. Rohlin A, Wernersson J, Engwall Y, Wiklund L, Björk J, Nordling M. Parallel sequencing used in detection of mosaic mutations: comparison with four diagnostic DNA screening techniques. Hum Mutat. 2009;30(6):1012–20.

    Article  CAS  PubMed  Google Scholar 

  44. Hardenbol P, Banér J, Jain M, Nilsson M, Namsaraev EA, Karlin-Neumann GA, Fakhrai-Rad H, Ronaghi M, Willis TD, Landegren U, et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat Biotechnol. 2003;21(6):673–8.

    Article  CAS  PubMed  Google Scholar 

  45. Biezuner T, Brilon Y, Arye AB, Oron B, Kadam A, Danin A, Furer N, Minden Mark D, Hwan Kim DD, Shapira S et al. An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency. NAR Genomics and Bioinformatics 2022, 4(1).

  46. Wang Y, Moorhead M, Karlin-Neumann G, Falkowski M, Chen C, Siddiqui F, Davis RW, Willis TD, Faham M. Allele quantification using molecular inversion probes (MIP). Nucleic Acids Res. 2005;33(21):e183–3.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Waalkes A, Smith N, Penewit K, Hempelmann J, Konnick EQ, Hause RJ, Pritchard CC, Salipante SJ. Accurate Pan-Cancer Molecular diagnosis of microsatellite instability by single-molecule Molecular Inversion Probe capture and high-throughput sequencing. Clin Chem. 2018;64(6):950–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Andersen EF, Paxton CN, O’Malley DP, Louissaint A Jr, Hornick JL, Griffin GK, Fedoriw Y, Kim YS, Weiss LM, Perkins SL, et al. Genomic analysis of follicular dendritic cell sarcoma by molecular inversion probe array reveals tumor suppressor-driven biology. Mod Pathol. 2017;30(9):1321–34.

    Article  CAS  PubMed  Google Scholar 

  49. Lau HY, Palanisamy R, Trau M, Botella JR. Molecular Inversion probe: a New Tool for highly specific detection of Plant Pathogens. PLoS ONE. 2014;9(10):e111182.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Wang H, Campbell B, Happ M, McConaughy S, Lorenz A, Amundsen K, Song Q, Pantalone V, Hyten D. Development of molecular inversion probes for soybean progeny genomic selection genotyping. The Plant Genome 2023, 16(1).

  51. Boyle EA, O’Roak BJ, Martin BK, Kumar A, Shendure J. MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. Bioinformatics. 2014;30(18):2670–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Almomani R, Marchi M, Sopacua M, Lindsey P, Salvi E, Koning BD, Santoro S, Magri S, Smeets HJM, Martinelli Boneschi F, et al. Evaluation of molecular inversion probe versus TruSeq® custom methods for targeted next-generation sequencing. PLoS ONE. 2020;15(9):e0238467.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wu L, Chu X, Zheng J, Xiao C, Zhang Z, Huang G, Li D, Zhan J, Huang D, Hu P, et al. Targeted capture and sequencing of 1245 SNPs for forensic applications. Forensic Sci International: Genet. 2019;42:227–34.

    Article  CAS  Google Scholar 

  54. Boursiquot J-M, Lacombe t. Laucou v, Julliard s, Perrin f-x, Lanier n, Legrand d, Meredith c, this p: parentage of Merlot and related winegrape cultivars of southwestern France: discovery of the missing link. Aust J Grape Wine Res. 2009;15(2):144–55.

    Article  Google Scholar 

  55. : OIV, Release P. http://www.oiv.int/public/medias/5681/en-communiqu-depresse-octobre-2017.pdf, (Accessed 19 September 2018). 2017.

  56. IFV INRAE, Montpellier IA. Pl@ntGrape, Catalogue of Vines Cultivated in France, IFV – INRAE – Institut Agro Montpellier, 2009–2022. 2022.

  57. Robinson J, Harding J, Vouillamoz J. Wine grapes: a complete guide to 1,368 vine varieties, including their origins and flavours. Penguin UK; 2013.

  58. van Leeuwen C, Destrac-Irvine A, Dubernet M, Duchêne E, Gowdy M, Marguerit E, Pieri P, Parker A, de Rességuier L, Ollat N. An update on the impact of climate change in viticulture and potential adaptations. Agronomy. 2019;9(9):514.

    Article  Google Scholar 

  59. Mosè Manni MRB, Mathieu Seppey, Felipe A, Simão, Evgeny M, Zdobnov. BUSCO Update: Novel and Streamlined Workflows along with broader and deeper phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–54.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021;37(12):1639–43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Ryan Poplin P-CC, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, Gross SS, Dorfman L, McLean CY. DePristo: DeepVariant: a universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36:983–7.

    Article  PubMed  Google Scholar 

  62. Velasco R, Zharkikh A, Troggio M, Cartwright DA, Cestaro A, Pruss D, Pindo M, Fitzgerald LM, Vezzulli S, Reid J, et al. A high Quality Draft Consensus sequence of the genome of a heterozygous Grapevine Variety. PLoS ONE. 2007;2(12):e1326.

    Article  PubMed  PubMed Central  Google Scholar 

  63. Maestri S, Gambino G, Lopatriello G, Minio A, Perrone I, Cosentino E, Giovannone B, Marcolungo L, Alfano M, Rombauts S et al. ‘Nebbiolo’ genome assembly allows surveying the occurrence and functional implications of genomic structural variations in grapevines (Vitis vinifera L.). BMC Genomics 2022, 23(1).

  64. Burian A. Does shoot apical Meristem function as the germline in safeguarding against excess of mutations? Front Plant Sci 2021, 12.

  65. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983–7.

    Article  CAS  PubMed  Google Scholar 

  67. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, Depristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 2020, 117(17):9451–9457.

  69. RepeatMasker. http://repeatmasker.org.

  70. Robinson JT, Thorvaldsdóttir H, Turner D, Mesirov JP. igv. js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). bioRxiv 2020.

  71. Quinlan ARHI. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Hiatt JB, Pritchard CC, Salipante SJ, O’Roak BJ, Shendure J. Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Genome Res. 2013;23(5):843–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017;27(3):491–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal. 2011;17(1):10.

    Google Scholar 

  75. Garsmeur O, Droc G, Antonise R, Grimwood J, Potier B, Aitken K, Jenkins J, Martin G, Charron C, Hervouet C et al. A mosaic monoploid reference sequence for the highly complex genome of sugarcane. Nat Commun 2018, 9(1).

Download references

Acknowledgements

We are grateful to the members of the AGAP-DAAV team and especially Charles Romieu, for their help and their contribution in discussions. We acknowledge Laurent Torregrosa for his help in understanding grapevine cell layer physiology. We sincerely thank the Petrus team: Jean-Claude Berrouet, Olivier Berrouet, Fabienne Caillon, Lionel Caillon, Gilles Rabeyroux, Michael Paiva, Pierre-Jean Dalesme, Johann Ventre, Catherine Gaillard and Emilie Verral for their support during this work. We would like to thank Cécile Marchal and Sandrine Dedet for supplying samples from INRAE grapevine collection Vassal-Montpellier. We highly acknowledge the work of Carine Satgé from Toulouse CNRGV for adapting the DNA extraction protocol to grapevine.

Funding

This work is a collaboration between SC Petrus wine producer in Pomerol (France), National Research Institute for Agriculture, Food and Environment (INRAE) and the French institute for grapevine and wine (IFV). Fees and Phd student salary were covered by Petrus while INRAE, CIRAD and IFV contributed through researcher’s salaries.

Author information

Authors and Affiliations

Authors

Contributions

L.T., T.P., L.L., conceptualised the project. L.L., collected some samples. L.V., R.C., performed DNA extraction and sample preparation, S.V., S.G., G.N., B.PF., built and executed bioinformatic work. M.P, and R.M, performed the MIP validation from samples preparation to Miseq sequencing. All authors contributed to writing the manuscript. The final manuscript was read and approved by the authors.

Corresponding author

Correspondence to P. This.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interest.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Additional file 1

: Additional_file_1.xls. Comparative data between whole genome sequences already published and the pseudomolecules built in this study.

Additional file 2

: Additional_file_2.xls. PacBio and MIP data for each chimera found on Merlot-hap-MG.

Additional file 3

: Additional_file_3.xls. PacBio and MIP data for each chimera found on Merlot-hap-CF.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sichel, V., Sarah, G., Girollet, N. et al. Chimeras in Merlot grapevine revealed by phased assembly. BMC Genomics 24, 396 (2023). https://doi.org/10.1186/s12864-023-09453-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12864-023-09453-8

Keywords

  • Chimera
  • Hifi sequencing
  • Phased assembly
  • Whole genome
  • Vitis vinifera