Association and Linkage Mapping to Unravel Genetic Architecture of Phenology-Related Traits and Lateral Bearing in Persian Walnut ( Juglans regia L.)

Background: Unravelling the genetic architecture of agronomic traits in walnut such as budbreak date and bearing habit, is crucial for climate change adaptation and yield improvement. A Genome-Wide Association Study (GWAS) using multi-locus models was conducted in a panel of 170 walnut accessions genotyped using the Axiom™ J. regia 700K SNP array, with phenological data from several years. These accessions come from the INRAE walnut germplasm collection which is the result of important prospecting work performed in many countries around the world. In parallel, an F1 progeny of 78 individuals segregating for phenology-related traits, was genotyped with the same array and phenotyped for the same traits, to construct linkage maps and perform Quantitative Trait Loci (QTLs) detection. Results: Using GWAS, we found strong associations of SNPs located at the beginning of chromosome 1 with both budbreak and female flowering dates. These findings were supported by QTLs detected in the same genomic region. Highly significant associated SNPs were also detected using GWAS for heterodichogamy and lateral bearing habit, both on chromosome 11. We developed a Kompetitive Allele Specific PCR (KASP) marker for budbreak date in walnut, and validated it using plant material from the Walnut Improvement Program of the University of California, Davis, demonstrating its effectiveness for marker-assisted selection in Persian walnut. We found several candidate genes involved in flowering events in walnut, including a gene related to heterodichogamy encoding a sugar catabolism enzyme and a cell division related gene linked to female flowering date. Conclusions: This study enhances knowledge of the genetic architecture of important agronomic traits related to male and female flowering processes and


5
advancing effect of warm springs on phenological events has been observed for walnut in California, particularly for leafing date [12]. Similar findings have been reported in Slovenia [13] and Romania [14]. Using phenological data recorded by the Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAE) of Bordeaux from 1989 to 2016, we also observed an average advance in budbreak in France of 5 days over the last 3 decades [15]. In Iran, researchers assessed land suitability for walnut cultivation under present and future climatic conditions, and predict that the currently suitable area will be significantly reduced [16].
Genetic control of phenology-related traits is fundamental for the development of new, resilient cultivars, able to adapt to changing climatic conditions. Many studies have focused on genetic dissection of phenological traits (e.g., chilling requirements and flowering time) in diverse fruit crops, such as peach, apricot and sweet cherry [17,18]. In walnut, a significant genotype effect has been identified for heat requirements [19]. Moreover, high heritability has been shown for leafing date (71-96%), type of heterodichogamy (90%), and female/male blooming (80%) [20,21].
Persian walnut has two main types of bearing habit. Fruiting can occur only at the terminal position of new branches or at both terminal and lateral positions [22]. A genetic locus for lateral bearing has been identified based in an F 2 progeny in the United-States [23], but has not been sufficiently robust for wider use in markerassisted selection.
Release of the first walnut genome sequence [24] facilitated advanced genetic and genomic studies, including development of the first high-density Axiom™ J. regia 700K SNP genotyping array [25]. Application of this powerful genotyping tool allowed genetic dissection of crucial traits in walnut, such as nut-related traits [26] 6 and water use efficiency [27,28]. A recent study, combining genome-wide association study (GWAS) and classical linkage mapping, found major loci for leafing and harvest dates on chromosome 1 (Chr1), and lateral fruitfulness on Chr11 [29].
Here, we studied for the first time in walnut, the genetic control of budbreak date and female/male flowering dates, using the Axiom™ J. regia 700K SNP array to genotype both a panel of 170 walnut accessions of diverse geographical provenience and an F 1 progeny segregating for these traits. This study sought to We did not phenotype the F 1 progeny for bearing habit, since this trait did not segregate in that population, but we observed great variability for fruit bearing within the GWAS panel. Lateral bearing was found in recently released cultivars from France ('Ferbel', 'Fertignac') and the USA ('Pedro', 'Chico', 'Cisco', 'Chandler'), and in accessions coming from the Central Botanical Garden of Kiev, Ukraine (Table   S1), supposedly originating from the walnut center of domestication, according to the introduction book of the INRAE. Terminal bearing was observed mainly in French landraces such as 'Franquette', 'Grosvert' and 'Verdelet'.
High broad-sense heritability values were observed for budbreak date, with H² of 0.95 when using legacy data and 0.93 using only two-year data (Table 1). Overall, H² values were lower within the F 1 progeny (H² = 0.67 for budbreak date). However, we found low values for male flowering duration (H² = 0.22) and female flowering duration (H² = 0.26) within GWAS panel (using the recent two phenotyping years), 8 while no genetic effect was found for the F 1 progeny. Therefore, we did not consider both male and female flowering durations in the GWAS and QTL mapping analyses.
Population structure of the GWAS panel A total of 364,275 SNPs were retained after filtering for high resolution SNPs categories (Poly High Resolution and No Minor Homozygotes), for genotyping rate > 90%, and minor allele frequency > 5% (Table 2). We investigated the population structure of our association panel using the Bayesian clustering approach implemented in fastSTRUCTURE, and Principal Component Analysis (PCA). The fastSTRUCTURE analysis infers accession ancestry from genotypic information and permitted us to determine the best number of clusters (K). The most likely K subpopulations were K = 2 and K = 3 ( Figure S3). At K = 2, admixture proportions clustered the accessions according to their geographical origin. In particular, the cluster in purple named "Western Europe and America" includes 86 accessions from Austria, Chile, England, France, Germany, Netherlands, Portugal, Serbia, Slovenia, Spain, Switzerland and USA. The cluster in green named "Eastern Europe and Asia" includes 50 accessions from Afghanistan, Bulgaria, China, Greece, Hungary, India, Iran, Israel, Japan, Poland, Romania, Russia and Central Asia (Fig. 2). At K = 3, a new cluster includes all the hybrids and admixed accessions from France and USA  Table S1).
PCA shows similar clustering of our germplasm collection as fastSTRUCTURE ( Figure   S4). PC1, which explains 7.37% of total variance, separated the "Western Europe and America" (WEAm) accessions from the "Eastern Europe and Asia" (EEAs) accessions. PCA2 accounted for 5.80% of variance explained and separated the hybrids and admixed accessions from France and USA, observed with K = 3 in fastSTRUCTURE.

Relatedness of the GWAS panel
In addition to population structure, we investigated the familial relatedness within our association panel by estimating kinship coefficient (k) with the KING method. To identify first-degree relationships and differentiate "parent-offspring" from "full sibling" pairs, we used the estimates of k and the proportion of zero identical-bystate (IBS0) observed in the F 1 progeny ( Figure S5). In particular, we defined all pairwise relationships in the GWAS panel with k > 0.17 and 0 < IBS0 < 0.019 to be parent-offspring relationships. Results confirmed known pedigrees, particularly for the hybrids accessions and the modern cultivars from France and the USA. We also identified new relationships, such as that between 'Grosvert n°1' and 'Verdelet', French landraces from the departments of Dordogne and Corrèze, which may be fullsibs ( Figure S6). Moreover, 'Ashley' and 'Payne', said to be identical, show the highest kinship coefficient.

Genome-Wide Analysis for Bearing H abit
For bearing habit, we found no influence of population structure (PC = 0 according to the 'model selection' function implemented in GAPIT; Table S2). We used multilocus mixed model (MLMM), and Fixed and random model Circulating Probability Unification method (FarmCPU). GWAS results using both models showed a significant association on Chr11 with bearing habit, using only the 2019 data ( The boxplots show the bearing habit phenotypes of 2019 for the different alleles of the three associated SNPs (Fig. 3). For the most significantly associated SNP 'AX-171191765', the allele G is linked to a terminal bearing habit, whereas the allele C is linked to a lateral bearing habit (R² = 34.3%, allelic estimated effect = 2.59), leading to an increased yield.

Association and Linkage Mapping for Budbreak Date and Female Flowering Dates
Using 1,937 SNPs (Table 2), the 'Franquette' and 'UK 6 − 2' parental genetic maps constructed have a length of 1,015 and 1,346 cM, and a number of markers of 849 and 1,088 SNPs, respectively ( Table S3). The marker names of the genetic maps were changed with the corresponding chromosome number and its physical position for a better visualization ( Figure S7). For all the phenology-related traits, we also found that population structure did not influence phenology in our GWAS panel.
Both GWAS and classical QTL mapping identified marker-trait associations for budbreak date in the same region on Chr1 (Fig. 4). The most significant associated SNP 'AX-171179714' on the Chr1 (physical position: 6,514,832 bp) was found using the Best Linear Unbiased Predictions (BLUPs) of two-year data and co-localizes with the major QTLs identified for both parents in 2019 using the F 1 progeny data. The allele T of this SNP is linked to a late budbreak date (R² = 30.6%, allelic estimated effect = 5.9) ( Table 3). We also ran a Kruskal-Wallis test to find if the phenotypic differences were significant among the three genotypes using the different phenotypic datasets, and this allelic effect remains consistent (p-values = 1.84E-13 for two-year data, 6.63E-12 for 2018, 9.76E-13 for 2019, and 2.61E-09 for legacy data; Fig. 5). The co-localizing major QTLs found in 2019 in LG 1 in 'Franquette' and 'UK 6 − 2' explain 23.9% and 34.8% of the budbreak date variance, respectively. In addition, GWAS with two-year data found four additional associations on chromosomes 2, 4, 8 and 15, while the classical linkage mapping analysis identified minor QTLs on linkage groups 6, 11, 12 and 14.

11
The high power of our gene tagging approach based on both GWAS and QTL mapping, was also confirmed for beginning, peak, and end, of female flowering dates. The SNP 'AX-170990138' (physical position: 9,298,520 bp) on Chr1 was systematically found associated with all stages of female flowering (Table 3). For beginning female flowering date, we found this SNP associated using two-year data, and using each year separately. We also observed this marker-trait association for peak female flowering date using legacy data, and for end female flowering date with two-year data and with 2019 data. The SNP 'AX-170990138' is 2.8 Mbp apart from the most significant marker-trait association found for the budbreak date on Chr1, and the allele G of this SNP is linked to a delayed female flowering, with a R² ranging from 34.8-39.6%, and an allelic estimated effect ranging from 3.4 to 4.5, depending on the stage and the dataset. We identified additional QTLs for all three stages of female flowering but the most significant ones, segregating in both parental maps, co-localize with those previously found associated with the budbreak date on Chr1 (Table 3).
Besides the major QTL on Chr1 identified with both GWAS and QTL mapping, we found three significant associations also on Chr7 for all three stages. These three SNPs are located in a region of about 23 to 25 Mb (Table 3). In addition, we found QTLs on LGs 2 and 14 in 'Franquette' map, and on LGs 3, 9, 11 and 12 in 'UK 6 − 2' map.

Association and Linkage Mapping for Heterodichogamy and Male Flowering Dates
Results for male flowering are similar to the female flowering results in that a few SNPs in a very close region are associated with all three stages (Table 4). On Chr11, we found four associated SNPs depending on the stage and the dataset, in a region of about 31.8 Mbp and a window of 52 kb. The most significant QTLs for all three stages of male flowering for both parental maps co-localize with those previously identified as associated with budbreak date and the three stages of female flowering. Two additional QTLs on 'UK 6 − 2' map were found on LG 11 for peak male flowering date, using two-year data (26, For heterodichogamy trait (computed by subtracting peak female date from peak male date; Table S5), the significant associations found with GWAS using two-year data and legacy data co-localized with the associations identified for male flowering dates on Chr11 in the region of about 31.8 Mbp.
Candidate Genes for Bearing Habit and Phenology-Related Traits Using the Walnut Genome By combining GWAS and QTL results and considering their consistency over phenotypic datasets, we decided to focus on a robust subset of eight loci to find candidate genes for bearing habit and phenology-related traits ( Table 5) Since we designed the KASP primers using the complementary strand, we expect to find the allele A (versus C) as involved in late leafing genotypes. The three different genotypes were clearly separated for 95 accessions, whereas one of them could not be determined (Table S4). In order to determine if leafing dates were different among the three genotypes, we ran a Kruskal-Wallis test which showed significance (p-value = 6.88E-13). Then, we ran a pairwise comparison using Dunn's test with Bonferroni p-value adjustment. Results of pairwise comparisons showed that A/A vs. C/A, and A/A vs. C/C have a significant effect on leafing date (p-values = 1.4E-06 and 9.4E-12, respectively), contrary to C/A vs. C/C genotypes (p-value = 1). The homozygous accessions A/A have a median leafing date of almost 110 Julian days, while genotypes C/A and C/C have a median leafing date of only 76 Julian days ( Fig. 7).

Related Traits
We found that the most likely K subpopulations were K = 2, with a clustering in two major groups according to their geographical origin, and K = 3 that highlights admixed accessions, including hybrids or modern cultivars from France and USA.
The EEAs cluster in PCA resulted more scattered, indicating a higher level of SNPbased genetic diversity, and the three most diverse accessions of the EAAs cluster, located at the bottom-right corner of the PCA plot, are 'Shinrei' from Japan, 'JinLong1' from China, and 'EAA6' from Greece. The two last were previously found to be highly diverse by SSRs and included in an SSR-based core collection [31].
For the entire set of studied traits, we observed no influence of population structure on the phenotypes. Most likely, the familial relatedness among accessions accounted for most of the genetic variation affecting the traits studied, leading to redundant PCA information [32]. Even though our collection included accessions from different geographical areas, it consisted of 30% modern cultivars likely brought to the INRAE of Bordeaux as potential genitors for breeding purposes. Therefore, it is not surprising to observe accessions of the same geographical origin that are both early and late in phenology.

Strong Correlations Between Traits Lead to Close Associations
Results found using the F 1 progeny show that strong QTLs for all the stages of female flowering date segregating in both parents on LG 1, co-localize with strong QTLs for budbreak date. In walnut trees, budbreak phenotyping relies on buds that will produce both leaves and female flowers. Those traits are, therefore, correlated and likely controlled by common genetic and metabolic pathways, as suggested by the identification of QTLs within the same genomic region on Chr1. Similar results were observed in walnut for leafing and harvest dates [29]. On the other hand, GWAS enabled further dissection of the genetic control of these traits by identifying a location for the most significant marker-trait associations for budbreak date at about 2.8 Mbp apart from the SNPs associated to all stages of the female flowering date. This is one of the major advantages of GWAS, whose higher resolution defines more precisely the underlying regions of traits of interest compared to classical QTL mapping.
While the major loci controlling the traits studied were identified with both GWAS and linkage mapping, this did not happen for the minor associations. For instance, for budbreak date, GWAS found a second trait-locus association on Chr2, while one minor QTL was detected on LG 6 of the 'UK 6 − 2' map. This finding can be explained in two ways: (i) the loci identified in the GWAS likely are not segregating in the F 1 progeny, and (ii) the QTLs found in the linkage mapping analysis have rare alleles in the association panel. For both GWAS results and QTLs mapping, the minor trait-loci associations were not stable across years, suggesting they are more affected by environmental conditions. For this reason, we decided to focus mostly on the GWAS results based on the two-year BLUPs. Nevertheless, results based on the legacy data allowed confirmation of the major SNP associated with budbreak date.
We observed a different scenario for all the stages of male flowering date: while the major QTL found on LG 1 co-localizes with those for female flowering dates and budbreak date, these results were not confirmed in the GWAS panel. Here, instead, we identified the most significant associations on chromosomes 11 and 4. Then, since heterodichogamy is the measurement of how the male and female flowering dates overlap, it is not surprising to find a marker-trait association on Chr11 with more consistency across years. Since the F 1 progeny used for QTL mapping included only 78 individuals, likely leading to an overestimation of the QTL effects (Beavis effect) [33], we decided to analyze the GWAS results in more depth. In addition, a few regions were not segregating in our F 1 progeny, due to monomorphism in the parents, likely preventing accurate QTL detection. This is the case, for example, of the QTLs detected on LG 2 for both parental maps. By checking the physical position of the associated markers, we observed that the beginning of the Chr2 lacks regions from 0 to 5 Mbp for the 'Franquette' map, and from 0 to 7.7 Mbp for the 'UK 6 − 2' map.
Combining multi-locus models for GWAS has become widespread [34], and the associations found by multiple methods are usually reliable [35]. Similar work on walnut, also using MLMM and FarmCPU models [29], found three associations within a physical genomic region spanning from 3,187,214 to 4,805,396 bp for the leafing date. This is very close to our association for budbreak date, detected at 6,514,832 bp, and our QTL for the same trait using the 'Franquette' parental map, located at 87,347 to 3,051,320 bp. These results indicate that most likely on Chr1 there is a genomic region controlling budbreak, leafing, and female flowering dates in walnut. Since strong positive correlations exist among these traits, with high allelic effects of the marker-trait associations, this would have an impact for selection. For instance, breeding for budbreak and female flowering (and more broadly for all phenological-related traits) would be a difficult job, and would complicate the creation of shorter-cycle cultivars.
In this work, classical QTL mapping mostly confirmed the major marker-trait associations identified. We found a major locus on Chr1 for budbreak date and female flowering dates, and a major locus on Chr11 for male flowering dates.
Separating the flowering stages (beginning, peak and end) allowed "repeat" phenotyping, generating robust data for identification of the major loci required for marker development purposes. Also, the high complexity of phenology-related traits indicates that many loci may be involved in their expression [17], making the detection of minor loci difficult.

Genetic Bases for the Expression of Lateral Bearing
Lateral bearing is a crucial trait in walnut breeding since it contributes to increased yield [36]. Fruiting at lateral positions on shoots was originally introduced into California from the cultivar 'Payne', which is in the background of all current varieties released by the Walnut Improvement Program of the University of California, Davis [3]. Using a large F 2 progeny from 'Chandler', a lateral bearing cultivar heterozygous for this locus, the genetic basis of lateral fruitfulness was found to be located in the centromeric region of Chr11 [23]. The same location was obtained using F 1 progeny from 'Chandler' by 'Idaho' cross [37]. Additional associations for lateral bearing were found using GWAS in a region covering 25 Mb on Chr11, and a major QTL was detected in 'Chandler' in proximity to the centromeric region (peak at 17,341,634 bp) of Chr11 [29].
In the present study, we reconfirmed that walnut bearing habit is controlled by a major QTL in the centromeric region of Chr11. In line with previous studies, we observed that an allelic substitution at the most associated locus, 'AX-171191765', causes lateral bud fruiting. We confirm previous results regarding genetic control of lateral bearing by performing GWAS in genetically different plant materials, further supporting the power and high resolution of this method in walnut.

Gene Annotations May Give Clues to Dissect Genetic Control of Flowering Process
Availability of the new chromosome-scale reference genome Chandler v2.0 [30] allowed us to explore the gene space surrounding the most significant marker-trait associations identified for bearing habit and phenology. For instance, we found the chromosome transmission fidelity (ctf) protein 8 homolog gene as candidate gene for all female flowering dates. Previous studies in Arabidopsis thaliana suggest the involvement of the ctf genes family in the cell division processes [38]. In particular, the inactivation of the ctf7-1 and ctf7-2 genes resulted in developmental defects, including aberrations in flower morphology and male and female gametophytes.
Interestingly, a TPX2 encoding gene required for spindle assembly during cell division process was also found likely to be involved in harvest date determination in walnut [29]. These findings suggest that cell division is a crucial process for correct flower and fruit development in walnut, and suggest the ctf8 protein homolog locus is an excellent candidate gene for female flowering.
Trichomes consist of only one cell type, appearing as a long, slender appendage [39]. In J. regia, during staminate flower development, the tepals differentiate into anthers and filaments shortly before anthesis, enveloping the stamens [40]. Since these tepals become hirsute, it is not surprising to find a trichome birefringence-like 19 13 protein encoding gene within the LD block surrounding the SNP on Chr4 which is associated with male flowering dates. Also, the SNP on Chr11 associated with heterodichogamy and male flowering dates is in LD with a candidate gene encoding a probable trehalose-phosphate phosphatase D. This dephosphorylating enzyme and other members of the gene family are highly expressed in male flowers of Jatropha, a perennial woody plant, and in transgenic plants of A. thaliana with delayed flowering [41], suggesting the putative involvement of this class of genes in the flower development process. Other authors reviewed the roles of sugars in the control of flowering time, and apart from its involvement in carbohydrate metabolism, trehalose-6-phosphate seems to serve as a signal for inducing flowering transition in plants [42,43]. In this regard, we found a probable trehalosephosphate phosphatase D encoding gene for heterodichogamy.

A KASP Marker Related to Phenology Is Released for the First Time in Walnut
For the first time in walnut, a KASP marker related to phenology is released. By designing specific primers for the SNP AX-171179714, we found that the homozygous accessions A/A are significantly later leafing. Since this marker is dominant, it will greatly help breeders to accurately select individuals with delayed budbreak. However, due to the complex genetic basis of phenology-related traits in perennials [17], additional markers, especially for minor loci, will be needed to improve the selection. Finally, this SNP is located in a gene encoding an uncharacterized protein, and it would be interesting to know more about the functional role of this gene. The marker can predict a significant portion of the phenotype but we still do not know the regulatory networks involved in this complex trait. 20

Conclusions
Due to the significant influence of environment on phenology-related traits in walnut, unravelling their genetic architecture is of upmost importance for the development of markers that could assist the selection of superior genotypes and, therefore, the release of new walnut cultivars adapted to different climatic conditions. Using GWAS with two different multi-locus models, we identified significant associations for budbreak date, and male and female flowering dates, confirmed by classical QTL mapping. In addition, we provide a list of candidate genes for these traits, that will be fundamental in future studies of functional genomics and understanding the metabolic pathways underlying phenology in walnut. We also developed and validated the first KASP marker for budbreak date in walnut, which will allow accurate selection of individuals with a delayed budbreak previously genotyped using the same SSR markers as those of the GWAS panel [30] to confirm that they were true progeny. This plant material also complies with our INRAE institutional guidelines.

Phenotypic Evaluation and Data Analysis
Phenotypic evaluation for the following 10 traits was conducted in two years (2018 and 2019) for both the GWAS panel and the F 1 progeny: budbreak date, beginning, peak, end, and duration of female flowering, heterodichogamy, and beginning, peak, end, and duration of male flowering (Table S5). Heterodichogamy, degree of male and female flowering overlap, was computed by subtracting peak female flowering date from peak male flowering date. A negative value (categories '1' and '3', Table   S5) means the accession is protandrous since peak male flowering in Julian days occurs before the peak female date. Categories '7' and '9' indicate protogyny.
For the GWAS panel, we also used phenotypical data based on the same scoring scales used for many years previously (mainly between 1989 and 2011). These legacy data are available from the INRAE walnut germplasm collection database [15]. Moreover, as bearing habit is not affected by environment conditions, this trait was recorded only in 2019. This trait did not segregate within the F 1 progeny, so only the GWAS panel was evaluated. Data management and visualisation, and correlation matrixes were performed using "R" software [44], with the packages "tidyverse" [45] and "corrplot" [46], respectively.
For the GWAS panel, the means of genotypic effects were obtained for each accession by adjusting for known environmental factors using the BLUPs. When 23 using two-year data, the means were predicted using a mixed linear model considering the year effect (a). When using phenological data from many years and three sites [15], the means were predicted using a mixed linear model considering the effects of year, site and combined of year and site (b): where P i(j)k refers to the observed phenotype of the k th accession in the i th year in the j th site; µ is the mean value of the trait; Y i is the fixed effect of the i th year, S j is the fixed effect of the j th site, g k is the random effect of the k genotype; and e i(j)k is the residuals of the model. The BLUPs were performed using "R" package "lme4" [47].
Based on the previous linear models, broad-sense heritability of each trait was estimated as follows: where σ 2 G is the genotypic effect variance; σ 2 ε is the variance of residuals; and n obs/g is the number of observations by genotype (n obs/g = 2 using two-year data and n obs/g = 5 using data from many years and 3 sites).

SNP Genotyping and Quality Control
The genomic DNAs of both the GWAS panel (J. regia accessions) and F 1 progeny (78 individuals) were extracted from young leaves as described in Bernard et al. [31].
The accessions were genotyped using the Axiom™ J. regia 700K SNP array containing 609,658 SNPs uniformly distributed over the 16 J. regia chromosomes [25]. These SNPs were then filtered through several criteria (Table 2). First, the filtering metrics were performed by ThermoFisher considering default thresholds: dish quality control greater than or equal to 0.82, and quality control call rate of 97%. The quality control steps were performed using "PLINK 1.9" software [48].

SNP Linkage Map Construction and Quantitative Trait Locus Analyses
Based on the pseudo-testcross strategy [49], 849 SNPs were retained for the 'Franquette' female parent, and 1,088 for the 'UK 6-2' male parent. The two parental genetic linkage maps were constructed using JoinMap® 4.0 software [50] 25 and distorted markers were omitted. A minimum LOD value of 16.0 was chosen for mapping. Kosambi's mapping function was used to convert recombination frequencies into map distances [51], and graphical display of linkage maps was performed using MapChart 2.32 software [52].
QTLs were determined using MultiQTL 2.6 software (http://www.multiqtl.com/; MultiQTL Ltd, Institute of Evolution, Haifa University, Haifa, Israel). Single trait analysis was performed using a Multiple Interval Mapping (MIM) method [53], in which the highest effect QTL is taken as a cofactor to control the genetic background, whereas another QTL is searched in a different position. All LGs for each parent were scanned using the one QTL model, with 1,000 runs of permutation tests, to compare H1 hypothesis (one QTL is present in the LG) and H0 hypothesis (no QTL in the LG). The threshold for MIM was 0.05 and computation of Type I error rate for each QTL (α chr ) was performed as follows: where M is the total number of markers used for the QTL detection on each parental map, and m is the number of markers in the LG carrying the QTL [54].

Population Structure and Kinship Analyses
For the GWAS panel, the R packages "SNPRelate" and "gdsfmt" [55] were used to perform PCA and relatedness estimations based on a LD pruned set of 29,825 SNPs to avoid capturing too much variance of high LD regions. For the PCA, the ten 26 largest eigenvalues were obtained to check for structure. The KING method of moment was used to estimate identity-by-descent (IBD) proportions between all pairwise comparisons [56]. Population structure was also investigated using the "fastSTRUCTURE" software [57], and the most likely K was determined using the ΔK method [58]. Then, "CLUMPP" [59] and "distruct" [60] softwares were used for clustering accessions and graphical presentation respectively.

Genome-Wide Association Analysis
GWAS was carried out using the R package "GAPIT" [61], accounting for the familial relationships in the form of a kinship matrix. The 'model selection' function implemented in GAPIT was used to select the best number of PCs by running a mixed linear model (MLM) with a maximum of ten PCs tested. Then, the best number of PCs to include in the GWAS analysis was selected according to the Bayesian Information Criterion (BIC). Using BLUPs previously calculated as phenotypic data and each year separately, two multi-locus models were applied: multi-locus mixed model (MLMM) [62] and Fixed and random model Circulating Probability Unification method (FarmCPU) [63]. Thresholding with FDR [64] classically implemented in "GAPIT" was applied to define the significant associations.

Search of Annotations Within LD Blocks of Associated Loci
LD levels around the most associated loci were estimated using HaploView v4.2 software [65]. Each physical position of these trait-SNPs associations was investigated to explore the extension of the surrounding LD blocks and determine the genomic regions where to search for candidate genes. The LD blocks were investigated using the method of confidence interval [66], and "solid spine of LD" Availability of data and materials The phenotypic raw datasets generated and analyzed during the current study for the GWAS panel and for the F 1 progeny are available respectively in Table S6, and   Table S3. 'Franquette' and 'UK 6-2' parental maps for QTL detection. Table S4. Genotyping results of SNP AX-171179714 by KASP. Table S5. Walnut trait ontology. Table S6. Phenotypic raw dataset of the GWAS panel. Table S7. Phenotypic raw dataset of the F 1 progeny.     Allelic effect of the KASP marker associated to the leafing date. 96 unreleased breeding line