Genomic Analysis of Spanish Wheat Landraces Reveals Their Huge Potential for Breeding

One of the main goals for the XXI century breeding is the development of crop cultivars that can maintain current yields under unfavorable environments. Landraces that have been grown under varied local conditions include genetic diversity that will be essential to achieve this objective. The Center of Plant Genetic Resources of the Spanish Institute for Agriculture Research (CRF-INIA) holds a wide collection of wheat landraces. These accessions, locally adapted to a really wide diversity of eco-climatic conditions, represent a highly valuable material for breeding. However, their efficient use requires an exhaustive genetic characterization. The overall aim of this study was to assess the diversity and population structure of a selected set of 380 Spanish landraces and 52 reference varieties of bread and durum wheat by high-throughput genotyping. approach generated 10K SNPs and 40K DArT high-quality markers that were mapped against the currently available bread wheat reference genome. The markers with known location were distributed in all the having a relatively well-balanced genome-wide coverage. The genetic analysis showed that Spanish wheat clustered different allelic The subspecies had major on the population structure durum wheat landraces, identifying three different clusters that corresponded to subsps. durum, turgidum and dicoccon. The population structure of bread wheat landraces was more biased by geographic origin. and

the reference set in durum wheat than in bread wheat. Some genomic regions with patterns of variation that differ between landraces and reference varieties could be detected, pointing out loci under selection during crop improvement that could help to target breeding efforts. The results obtained from this work will be highly valuable for future GWAS analysis.

Background
Wheat is a cereal that belongs to the Poaceae family. Because of its high agronomic adaptability, it is the most widely grown crop in the world. Wheat also occupies a central place in human nutrition providing 20% of the daily protein and food calories. Current cultivated wheat has originated from natural hybridization events between different species (1). Roughly 90 to 95% of the wheat produced in the world is common or bread wheat (Triticum aestivum L.; 2n= 6x= 42, 17Gb, AABBDD genomes). The rest of the wheat world's production includes about 35-40 million tonnes of durum wheat (T. turgidum var. durum; 2n= 4x= 28, 13Gb, AABB genomes), which is cultivated mainly in the Mediterranean region ( http://www.fao.org/faostat/en/). Advances in molecular biology and high-throughput genotyping technologies have significantly improved the field of molecular plant breeding, moving from a phenotypebased to a genotype-based selection (2). The integrated used of genomic and molecular tools in conventional phenotype selection programs has allowed the development of new breeding strategies such as genome-wide association studies (GWAs) and genomic selection. However, in wheat, its large complex genome with over 85% repetitive DNA content has hampered these molecular breeding approaches compared to other crops, i.e. the presence of two or three separate but closely related subgenomes hinders the analysis of homoeologous gene sequences (3). The recently published bread wheat reference genome (4) provides at last high quality information that will help to physically locate molecular markers, and will increase the potential of GWAs analysis by facilitating the identification of key genes underling the associations eventually detected. This is also enabling 'genomic selection' to be possible in wheat breeding programs to predict superior traits based on DNA markers (5).
The successful genomics assisted breeding of any crop requires a thorough understanding of the species genetic diversity. As in other crops, genetic diversity of wheat has declined as a consequence of bottlenecks encountered during polyploidization and domestication (6). The modern plant breeding practices, in which only a small number of elite cultivars are included on breeding programs, has further narrowed the genetic base of wheat throughout the world. The consequent erosion of genetic diversity limits the pool of alleles in which to search for new traits of agronomic interest and adaptability to address novel threats such as climate change. Fortunately, an enormous number of genetically different, locally well-adapted wheat landraces were created by natural and farmer selection in the previous century. These collections represent an invaluable material for genetic studies and crop improvement. A deep knowledge of their genetic/genomic diversity is thus essential to deal with future plant breeding challenges (7).
A large number of studies have been performed to estimate genetic diversity employing different methodologies in diverse plant species (8,9) including wheat (10) . It is accepted that molecular markers are the best option to address genetic variation studies, and among them, Single Nucleotide Polymorphisms (SNPs) distributed across the entire genome are the most frequently employed to assess genome wide diversity. It is nowadays possible to detect thousands of SNPs in a large number of accessions in a reasonable time with high-throughput technologies like SNP arrays (11) or genotyping-bysequencing (GBS) (12).
Assessing genome-wide diversity by GBS provides a robust estimate of the diversity and it has been increasingly adopted as a fast, high-throughput and cost effective tool for wholegenome genetic diversity analysis in large germplasm sets (13). The DArTseq markers, based on GBS (14), efficiently target low copy sequences by a complexity reduction method and have been successful applied for genetic diversity studies in different species (15)(16)(17) (18). Compared to SNP-based genomic profiling, the DArTseq method is much more comprehensive in terms of the molecular variation underlying the polymorphisms but also provides data at a more affordable cost, especially in complex polyploid species as wheat ( https://www.diversityarrays.com).
High throughput genotyping also provides essential information for the design of highpower genome-wide association studies (GWAs), which enable the identification of agriculturally important genes and ease their transfer from wild or local germplasm into modern cultivars through marker-assisted selection and marker-assisted breeding and/or genomic selection. For GWAs analysis, the optimum diverse panel must be genotyped with a set of molecular markers, covering as much as possible the genome of the species (19) the population structure needing to be investigated to avoid false associations between phenotypes and markers (20).
The Spanish wheat landraces conserved at National Plant Genetic Resources Center (CRF-INIA) and maintained in the national collection, were collected in the first half of the 20th century. Several studies have shown the great variability of these materials when compared to other germplasm collection (21)(22)(23)(24). However, their high throughput genomic characterization has not been yet fully addressed.
The aim of the present study was to characterize two collections of durum and bread wheat landraces from CRF-INIA by using the DArTseq-GBS approach. Specific objectives of the present investigation were: (1) to assess the genomic diversity of a set of durum wheat accessions comprising 191 Spanish landraces and 23 reference varieties, (2) to assess the genomic diversity of a set of bread wheat accessions comprising 189 Spanish landraces and 29 reference varieties, and (3) to compare the genetic diversity of landraces and modern cultivars in both wheat species.

Wheat genotyping
We have characterized at genomic level a set of 380 landraces and 52 reference varieties (Additional file 1). The DArTseq approach allowed us to detected around 100K DArTs (presence/absence markers) and 50K SNPs in each analyzed species.
In the tetraploid durum wheat, a total of 98,983 DArTseq markers and 51,751 SNP markers were obtained for 214 accessions (Additional files 2 and 3). Markers, mapped against the genome of the close hexaploid species T. aestivum, were detected along the whole genome (Table 1). In raw data, around 6% and 37% DArTs, and 7 % and 20% SNPs were located in the T. aestivum D genome (not present in the tetraploid) or not mapped, respectively. In summary, 57% DArTs and 73% SNPs could be located. After filtering to obtain highly informative markers 38,700 DArTs and 9,324 SNPs were selected for further analysis. In this set of markers, the percentage of markers mapped was similar to the raw data (83% in SNPs and 66% in DArTs). As shown in Table 1, the filters applied did not affect the markers distribution along the genome. The A and B genomes presented a comparable number of markers, either before or after filtering. The chromosome 4 of the homoeologous group B had the lower density of both types of markers.
In the hexaploid bread wheat, a slightly higher number of markers, 130,899 DArTseq markers and 58,660 SNPs, were generated (Additional files 4 and 5). As in durum wheat, markers were also detected throughout the whole genome (Table 1). The stricter mapping parameters used for bread wheat reduced the chance of homoeologous cross mapping, but also the percentage of located markers when compared to durum wheat. So, around 64% of raw DArTs and 41% of raw SNPs were not mapped in the bread wheat reference genome. After filtering, 44,241 DArTs and 8,238 SNPs were selected for further analysis.
In this set of markers, the percentage of mapped markers was similar to the obtained in raw data (36% in DArTs and 58% in SNPs). The D genome presented a reduced amount of markers compared to A and B genomes. Again, the chromosome 4 from homoeologuos group B had the lowest density of both types of markers.
The marker distribution along the chromosomes was comparable in both species for all the chromosomes from genomes A and B. In general the distributions showed a higher density of markers at both chromosome ends as it is illustrated for chromosome 2A in Figure 1.
The same pattern was observed for bread wheat genome D.
In both species, the distribution of PIC values for DArT and SNP data was asymmetrical and skewed towards the lower values ( Figure 2). In durum wheat, the 82% of the DArTs and the 75% of the SNP markers showed a PIC value >0.2. In bread wheat the corresponding values were 76% and 70% for DArTs and SNPs, respectively. In both species and type of markers, the average PIC value was between 0.30 and 0. 35. Genetic structure of the durum wheat collection fastSTRUCTURE runs with 38,700 DArT markers divided the tetraploid wheat landraces into seven populations (K=7) ( Figure 3A). All the accessions belonging to subsp. dicoccon but one (BGE021775) were grouped in Pop5 and all belonging to subsp. turgidum were clustered in Pop3. The landraces in both populations came mostly from the North of Spain ( Figure 3B). All the subsp. durum landraces but one, (BGE013103) that was classified in Pop3, were distributed into five populations (Pop1, Pop2, Pop4, Pop6 and Pop7), containing from 10 to 80 accessions (Additional file 1). The Pop6 had the higher number of accessions, showed the greatest degree of admixture and was the population with the most diverse eco-geographical origin ( Figure 3). However, some subsp. durum populations showed a differentiated geographic distribution (Additional file 1). That is, landraces of Pop1 originated mostly from the eastern Spain, whereas those in Pop2 were from the south-western provinces. The Pop4 included landraces from the South and East of Spain, and the Canary Islands.
Genetic diversity parameters were calculated in the fastSTRUCTURE populations based on SNP data ( Table 2). The population with the highest genetic diversity value (Hs) was Pop6 We have also explored the genomic structure of the durum wheat collection, including landraces and reference varieties, by a principal coordinate analysis (PCoA) based on the 9,324 filtered SNP markers. The first two principal coordinates explained 21% of the total variation. Three discrete groups corresponding to the three subspecies could be clearly identified ( Figure 4A). This was in agreement with the results obtained with fastSTRUCTURE where the three subspecies were grouped into different populations. A fourth group corresponding to the reference varieties also appeared clearly separated from the subsp. durum landraces. The subsp. durum accessions were differentiated from the others by PCo1, but the difference between turgidum and dicoccon was due to PCo2, evidencing, as detected in the Hs analysis, that different sets of markers are responsible of the genetic divergence among subspecies. Some landraces of subsp. durum were close to subsp. dicoccon and turgidum. These landraces were from Pop6 and some of them (e. g. BGE019290) come from the North of Spain (Additional file 1).
We further investigated the allelic variability for a functional marker involved in wheat adaptability, the vernalization gene Vrn-A1, in relation to the population structure. Three different alleles were identified in the collection: the winter type allele vrn-A1, and two alleles related to spring growth habit, Vrn-A1b and Vrn-A1c. The representation of allelic variation in the PCoA showed that most of the accessions carrying the winter type allele vrn-A1 were grouped together as corresponded to dicoccon accessions.( Figure 4A) Most of the reference cultivars and subsp. durum accessions carried the Vrn-A1c allele, and almost all of the subsp. turgidum the Vrn-A1b allele. When analysed within the population structure, all durum accessions from Pop2, Pop7, and Pop1 but one, presented the Vrn-A1c allele, which was also identified in the 80% of durum wheat landraces clustered in Pop 6.
In Pop3 and Pop4 the most frequent allele was Vrn-A1b. According to passport data (see Additional file 1) accessions with the winter type allele vrn-A1 come mostly from the North of Spain.
The allelic variation was also studied for HMW-GS loci Glu-A1 and Glu-B1, related to wheat rheological properties, but no relationship with the population structure could be observed (data not shown).
As the subspecies was the main discriminant factor in the global PCoA, we decided to perform the analysis excluding the turgidum and dicoccon accessions to gain insight into the variability within the subsp. durum ( Figure 4B). The populations identified in the previous analysis with DArTs ( Figure 3A) were similarly grouped in the SNP-based PCoA, Pop6 being again the one that showed more dispersion due to its higher intrapopulation variability ( Figure 3A). The only durum accession clustered in Pop3 (BGE013103) appeared now closely located to Pop1 landraces ( Figure 4B). This local variety can be identified in the Figure Table 3 The relationship among the bread wheat accessions was also assessed by a PCoA, based on 8,238 SNPs in the whole bread wheat collection. The total amount of genetic variation explained by the first two principal coordinates was 19.2%. The first two coordinates clearly separated the Pop4 (by PCo1) which formed the most distant group, separated from the other three populations that appeared distributed along PCo2 ( Figure 6A). Some degree of overlapping was shown between Pop2 distributed along the PCo2 and both, Pop1 and Pop3 located in the upper extreme of PCo2. This is in agreement with the higher degree of admixture in Pop2 revealed by fastSTRUCTURE analysis ( Figure 5A). Reference varieties were located in a tight space, but overlapping with some Pop2 landraces.

As shown in
As in durum wheat, we investigated the allelic variability of Vrn-A1 gene in relation to the population structure. In bread wheat, three different alleles were identified in the collection, the winter type allele vrn-A1, and two alleles generally related to spring growth habit, Vrn-A1a and Vrn-A1b. Accessions from Pop4 were characterized by having almost exclusively the Vrn-A1b allele. The accessions presenting the other two alleles could be differentiated along the PCo2 axis, whose upper part corresponded to spring type landraces whereas the reference varieties and winter type landraces were included in the lower part ( Figure 6A).
The representation of Glu-B1 alleles specific of Iberian landraces in the PCoA showed that accessions that carried the Glu-B1f allele (HMW subunits 13+16) were clustered at the right part of the PCoA and corresponded to most of Pop4 samples. Again, an interesting tendency could be observed along the PCo2 axis: accessions with the Glu-B1e allele (HMW subunits 20x+20y) appeared grouped in the upper part, while the lower part corresponded to accessions with Glu-B1u or Glu-B1al alleles (HMW subunits 7+8 or 7OE+8) ( Figure 6B).
The allelic variation was also studied for HMW-GS loci Glu-A1 and Glu-D1, but no relationship with the population structure could be observed (data not shown).
The overlap between landraces and reference varieties in bread wheat was more remarkable than in durum wheat. Actually, reference varieties fully overlapped with Pop2 accessions. Some of the landraces closer to the references varieties were collected in the 90's (e. g. BGE025410) and according to their early flowering and lower height phenotype (http://webx.inia.es/web_coleccionescrf/CaracterizacionCRFeng.asp) may not be real landraces.

Divergence between landraces and reference varieties
The PCoA analysis showed a clear genetic divergence between landraces and reference varieties in durum wheat whereas it was not so evident in bread wheat, where landraces and reference varieties overlapped (Figures 4 and 6). Regarding to the overall genetic diversity, landraces showed a higher diversity than the reference varieties in both species (Table 4) In order to analyze the degree of allele fixation in the reference varieties, we studied the presence of monomorphic markers. Around the 40% of SNP markers were fixed in durum wheat reference varieties, the number of monomorphic markers (3,791) being comparable to the found in subsps. dicoccon and turgidum (4,079 and 2,119 respectively) and much higher than the found in subsp. durum (478 fixed markers). In bread wheat reference accessions, the number of fixed SNP markers was 1,771 (21%) clearly lower than the obtained in durum wheat (Table 4).
We also analyzed the Hs distribution along the genome in landraces and reference varieties ( Figure 7). We called low diversity regions in the reference varieties with respect to the landraces, as those might be regions fixed by breeding efforts. In durum wheat we detected 24 genomic regions located in 11 different chromosomes ( Figure 7A). All chromosomes except 1A, 1B and 4B presented at least a fixed region. Chromosome 3B, the largest one, with 5 regions spanning a total of 46Mb, presented the greatest number of low diversity regions. On its turn, chromosome 5A presented the widest region (42Mb).
In summary, low diversity regions spanned 288MB and include some key genes. Such as, the region located in chromosome 2B, that overlapped with Sr36 gene related with resistance to leaf rust. In bread wheat a similar number (22)  Wheat high throughput genotyping This is the first report of high throughput genotyping by GBS of Spanish wheat germplasm.
The GBS method has the potential to provide in-depth and robust genomic diversity estimates. Moreover, it may reveals new and favorable alleles, which might have a high value for prebreeding (25)(26)(27). The GBS-DArTseq methodology has been successfully applied in wheat species, where a standard GBS approach, that requires the genome resequencing, is still challenging due to their huge complex genome (7).
In our study, 50K SNP and 100K DArT markers in each of the wheat species were analyzed, from which we have been able to select around 10K SNP and 40K DArT high quality markers. The availability of the reference wheat genome (4) (WGA v0.4) allowed the marker mapping in both species. The results showed that markers were distributed all over the genome although D genome markers were remarkably less abundant than A and B genome markers in bread wheat. It can be noted that earlier hexaploid wheat maps based on expressed sequence tag loci have showed a fairly even distribution of markers among the three wheat genomes (e. g., (28)). Among homoeologous sets, group 4 chromosomes were the less covered, especially 4B and 4D. No satisfactory hypothesis has yet been proposed to explain the relative scarcity of markers consistently mapped on 4B compared to 4A (28,29) despite the latter has been the involved in several structural rearrangements during wheat (30).
Interestingly, the percentage of mapped markers was higher in durum wheat than in bread wheat although the reference genome employed is from the latter. This can be due to the fact that two different sets of mapping parameters were used. Parameters were less strict in durum wheat, as the reference genome corresponds to the hexaploid T. aestivum.
However, it will be advisable to repeat the mapping procedure in durum wheat once its reference genome becomes available, to avoid any mapping misplacements prior to GWAs analysis. The informativeness of markers was assessed by their PIC values whose distribution and average PIC value was comparable to the previously reported in wheat (17,(31)(32)(33)(34).

Population structure of Spanish wheat landraces
Ex situ collections preserve valuable genetic resources collected over time that include germplasm from different genepools. The identification of useful alleles is largely improved when the genetic structure of the germplasm collection is known. Moreover, the study of population structure is also important for genomic studies, being a previous and mandatory step to successfully perform further GWAS analysis. In our case, since GWAS will be based on the high quality SNPs identified, we decided to assess the structure with a different set of markers, such as DArTs. The population structure was assessed with FastSTRUCTURE algorithm, developed by the authors of the classical STRUCTURE software, but with faster runs, and comparable ancestry estimates and prediction accuracies (35). In both species, the analysis of population structure was complemented with a PCoA analysis performed with the SNPs data set. As expected, the results were in close agreement, but some additional information about subgroups and their relation to phenotypic traits could be extracted.

Durum wheat landraces
Spanish durum wheat (Triticum turgidum L.) landraces belong to three main interfertile subspecies (dicoccon, turgidum and durum). The subsp. dicoccon, also known as emmer wheat, is a hulled wheat only grown in the North of Spain that represents the feral situation in this crop. The subsp. durum is the most widely cultivated and well adapted to the dry-summer conditions of the South. The subsp. turgidum, less common and grown in colder areas than durum, corresponds mostly to winter wheats (36). In a previous research, SSRs were used to assess the genetic structure of the collection of durum wheat landraces analyzed here and 9 populations were established. Some of the populations included more than one subspecies and several genotypes could not be classified in any population (37). In the present work we identified seven populations, all of them composed by landraces from one subspecies with only two exceptions ( Figure 3). The discrepancies between this and the earlier work can be explain by the different type and number of markers employed for assessing the population structure in both studies (40K

DArTs vs 39 SSRs).
Analyses comprising other durum wheat materials have not been able to separate the subspecies durum and turgidum (38,39). However, we have clearly differentiated the three subspecies. This demonstrates the analytical power of DArT and SNP markers for taxonomical identification and supports the classification of Mac Key (40,41) with turgidum and durum as separate subtaxa. In our study, subsp. dicoccon was closer to turgidum than to durum (see F ST values, Table 2), indicating that both subspecies share a common allele pool. As both subspecies were grown in similar environmental conditions, this closeness might be due to similar selective pressures during local adaptation. The analysis allowed also the identification of accessions that could be admixtures between subspecies, such as BGE021775 a dicoccon landrace closer to durum, or BGE04564, a durum landrace grouped with the turgidum landraces. Those admixtures have been previously reported in durum wheat as not unusual in ancient local forms (40).
Vernalization genes determine winter/spring growth habit in temperate cereals and are one of the major genetic factors that, affecting their life cycle, enable the crop plant to adapt to a wide range of environments (42). The evolution of spring cultivars from winter habit ancestors is a main event in the post-domestication spread of wheat. However, studies of the major vernalization gene VRN1 have been mostly limited to hexaploid wheat species and very few reports in tetraploid species can be found in the literature (43,44).
None of the durum wheat accessions characterized here showed the Vrn-A1a allele described in spring-habit hexaploid wheats, which is according to what has been found in other studies (45,46). All seems to indicate that this allele appeared in wheat evolution after the last polyploidization event. The Vrn-A1c allele has been described as the most frequent determinant of spring habit in tetraploid wheats (45,47,48) but has been described as rare in emmer wheats (38). In our study, 78% of durum accessions presented this allele, but almost all turgidum accessions (83%) carried the Vrn-A1b, and 7 out of the 11 dicoccon landraces characterized for VRN1 carried the winter vrn-A1 allele (Figure 4). It is noteworthy the presence of alleles associated to spring growth habit in the remaining four accessions from subsp. diccocon , as in this subspecies, traditionally cultivated in mountainous regions where the vernalisation requirement is of importance, the winter habit is predominant. Our results are consistent with the description of few Spanish spring emmer wheats in the past (36), and remarks the value of the collection to provide favourable alleles for the creation of novel cultivars with improved adaptability to changing eco-climatic conditions. Low or none influence of the geographic origin on the genetic structure of durum wheat materials, due to the exchange of germplasm, has been reported for landraces of Iran, Central Fertile Crescent and Ethiopia (49)(50)(51). In our collection, a restricted geographic distribution exists for the Spanish landraces belonging to the less represented subspecies dicoccon (14 accessions) and to some extent for turgidum landraces (37 accessions).
However, landraces of subsp. durum, which were structured into five distinct populations, showed not only greater variability and complexity but also less influence of the origin, although some geographic areas were predominant in some of the populations.

Bread wheat landraces
The great majority of the Spanish bread wheat landraces conserved at CRF belong to Triticum aestivum subsps. vulgare, and thus our bread wheat set was composed exclusively by this subspecies. The population stratification of the bread wheat panel identified four groups of landraces with a great divergence according to the F ST values.
This clustering reflected better the geographic origin of the accessions than in the subsp.
durum. The genetic differentiation, estimated by D est , was smaller in the bread landraces collection than in the whole durum wheat collection, in agreement with the less level of stratification observed. Few studies have simultaneously addressed the variability of hexaploid and tetraploid wheats but a higher genetic diversity in durum than in bread wheat has been previously reported in landraces from other countries (e. g., (52)).
One of the four groups detected (Pop4) was clearly more genetically distant. This group included landraces from western Spain, where there is a prevalence of acidic or neutral soils (53). Most of the accessions from this population show spring growth habit, and carry the f allele at Glu-B1 HMW locus. Glu-B1f allele has a low frequency in worldwide collections but has been previously described as characteristic of Iberian landraces, which also holds for the Glu-B1e allele (54,55). The latter was predominant in Pop1 and Pop3, being also present in some landraces of Pop2 that were closely grouped by PCoA. The Glu-B1e allele is related to poor rheological properties in bread wheat, but the f allele has been associated with a good dough quality (56). The presence of this variant in a discrete group of more differentiated landraces supports their common origin.
In our study, the winter-type allele vrn-A1 was the most common Vrn-A1 allele found in T. aestivum reference cultivars (22 out of 29 varieties). However, the most frequent within the bread wheat landraces was the Vrn-A1a allele, found to be absent in durum wheats. It is the most potent spring-habit allele described, causing complete insensitivity to vernalization (57). Though the measured impact of growth habit on genetic differentiation is limited, spring and winter bread wheat can frequently be distinguished by cluster DAPC analyses (58,59). Two discrete groups regarding growth habit were not clearly defined in our PCoA analysis, but the allelic variability showed some relation to population structure and some biased tendency along the second PCo axis ( Figure 6A). On this matter, it has to be kept in mind that vernalization response is a complex process under polygenic control.
Vrn-A1 has been described as the main genotypic determinant of the vernalization requirements in temperate crops, but there are othergenes, as Vrn-B1 and Vrn-D1, whose allelic variability has not yet been addressed. This is the first time that the bread wheat resources maintained in the CRF-INIA Spanish national genebank are characterized at genetic level and compared to durum wheat accessions. The deep knowledge of their genetic structure represents the starting point for the development of a core collection of Spanish T. aestivum wheats, which will allow the efficient management and use of its valuable gene pool.

Relationship between landraces and modern cultivars
In diversity studies, wheat landraces usually cluster in a separate group from elite cultivars (50,51,59,60), but in some cases some degree of mixture has also been found (32,58). Concerning the durum wheat materials examined here, it is remarkable the great genetic divergence between the bulk of landraces and the reference set ( Figure 4A).
Moreover, some durum landraces are closer to turgidum and dicoccon than to the reference varieties (that belong all to subsp. durum). However, a great relatedness with the reference varieties has been detected for a reduced group of durum landraces that includes 'Caravaca' (BGE002869), a Spanish landrace used by CIMMYT in the development of some modern cultivars (see the Genetic Resources Information System for Wheat and Triticale of CIMMYT at http://www.wheatpedigree.net/). It can also be noted that other studies have reported a relation between Spanish and North-African landraces (32,38) and that the reference set included two old cultivars commonly cultivated in Spain in the past ('Senatore Capelli' and 'Bidi-17') both with a North African origin. Nevertheless, and overall, our results suggest a little involvement of Spanish landraces in the development of modern durum wheat varieties grown in Spain at present.
In bread wheat, the situation was quite different, and mixture between landraces and reference varieties was much higher, in special with accessions in Pop2 ( Figure 6). Likely, some of them are not true local landraces, but rather old adopted cultivars wrongly classified. Some others are related to Italian material, such us 'Ritchela Blanca', 'Montnegre' or 'Ardito' (61,62). Clustering of landraces and reference varieties can also be indicating a pedigree relationship. Then, it is possible that some of the landraces characterized in our study were among those utilized by the early breeders to develop pure lines.
Mediterranean wheat landraces have been considered highly valuable due to their huge genetic diversity and the presence of accessions with high resilience to abiotic stresses, resistance to pests and diseases and grain quality (63)(64)(65). This fact was patent in our study, where the overall genetic diversity of the reference cultivars was way lower than that of the landraces in both species (Table 2; Figures 4 and 6).
Genomic regions with patterns of variation that differ between landraces and reference varieties can aid to identify loci under selection during crop improvement which is of help to better target breeding efforts (66) (Figure 7). Our analysis allowed the identification of several of such genomic regions studying the distribution of genetic diversity along the genome. The number of regions in durum wheat was higher than in bread wheat, probably due to the higher number of mapped markers in this species. As expected, some of the chromosomes containing fixed regions in both species harbor genes related to agronomically important phenotypes, such as, Sr36 in chromosome 2B in durum wheat, or in bread wheat Rht-B1 (4B) and Vrn-B1 (5B), associated with vernalization (67,68). On their turn, the fixed region in chromosome 1A can be related with the presence determinants of bread quality, as Glu-1in this chromosomes. Coupling this analysis with future GWAs studies will help to identify the traits underlying each of the fixed regions detected in this work. Moreover deciphering of their gene content will be facilitated by the remapping of the markers against the durum reference genome once it will be available.

Conclusions
The replacement of local landraces by high-yielding wheat varieties that began in the times of the Green Revolution has led to a loss of genetic variation in crop wheats. This depletion has now encouraged the use of genetic resources in wheat breeding programs, but the genetic variability of these resources needs to be exhaustively characterized for their efficient use. This study demonstrated the relevance of DArTseq technology as a reliable and cost-effective tool for assessing the diversity within and between landraces and for establishing the genetic relationships between them and modern wheat cultivars.The study of the genome-wide diversity will also provide a resource for the design of high-power GWAS experiments, which will help to achieve the overarching goal of improving wheat for different environments, ecosystems and stress situations. The collections of Spanish landraces characterized in the present study are clearly clustered in different groups, thus representing different genepools capable to provide different sources of genes for plant breeding. The panel of genotypes investigated has shown an outstanding degree of diversity compared to the reference counterparts. It then clearly represent a strategic platform and a valuable genetic resource that must be further study to ensure not only its efficient conservation and management but also its useful exploitation in breeding programs.

Plan material
The plant material analyzed in the present study comprised 432 selected accessions The marker calling and mapping was performed by the genotyping services. For durum wheat, markers were blasted against the current available Triticum aestivum genome IWGSC WGA v0.4 ( https://wheat-urgi.versailles.inra.fr) a tightly related species. A marker was mapped when blast E-value < 5e-7 and sequence identity >70%. For bread wheat, markers were blasted against the same genome. However, in this case more strict parameters (blast E-value < 5e-10 and sequence identity >90%) were used, as accessions belonged to the reference genome species. Genotyping services provided two different sets of markers, including their genomic location. DArT markers were scored as binary data (0/1) indicating the presence or absence of a marker in each accession, and SNP markers were scored as 0/1/2 indicating the presence of the reference allele in homozygosis, the alternative allele in homozygosis or an heterozygous genotype, respectively. Raw data are available upon request to the corresponding author.
To contrast with the population structure based on GBS-DArTseq markers, we investigated the allelic variability for functional markers in Vrn-A1 gene, one of the most determinant loci involved on the transition from vegetative to reproductive growth (72,73). It has been described that carrying a dominant allele at Vrn-A1 locus is enough to confer a spring growth habit (48). Three alleles were characterized by PCR, Vrn-A1a, Vrn-A1b and Vrn-A1c according to (45,47), and following the protocols described in https://maswheat.ucdavis.edu/protocols/Vrn/index.htm.
Additionally, the panel of accessions was also genotyped for Glu-1 homoeoloci. These

Data analysis
Prior to any molecular-based analysis, the set of markers to be considered was filtered employing homemade R scripts (75) available upon request to the corresponding author.
For SNP markers, we did not expect any real heterozygous genotypes, as all the accessions genotyped were homozygous lines. Thus, when the genotypic values for a marker were only 0 or 2, we considered the heterozygous calling an error caused by the presence of the SNP marker in homoeologous genomes. In this case, the genotypes scored as 2 (putative heterozygotes) were recoded as 1. The same procedure was followed when only 1 or 2 genotypic values were present in a sample, but in those cases genotypes scored as 2 were recoded as 0. Finally, when the genotypic values for a marker included 0, 1 and 2 we recoded as missing data the genotypes scored as 2. From this step, we filtered SNP and DART markers following the same procedure. First, when several markers presented the same allelic profile, all but the one with the less missing data were removed. Then, markers with more than 10% missing data or monomorphic (Minimum Allele Frequency <0.05) were excluded.
Genetic substructure inside the durum and bread wheat landrace collections was investigated using the fastSTRUCTURE algorithm (35) and the DArT markers dataset.
Default parameters and K values from 1 to 15 were tested. The appropriate number of components that explained structure in the dataset was determined using the chooseK.py function (35). Results for the identified optimal values of K were visualized using DISTRUCT (76). Individual accessions were assigned to the population with the highest proportional membership.
Genetic similarity based on SNP data inside the full sets of accessions (landraces plus reference varieties) was analyzed by Principal Coordinates Analysis (PCoA) using the gl.pcoa function from dartR R package (77).
Gene diversity (Hs) within populations, landraces and reference varieties were calculated based on the SNP dataset according to (78) with the function basic.stats from hierfstat R package (79). Dest, a measure of population differentiation in collections with several populations, was calculated as defined by (80), with this same function. Genetic differentiation between populations was analyzed estimating a pairwise Fixation index (Fst) according to (81) with the stamppFst function from R StAMPP package (82) using the SNP dataset.
Fixed genomic regions on the reference varieties were identified by doing a scan of the Hs values along the different chromosomes. Hs was estimated as described previously.
However we did not filter out the SNPs markers with the same allelic profile in order ot increase the number of mapped markers to refine the estimations. A region was considered as "fixed" when it contained at least 5 consecutive markers with Hs equal to 0 in the reference varieties and at least 5 markers with Hs > 0.1 in the landraces, and spanned more than 5 Mb.
in genomes and chromosomes is presented.     Collection sites of the different T. aestivum accessions, colored by their STRUCTURE population assignment. When GPS coordinate data were not available, the coordinate of the capital of the province of origin has been used.