Consequences of breed formation on patterns of genomic diversity and differentiation: the case of highly diverse peripheral Iberian cattle

Background Iberian primitive breeds exhibit a remarkable phenotypic diversity over a very limited geographical space. While genomic data are accumulating for most commercial cattle, it is still lacking for these primitive breeds. Whole genome data is key to understand the consequences of historic breed formation and the putative role of earlier admixture events in the observed diversity patterns. Results We sequenced 48 genomes belonging to eight Iberian native breeds and found that the individual breeds are genetically very distinct with FST values ranging from 4 to 16% and have levels of nucleotide diversity similar or larger than those of their European counterparts, namely Jersey and Holstein. All eight breeds display significant gene flow or admixture from African taurine cattle and include mtDNA and Y-chromosome haplotypes from multiple origins. Furthermore, we detected a very low differentiation of chromosome X relative to autosomes within all analyzed taurine breeds, potentially reflecting male-biased gene flow. Conclusions Our results show that an overall complex history of admixture resulted in unexpectedly high levels of genomic diversity for breeds with seemingly limited geographic ranges that are distantly located from the main domestication center for taurine cattle in the Near East. This is likely to result from a combination of trading traditions and breeding practices in Mediterranean countries. We also found that the levels of differentiation of autosomes vs sex chromosomes across all studied taurine and indicine breeds are likely to have been affected by widespread breeding practices associated with male-biased gene flow. Electronic supplementary material The online version of this article (10.1186/s12864-019-5685-2) contains supplementary material, which is available to authorized users.


Background
The biological resources of the Mediterranean sub-region of the Palaearctic include a diversity of domesticated animals [1] comprising 53 officially recognized local breeds of taurine cattle (Bos taurus) in the Iberian Peninsula alone (Additional file 1: Table S1). Taurine cattle are thought to have been domesticated by Neolithic farmers from Bos primigenius populations in the Fertile Crescent around 10,000 years [2], and have since diversified into more than 1000 breeds [3]. Cattle genomes have been shaped not only by human-driven selection, but also by genetic bottlenecks associated with migrations from the origin of domestication, adaptation to different agro-ecological areas and a more strict division of animal populations into breeds led by Europeans since the eighteenth century [3]. Furthermore, multiple events of introgression have been proposed to have influenced European cattle breeds: i) ancestral hybridization with European populations of B. primigenius [4][5][6][7][8][9] (extinct in Europe since the seventeenth century [9]); ii) introgression from African taurine cattle [10]; iii) introgression from non-taurine sources such as indicine breeds (Bos indicus, the humped cattle type resulting from a separate domestication event in the Indus valley [11]) [10,12]. Such wide-spread gene-flow resulted in complex patterns of admixture and the difficulty in sometimes establishing whether a breed represents the taurine populations that were originally associated with a specific geographic region [10] and could explain the high levels of genetic diversity relative to other domesticated species [12].
Currently, there are two broad groups of cattle breeds, those under intensive management with strong specialization in dairy or meat phenotypes (such as the commercial transboundary Holstein, Charolais, Limousine, and more recently Angus), and the so-called "primitive" breeds, traditional cattle with a low dependence from external inputs that make use of naturally available food resources. Iberian native cattle are found in diverse agro-ecological systems including coastal, mountain, and lowland arid environments (Fig. 1a). Inheritable traits of these cattle have been modified at different times by the various cultures that inhabited this territory, and breeds are often defined based on morphological traits such as coat color, as well as horn size and body shape. For example, osteometric data validated by the molecular sexing of cattle metacarpals dated to the fifteenth century indicated that there was an overall size increase or improvement of cattle in southern Portugal following the Christian reconquista of the 11th-13th centuries AD [13]. Recently, the Food and Agriculture Organization (FAO) has warned that about 67% of the Iberian cattle breeds are at risk as many of these have less than 1000 breeding females and/or less than 20 breeding males [3], which reinforces the need for a continued conservation strategy. The complex origin of the Iberian primitive breeds is reflected in their high diversity in Y-chromosome haplotypes, including the major taurine Y1 and Y2 haplogroups [14,15] and unique patrilines [16], as well as distinct maternal lineages, i.e. common European T3-matrilines along with more distinct Q-haplotypes [15,17], and a strong influence of T1-lineages of African origin [18]. This higher diversity relative to their European counterparts is quite notable, given the geographic distance of this territory from the presumed Near-Eastern center of domestication [4,14,15,19,20]. This makes Iberian cattle a great example for investigating the genomic impact of the intricate processes of cattle diversification both regarding the last 200 years of specific breed formation and the putative earlier admixture events.
To uncover genome-wide patterns of diversity associated with the formation of primitive cattle breeds, we sequenced the genomes of 48 individuals belonging to eight breeds of native Iberian cattle (Fig. 1a). Their breed denominations have been shown to agree with population structure inferred from microsatellites [19][20][21][22]. Noteworthy, no clear structure is recovered when using genotypes determined with the Illumina Bovine High-Density 777 k SNP BeadChip in the context of European cattle [4], likely a result of ascertainment bias as Iberian breeds were not included in the discovery panel of the genotyping assay. This reinforces the need for full genome data to accurately determine genetic diversity and measure population differentiation [23].
We confirm that there is a clear genetic distinction between Iberian cattle breeds. In addition, we demonstrate that breed management and associated demographic processes had profound effects on genomic diversity and resulted in unusual patterns of genetic differentiation for autosomes vs sex chromosomes. We further describe genome-wide diversity and introgression in Iberian breeds in relation to 60 previously published taurine (B. taurus) and zebu (B. indicus) cattle genomes from Europe, Africa and Asia [24], and sequence data from one European aurochs (B. primigenius) [5]. We confirm that gene flow has occurred between African taurine and Iberian breeds. Overall, we show how whole-genome data are important for uncovering specific patterns related to recent events in breed formation and management, and provide the ground for future studies on the singularity of locally adapted European cattle breeds.

Results
The 48 Iberian cattle genomes and the previously published shotgun resequencing data from 60 additional individuals including taurine and indicine cattle (Table 1;  Additional file 1: Table S2) were mapped with BWA mem to three reference genomes: genome version UMD_3.1.1 (bosTau8) [25], genome version Btau_4.6.1 (bosTau7; contains an assembled Y-chromosome) [25] and to the outgroup wild yak (B. mutus) [26]. Details on the quality-based read trimming and filtering steps are included in the Methods section. Sequencing error rates for all 48 samples are below 0.2% (Additional file 1: Figure S1).
Population structure and individual ancestry were investigated with NGSadmix, which does not require definition of the exact genotypes thus is adequate for low-depth sequencing data [27]. Setting the number of expected clusters to eight (the number of breeds) resulted in the assignment of each individual to the source breed ( Fig. 1b) while assuring convergence of the method. This level of genetic homogeneity within Iberian cattle populations is also observed in the results of the principal components analyses (Fig. 1c). The first two PCs explain 10 and 9% of the total variation and show the high differentiation of Mirandesa and Brava.
Mirandesa in fact appears as an independent cluster when the number of ancestral K populations is set to two (Fig. 1b), and Brava individuals become a separate cluster when K = 4 (Additional file 1: Figure S2).
We assessed the levels of genetic differentiation between breeds by calculating the fixation index (F ST ). In general, we observed high levels of differentiation (average 9%), even when admixture has occurred, which precludes the use of Iberian cattle as a single evolutionary unit. Consistent with their higher heterogeneity, the breed pair Arouquesa/Mertolenga had a low F ST value of 0.06. The highest F ST corresponded to the pairwise comparison of Mirandesa and Alentejana (F ST = 0.16) and the lowest F ST values were obtained for Preta vs Mertolenga (F ST = 0.04) ( Table 2).
F ST values between Iberian breeds and other taurine cattle ranged from 12 to 33%, partially overlapping the divergence values observed for comparisons within Iberian breeds (Table 2). Mirandesa, the most divergent within the Iberian breeds, has the highest F ST values relative to all other breeds (Fig. 5a). The taurine breed with the overall highest F ST relative to the Iberian was the Jersey cattle which may be explained by the insular isolated status of this breed [28], although we must note that this might not be a representative sample of the breed. Overall, the F ST for autosomes was much higher than for chromosome X within taurine breeds and within the indicine cattle ( Fig. 5b; Additional file 1: Figure S7).  When compared to publicly available genomes [24] (database information in the Materials section) of taurine (European Holstein, Angus and Jersey, and African N'Dama) and African indicine cattle (Ogaden, Kenana and Borana), Iberian breeds are clearly assigned by NGSadmix [27] to the European cluster ( Fig. 2a) with a slight suggestion of African taurine admixture at K = 3 for autosomal data. As observed previously [24], at K = 3 the clusters observed represent European taurine, African taurine and African indicine ancestries.
All analyzed breeds have a positive Tajima's D (Additional file 1: Figure S4). As observed previously [24], the commercial European breeds have lower nucleotide diversity (average number of pairwise differences) relative to the African breeds (Fig. 2c). The Iberian breeds analyzed in this study have, overall, similar or higher values of nucleotide diversity compared to their European counterparts. The lowest values correspond to Mirandesa, Brava and Alentejana, which had been previously shown to have the lowest heterozygosity in a microsatellite panel [19,20]).
We then used the maximum likelihood approach implemented in Treemix [29] to uncover the historical relationships between the breeds (Fig. 2b). We intersected our whole genome data with the Illumina BovineHD SNP data of 25 European primitive breeds from [4], which shows that our selection of breeds is representative of the Iberian breed context (Fig. 2b). When allowing for one migration event, we observe gene flow from African taurine to the base of the Iberian clade (Fig. 2b) which had been previously suggested to have occurred [15,17,18]. We explicitly test for differential African cattle introgression into Iberian breeds, using the D-statistics [30,31] and publicly available genomes [24] (database information in the Materials section) of taurine (European Holstein, Angus and Jersey, and African N'Dama). We can confirm that there is a significant excess of shared derived alleles in varying amounts between Iberian breeds and the African taurine N'Dama when compared to a panel of European taurine breeds (Fig. 3). This was observed both for southern Iberian Brava that had the largest African (N'Dama) influence, but also in breeds from the north of Portugal such as Barrosã. These results are further supported by the occurrence of~17% of T1-matrilines in the Iberian cattle analyzed here (Fig. 4). We found no evidence for either indicine or aurochs introgression into Iberian cattle (Additional file 1: Figure S5).

Discussion
Signatures of breeding in the population structure and genetic differentiation of Iberian cattle breeds The observed genetic homogeneity within Iberian cattle and the high differentiation of Mirandesa and Brava are expected to result from genetic drift due to drastic demographic changes: in the 1970s, Mirandesa was raised in a vast area of the Portuguese territory with over 200,000 animals [32] and since has suffered a significant reduction in population size with less than 6000 breeding females registered in the herd book in 2017 (http://www.fao.org/dad-is/browse-by-countryand-species/en/); Brava has historically been reproductively isolated from other breeds living in semi-feral conditions for the main purpose of its use in bullfights [32]. PCs 3 and 4 separate Alentejana and Preta from the remaining Portuguese native breeds, whereas Maronesa, Barrosã and Mertolenga are separated by PCs 5 and 6 (Additional file 1: Figure S3).
Recent crossbreeding involving Arouquesa cattle is revealed in it being the last to form a discrete cluster, showing contributions from the other populations until K = 7 ( Fig. 1b and Additional file 1: Figure S2). This is consistent with an analysis of microsatellite loci, which showed Arouquesa as having the lowest mean genotype membership proportions [19]. This breed is mostly raised in a region located south of the Douro river in the district of Viseu (Fig. 1a), bordering the area of production of Maronesa and in remote times also of the once abundant Mirandesa cattle. Arouquesa has also historically been crossbred with the latter to produce the highly valued "vitela de Lafões", a meat product certified by the European Union with Protected Geographical Indication, and so admixture is intrinsically linked to its history. Another breed showing high heterogeneity was Mertolenga (Additional file 1: Figure S2), one of the most phenotypically diverse Iberian native breeds, with its three distinct coat color phenotypes mostly raised in separate herds [19]. In general, we observed high levels of differentiation (average 9%), even when admixture has occurred, which precludes the use of Iberian cattle as a single evolutionary unit.

Iberian genetic variation in the context of taurine and zebu cattle diversity
The positive Tajima's D (Additional file 1: Figure S4) indicates a reduction in the low-frequency polymorphisms, suggestive of population structure, bias in the choice of genomic markers or of a recent bottleneck probably associated with breeding practices. The lower nucleotide diversity of European breeds relative to the African breeds (Fig. 2c) can be explained by a combination of intensive selection and genetic drift in European cattle breeds [24]. Mirandesa, Brava and Alentejana present the lowest values of all, which probably results from management and demographic histories (as mentioned above, Mirandesa has, since the 1970s, suffered a drastic reduction in population size, and significant inbreeding has been detected in Brava and Alentejana [19]).
Iberian cattle show a clear signature of admixture from African cattle and high diversity in mitochondrial DNA and Y chromosome haplotypes We confirmed that there was gene flow between Iberian breeds and the African taurine N'Dama and this is supported by both nuclear and mitochondrial data ( Fig. 3 and Fig. 4). The Iberian Peninsula and the Maghreb regions share natural zoo-geographical affinities, and there were complex biogeographic and historic faunal and human relationships during much of the early Holocene (including maritime pioneer colonization in the West Mediterranean by the agropastoral communities that reached the Iberian Peninsula in the Early Neolithic [33]), which could explain these patterns of genomic admixture. Despite not finding evidence of indicine introgression in Iberian cattle, it is important to notice that the indicine cattle in our sample has taurine introgression (confirmed by the presence of T1 taurine mitochondrial haplotypes in all the indicine samples of Fig. 4), it is likely that these are not adequate for performing this test. Contrary to previous results [4], we did not find evidence for aurochs introgression into Iberian cattle (Additional file 1: Figure S5) when using sequence data from a 6750 year-old British aurochs [5]. Given the probable complex population structure of ancient wild cattle in Europe [5,9,34], this result does not preclude that local aurochs introgression occurred, but data from pre-domestic Iberian specimens is required for further testing of this hypothesis.
Y-specific markers are useful to investigate crossbreeding [14] as Y-chromosomal variation is geographically structured, with the Y1 and Y2 lineages being predominant in northern and central European taurine cattle, respectively, while the Y3 lineage is specific of indicine cattle [15]. In addition, the effective population size of the cattle Y-chromosome is strongly reduced by the reproductive success of popular sires. The paternal diversity (Y-chromosome) of Iberian cattle (Additional file 1: Figure S6 and Table S2) appears to have its origins in the dispersal of a heterogeneous male population since the Neolithic along the Mediterranean route, rather than in the recent admixture of transboundary commercial cattle which are generally fixed for a single patriline (e.g. Holstein-Friesian). Isolation and less intensive selection probably also contributed to preservation of much of the original diversity in this region. Interestingly, Jersey bulls shared a distinct patriline with African cattle (one Ogaden individual; Additional file 1: Figure S6). Previous analyses of Y-chromosome polymorphisms showed that Jersey is fixed for a specific haplotype that is intermediate between Y1 and Y2 haplogroups [15], this may-well represent an African Y-lineage but more comprehensive data from African bulls are needed.

The impact of breeding practices on chromosomal variation and general patterns of diversification
The lower effective population size in chromosome X relative to the autosomes should lead to stronger impact of the bottleneck (or population structure) caused by breeding practices, observed in an overall higher Tajima's (Additional file 1: Figure S4). In this scenario, genetic drift would be expected to result in higher F ST values for chromosome X (lower effective population size [35]) relative to autosomes, which is what we observe when we compare taurine and indicine cattle (Fig. 5b). However, comparisons within taurine and within indicine show a much higher F ST for autosomes than for chromosome X (Fig. 5b; Additional file 1: Figure S7). This agrees with extensive male-biased gene flow within taurine and within indicinesince males have a single copy of chromosome X, introgression will be more efficient on the autosomes. It is "known" that female populations are more likely to be geographically constrained and human-driven crossbreeding may have been carried out mainly using males [36]. This could also explain the difference in ancestry assignments for autosomes and chromosome X (Fig. 6), with signatures of previously described indicine admixture in the African taurine autosomes, but not observed in chromosome X.

Conclusion
Here we sequenced whole genomes of locally adapted Iberian cattle (for which genomic resources were lacking), and compared them to commercial cattle to uncover genomic patterns associated with the different breeding contexts. Our analyses confirm that these breeds are genetically very distinct and show high levels of genetic variation unlike what would be expected given their limited geographical distribution. Also, Iberian cattle retain much of the original paternal and maternal diversity, which appears to derive from the dispersal of a heterogeneous population since the Neolithic along the Mediterranean route with strong influences from North African taurine cattle, rather than from recent admixture with transboundary commercial cattle. This may have significant impact on the resilience of Iberian cattle to foreseen environmental changes. Not only these breeds produce high-quality certified meat products under local extensive conditions, as they can provide the source for genetic material to improve breeds with depleted genetic diversity, i.e. transboundary commercial cattle. Furthermore, we show that the complex processes underlying the formation of taurine breeds in general had profound effects on genomic diversity and resulted in unusual patterns of genetic differentiation for autosomes vs. sex chromosomes. Our results indicate that genetic differentiation measured using chromosome X might be more representative of the native populations of domesticated cattle, and that comparisons between breeds using autosomal data might be misleading without an appropriate demographic model.

Materials
Information regarding the breeds and the type of genetic data used to investigate genome diversity and genetic A B  Table 1 and supplementary Additional file 1: Note S1. We selected a total of 48 animals representative of Iberian cattle, namely from the Portuguese breeds Alentejana, Arouquesa, Barrosã, Brava de Lide, Maronesa, Mertolenga, Mirandesa and Preta (Fig. 1). The 6 animals of each breed included in our study were nonrelated back to the second generation, originated from several herds, and portray the genetic diversity observed for autosomal microsatellite loci, mitochondrial DNA and Y-chromosome sequences [15,19]. Sampling was done as described in [19], briefly 9 ml of whole-blood were collected from each animal by qualified veterinarians during their routine practice in the framework of official health control programs. Additionally, we used previously generated publicly available genomic data to make population genomics inferences in the context of worldwide cattle:   [18], and 946,518,556 [38] were used together with the mitochondrial consensus sequences obtained from our shotgun data (details below).

Laboratory procedures
Genomic DNA was extracted using a modified salting-out precipitation method (Gentra Puregene Blood Kit, Qiagen) according to the manufacturer's recommendations. We prepared equimolar DNA concentrations for all animals before library construction using Nanodrop™ 2000 (Thermo Scientific) and Qubit™ Fluorometer (Qubit™ dsDNA BR Assay Kit, 2-1000 ng, Invitrogen, Oregon, USA) measurements. Following DNA fragmentation by sonication using a program specific for 550 bp inserts (https://www.diagenode.com/en/ p/bioruptor-pico-sonication-device), genomic libraries were prepared using the TruSeq DNA PCR-free Library Preparation Kit (Illumina, San Diego, CA) according to the manufacturer's protocols. Whole-genome paired-end resequencing data was obtained by pooling 16 samples in each lane and using an Illumina HiSeq1500 instrument with 2 × 100 bp reads.

Sequencing error rates
Sequencing error rates were determined in ANGSD (version 0.917) [39] using a method that relies on an outgroup and a high quality genome to estimate the expected number of derived alleles (similar to a method described by Reich et al [44]). Briefly, if we observe a higher number of derived alleles in an individual we assume that this excess is due to errors. If the high-quality genome is error free, we will obtain an estimate of the true error rate. If there are errors in the high-quality genome, then the estimated error rate can roughly be understood as the excess error rate relative to the error rate of the high-quality genome.

Population structure
NGSadmix version 32 [27] was used to detect population structure with autosomal data from samples for which shotgun resequencing data was available. NGSadmix infers population structure from genotype likelihoods (that contain all relevant information on the uncertainty of the underlying genotype [45]). NGSadmix was run for K equal 2 to 8 for sites present in a minimum of 10% of the individuals: a total of 951,213 SNP sites for the 48 Iberian samples (Fig. 1b); 129,829 SNP sites for the data set including all 128 animals (Fig. 2a); 628,774 for SNP sites for the data set including the 94 female individuals (Fig. 6). The program was run with different seed values until convergence was reached. A principal component analysis using the same SNP set for the Iberian breeds was done with PCAngsd [40] which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Genotype likelihoods for all individuals were generated with ANGSD [39] (options -GL 1 -doGlf 2 -minQ 20 -minMapQ 30).

Phylogenetic analyses
Treemix (version 1.13) [29] was used to infer the admixture graphs (Fig. 2b) using allele counts for 512,358 SNP positions included in the Illumina Bovi-neHD SNP that can be unambiguously assigned to autosomal positions in the cattle reference genome version UMD_3.1.1 [25] using [46]. For shotgun resequencing data, allele counts were obtained from allele frequencies calculated in ANGDS [39] for positions covered in at least 3 individuals. Treemix was run using the global option and standard errors were estimated in blocks with 500 SNPs in each. Even though we do not call genotypes on the shotgun data, the individual breeds where correctly assigned to expected branches in the North/Central European and Iberian clades (Fig. 2b), confirming the robustness of our methodological approach.
The software RAxML [47] version 8.1.7 with 100 rapid bootstrap replicates was used to estimate the phylogenetic trees under the GTR + GAMMA model of sequence evolution for complete mitochondrial sequences from [7,18,37,38] together with consensus sequences from the shotgun resequencing data analyzed in this study obtained by choosing the most common base per position (−doFasta 2 in ANGSD [39]).

D-statistics
To determine the pattern of excess shared derived alleles between taxa, indicative of introgression, we estimated D-statistics using the wild yak (Bos mutus) as an outgroup. All samples were mapped to the yak outgroup genome assembly [26]. The D-statistic [30,31] is approximated by a Gaussian distribution with mean zero [41] in the absence of gene flow between the four populations, allowing for hypothesis testing. We apply an extended version of the D-statistic [41] which can use multiple individuals per population sequenced at low coverage and is implemented in ANGSD [39]. It takes observed allele frequencies for each individual in a population, and then combines them linearly to find an unbiased estimator of population frequency while minimizing the variance [41].

Assessment of genetic diversity and population differentiation
We used methods based on the site frequency spectrum (SFS) [48,49] to estimate nucleotide diversity, the neutrality test statistic Tajima's D (Fig. 2C; Additional file 1: Figure S7) and genome-wide F ST values (Fig. 5 and Additional file 1: Figure S7). Briefly, after estimating the SFS, posterior sample allele frequencies are calculated using the global SFS as prior. SFSs estimated separately were used to obtain joint SFSs for population pairs, which are then used to estimate F ST . For all pairwise breed comparisons, we determined F ST using autosomes 1 to 29. For comparisons relating to chromosome X, F ST was determined for the sex chromosome and autosomes using only female individuals.

Additional file
Additional file 1: Note S1. Brief description of the Iberian native cattle breeds sampled in our study. Table S1. Iberian breeds databases. Table  S2. Individual sample information. The Y-chromosome haplogroups in bold were determined in this study (n.a.: not applicable). Table S3. F ST values between taurine breeds. The highest value is shown in bold and the lowest in italic. Figure S1. Average error rate per sample. Figure S2. Population structure plots determined by NGSadmix; each individual is represented by a stacked column for 3, 4, 6 and 8 proportions. Other K values are shown in Fig. 1b. Figure S3. Principal component analysis done with PCAngsd (components 1 and 2 are shown in Fig. 1c). Variance explained by each component is shown in parenthesis. Figure S4. Differences in Tajima's D between autosomes and sex chromosome X (calculated using only the female individuals). Figure S5. D-statistics determined as in [4] using genome-wide data. Positive values indicate an excess of derived alleles shared by the breeds in H2 (ANG: Angus; HOL: Holstein; JER: Jersey) and the British Aurochs [5], as indicated by the tree depicted above. Figure S6. Approximately-maximum-likelihood phylogeny of cattle Ychromosome sequences (sites with a minimum of two minor alleles) determined in FastTree [6] which uses the Jukes-Cantor distance [7]. Labels for the Iberian cattle are according to Table S2. JER: Jersey; ANG: Angus; BOR: Borana; KEN: Kenana. 50% missing data was allowed. The taurine haplogroups Y1 and Y2 are shown in green and red, respectively, and the indicine Y3 in grey. Figure S7. Gaspar Ferreira (www.paulogasparferreira.com/) and Mariana Bruno-de-Sousa (https://www.behance.net/Mary_Anne_B) were responsible for the graphic concept of the breed distribution figure.

Funding
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors gratefully acknowledge the following for funding their research:

Availability of data and materials
The raw reads have been deposited in the Sequence Read Archive (SAMN11119414-SAMN11119461) with corresponding Bioproject PRJNA526824.
Author contributions R.D.F. and C.G. designed the study with input from A.E.P. and L.C.; C.G. carried out the sampling and DNA extraction; I.U. and S.A. performed the NGS laboratory work; R.D.F. and C.G. analyzed the data with contributions from E.J.; R.D.F. and C.G. wrote the manuscript with contributions from all authors. All authors have read and approved the manuscript.

Ethics approval
Not applicable: animals did not belong to any experimental design but were sampled by veterinarians and/or under veterinarian supervision for routine veterinary care. EU-wide welfare standards and authorizations from the National Governmental Agriculture and Veterinary Agencies were followed in these collection procedures. All samples were collected from commercial farms and the animal owners agreed to be involved in the project through their respective breeder associations.

Consent for publication
Not applicable.

Competing interests
We have no competing interests.