Signatures of selection in the genome of Swedish warmblood horses selected for sport performance

Background A growing demand for improved physical skills and mental attitude in modern sport horses has led to strong selection for performance in many warmblood studbooks. The aim of this study was to detect genomic regions with low diversity, and therefore potentially under selection, in Swedish Warmblood horses (SWB) by analysing high-density SNP data. To investigate if such signatures could be the result of selection for equestrian sport performance, we compared our SWB SNP data with those from Exmoor ponies, a horse breed not selected for sport performance traits. Results The genomic scan for homozygous regions identified long runs of homozygosity (ROH) shared by more than 85% of the genotyped SWB individuals. Such ROH were located on ECA4, ECA6, ECA7, ECA10 and ECA17. Long ROH were instead distributed evenly across the genome of Exmoor ponies in 77% of the chromosomes. Two population differentiation tests (FST and XP-EHH) revealed signatures of selection on ECA1, ECA4, and ECA6 in SWB horses. Conclusions Genes related to behaviour, physical abilities and fertility, appear to be targets of selection in the SWB breed. This study provides a genome-wide map of selection signatures in SWB horses, and ground for further functional studies to unravel the biological mechanisms behind complex traits in horses.


Background
The Swedish Warmblood (SWB) is a modern horse breed selected for equestrian sport purposes, mainly show jumping and dressage [1]. The origin of the breed dates back to the eighteenth century when the Royal Cavalry requested more agile and faster horses, leading to intensified breeding and selection of Swedish riding horses [2]. The SWB studbook was thus founded in 1928 with the initial aim to breed horses for multiple equestrian purposes. Since the demand for physical and mental abilities in sport horses has increased noticeably in the last decades, emphasised selection of SWB horses for specific disciplines, which are dressage or showjumping, is now common practice. The current goal of the SWB studbook is to breed internationally-competitive warmblood horses in terms of rideability, performance-oriented temperament, excellent gaits and/or jumping ability [1].
Recent advances in genomic methodologies have paved the way to explore the effects of selection in the genome. Positive selection reduces genetic variability and it results in increased genomic homozygosity. Stretches of consecutive homozygous loci, runs of homozygosity (ROH), have been used to identify genomic regions potentially under artificial selection in several species. The first large-scale study of how selection has shaped the equine genome involved 744 horses from 33 breeds, genotyped using the Illumina SNP50 Bead chip. The study was based on fixation index (F st ) statistics and provided evidence of genomic selection signatures for breed-specific morphological features and gaits [3]. Genomic regions with loci responsible for body size, milk yield and fertility were detected by ROH analysis in cattle [4][5][6], whereas regions involved in immune system and behavioural features were discovered in sheep and goats [7,8]. More recent examples of ROH and population structure analyses in horses were based on both medium and high-density genotype data and revealed key aspects on the history of European horse breeds [9][10][11][12]. ROH detected on whole-genome sequencing data from ten horses of different breeds showed potential selection for reproduction and fertility-related traits [13]. Similarly, the degree of homozygosity at haplotype level can be used to detect signatures of positive selection [14]. The Cross Population Extended Haplotype Homozygosity (XP-EHH) method estimates the length of extended haplotypes and evaluates differences between two populations. It has previously been used to detect breed specific signatures of selection in domestic species [15][16][17]. The XP-EHH analysis effectively detects signals of differentiation across breeds and it has previously been used to detect putative loci affecting height in Shetland ponies [18]. Genomic regions associated with selection for racing performance were identified by a scan for selective sweeps using whole-genome sequencing data from Thoroughbreds and the native Jeju breed [19].
The aim of the present study was to detect genomic regions under selection in Swedish Warmblood horses (SWB). In line with previous studies where comparisons between breeds were used to highlight signatures of selection [13,19], we performed a genome scan for signatures of selection in SWB horses and Exmoor ponies. The comparison with Exmoor ponies is justified as the Exmoor Pony Society primarily aims to protect the heritage of the breed by preserving its genetic diversity [20], and therefore selection for sport performance traits is not practiced [21]. A recent study showed that the Exmoor pony did not exhibit any common homozygous region with breeds intensively selected for sport performance, which supports its suitability for the purpose of this study [12]. Based on the hypothesis that performance-oriented selective breeding increased genomic homozygosity in specific regions in SWB horses, we used two different approaches: 1) analysis of ROH detected in SWB and Exmoor ponies, and 2) two population differentiation tests, F st and XP-EHH analysis, comparing SWB and Exmoor ponies.

Genotyping, quality control, and inbreeding coefficients
The total genotyping call rate was 0.99 in SWB (n = 380), and 0.97 in Exmoor ponies (n = 274). The LD measured by the squared correlation coefficient (r 2 ) showed significant mean differences (p < 0.005) between the two populations. In SWB horses, the average r 2 was 0.56 at 5 kb distance, compared to 0.60 in the Exmoor ponies. The estimated inbreeding coefficient based on loss of heterozygosity (f i ) in the 380 SWB ranged from − 0.134 to 0.098 with an average value of 0.006. The f i values in the 274 Exmoor ponies ranged from − 0.378 to 0.530, with an average of 0.170. Overall, 129 Exmoor ponies and six SWB horses were excluded from further analyses as their f i exceeded the threshold of 5% set in this study.

ROH as genomic signatures of selection
The average number of both short (< 125 kb), medium (125-500 kb) and long (> 500 kb) ROH per individual was higher in the Exmoor ponies than in the SWB horses (p < 0.0001); short ROH were five times more abundant in the Exmoor ponies than in SWB horses ( Table 1). The average number of short ROH per animal was twice as high when including all the 274 Exmoor ponies that passed the QC, compared to when only including the 145 ponies with a f i < 5% in the analysis, which supported the use of the f i threshold. Short and medium ROH were distributed equally along the genome in both SWB and Exmoor ponies. Long ROH were the rarest ones as shown in Table 1; nevertheless, such ROH covered, on average 286 Mb in SWB horses and 1182 Mb in Exmoor ponies. While long ROH in Exmoor ponies were spread across the genome, long ROH in the SWB population were found in a few chromosomes only. The same pattern was discovered when filtering long homozygous regions shared by at least 85% of SWB horses and Exmoor ponies, respectively. The 65 long shared ROH detected among SWB horses contained overlapping homozygous segments in five chromosomes: ECA4, ECA6, ECA7, ECA10 and ECA17 (Additional file 1: Figure S1a). In Exmoor ponies, 398 long shared homozygous segments were instead distributed over 24 out of 31 (77%) autosomal chromosomes (Additional file 1: Figure  S1b). The longest shared homozygous segment within ROH in SWB horses was 0.28 Mb and was located on ECA7:42,688,962-42,905,689, whereas in Exmoor ponies the longest shared homozygous region was 0.37 Mb on ECA22:46,333,459-46,708,156 (Fig. 1). The exact overlap in a ROH in ECA7 (7:49,160,767:49,212,921) coincided with a known QTL regions for body size (height at withers) [22]. The homozygous long ROH segments, shared by more than 85% of the studied Exmoor ponies, harboured 265 genes (Additional file 3: Table S1). In SWB, only 21 genes were located in the overlapping homozygous segments detected by long ROH analysis. Eighteen of them were annotated in the reference genome EquCab2 release 94 where ten had an annotated function, and eight were novel genes. The gene position, name and percentage of horses sharing an exact overlap in a ROH are listed in Table 2. All but one of the ten annotated genes located in shared ROH in SWB were included in a network based on co-expression, physical interactions or shared protein domains (Fig. 2). From the set of candidate genes five significantly overrepresented biological processes were found; actin cytoskeleton reorganization (GO:0031532), cellular macromolecule catabolic process (GO:0044265), apoptotic process (GO:0006915), glycoprotein metabolic process (GO:0009100) and synaptic signalling (GO: 0099536) (Additional file 4: Table S2).

Breed differentiation tests
The average F st for all windows between SWB and Exmoor ponies was equal to 0.11 with a standard deviation of 0.04. The distribution of F st estimates per SNP is shown in Additional file 2: Figure S2. Average F st per window for a total of 126 windows exceeded the significance threshold, corresponding to an averaged F st value ≥0.24 (Fig. 3). The top 10 windows with highest F st were located on ECA2, ECA4, ECA6, ECA7 and ECA22.
Genomic regions with an XP-EHH value above 4.34 and thus potentially under positive selection in SWB horses were detected on ECA1, ECA2, ECA4, ECA6, ECA17 and ECA26. The complete list of markers underlining potential signatures of selection is presented in Additional file 5: Table S3. Eight signatures of selection located on ECA1, ECA4, and ECA6 in SWB horses remained significant after correcting for false discovery rate (Fig. 4).  No position with an XP-EHH value lower than − 4.34 was found in Exmoor ponies, indicating a lack of recent selection. Four genomic regions were detected by both F st and XP-EHH population differentiation tests. The genes located within those regions were considered as potentially under selection in SWB horses ( Table 3).

Genomic traces of breed history
Selection in SWB horses for equestrian sport performance traits was highly intensified during the last decades.
In 1973, a field test for young horses was introduced in the SWB breeding program to allow for a more accurate selection of the breeding animals and increased genetic gain [24]. The overlapping homozygous segments shared in most of SWB horses agree with the intensified selective breeding program applied by the SWB studbook in the last 40 years. We believe this is an indication that shared homozygous segments in SWB horses can be a result of recent selection for performance, rather than inbreeding. This is supported by the estimated inbreeding coefficient (f i ) that was below 5% in all but six SWB individuals in our study and is likely to be a result of a semi-open studbook allowing inflow of genetic material from other warmblood populations. In contrast, ROH of all lengths were evenly spread across the genome of Exmoor ponies. Short ROH were as much as five times more frequent in Exmoor ponies than in SWB horses, indicating historical inbreeding in line with the Exmoor's known demography. The Exmoor pony breed is listed as "endangered" by the Rare Breeds Survival Trust [25], and thus founder effect and genetic drift were expected in the breed. In agreement with such assumption, several Exmoor ponies showed high f i , which indicates significant loss of heterozygosity. Therefore, we applied inbreeding correction to distinguish between homozygosity due to relatedness rather than selection. Our results support that long shared ROH may have different origins due to different SWB horse and Exmoor pony population histories. In SWB horses, long ROH may originate from recent intensive selection for sport traits, whereas in Exmoor ponies long ROH may be the result of past bottlenecks. However, the difficulty to clearly define the origin of reduction in genetic diversity in small populations has been pointed out as they are vulnerable to genetic drift [26,27]. Therefore, we did not further examine the results retrieved from ROH in the case of Exmoor ponies, but used the results as a comparison to the SWB.
In the current study, we performed a correction for linkage disequilibrium (LD) which has not been common practice for less dense SNP data for the ROH analysis.
The LD measured by the squared correlation coefficient (r 2 ) was significantly higher in the Exmoor ponies if compared to SWB horses, most likely because of breed history. Therefore, LD pruning was performed to reduce strong LD between markers originating from population history rather than from positive selection [28][29][30].

Signatures of selection based on ROH in SWB horses
Performance traits are known to be complex traits caused by mutations in many genes and regulatory elements, generally connected in aggregated networks. Thus, it is not surprising that several of the genes indicated to be under selection in SWB horses share pathways, co-expression, co-localisation, and interact physically or genetically. Fig. 2 GeneMANIA representation of the genes found in the shared ROH in the SWB horses. The ten annotated genes are represented as stripped grey circles. Physical interactions are displayed as red lines, co-expressions as violet lines, predicted related genes as orange lines, shared pathways as light blues lines, co-localisations as blue lines and genetic interaction as green lines. The three most related genes with the potentially under selection ones are shown as plain circle Selection for muscle strength and function is indeed a favoured trait in the performing horse, and we found several genomic regions comprising genes involved in this. An especially interesting example is the Histone deacetylase 9 (HDAC9) with a known regulatory function in neuronal electrical activity of excitable skeletal muscle cells [31]. Two of the ten annotated genes detected in long ROH share biological functions in synaptic signalling (GO:0099536); the Glutamate Ionotropic Receptor NMDA Type Subunit 2B gene (GRIN2B) and the Calcium Voltage-gated Channel Subunit 1 gene (CACNA1A). Both genes are involved in response to pain, synaptic transmission and receptor clustering. The protein encoded by GRIN2B is a NMDA receptor channel subunit critical for neuronal communication [32]. The GRIN2B gene has been pointed out as potentially important for performance in Icelandic horses and Coldblooded trotters [33,34] and as a target of selection in French Trotters and Gidran horses [12]. Out of the 21 genes found in shared long ROH of SWB horses, GRIN2B was the only one found also in shared long ROH in Exmoor ponies. We therefore believe that this gene may have been under selection prior to horse breed formation. The gene CACNA1A encodes the protein C av involved in muscle contraction, hormone and neurotransmitter release [35]. Behaviour analyses in mice suggested that C av contributes to pain perception [36]. In humans loss-of-function mutations in the CACNA1A were implicated in neurological and psychiatric diseases [37]. Our ROH analysis also detected a region comprising the Beta-1,3-glucuronyltransferase 1 (B3GAT1) which has been associated with learning abilities in domesticated rats [38].
Genes involved in neurological control and signalling pathways were likewise found in shared ROH among Hanoverian sport horses [13]. The SWB breed is genetically connected with the Hanoverian breed, as well as with other European sport horse breeds [39]. This indicates that some of our presented findings are also applicable in other warmblood breeds used for sport and that cognitive reactions and functions may be important targets of selection in Warmblood breeds. This latter result agrees with previous studies where differences in temperament between breeds have been shown [40].

XP-EHH and F st tests indicate recent selection in SWB horses
The average F st of 0.11 between the two breeds was similar to the previously estimated average F st of 0.10 between 37 horse populations across the globe [29], indicating that a comparison between SWB and Exmoor ponies is reasonable. In this study, we investigated signatures of selection for performance traits in SWB horses by comparing them with a breed not selected for sport performances. We could confirm that even the worstperforming SWB horse was genetically superior in equestrian sports to the best Exmoor pony, thus excluding any potential confounding effects in our study. Although the comparison with an ancestral breed of SWB horses would have been desirable, none of the remaining native Swedish breeds can be claimed as the ancestor of SWB horses. Further comparison with other warmblood horse populations could also verify the validity of this study. The XP-EHH analysis support our hypothesis that recent selection took place in SWB horses, but not in Exmoor ponies, as significant regions were only found in the former. Since the Exmoor pony genome presents widespread homozygous regions, we probably failed to identify all regions under selection in the SWB by comparing the two breeds. However, we believe that this comparison has considerably reduced the risk of finding false positives. Concordance across statistics is generally used to support putative sweeps, this is the reason why we only further analysed regions detected by both population differentiation tests [17,40,41]. Selection for eight genes involved in cellular response to stimulus (GO: 0051716) were found from the composite results of the F st and XP-EHH tests. As an example, the Adenylate Cyclase 1 (ADCY1) gene encodes adenylyl cyclase (AC) primarily expressed in the brain, influences neuroplasticity, as long-term potentiation, depression and memory formation [42]. In mice, ADCY1 and ADCY8 play an important role in the formation and maintenance of fear memory, dopaminergic responses, and behavioural sensitisation [43,44]. Additionally, we detected the Guanylate Cyclase 2C (GUCY2C) which is associated with human attention deficiency and hyperactive behaviour [45], along with the Rho GDP Dissociation Inhibitor Beta (ARHGDIB), and RAS Like Estrogen Regulated Growth Inhibitor (RERG) genes [46,47]. These four genes (ADCY1, GUCY2C, ARHGDIB, and RERG) are all involved in G-protein coupled receptor, and GTPase mediated signal transduction, common in the central nervous system, suggesting an important role in learning and reactivity in the performing horse.
Two other genes, represented in the cellular response to stimulus term, shared biological function related to signal transduction (GO:0007165) and regulation of cell communication (GO:0010646): The Insulin like growth factor binding protein 1 (IGFBP1) and the Insulin like growth factor binding protein 3 (IGFBP3). These were also detected in studies of selection signatures in German Warmblood breeds [11] and in French Trotters [12]. Both are members of the insulin-like growth factor binding protein (IGFBP) family that binds IGF-I and -II and regulates somatic growth with an important function in muscle growth distribution. The importance of growth traits was further supported by two detected regions overlapping with known QTLs related to conformation and morphology traits (body size and cannon bone circumference) [22,23].
In agreements with other studies, the region on ECA4: 19,913,037-20,413,037, in our study, seems to be under selection in warmblood horses [11,40]. This region contains several genes, for example Spermatogenesis associated 48 (SPATA48) and Zona pellucida binding protein (ZPBP), which play a role in equine fertility [11,40]. Also, the most significant SNP in this region in our XP-EHH test was located within the ZPBP gene (Fig. 5). SPATA48 also promotes osteoblast differentiation, which might have an important function in sport horses [48,49]. In our study the overrepresentation test showed as significant gene the Fidgetin Like 1 (FIGNL1) which has likewise a known function in osteoblast differentiation [50]. This gene was also found in the top ten enriched pathways from the iHS test in four German warmblood horse breeds [11]. In two other regions on ECA 4 and ECA7, our ROH analysis detected the Thrombospondin, type I (THSD7A) and the Deoxyribonuclease 2, lysosomal (DNASE2), respectively, which are both involved in bone metabolism [51,52]. These findings may indicate a link between fertility and bone metabolism.
To further validate our findings, sport performing horses should be re-sequenced to find candidate causative mutations, and functional studies are needed to confirm biological effects of the mutations.

Conclusions
We conclude that genes associated with behavioural, physical abilities, and fertility appear to be targets of selection in the SWB horse breed. Our analysis of SWB horses reveal putative signatures of selection in genomic  regions containing genes primarily involved in nervous system functionality, as well as muscle contraction and development. As expected, due to the complex nature of sport horse performance, many of the genes under selection in SWB interact with each other in complex biological networks. In line with this, our study reveals that selection for sport performance has likely occurred in numerous genomic regions. Our work unveils the effects of selection in sport horses primarily used for riding and highlights how selection has shaped the genome of Swedish Warmblood horses, a representative breed for modern sport horses.

Sample collection
The study included 380 Swedish Warmblood horses born in 2010-2011. The horses (182 males and 198 females) were assessed at young horse evaluation tests at the age of three [53] and descended from 145 sires with 1-11 offspring each, and 372 mares with 1-2 offspring each. The selected horses were either: 1) horses with high scores for show jumping but lower ones for gaits (n = 48), 2) horses with high scores for gaits but lower ones for show jumping (n = 48), 3) horses with high scores for both show jumping and gaits (n = 143), and 4) horses with low scores for both show jumping and gaits (n = 141). Genotype information (670 k SNP array data) of 280 Exmoor ponies was retrieved for comparison from a previous publication, which includes details about the ponies' selection procedure for genotyping [21].

DNA isolation
DNA was prepared from 20 hair roots, cut into 5% Chelex 100 Resin (Bio-Rad Laboratories, Hercules, CA, US), and 1.4 mg/ml Proteinase K (Merck KgaA, Darmstadt, Germany) in a total volume of 200 μl. The samples were incubated at 56°C, 1500 rpm for 2 h, followed by heat inactivation of Proteinase K at 96°C for 10 min. DNA concentration was normalised, and the DNA was re-suspended in lowTE (1 mM Tris, 0.1 mM EDTA) at a concentration of 10 ng/μl.

Genotyping, quality control and inbreeding coefficients
All samples were genotyped using the 670 K Affymetrix® Axiom® Equine Genotyping Array (Thermo Fisher Scientific, Santa Clara, CA, USA) [54]. Individual inbreeding coefficients (f i ) were estimated from loss of heterozygosity using the PLINK -het command and horses with an f i higher than 0.05 were excluded for further analyses. Quality Control (QC) was performed separately for each breed on the 31 autosomal chromosomes. The exclusion of poorly genotyped and faulty data was performed using PLINK v1.90 [55] based on the following criteria: minor allele frequency (MAF) (< 0.01), missing genotype per single SNP (GENO) (> 0.10), missing genotype per individual (> 0.10) and Hardy-Weinberg equilibrium (HWE) (p < 0.0001). A linkage disequilibrium pruning was applied for the ROH analyses. SNPs in linkage disequilibrium (LD) were excluded if the LD between each pair of SNPs was greater than 0.5 (r 2 > 0.5) in a window size of 50 SNPs moving 5 SNPs per window. In total 249,395 SNPs were used for ROH analysis.

Definition of ROH
ROH were detected in SWB and Exmoor separately, using a sliding-windows approach through the homozyg command in PLINK v1.90. As different signatures of selection are expected when using various ROH definitions, we classified ROH as short (< 125 kb), medium (125 kb to 500 kb) and long (> 500 kb). The three-length classes were defined by the following criteria: minimum number of SNPs, minimum SNP density (SNP/kb), maximum gap between two SNPs (bp) and the minimum length to define a ROH (kb). The sliding window was defined by using the options homozyg-window-snp, homozyg-window-missing and homozyg-window-het in PLINK v 1.90 (Table 4). The last two options were only applied when detecting long stretches of homozygosity, thus allowing for one heterozygous and one missing SNP [56]. The differences in the number of ROH in each ROH length-class between the two breeds were tested by a oneway analysis of variance in R (v3.4.0) [57]. Proportional differences of ROH between breeds and uniformity of ROH over each respective chromosome were tested with the Chi square test (χ 2 ) for proportions and goodness of fit in R.

ROH as genomic signatures of selection within breed
A custom-made script in R was used to filter homozygous regions within long ROH shared by more than 85% of the studied individuals within breed. The use of 85% as threshold was chosen to identify regions containing fundamental loci for breed-type shared by most SWB horses, regardless of which equestrian sport discipline they were bred for, and thus enable detection of important loci for general sport performance. The EqCab2 genomic coordinates of these regions were used to retrieve candidate gene lists and annotations from the Biomart web interface in Ensembl release 94 [58]. Genetic co-expression, physical interactions or shared protein domains, were visualised by GeneMANIA in Cytoscape with human genes as reference [59]. Additionally, genes were compared with QTL regions previously identified and present in the Horse QTL database [60]. Statistical overrepresentation test of biological processes (GO terms) of candidate genes was conducted using PANTHER 14.0 (http://pantherdb. org/) [61]. The level of significance for the overrepresented biological processes was set as p < 0.05.

Population differentiation tests
The genetic differentiation between SWB and Exmoor ponies was verified by the fixation index (F st ) as defined by Nei (1987) [62]. To identify highly differentiated regions, we divided the genome into non-overlapping 500 kb windows. A F st value was calculated for each SNP and the values were then averaged over the SNPs located in each window. Windows located at the extreme 1% of the empirical distribution of F st values were considered as candidate regions [63].
For analysis of population differentiation at haplotype level, haplotypes were phased using Shape-it software [64] and filtered using REHH Package in R [65], resulting in 503,829 SNPs. We then used XP-EHH statistics to identify regions displaying significantly higher, or lower, extended haplotype homozygosity in one population compared to the other [14]. A position was considered under selection if the XP-EHH in two population-pairwise comparisons was above 4.34 or below − 4.34 as suggested by [14]. To minimise the number of false positives and to account for large variation, selection signatures were averaged over windows. Only regions containing SNPs falling into the upper 99th percentile, defined as 500 kb windows, and containing at least three SNPs were considered as putative signatures of selection. A haplotype bifurcation diagram around a core marker was used to visualise signatures of selection indicated from the XP-EHH.
Furthermore, genes located in genomic regions containing significant SNPs based on both F st and XP-EHH analyses were retrieved using Biomart, as previously described, and potential overlaps with QTLs, present in the Horse QTL database [66], were considered. Statistical overrepresentation test of biological processes (GO terms) of the candidate genes was conducted using PANTHER 14.0 as described above. SM, SE and ÅV designed the study. MA, SM, SE, and ÅV carried out the experiments and data analysis and MA, SM, SE and ÅV drafted the manuscript. GL contributed with additional reference data and helped to finalise the manuscript. All authors read and approved the final manuscript.

Availability of data and materials
The datasets in the current study was generated and analysed in collaboration with the Swedish Warmblood Association and has a