Tests for genome-wide selection signatures were performed using the CSS approach in regional subgroups of Chinese Mongolian and Asian horse breeds that have experienced long-term natural and artificial selection as discrete population units. This contrasts with previous studies that have identified selection signals for traits in domestic horse breeds that have experienced intense selection pressure applied over a relatively recent evolutionary time period [7, 8].
The major genetic determinants of the ability to perform an alternative form of gait (i.e. pace) and various coat colour traits in the horse are well understood and therefore provide phenotypes to demonstrate the utility of the CSS test in these populations. Here, the detection of a known genomic selection signal for gaitedness in the Wushen population suggests that this method can be used to identify other genomic regions containing genes for traits under selection within the studied populations.
The genomic region containing the ASIP gene on ECA22 was not immediately evident as a selection signature in Abaga Black, which has been bred for black coat colour (Fig. S1 and Table S2). However, when we included the top 1% SNPs, evidence for variation at the ECA22 locus containing ASIP was identified (Fig. S2, Table S3). The weaker than expected signal may be explained by the breeding history of this population. Although it is recorded that local herdsmen have favoured black horses since the thirteenth century and it is common to see black horses in this geographic area, the Abaga Black has only been officially recognised as a distinct breed since 2006, therefore the selection history may not be long enough to be captured by the initial stringent threshold criteria. Also, it is possible that there is a novel gene in this population contributing to the coat colour phenotype observed in this population that may be different from the common black coat colour observed in modern horse breeds . Indeed, a number of rare coat colours have been identified in Mongolian horse populations that are not found in the more common modern European/American breeds . Since several genomic regions on other chromosomes were identified in the Abaga Black, these genomic regions may harbour genes and/or genomic regulatory elements influencing the coat colour phenotype and other uncharacterised traits in this population.
Selection signals for ‘Iron Hoof’
The equine hoof is a highly complex structure of the integument, which is produced by a modified epidermal layer covering the tip of the distal phalanx. The heritabilities of hoof shape and hoof wall quality phenotypes have previously been estimated as 0.12 and 0.10, respectively  indicating a modest genetic contribution to variation in hoof characteristics in the horse. Furthermore, a recent study that investigated the environmental effect on the morphometrics of the hooves of 100 feral Australian horses from five populations has suggested that a hard ground surface environment has an effect on the hooves, such as thickening of the hoof wall .
For the detection of selection signals in ‘Iron Hoof’, the CSPG4 gene on ECA1 has been observed to be expressed in connective tissues, basement membranes and developing blood vessels . Both CSPG4 and PEAK1 are expressed in equine hoof lamellar tissue  and involved in MMP-2 activity [21, 22], a member of the matrix metalloproteinases (MMP) family that is present in normal hoof wall lamellae . The lamellar basement membrane changes may be the first step in lamellar failure occurring prior to detection with conventional methods . Considering their function in MMP activity and expression in equine lamellar tissue, we hypothesise that genetic variation in CSPG4 and PEAK1 may influence MMP activity in the epidermal lamellae for maintenance of lamellar health. CSPG4 could represent a strong candidate as it is also involved in dermal thickness, skin vascularization, vessel development  and keratinization .
There are several other candidate genes contained within the selection signals with functions that may be relevant to hoof biology. The selected region on ECA7 flanked the EXPH5 gene, which is related to inherited skin fragility . Six candidate genes on ECA3, WWP2, PSMD7, NQO1, NOB1, NFAT5 and HAS3, are expressed in equine lamellar tissue and are likely to be involved in the biology of the hoof (Table 1). For example, in humans, a SNP (rs62051384) located within WWP2 is associated with plantar fibromatosis, a rare fibrous hyperproliferation of the deep connective tissue of the foot . A polymorphism (rs17336700) in the PSMD7 gene is associated with ankylosing spondylitis in Chinese subjects . The NQO1 is a representative target of the nuclear factor, erythroid 2 like 2 (NFE2L2) transcription factor, which causes suppression of the macrophage inflammatory response ; inflammatory damage to the lamellae is one of the biological processes known to cause laminitis . The protein product of NFAT5 regulates synovial proliferation and angiogenesis in chronic arthritis ; the NOB1-encoded protein is likely to be a potential target for the treatment of osteosarcoma ; and the HAS3 product is related to hyaluronan metabolism in human keratinocytes and atopic dermatitis skin .
Finally, several genes (SEMA7A , CSK , GUCY1A2 , EMCN , OPRM1 ) are associated with rheumatoid arthritis (RA). RA is a complex genetic disease that is relatively common and affects the joints with occasional skin manifestations . However, it is unclear how this may relate directly to hoof strength, therefore further investigation of the association between RA-associated candidate genes and hoof strength may provide new insights to understand equine hoof physiology and possibly shed further light on RA in humans. Also, it may be possible that the Iron Hoof horses have been selected for traits other than the principle hard hooves phenotype.
Selection signals relating to geographic origin
No genomic regions were common across the Asian versus European and Asian versus American/Middle Eastern population group comparisons. Only one selection signal was identified for the Asian versus American/Middle Eastern comparison. This selection signal contained the HMGA2 gene, which has been reported to be associated with height variation and metabolic traits in ponies [36,37,38]. Generally, Asian horses tend to have smaller stature than western breeds. However, in the absence of additional phenotypic information for the horses used in this study, it is not possible to understand which trait the selection signal may be underpinning.
For the Asian versus European comparison, the selection peak for the highest ranked selection signal on ECA3 (35,533,253 bp) was closest to two genes, ZNF469 (3:35654813–35,666,467 bp) and ZFPM1 (35,728,941-35,748,261 bp) (Fig. S4). To our knowledge, there are no known associations for ZNF469 among horse breeds. However, it is noteworthy that this selection signal overlaps with one of the two selection peaks identified among the domestic horse clade, DOM2. This clade represents the earliest domesticates that became geographically widespread following 4200 YBP . The previously described selection peak at ZFPM1 distinguished DOM2 from non-DOM2 archaeological specimens representing an early domestic group that did not contribute to modern horse populations. This selection signature has been hypothesised to reflect the emergence of docility and stress traits, which may reflect cognitive changes accompanying horse husbandry practices among intensively managed populations.
Two possible explanations may be proposed for the concordance of the ZFPM1 selection peaks. First, extant Asian and European populations differ considerably in their mode of husbandry, with Mongolian horses being free-ranging, mimicking natural herd structures with little human interaction, and therefore they may have quite different cognitive requirements . However, examination of genotype frequencies of the top SNP (chr3.35533253, rs68458737) among 806 horses from a range of global breeds (Table S9 and Fig. S5), suggests that the selection signal may be driven by native British Isles horse breeds rather than a behavioural phenotype distinctive of the Asian horses. The highest frequency of the GG genotype was observed in Shire (0.82), Clydesdale (0.71), Fell Pony (0.48) and Exmoor (0.33) and was absent or at low frequency (< 0.05) in six of the seven Asian breeds (Baicha Iron Hoof, Wuzhumuqin, Sanhe, Mongolian, Tuva and Wushen). Notably, Clydesdale and Shire are among the largest horse breeds , and the ZC3H18 gene, located within the selection peak, is associated with body weight in other species . However, an association of the G allele with body mass traits is not supported by the observation that Exmoor and Fell Pony breeds are small breeds and because the alternative homozygous genotype (AA) is fixed in the Belgian Draught sample.
We recently suggested that Mongolian horse populations may preserve ancient genomic variants that do not exist in other global horse breeds  as their paternal lineages can be dated to the early domesticates, at least 1400 YBP . Therefore, a second possible explanation is that DOM2 horses did not replace all locally domesticated horse populations and Chinese Mongolian (and Mongolian and Tuva) horses may be the result of an entirely separate domestication event not linked to the Volga-Don origins. However, this is not supported by modelling of ancestral population structure in the Chinese Mongolian horses . Consequently, based on these observations, a distinct contribution from native British Isles horses to modern horse genetic variation becomes plausible. Indeed, there is historical, morphological, and genetic evidence that has led to the assertion that wild horses survived in Great Britain and the Exmoor pony, for example, may be an ancient relic of pre-domestic horses . Therefore, accurate reconstruction of the history of horse domestication using paleogenomes requires integration of sufficient genome-wide data from geographically representative modern horse populations, including older landrace or heritage breeds.
To further study regional adaptations in Chinese Mongolian horse populations, future research should focus on genome sequencing data for appropriate regional populations, rather than genotype data arising from SNP genotyping arrays. There can be considerable ascertainment bias in equine genomic studies that focus on the more common European and European-derived breeds. This is because the EquineSNP50 Genotyping BeadChip was designed predominately using information from polymorphic SNPs observed in seven European breeds , which has resulted in a genotyping array with low representation of rare variants within non-European breeds such as Asian horses. This ascertainment bias may therefore be responsible for the lower number of selection signatures detected using the 36 K SNP data set compared to the 511 K SNP data set. In addition, one of the component tests of CSS, the XP-EHH test, is haplotype-based and likely compromised by lower SNP densities .