A selective sweep of >8 Mb on chromosome 26 in the Boxer genome

Background Modern dog breeds display traits that are either breed-specific or shared by a few breeds as a result of genetic bottlenecks during the breed creation process and artificial selection for breed standards. Selective sweeps in the genome result from strong selection and can be detected as a reduction or elimination of polymorphism in a given region of the genome. Results Extended regions of homozygosity, indicative of selective sweeps, were identified in a genome-wide scan dataset of 25 Boxers from the United Kingdom genotyped at ~20,000 single-nucleotide polymorphisms (SNPs). These regions were further examined in a second dataset of Boxers collected from a different geographical location and genotyped using higher density SNP arrays (~170,000 SNPs). A selective sweep previously associated with canine brachycephaly was detected on chromosome 1. A novel selective sweep of over 8 Mb was observed on chromosome 26 in Boxer and for a shorter region in English and French bulldogs. It was absent in 171 samples from eight other dog breeds and 7 Iberian wolf samples. A region of extended increased heterozygosity on chromosome 9 overlapped with a previously reported copy number variant (CNV) which was polymorphic in multiple dog breeds. Conclusion A selective sweep of more than 8 Mb on chromosome 26 was identified in the Boxer genome. This sweep is likely caused by strong artificial selection for a trait of interest and could have inadvertently led to undesired health implications for this breed. Furthermore, we provide supporting evidence for two previously described regions: a selective sweep on chromosome 1 associated with canine brachycephaly and a CNV on chromosome 9 polymorphic in multiple dog breeds.


Background
It has been proposed that the majority of modern dog breeds recognised today have resulted from two population bottlenecks in dog evolution [1,2]. During the first genetic bottleneck, pre-domestic breeds diverged from wolves some 15,000 years ago, probably through multiple domestication events. The second bottleneck for most breeds occurred within the last few hundred years, when the breed creation process resulted in the loss of genetic variation due to strong bottleneck events which occurred in parallel with strong artificial selection for behavioural and physical characteristics favoured by humans.
The same bottlenecks and artificial selection forces that generated these breed-specific features have, in some instances, provoked undesired health effects. Random fixation of detrimental variants can occur during bottlenecks. Similarly, risk alleles may be in linkage disequilibrium with selected phenotypic variants or these may have pleiotropic effects [3,4].
Several studies have previously aimed to identify genomic regions involved in defined traits and their relationship with disease using association mapping (reviewed in Karlsson and Linblad-Toh [2]). However, phenotypic traits that have been driven to fixation by genetic drift or artificial selection within a dog breed cannot be mapped within that breed with this approach. An alternative in these cases is selection mapping, in which selective sweeps (a reduction or elimination of genetic polymorphism in a region owing to strong selection) are searched [2,[5][6][7][8]. The aim of this work was to identify selective sweeps in the Boxer genome resulting from the breed creation process using high density genome-wide SNP data. These regions are likely to govern phenotypic traits of interest and may be linked to overrepresentation of certain genetic disorders in this breed.

Detection and replication in Boxer
Regions of homozygosity (ROHs) in the Canis familiaris chromosomes (CFA) were identified in 25 Boxers from the United Kingdom (UK) which had been genotyped on microarrays for~20,000 SNPs (set A, Table 1). Eight ROHs meeting the criteria detailed in Methods were detected (Table 2), representing 22 Mb (~0.9%) of the dog genome. Three of these ROHs, two on CFA 1 and one on CFA 26, showed a remarkable extended low heterozygosity ( Figure 1a, Table 2). To confirm these ROHs, a second dataset (set B, Table 1) was generated using a higher density SNP array (~170,000 SNPs). This related to Boxers collected from different geographical locations to set A. In set B, 27 ROHs were found, which spanned 40.8 Mb (~1.7%) of the dog genome. Three regions on CFA X (Figure 1b) were discarded as these were not present when only female samples were analyzed (data not shown). Five ROHs were shared in both sets (Table 2). In general, these were notably shorter and/or split into two separated shorter regions when a higher number of samples and SNPs were genotyped (Additional file 1: Figure S1a-d, f-i). Conversely, the first 8 Mb of CFA 26 showed almost total loss of heterozygosity (average observed marker heterozygosity < 0.02) in both sets (Additional file 1: Figure S1e, j). There was a single SNP (BICF2G630807104, CFA 26:4,222,068 bp) with MAF = 0.5 within the region of extended homozygosity on CFA 26 (Additional file 1: Figure S1j), closer examination of which showed that heterozygous genotypes had been called for all Boxer samples. Possible explanations might be wrong genotype call from intensity data or a structural variation affecting that single SNP. To avoid the concern about SNPs significantly deviating from Hardy-Weinberg Equilibrium (HWE) affecting the identification of ROHs the analysis was repeated in set B after the removal of SNPs with HWE test p-value < 0.005, which resulted in similar results (Additional file 2 and Additional file 3). For the subsequent analyses we focused on the ROHs on CFA 1:58,710,420-61,801,815 bp and CFA 26:3,008,718-11,914,284 bp because these had markedly larger size and lower levels of variation than other regions common in both sets ( Figure 1, Table 2). Finally, we found a region of increased heterozygosity on CFA 9:19,826,590-21,137,140 bp (Figure 1b), closer examination of which revealed a region of approximately 1.5 Mb showing a pattern of alternate heterozygous and homozygous genotypes indicating a CNV ( Figure 2).

Presence in other breeds
Since reduction of genetic polymorphism in a region can result from strong selection and brachycephaly is a breeddefining trait in the Boxer, we evaluated the presence of the ROHs on CFA 1 and 26 in non-brachycephalic and brachycephalic breeds. Brachycephaly is characterized by severe shortening of the muzzle, and therefore the underlying bones, and a more modest shortening and widening of the skull [9]. For both selective sweeps on CFAs 1 and 26, normal levels of heterozygosity were observed in nonbrachycephalic dog breeds and the Iberian wolf ( Figure 3), based on a first dataset containing 118 samples from 6 different dog breeds and 7 Iberian wolf samples genotyped using the same panel of SNPs as in set A and on a second dataset containing 43 samples from German shepherd dog genotyped using the same panel of SNPs as in set B ( Table 1). The selective sweep on CFA 1 was present and showed allelic match with the Boxer (data not shown) in other brachycephalic breeds such as English bulldog, Pug and French bulldog although in the latter the reduction in heterozygosity was not as extended as in the other two breeds and seemed to be located slightly upstream in the chromosome ( Although the ROH was not apparent in the Pug in Figure 3, in the segment of the ROH shared between   Within the SNPs making up the Illumina's Cani-neSNP20 only two covered the CNV on CFA 9 (Additional file 5), both of them highly monomorphic with the exception of the SNP at position 20,274,406 bp for the Shar pei samples for which an excess of heterozygous genotypes (HWE test p-value < 0.001) were observed. A pattern of excessive heterozygous genotypes was observed for the region corresponding to the CNV on CFA 9 in the breeds genotyped with the Illumina's CanineHD Beadchip (Additional file 6).

Genetic content and functional annotation analysis
The region of decreased heterozygosity which was observed on CFA 1 in our study overlapped with a region previously associated with canine brachycephaly [7]. This was detected using dogs from brachycephalic breeds and non-brachycephalic breeds to perform across-breed association and selection mapping. Both strategies identified a region on CFA 1 at 59 Mb. The decrease in the averaged observed heterozygosity of brachycephalic dogs relative to non-brachycephalic dogs is indicative of a selective sweep at this position. Genes which have been associated with brachycephaly on CFA   ENSCAFG00000017855] which is expressed primarily during embryogenesis and in adult bone tissue [11]. The ROH in Boxer on CFA 26:3,008,718-11,914,284 bp contained 135 annotated elements of which 95 (71.9%) were genes with an associated name; this level was similar to that observed in the whole dog genome (69.7%).
The ROH from CFA 26 mapped to two adjacent syntenic regions on Homo sapiens chromosome (Hs) 12 ( Figure  4a). Advantage was taken of the fact that the region in the human genome syntenic to the region of interest on CFA 26 was better annotated and could be used to perform functional annotation analysis through the use of Ingenuity Pathways Analysis software [12]. One hundred and six dog to human orthologs annotated elements were  used as input, resulting in a list of biological processes and disease categories that would be enriched given the genes in the region of interest. Amongst functional categories related to biological processes, skeletal and muscular system development and function as well as tissue morphology were the two most significantly associated categories (Additional file 7). Also, the selective sweep on CFA 26 contained genes that are linked to inherited diseases overrepresented in the Boxer [4] (  (Figure 4a), which comprised six genes significant in the functional annotation analysis (  [17]. VWM is a neurological disorder manifesting progressive cerebellar ataxia, spasticity, inconstant optic atrophy and relatively preserved mental abilities. SETD8 [Ensembl:ENSG00000183955] encodes a lysine methyltransferase that regulates tumor suppressor p53 protein [18]. To note, the region of~55 Kb of nearly complete homozygosity in the Pug is both upstream of these six genes and within the region of extended homozygosity shared amongst the three breeds. Interestingly, (i) the genes involved in skeletal and muscular system development and tissue morphology that were significant in the functional annotation analysis in the Boxer only (with the exception of POLE [Ensembl:ENSCAFG00000006215]) and (ii) the region shared in Boxer and English and French bulldogs, are both located within the region on CFA 26 that showed the greatest decay in the averaged observed heterozygosity (Figure 4b).
The region of alternate heterozygous/homozygous genotypes patterns observed for CFA 9 in the Boxers overlapped perfectly to a CNV previously described as being polymorphic in a number of dog breeds [19,20]. This 1.5-Mb CNV region contained three protein coding genes, one of which had two reported transcript variants, and four non coding RNA genes (Additional file 8). Analysis of Gene Ontology Biological Process (GO BP) terms revealed that the two transcript variants of the protein coding gene VPS13D [Ensembl:ENSCAFG00000016397] (CFA 9:21,079,541-21,164,823 bp) were associated with processes of protein localization and viral envelope fusion with host membrane (GO:0008104 and GO:0019064, respectively). This gene was also mapped to the Hs 1:12,290,124-12,572,099 bp but only the protein localization term was associated (GO:0008104). The remaining annotated elements had neither GO BP terms associated nor homology in H. sapiens.

Discussion
The substitution of a strongly selected mutation produces a selective sweep on the frequency of neutral alleles at linked loci characterised by a reduction of the local genetic variation [21][22][23]. Two selective sweeps detected on CFAs 1 and 26 in the Boxer genome were replicated in a larger sample size of the same breed obtained from a different geographical location and genotyped for a panel of SNPs of higher density. Assessing both the presence of these regions in other breeds and their genetic content can provide information on how they affect the phenotype and relate to the ancestral origin of breeds.
In our study, the selective sweep previously associated with brachycephaly on CFA 1 [7] was replicated in a larger sample of Boxers and in samples from other brachycephalic breeds. Moreover, samples in this study were from a different geographic area compared to the previous work [7] (Europe and US, respectively), suggesting the selective sweep is shared in the two populations within each breed.
The selective sweep on CFA 26 indicates strong artificial selection of a trait of interest in the Boxer, although the phenotypic trait resulting from this particular selective sweep is unknown. The sweep was not present in the Iberian wolf, one ancient breed (Shar pei), Labrador retrievers, German shepherd dogs or four hound breeds. On the other hand, it was present, although in shorter length, in English and French bulldogs, breeds that share with the Boxer the brachycephalic trait and a related breed creation process. Altogether, these results suggest that the selection of the sweep predated the formation of Boxer and both bulldog breeds. It is known that the English bulldog contributed to the breed creation of both Boxer and French bulldog breeds [24]. The Boxer is believed to have originated from a long-existing and now extinct German breed, the Bullenbeisser, which was crossed with a small number of English bulldog exemplars exported from the UK. Likewise, the French bulldog originated from toy varieties of English bulldog that were more popular in France. Moreover, it is interesting that the region of the selective sweep common in the three breeds coincides with the lowest reduction in the heterozygosity along the sequence (Figure 4). In selective sweeps, the reduction of genetic variation is lowest at the site of directional selection and not as great at distant sites due to recombination, although asymmetry in the valleys of reduced heterozygosity may provide imprecise information about the location of the sweep [25]. Based on our data, it is hard to assess whether the selective sweep on CFA 26 shared in Boxer and English and French bulldogs was also present in the Pug, also a brachycephalic breed, because only a short segment of reduced polymorphism within the sweep was observed in this breed (~55 Kb). Nonetheless, the history of the Pug differs from that of these three breeds mentioned before. The Pug dates to the ancient China and it is suggested that interbreeding with Pekingese, Japanese chin and possibly Shih tzu contributed to the breed creation process. Pugs were imported to Europe through Holland around 1,600s [24].
A possible scenario is that the standing neutral variation on CFA 26 present in the original English bulldog was passed to both Boxer and French bulldog during the breed creation process. Some variants would have been beneficial thereafter when selection of brachycephaly started, which is reasonable to think that happened during the breeds creation process since brachycephaly is a breed standard in these three types of dogs. Thus, strong selection of variants close to the position 8-10 Mb on CFA 26 contributing to brachycephaly might have swept nearby genetic variation. Variable selective sweep length in the three breeds would response to different breed histories as it depends on the strength of selection, the amount of recombination and the population size [21,22,25]. Therefore, if one assumes the recombination rate to be similar across breeds for a given chromosome region, different across-breeds strength of selection and population sizes might have probably caused the variable length in the sweeps on CFA 26, which in the Boxer is more than ten times larger than in the French bulldog.
Altogether we suggest that CFA 26 may contain a footprint of selection for brachycephaly, especially in the Boxer. A brachycephalic head with a distinctive broad and blunt muzzle is a unique phenotype of the Boxer and particular attention is given to this trait by the Boxer breeding community. Although brachycephaly has been mapped to CFA 1 and the greatest association was greater than 100 times more significant than the second highest, Bannasch et al [7] suggested that the complex nature of the brachycephalic head phenotype may be the result of associations across multiple chromosomes. Verification as to whether the genome-wide significant markers on CFA 26, the second highest association, previously reported [7] are within the ROH on CFA 26 in our data would provide some support for a link between this region and selection for brachycephaly. Genes on CFA 1 which have been associated with brachycephaly are involved in skeletal development [7,10,11]. Similarly, genes significant in functional annotation analysis of our data were associated with skeletal and muscular system development and function as well as tissue morphology biological process (Table 3).
It is possible that the selection for certain breed-specific loci or locus might be in linkage disequilibrium with detrimental variants at other genes. Interestingly, some of the genes in the selective sweep region on CFA 26 could be related to diseases that are reported to be more common in the Boxer breed, particularly cancer (lymphoblastic lymphoma) and cardiovascular disorders (CMD) [4].
In addition, we could observe in our data a previously reported CNV on CFA 9 polymorphic in multiple dog breeds [19,20], providing evidence for within-breed variation in the number of segment copies. Our data suggest that the CNV on CFA 9 is present and variable in the 273 Boxers used in this study (Figure 1b, 2), as well as in other breeds such as German shepherd dog, Pug and English and French bulldogs (Additional file 6). We suggest this CNV may be also possibly present in the Shar pei (Additional file 5) although in this breed it should be confirmed with a panel of higher SNP density. It might be that variable numbers of copies of a gene contained within the CNV such as VPS13D [Ensembl:ENSCAFG00000016397], which is involved in entrance of virus into the host cell, might be functional in the susceptibility to viral infection. Likewise, the non coding RNAs (ncRNAs) and small nuclear RNAs (snRNAs) which precede a region relatively rich in genes (data not shown) could be functional in regulatory processes.

Conclusion
We have identified a selective sweep in excess of 8 Mb on CFA 26 in the Boxer which is not present in Iberian wolves or non-brachycephalic dog breeds. This region is a candidate for strong artificial selection in the Boxer for a trait of interest, possibly brachycephaly, and the inadvertent selection of genes during the enrichment for a certain phenotype may have given rise to an increased incidence of certain related afflictions in the breed. The fact that the selective sweep is also present in English and French bulldogs provides genetic evidence of a shared history of the three breeds.
Furthermore, we provide supporting evidence for two previously described regions: a selective sweep on CFA 1 associated with canine brachycephaly and a CNV on CFA 9 which is polymorphic in multiple dog breeds and contains genetic elements with potential biological implications.

Sample collection
A set of 27 Boxer samples from the UK (denoted as set A) were collected as residual samples from dogs taken for clinical investigation. They were selected from a large archive of DNA samples (UK Companion Animal DNA Archive, University of Manchester) and all samples had informed owner consent. A second set of 274 Boxer samples were collected from Spain, Greece, Italy and Portugal (denoted as set B); samples from Spain represented > 90% of set B. Dogs in this set and the remaining breeds in Table 1

Data cleaning
Data cleaning was conducted using PLINK and R packages [26,27]. Set A was filtered to have individual and marker call rates > 90%, resulting in 25 Boxers and 22,300 SNPs left for analysis. The same filters were applied to set B and, moreover, in this set we also excluded intensity probes, markers on the boundary autosomal region on chromosome CFA X as well as those SNPs on the non-pseudoautosomal region on CFA X for which heterozygous genotypes in male samples were observed. All samples in set B had an individual call rate > 90% but one sample was excluded as it appeared as an outlier when the first two dimensions of the multidimensional scaling analysis were plotted (Additional file 9). This resulted in 273 individuals with 171,772 SNPs each left for analysis.

Statistical analysis
Averaged observed heterozygosity was calculated as the moving average of the observed heterozygosity using 50-SNPs windows both for set A (20,451 windows) and set B (169,812 windows). In each set the 1% of windows with the lowest averaged observed heterozygosity was selected (Additional file 2); windows spaced less than fifty times the mean SNP density (bp/SNP) of the beadchip used were considered as single regions of homozygosity. ROHs common in both sets were defined as those overlapping in at least one SNP. The analysis was also performed on the dataset with the SNPs with Hardy-Weinberg Equilibrium test p-value > 0.005 and the identified ROHs presented in Table 2 correspond to this second analysis.

Genetic content and functional annotation analysis
The position of the CNV detected on CFA 9 was defined as the union resulting from our data and the positions annotated in the Ensembl database [28] in two previous works describing this CNV [19,20]. This resulted in a region at CFA 9:19,778,695-21,332,928 bp that was searched for Gene Ontology biological process (GO BP) terms using Biomart [29] and regions of synteny with H. sapiens. For the ROH on CFA 26 Ensembl IDs of the annotated elements in the syntenic region at Hs 12:108,311,620-133,784,108 bp were retrieved using Biomart [29] and used as input for functional annotation analysis. The annotated genes in Hs 12:108,311,620-133,784,108 bp were tested for enrichment of certain biological functions or diseases by comparison with the annotations from the Ingenuity database for mouse, rat and human genomes [12]. Right-tailed Fisher's exact test was used to calculate a p-value determining the probability that each biological function and/or disease assigned to that data set was due to chance alone. The categories of diseases associated with the region of interest were compared with the reported inherited diseases in the Boxer breed [4]. collected the samples. JQO and VMD performed DNA extraction and genotyping. AS provided technical support for the data analysis. JQO performed, data cleaning, statistical and genetic content analysis and wrote the manuscript. AS, LA, OF, VMD and WO edited the manuscript. All authors read and approved the final manuscript.