Genomic divergence of zebu and taurine cattle identified through high-density SNP genotyping

Background Natural selection has molded evolution across all taxa. At an arguable date of around 330,000 years ago there were already at least two different types of cattle that became ancestors of nearly all modern cattle, the Bos taurus taurus more adapted to temperate climates and the tropically adapted Bos taurus indicus. After domestication, human selection exponentially intensified these differences. To better understand the genetic differences between these subspecies and detect genomic regions potentially under divergent selection, animals from the International Bovine HapMap Experiment were genotyped for over 770,000 SNP across the genome and compared using smoothed FST. The taurine sample was represented by ten breeds and the contrasting zebu cohort by three breeds. Results Each cattle group evidenced similar numbers of polymorphic markers well distributed across the genome. Principal components analyses and unsupervised clustering confirmed the well-characterized main division of domestic cattle. The top 1% smoothed FST, potentially associated to positive selection, contained 48 genomic regions across 17 chromosomes. Nearly half of the top FST signals (n = 22) were previously detected using a lower density SNP assay. Amongst the strongest signals were the BTA7:~50 Mb and BTA14:~25 Mb; both regions harboring candidate genes and different patterns of linkage disequilibrium that potentially represent intrinsic differences between cattle types. The bottom 1% of the smoothed FST values, potentially associated to balancing selection, included 24 regions across 13 chromosomes. These regions often overlap with copy number variants, including the highly variable region at BTA23:~24 Mb that harbors a large number of MHC genes. Under these regions, 318 unique Ensembl genes are annotated with a significant overrepresentation of immune related pathways. Conclusions Genomic regions that are potentially linked to purifying or balancing selection processes in domestic cattle were identified. These regions are of particular interest to understand the natural and human selective pressures to which these subspecies were exposed to and how the genetic background of these populations evolved in response to environmental challenges and human manipulation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-14-876) contains supplementary material, which is available to authorized users.


Background
Natural selection has shaped the genome of all living creatures in our planet, including domesticated animals. Nearly all modern cattle can be associated with one of two types or sub-species. This division between the types Bos taurus taurus (taurine cattle) and Bos taurus indicus (zebu cattle) is estimated to have occurred from a common ancestor between 330,000 [1] and 2 million [2] years ago. Since divergence, cattle types have accumulated different genetic variations, which have contributed to highly differentiated phenotypes. It is assumed that the divergence between cattle types was long before domestication, which is estimated to have occurred between 10,000 to 7,000 BC in two separate locations: the Fertile Crescent (taurine cattle) and the Indus Valley (zebu cattle) [3,4]. After domestication human-oriented selection added further complexity to the evolution of cattle.
For most of the history of human-cattle coexistence the environment was the main force driving changes in the animals' genome. Shortly after domestication, human breeders preferred traits that enabled easy management; however, breeders also sought production improvement traits as well [5]. The introduction of the concept of breed in the 19th century led to human-oriented selection imposing strong bottlenecks, which created population demes based on phenotypes. Breed formation was followed by breed expansion via the use of artificial insemination, which reduced genetic variability within breeds particularly in the sex chromosomes and mitochondrial DNA [6]. This is due to the fact that only one haplotype is passed on to the following generation, and subjected to stronger selective forces when compared to autosomal chromosomes.
Positive and balancing selections are terms used to characterize different aspects that selection forces might impose on a population. Positive selection, also termed directional or purifying selection, refers to the selection process through which a particular phenotype (or genotype) is favored in a given environment, which leads to an increase of its frequency in a population. In contrast, balancing selection refers to the selective process through which multiple alleles are selected, thus preserving the genetic diversity in a population. Balancing selection is often observed when heterozygous animals have a competitive advantage. Alternatively, these may be regions of convergent selection across groups. Importantly, both positive and balancing selection phenomena can be tracked using SNP genotypes or sequence data from the cattle genome. SNP genotyping has become widely used in animal genetics and a number of methods have been developed to identify regions under selection. Out of these F ST is a widely used statistic to evaluate the diversity of subpopulations of animals or to determine the relative distance between populations. Many variations of the F ST concept [7] exist, but all adhere to the core principle of being a metric of allele frequencies and their variance. This metric has also been used to identify loci under selection [8][9][10].
In this study, we used a pure drift F ST model [11] which assumes all animals originated from the same ancestral population. This model was applied to taurine and zebu animals to identify loci under selection. These two groups correspond to the main (and most ancestral) separation of domestic cattle, which in most but not all cases corresponds to animals adapted to tropical and temperate environments. The identification of such loci can aid in the identification of genes and genomic variants that are related to environmental adaptation and/or selection derived from human agropastoral activities.

Statement on the ethical use of animals
No ethics statement was required for the collection of genetic material. The DNA from animals included in this study were either part of previous analyses that obtained specific permissions [12] or were extracted from semen straws collected in accredited AI centers in accordance with the Brazilian legislation on animal welfare.

Cattle samples and SNP genotypes
All individuals were genotyped using the BovineHD BeadChip that includes~777 K SNP (Illumina, Inc. San Diego, USA) following standard procedures. The SNP set included in this genotyping platform was carefully selected to reduce the potential for ascertainment bias during SNP discovery. Seven different grouping of breeds were used to assess the minor allele frequency of all available SNP, this included Holstein, Angus, Nelore, Bos taurus taurus dairy excluding Holstein, Bos taurus taurus beef ignoring Angus, Bos taurus indicus excluding Nelore, and adapted Bos taurus taurus (e.g. Senepol). This was complemented with sequence data from 30 breeds that were compiled and weighted to minimize ascertainment bias. More information on the BovineHD can be found in the supplier's website (http://www.illumina.com/documents// products/datasheets/datasheet_bovineHD.pdf).
Only animals with call rates > = 98%, and SNP with more than 95% successful genotypes were kept in the final dataset. Filtering was also based on available pedigree information and the estimated proportion of alleles shared identical-by-descent (PI_HAT > 0.8) ( [13] http://pngu.mgh.harvard.edu/~purcell/plink/), animals with high relatedness were excluded. A total of 339 Bos taurus taurus or taurine individuals from the Bovine Hapmap DNA panel [12] were included in the analyses. Breeds represented in this group were: Angus (n = 44), Brown Swiss (n = 24), Charolais (n = 37), Guernsey (n = 21), Hereford (n = 36), Holstein (n = 63), Jersey (n = 39), Limousin (n = 47), Norwegian Red (n = 17), and Red Angus (n = 11). The Bos taurus indicus or Zebu animals (n = 166) were also from the Bovine Hapmap experiment, and they were complemented with additional individuals. Breeds represented in this group were: Nelore (n = 91), Gir (n = 50), and Guzera (n = 25). Even though Brahmans are considered zebu animals, it is known that taurine animals were also used during the breed formation and expansion; therefore they were not included in these analyses.

Population and linkage disequilibrium (LD) structure
Pairs of markers with high linkage disequilibrium (LD) provide redundant information and impose higher computational demands for population structure analyses.
To remove extraneous information, the dataset was pruned based on LD between markers using the PLINK [13] command -indep-pairwise 50 10 0.1, which calculates LD for each pair of marker in a window of 50 SNP. If a pair of SNP had r 2 > 0.1, then one of the SNP was removed, the window was moved 10 SNP and the process restarted. The pruned genotypes defined a dataset including 38,681 SNP that were then used to assess the population structure using three methods: 1) unsupervised clustering of individuals based on maximum likelihood as implemented in the program Admixture Version 1.20 [14] with cluster number (K) equal 2; 2) principal components analysis as implemented in GCTA [15]; and 3) estimated genetic relationship matrix [16] visualized as heat map using R [17]. For plots of LD between markers, r 2 were calculated using Haploview [18].

Identification of genomic regions under selection
F ST statistics were used to characterize the differentiation between taurine and zebu animals by first identifying SNP potentially under selection. Next, genomic regions with a high proportion of such SNP were identified, and then the genic content of regions with extreme signals for positive and balancing selection were further analyzed. The estimation of SNP F ST was based on a pure drift model defined by Nicholson et al. [11], following the simplification proposed by Flori et al. [10]. These analyses were performed using R [17] scripts. The SNP F ST were smoothed across the Bovine genome reference assembly UMD 3.1 [19] using a local variable bandwidth kernel estimator [20] (R package lokern), where every fifteen SNP F ST values generated one smoothed F ST value. This bandwidth was used because it covers a region of~50Kb which is the average extent of LD found in these populations. The genomic regions with predominantly higher F ST values usually resulted in high values of smoothed F ST and were potentially associated to positive selection. In contrast, regions with mainly low F ST values generated low smoothed F ST values and were potentially associated to regions under balancing selection. The top and bottom 1% smoothed F ST values were identified, translated into genomic position (UMD 3.1) and the genic content of each region was tested for gene ontology overrepresentation. The cattle chromosome X (BTAX) is highly differentiated between taurine and zebu animals. Therefore, the identification of the top and bottom 1% values included only the autosomes, being the BTAX analyzed separately as it contains regions under relatively strong positive selection. Similar analyses were also performed only within-taurine (n = 9 breeds, the Red Angus was excluded due to small sample size) and only within-zebu (n = 3 breeds). These analyses were performed to gather hints as to the origin of the differentially selected regions seen between zebu and taurine cattle.
Regions harboring copy number variants (CNV) might also be under selection and contributing to an observed selection signal, therefore CNV regions that coincide to smoothed F ST peaks were further explored. Gene content of cattle CNV regions was assessed using Ensembl (ftp://ftp.ensembl.org/pub/current_fasta/bos_ taurus/pep/). It is worthwhile to point out that F ST and CNV results did not use the exact same samples. CNV results are based on Bickhart et al. [21] that use a Holstein, a Nelore, a Hereford and 3 Angus samples, also included in the F ST analyses. Intersections between balancing selection region coordinates and exon positions were compared using MySQL queries. We obtained a catalog of all bovine peptides from Ensembl. This yielded 22,118 peptides, 345 of which overlap with 24 predicted balancing selection regions, and corresponded to 318 unique Ensembl genes. Using PANTHER version 7 [22], we tested for over representation of biological process, molecular function and pathway terms under the balancing selection regions. Results were Bonferroni [23] adjusted and PANTHER terms with less than five observations were not further analyzed. Similar analyses were performed on the peptides under the 48 positive selection regions. PANTHER results were similar when all peptides under the 24 balancing selection regions and 48 positive selection regions were combined in a single analysis.

SNP genotypes
After quality control, a total of 768,506 SNP were considered. In taurine, most of the autosomes had >90% of markers polymorphic and in zebu slightly less markers were polymorphic (between 80-90%). This distribution was similar across all autosomes; however, the taurine group had a reduced proportion of polymorphic markers when compared to the zebu on BTAX ( Figure 1A). Most autosomes had >80% of SNP polymorphic in both groups, with~10% polymorphic only in taurine and only a reduced number of SNP exclusively polymorphic in zebu. The zebu exclusive SNP group was different again for the BTAX where~50% of the SNP were polymorphic in both groups and close to 40% polymorphic only in zebu ( Figure 1B). Within cattle types, the average heterozygosity was 0.21 and 0.29 for zebu and taurine.

Population substructure
The separation between taurine and zebu is the most substantial type-distinction between domestic cattle. Clustering animals based on the genetic relationship matrix clearly demonstrates this division between cattle populations ( Figure 2), which is also seen using an unsupervised clustering with selected number of clusters K = 2 (Additional file 1: Figure S1A). This latter analysis evidences the majority of individuals are pure bred within each cattle type assigning an estimated proportion of more than 0.9 for either the zebu or taurine clusters.
The first principal component, which is the axis that explains the most variance, not surprisingly corresponds to the same main division. The second principal component starts to subdivide the taurine animals (Additional file 1: Figure S1B and C). This subdivision of taurine animals was also seen in four independent runs of principal components analyses that used the same number of individuals per breed and different random combinations of taurine breeds in addition to the three zebu breeds (Additional file 2: Figure S2). This agrees with the lower pair-wise F ST observed between zebu breeds in comparison to taurine breeds (Additional file 3: Table S1).

Regions under positive selection
The top 1% smoothed F ST values were distributed across 48 regions in 17 chromosomes (Table 1) including the BTAX (not shown in Figure 3). Of those, 12 regions were known to harbor copy number variations, and 22 regions had been described as under positive selection in previous studies ( Table 1). Twenty of them also overlapped on one or more breed specific peaks in the within cattle type analyses. Among the previously described peaks, 10 of them overlapped to taurine breed signals, and 1 to a zebu breed peak. The search for overrepresentation of gene ontology terms was not conclusive. Nevertheless, some regions can be highlighted because of their genic content and/or results from previous studies identifying them as being under selection. The BTA7:47.2-53.7 Mb region (Table 1: regions P10 and P11) harbors two closely linked regions that are potentially under selection. These regions contain a number of immune-related and imprinted genes (CD14, HSPA9 and PCDH family) previously identified to be under selection, and associated with cattle fertility (SPOCK). Moreover, a number of CNV are located in the same region and linkage disequilibrium (LD) blocks larger than the average genomic LD are present in both taurine and zebu animals with LD blocks varying in length (Additional file 4: Figure S3A). Another interesting region is the BTA14:24.6-25.2 Mb region (Table 1: region P22), which confirmed previous results [10] and was recently associated with cattle production-related traits. Interestingly, the zebu and taurine LD patterns also markedly vary within this region (Additional file 4: Figure S3B). The BTAX is the final region to be highlighted, as almost the entire chromosome was Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore Nelore   shown to be highly differentiated between taurine and zebu.

Regions under balancing selection
The bottom 1% smoothed F ST values consisted of 24 genomic regions across 13 chromosomes (Table 2). Of those, only a region on BTA23 had been previously described as a candidate for balancing selection. This region also overlapped taurine breed signals from the within-taurine analysis. In total, 6 regions overlapped within cattle type analyses, three to zebu breed peaks and four to taurine breeds. Fourteen of these regions have been described as having CNV. These included the large region (Table 2: B22) on BTA23:24.2-31.1 Mb comprising the BOLA gene family (MHC -II molecules) which harbors 30 described CNV. This region has also been previously associated with balancing selection [12,24] in cattle ( Table 2).
The 24 balancing selection regions overlap with 345 Ensembl peptides, corresponding to 318 unique Ensembl genes ( Table 2). Additionally~83% (20/24) of the balancing selection regions completely or partially span cattle Ensembl genes. We assigned PANTHER accessions to a total of 332 overlapping peptides. Statistically significant over represented peptides were observed for multiple categories. Five pathways were found significantly overrepresented (adjusted p-value <0.05): the olfactory transduction, systemic lupus erythematosus, type I diabetes mellitus, antigen processing and presentation, graftversus-host disease and allograft rejection pathways; all of which could be linked to immune response systems (a biological process also overrepresented).
The average F ST for each chromosome in each analysis can be found in the Additional file 5: Table S2. Also in the supplementary material all top and bottom F ST peaks for all analyses are presented (Additional file 6: Table S3 and Additional file 7: Table S4).

Discussion
In all, 505 animals derived from 10 taurine and 3 zebu cattle breeds were genotyped across more than 770,000 SNP markers to investigate the genomic changes subsequent to the separation between taurine and zebu cattle, which occurred at a date between 330 thousand and 2 million years ago [1,2]. Evaluation of the SNP genotyping platform suggested there was minimal bias in properly characterizing both subspecies of animals, except possibly on the sex chromosomes. As expected, most of the chromosomes had a higher proportion of polymorphic markers in taurine animals, also resulting in higher heterozygosity, when compared to zebu ( Figure 1A). This is due to the fact that most of the SNP described for cattle were identified using the reference sequence of a taurine animal [19,28], but this should not overly impact population diversity metrics [29]. Nevertheless, all chromosomes have >80% SNP polymorphic in both cattle types, exception made for BTA1, 13, X and Y (Figure 1B), providing a large number of informative markers.
Clustering animals based on the genetic relationship matrix, not surprisingly, split the animals into two groups (taurine and zebu) in agreement to the division along the first principal component and the magnitude of pair-wise F ST between breeds. The split along the second principal component between taurine breeds suggests that there is more variation within this cattle type than there is within zebu. Since it is known that unbalanced principal components analyses could mislead interpretations of population structures [30], four randomized evenly sampled analyses were run (Additional file 2: Figure S2). These additional analyses supported   Table S3. Ang -Angus; Bro -Brown Swiss, Cha -Charolais, Gue -Guernsey, Jer -Jersey, Her -Hereford, Hol -Holstein, Lim -Limousin, Nor -Norwegian Red, Guz -Guzera, Nel -Nelore.  Table S4. Ang -Angus; Bro -Brown Swiss, Cha -Charolais, Gue -Guernsey, Jer -Jersey, Her -Hereford, Hol -Holstein, Lim -Limousin, Nor -Norwegian Red, Guz -Guzera, Nel -Nelore.
the previous results. This could be partly due to more intensive selection and reproductive isolation in taurine breeds than among zebu cattle. However, even though the BovineHD BeadChip was developed to minimize potential ascertainment bias, one cannot entirely reject the possibility that the subdivision seen on principal component 2 was due to this potential bias carried over by the genotyping platform. In the near future when whole genome sequences from a number of breeds and cattle types become available a definitive conclusion about this aspect will be drawn.
The BTAX and Y carry a great number of SNP with high difference in allelic frequencies between groups. These chromosomes have probably undergone much stronger selection or, more parsimoniously, higher genetic drift, due to their unique inheritance [6], and the history of domestication, selection, breed formation. Furthermore, the intensive use of artificial insemination techniques have likely contributed to the reduction of genetic variability within breeds (or cattle types) in these chromosomes. It is understood that in the case of the SNP that are polymorphic in both cattle types, the alternative allele likely arose within the cattle population before the split between taurine and zebu, and remained in both populations at variable frequencies. Alleles that are fixed in one subspecies and variable in the other possibly arose after the split. However, this understanding does not take into account that alleles that were fixed in one population also might have arisen before the split, but were fixed due to different selection processes or as a result of different bottlenecks on the populations. The identification of the ancestral allele of these SNP, ideally using whole genome sequences of other Bovids, would contribute to understand the evolutionary processes behind these monomorphic sites.
The use of metrics based on variance of allelic frequencies in order to identify genomic regions that are potentially under selection, such as F ST , have already been explored in a number of studies using cattle [10,28,31], sheep [9] and dogs [8]. In this study a relatively high density of markers (average gap between markers 4Kb) was applied to detect genomic differences between zebu and taurine using F ST , identifying regions that were potentially associated with different types of selection. Due to their original geographic distribution, taurine cattle are more adapted to temperate climate, while zebu cattle are better adapted than most taurine cattle to tropical environments. Therefore, differences between these two cattle could be linked to adaptation to the environment; however, it is likely that selection imposed by humans in different geographical locations and livestock-product production goals may have also produced regions that were under differing selective pressures. This study, the most comprehensive to date for cattle, identified 48 regions under potential positive and 24 under balancing selection.
A number of these positive selection candidates have been identified to be under selection in previous studies (22 out of 48, Table 1). These previous studies cannot strictly be considered independent analyses since a subset of markers included in the analyses presented here were already used in those. However, in this work more than a 10 fold increase in marker density was used, thus reducing the overlap of SNP across experiments to less than 10%. Further, different cattle samples and populations were used. Thereafter, even though not absolutely independent, from previous studies, our results lend support to the findings from previous articles provide new insights on ancient differentiation between zebu and taurine cattle. These regions may be genomic segments that were under natural selection or drift, but in fact, might for instance represent zebu fragments that were introgressed in taurine breed potentially defining low-level admixed populations [24,25]. A parallel could also be drawn to described QTLs that overlap these highly divergent genomic regions, e.g. on BTA14:~25 Mb which harbors quantitative loci for stature [32], fertility [33][34][35] and subcutaneous fat [36]. The different LD structure in these regions supports the concept of introgressed segments as a way of sharing recent polymorphisms between the cattle types [37], and defines quantitative loci and signatures of selection.
The highest differentiation peak was found in BTA7: 50 Mb. This region had previously been identified as a site containing a signature of selection [12,24]. A number of features were also identified in this region, including different LD structure between zebu and taurine cattle, the presence of imprinted genes, and potential association to fertility traits. This region is among the very few regions for positive selection that also contain CNVs; which may seem antagonistic to purifying selection. It is not clear at this point how CNV are being kept in the population at this site and at the same time there is a differential signal for zebu and taurine cattle. It could be in consequence that these CNV being less likely in LD with neighbouring SNPs, because similar CNVs can occur on different haplotype backgrounds. Another possibility is that duplications can initiate gene conversion events, which can then decrease the LD surrounding such variants. Interestingly, CNVs were often observed at most candidate sites for balancing selection, where variation is expected.
Fourteen out of 24 balancing selection regions overlap identified CNVs, including the highly variable region on BTA23:~24 Mb with 30 described CNV (Table 2). This set of balance selection-derived genes possess a wide spectrum of molecular functions and provide a rich resource for testing hypotheses on the genetic basis of phenotypic variation within and among breeds. Consistent with similar analyses in other mammals (human, mouse and dog), several of these genes, which are important in drug detoxification, defense/innate and adaptive immunity, are also highlighted by these analyses in cattle. These gene families include the bovine MHC (BoLA), ATP-binding cassette (ABC) transporters, Glutathione S-transferases, Complement factors, Interleukin-17A (IL17A), Heat shock 70 kDa protein 1A (HSPA1A), Chloride intracellular channel protein 1 (CLIC1), and Casein kinase II subunit beta (CSNK2B), which support the shared GO terms among mammals. Conservation of these genes across mammals suggests that selective pressure may drive acquisition or retention of species-specific gene functions.
On the other hand, lineage-specific selection events were detected in mammals, especially in mice and rats. In this regard, it is intriguing to note that mammary gland development genes, such as Butyrophilin-like protein 2 (BTNL2) and Myelin-oligodendrocyte glycoprotein (MOG) were enriched in GO Biological process on the PANTHER analyses. We also detected marked variation between individuals and across diverse cattle breeds, which indicates that these selection events may have occurred within the artiodactyla and/or Bos lineages contributing to cattle speciation and domestication.
Genome-wide, most CNVs evolved under neutral evolutionary pressures. Their frequency and sequence context were shaped by demographic events, mutation rate and genetic drift. However, most CNVs in potentially functional regions, especially those overlapping genes, are under purifying selection and there are only a few examples of CNVs on these positive selection sites. Regions that differ in copy number between subspecies can be informative about ancient adaptations that may have led to species-specific phenotypes [38]. Recent copy number changes can inform about human selection that may have led to genetic and phenotypic differences between breeds.
Similar to selection for variability seen in balancing regions that result in low F ST values, it is worth noting that low values could also represent purifying selection forces that are simultaneously applied in both populations in the same direction, imposing high similarity between the compared groups which would result in low differentiation (low F ST ). In this case, a potential deleterious mutation affecting both populations would be selected against in both groups. This can partially explain the high frequency of genes associated to Mendelian diseases within those potential balancing selection regions. Highlighting a few examples, Dysferlin (DYSF) is associated to muscular dystrophy [39], ATPase, Ca (2+)-transporting, plasma membrane, 1 (ATP2B1), where mouse knockouts have identified variation underlying embryonic lethality, and has a critical role in male fertility [40], Plakophilin 2 (PKP2), which is linked to circulatory system conditions [41,42], and Cyclin A (CCNA2) that is an essential regulatory molecule for the cell's cycle [43]. It is not clear at this point, and it will require further investigation to define if the selection signals seen in these regions are due to the presence of those candidate genes or not.
It is not completely clear at this point how the observed signals of selection originated. The within-taurine and within-zebu F ST complement the taurine-zebu contrast analysis providing hints on the breed driving each signal. From the autossomal regions previously described as candidate regions under positive selection, around half of them overlap to signals of one or more breeds in the within-taurine analysis (10 out of 19), which is consistent with one's expectation, since the majority of previous work was done using mostly taurine breeds, and in a few cases also composite cattle. There was only one region previously described as a candidate for balancing selection, in BTA23, and this also overlaps with withinbreed type signals. A number of peaks were characterized with more than one breed specific peak in the within-breed analyses, supporting a commonality of selective pressure in at least a few regions in some breeds. However, not all observed signals from the comparison taurine-zebu could be attributed to a specific breed (s), and these suggest that they represent a deeper degree of separation and, possibly, adaptation between cattle types.
In summary, genomic regions that are linked to positive and balancing selection were detected within taurine and zebu cattle, which represent the major sub-division of domestic cattle. A number of previously described regions containing positive selection were confirmed. Novel selection regions were likely discovered due to the higher resolution of informative SNPs available in this study compared to previous analyses. Some of these regions overlap with production QTL, and e.g. immunerelated genes, suggesting that favorable variations to adaptation and production are present in the general cattle population, however the application of these results into breeding programs to accelerate creation of synthetic breeds with high production value in tropical environments remains elusive until subsequent investigations confirm the underlying effect of the variants underlying the signatures. This information is needed to define breeding systems able to efficiently introgress specific genomic fragments of zebu in taurine cattle and vice-versa.

Conclusions
Genomic regions that are potentially linked to purifying or balancing selection processes in domestic cattle were identified genome-wide. The genetic variants imposing such selective pressure are not known, even though for some regions candidate genes could be assigned, and could serve as resource for new hypothesis testing in the future. These regions are of particular interest to understand the natural and human selective pressures to which these subspecies were exposed and how the genetic background of these populations evolved in response to environmental challenges and human manipulation.