Phylogenetic reconstruction and the identification of ancient polymorphism in the Bovini tribe (Bovidae, Bovinae)

Background The Bovinae subfamily incorporates an array of antelope, buffalo and cattle species. All of the members of this subfamily have diverged recently. Not surprisingly, a number of phylogenetic studies from molecular and morphological data have resulted in ambiguous trees and relationships amongst species, especially for Yak and Bison species. A partial phylogenetic reconstruction of 13 extant members of the Bovini tribe (Bovidae, Bovinae) from 15 complete or partially sequenced autosomal genes is presented. Results We identified 3 distinct lineages after the Bovini split from the Boselaphini and Tragelaphini tribes, which has lead to the (1) Buffalo clade (Bubalus and Syncerus species) and a more recent divergence leading to the (2) Banteng, Gaur and Mithan and (3) Domestic cattle clades. A fourth lineage may also exist that leads to Bison and Yak. However, there was some ambiguity as to whether this was a divergence from the Banteng/Gaur/Mithan or the Domestic cattle clade. From an analysis of approximately 30,000 sites that were amplified in all species 133 sites were identified with ambiguous inheritance, in that all trees implied more than one mutation at the same site. Closer examination of these sites has identified that they are the result of ancient polymorphisms that have subsequently undergone lineage sorting in the Bovini tribe, of which 53 have remained polymorphic since Bos and Bison species last shared a common ancestor with Bubalus between 5–8 million years ago (MYA). Conclusion Uncertainty arises in our phylogenetic reconstructions because many species in the Bovini diverged over a short period of time. It appears that a number of sites with ambiguous inheritance have been maintained in subsequent populations by chance (lineage sorting) and that they have contributed to an association between Yak and Domestic cattle and an unreliable phylogenetic reconstruction for the Bison/Yak clade. Interestingly, a number of these aberrant sites are in coding sections of the genome and their identification may have important implications for studying the neutral rate of mutation at nonsynonymous sites. The presence of these sites could help account for the apparent contradiction between levels of polymorphism and effective population size in domesticated cattle.

lands and obligate drinking patterns [17]. In addition early Bovini fossils do not exist in nearby regions like Afghanistan and Iran suggesting that these early Bovini were restricted to the open forests of India and Pakistan until < 7 MYA when the earliest Bovini fossils begin to appear in Africa, Asia and Europe, respectively. This expansion coincides with a changing climate and expansion of C 4 dominated grasslands [16][17][18]. Hence estimates of the Bovinae Bubalina split > 10 MYA are most likely overestimates and a more realistic estimate is < 10 MYA. Within the Bovina subtribe, divergence of the remaining species also occurred recently, with a very sudden radiation in the early Pleistocene ~2 MYA and as a result speciation has not been complete, with many members of the tribe producing viable hybrid offspring or infertile offspring in bulls that may have fertility restored by repeated backcrossing [1,[4][5][6]19]. The dramatic range expansion of domestic cattle has resulted in an increased threat of introgression by domestic cattle DNA into the genomes of many wild populations, either intentionally or accidentally [11,20], all of which may confuse phylogenetic relationships.
Another confounding factor when reconstructing phylogenies may arise if a rapidly evolving clade's most recent common ancestor (MRCA) was highly polymorphic, as this may result in the random assortment of genetic variation in different lineages and aberrant modes of inheritance for certain polymorphisms. This phenomenon is known as lineage sorting, and it has been known to cause random associations among species, which may further confuse derived taxonomic relationships [21].
Recently phylogenetic reconstruction using mitochondrial lineages has led to clearer Bovina phylogenies [6,11,15]. However, because mitochondrial DNA is maternally inherited, relationships driven by male mediated introgression may be missed. This finding was confirmed with Y-chromosomal phylogenies identifying close relationships between Wisent and North American bison, while mitochondrial phylogenies identify an association between Bos taurus and Wisent [11]. More recently the Kouprey and Banteng were shown to be both valid species despite the introgression of the Kouprey mitochondrial genome in Cambodian Banteng [15]. Thus, some improvements have been made regarding the phylogenetic relationships of the Bovinae subtribe, but there is still concern over the accuracy of many trees due to low confidence values and contrasting results from different data and phylogenetic methods. This is especially evident when considering the position of the Wisent in the bovine phylogeny and the time of divergence for these species.
Despite concerns over the level of variation within nuclear gene sequences at the species level in the Bovini [22,23], by examining a number of nuclear genes (exons and flanking noncoding regions) we should identify enough genetic variation to differentiate between species, populations or breeds. Also by examining variation within genes and exons in closely related species, one can analyse haplotypes for similarity among species over short (variation per amplicon) and large distances (variation between exons and genes). The presence of identical, long haplotypes in different species may be evidence of recent introgression as recombination is likely to break up linked haplotypes after a number of generations. Alternatively, associations generated by the random assortment of polymorphic alleles should not create similar patterns of linked haplotypes over any significant distance. Therefore, lineage sorting and introgression can be differentiated by the analysis of haplotype similarity and sequence variation in genomic DNA between species.
The presence of alleles undergoing lineage sorting in the coding regions of the genome are of interest as their presence is most likely due to polymorphisms in a common ancestor that were sorted in a fashion that is inconsistent with the inferred phylogeny. In some cases these polymorphisms can persist for generations, even past the lifespan of a species. The fact that many of these polymorphisms do not appear to be linked over any large distances suggests they have been independently assorted and that they are ancient in origin and therefore not introduced by recent introgression. If overdominance, selection or demographic factors, like the fixation of slightly deleterious alleles in small populations, have been responsible for the assortment of these ancient polymorphisms they must have an effect on phenotype and should therefore show high dN/dS ratios. Therefore, examining estimates of dN/dS at these sites should help infer whether they are neutral or nearly neutral and if so, whether they are useful for determining the neutral rate of mutation in the Bovini.
In this paper we attempt to reconstruct phylogenies between 14 extant representatives of the Bovinae subfamily. We examined 84 autosomal gene sequences, from 15 different genes, sampled over 4 Bovine chromosomes. Neighbour joining trees and the neutral substitution rate per site at synonymous (dS) and noncoding (dI) regions are presented for all autosomal sequences in an effort to identify relationships and divergence times between spe-cies, and test for evidence of introgression or lineage sorting.

Sequencing
In total 84 amplicons from 15 genes were sequenced across 14 representatives of the Bovinae subfamily, including members of the Bovini and Tragelaphini tribes, which we used to infer the sequence from the most recent common ancestor (see Methods). From the 84 amplicons 83 amplified in at least one sample. The relative proportions from the amplicons that successfully amplified in each species were particularly high, ranging from 0.76 in Eland to 0.95 in Gaur (additional file 1). Surprisingly some of the wild relatives showed better amplification (Gaur, Yak, Mithan, Banteng and Bison) than different breeds of B. taurus (Holstein and Tuli). All sequences were submitted to Genbank and their primers and accession numbers are also available in additional file 1.

Genetic distances, divergence times and phylogenetic reconstruction
Both dI and dS are generally considered neutral substitutions and are often used for discriminating between phylogenetic lineages. Table 2 shows pairwise comparisons for dI and dS between 15 Bovinae representatives, lower and upper diagonal respectively. The minimum dI is within breeds of Indian river buffalo, followed by breeds of Domestic cow. The largest dI is between species like Eland and the Domestic cow (Table 2, lower diagonal). Similarly, the minimum dS is found between breeds, and the maximum dS is between Eland and other members of the Bovidae, such as Bison, Gaur, Mithan, and Domestic cow, respectively (    Bison-Banteng-Mithan/Gaur. Previous work has placed the lineage leading to Bison and Yak as a separate group, which diverged from the Banteng/Gaur/Mithan clade [6,11], and more recently as a divergence from the B. taurus branch [15].

Aberrant sites
To detect if recent gene flow or ancient polymorphisms were responsible for differences between our and other recently published phylogenies, we analysed all of the segregating sites from figure 1 for the presence of anomalies. We reconstructed a number of phylogenetic trees to iden-tify the tree that minimises the number of aberrant sites, which we defined as any site that must have arisen due to a double mutation or lineage sorting of an ancient polymorphism (see Methods). We identified 133 aberrant sites, of which 111 were not tree specific and were aberrant in all trees leaving 22 informative sites. From the 111 non-tree specific aberrant sites 53 ancient sites are segregating within both the Bovina and Bubalina lineages. A legend to all aberrant sites, their position and the relative age (i.e. approximately how long they have been segregating in the clade) is provided in additional file 3.
To examine these aberrant sites in more detail we have analysed a number of phylogenetic trees that have been suggested due to the poor bootstrapping confidence in figure 1, and trees presented in recent work by Verkaar et al and Hassanin and Ropiquet [6,11]. The phylogenetic tree that generates the smallest number of anomalies (125) places Yak off the lineage leading to Domestic cattle and Bison off the lineage leading to the Indochinese cattle ((B. taurus, Yak), (Bison, Indochinese cattle)); however, similar numbers (126) are also seen for the tree that places Bison and Yak as a lineage of the Indochinese clade ((B. taurus), ((Yak, Bison) Indochinese cattle)), which is only worse by one aberrant site. Therefore, some evidence exists for the trees recently proposed by Hassanin and Ropiquet [6] and Verkaar et al. [11]. However, it appears that a number of aberrant sites are generating an association between Yak and Domestic cattle, which has confounded the phylogenetic reconstruction. Removing all aberrant sites from the analysis leaves fewer substitutions to accurately estimate genetic distances between members of the Bovina subtribe and the tree is less certain (data not shown).

Introgression and ancient polymorphism
To identify the source of the aberrant substitutions within the Bovini tribe, we compared all variable sites in the alignment to find sites shared with B. taurus at normal and aberrant positions (   that Yak shares the largest number of alleles in common with Domestic cattle for all variable and aberrant sites (Table 4). Perhaps the most interesting finding is the high number of identical haplotypes Yak and Domestic cattle share over haplotypes resolved from the aberrant sites per amplicon ( Table 4, Table 5). When compared to haplotypes resolved from the all variable sites, Yak and Domestic cattle share the largest number in common. However, there are relatively few haplotypes that are similar for any of the species (Gaur, Mithan, Banteng, Bison and Yak) and Domestic cattle, and this is probably a function of the species specific mutations that have accumulated per amplicon since divergence from the MRCA.
To determine if the high proportion of haplotypes, resolved from aberrant sites, shared between Yak and Domestic cattle may be evidence of recent introgression, we looked for extended haplotypes that are shared between species. Thus, the extent of linkage between neighbouring amplicons and genes on the same chromosome was examined. Table 5 shows the B. taurus haplotype per amplicon from all aberrant sites and the species that share the same haplotype. A number of haplotypes are shared with all species. Many of the haplotypes that are based on one aberrant site (PIT1_e1) may be similar by chance and therefore may not be very informative. However, by extending haplotypes to adjacent exons and genes more power to detect recent hybridisation should be available. But as haplotypes are extended to closely linked amplicons or neighbouring genes on the same chromosome this similarity rapidly disappears (Table 5).
In a further attempt to identify recent introgression, sequence diversity estimates were examined between neighbouring amplicons and genes. Figure 2 shows plots of sequence divergence at noncoding sites (dI) for Hereford vs Yak (A), Hereford vs Holstein (B) and Yak vs Gaur (C). A similar pattern is seen to that described in table 5. A number of amplicons show similar amounts of sequence divergence between Hereford vs Yak as is seen between Hereford vs Holstein (NRIP1 and PIT1). However, genetic diversity at closely linked amplicons remains low between breeds of B. taurus for closely spaced genes on chromosome 1 (PIT1, ITGB5) and chromosome 2 (GMEB1 and EGF), which suggests that these animals have recently shared a common ancestor. In contrast, comparisons for Domestic cattle vs Yak, and Yak vs Gaur (Figure 2A and 2C, respectively) rarely show low sequence diversity over similar linked regions, and thus, genetic associations are most likely not the result of recent introgression.
An examination of the river and swamp buffalo samples from Southeast Asia (BubB and BubC) has identified large segments of chromosome 1 and 2 that are identical. Thus BubB has a closer relationship to swamp buffalo (BubC) than to other members of river buffalo that originate from India, which may suggest some level of introgression in these two samples as large portions of chromosomes 1 and 2 in BubB appear to be of swamp buffalo origin (Figure 3).

Classifying aberrant sites: age and role of selection
A number of aberrant sites were identified that were inconsistent with any of the phylogenetic trees suggested from either molecular or morphological characteristics. In total 33 aberrant sites occurred in coding regions, of which 6 are Ka and 27 are Ks from 11,563 synonymous and 3,482 nonsynonymous sites, which resulted in a dN/ dS ratio of 0.065. From the 33 aberrant sites 10 appear to be polymorphic since the common ancestor with buffalo, while the remaining 23 would have been polymorphic in the common ancestor of the Bos and Bison genera. Of the 10 ancient polymorphisms 3 are Ka and 7 are Ks. Therefore these 10 ancient polymorphisms were maintained for millions of years. They must have been polymorphic in the MRCA to the Bovina and Bubalina subtribes some 5-8 MYA and appear to have persisted to at least the divergence of the Bovina subtribe around 2 MYA. The small values estimated for the dN/dS ratios calculated suggest that these polymorphisms may have been neutral.

Sequencing
A high proportion of successful amplification was achieved for all species within the Bovinae subfamily from primers designed using the Bos taurus genome despite divergence times of 8-14 MY (Taurotragus oryx).
The high rate of amplification between species should facilitate further population genetic studies in the Bovini and possibly other ruminant subfamilies.

Bovine phylogeny reconstruction, mutation and substitution
Within the Bovini, difficulties resolving interspecific relationships arose when comparing pairwise differences between species for dI and dS with those deduced from the neighbour joining algorithm. In general the phylogenetic reconstruction created using Kimura's two parameter method and the neighbour joining algorithm agreed with the p-distances calculated for dI and dS in table 2, with strong separation between Bubalina and Bovina subtribes and strong support for separation of Bubalus and Syncerus genera, confirming previous studies from molecular data [6,11,26]. Within the Bubalus there appears to be a separation between swamp and river buffalo and a possible separation within river buffalo that may be due to due to geographic patterns and/or introgression of swamp and river buffalo. The phylogenetic tree we identified does not agree with some of the individual genetic distances within  ) where e = exon and n = number, therefore, PIT1_e1 is exon 1 of PIT1. Exons that extend over the largest sequencing read, which is typically 800 bp are represented as e(n.j) where j equals order of read from 5-prime to 3-prime. Therefore NRIP_e1.1 and NRIP_e1.2 represent two subsequent reads of the l The number of aberrant alleles making each haplotype are given in brackets as the (taurine haplotype). Table displays haplotype blocks shared with Hereford and 5 members of the Bovinae for exons within the each gene and for genes on the same chromosome.

Figure 2 Plots of sequence similarity for noncoding substitutions per site (dI) for pairwise comparisons between Hereford vs Yak (A), Hereford vs Holstein (B) and Yak vs Gaur (C) for all amplicons all plots are in chromosomal order.
sented above is based on over 29,000 bi-parentally inherited nucleotide characters and should have more power to resolve some of the more confusing intergenic relationships within the Bovini.
The association identified between Bison, Yak and B. taurus may also explain the similarity seen between Wisent and B. taurus. Although, no Wisent sample was represented in our study, Verkaar et al [11] and Hassanin and Ropiquet [6] have recently hypothesised that as a result of repeated male mitigated introgression of American bison populations into Eurasian cattle like populations lead to a new species (Wisent), which comprises the phenotype, autosomal genes and Y-chromosome of American bison and the mitochondrial genome from the maternal ancestor (B. taurus like species) [11]. Therefore, the phylogenetic reconstruction may have been confused due to an analysis of a hybrid river buffalo (BubB) that has large regions of chromosome 1 and 2 that apparently originated from swamp buffalo. Alternatively, the phenotypic and karyoptypic plasticity shown within water buffalo may indicate that genetic differences between river and swamp buffalo could all be within species polymorphism.

Aberrant sites, introgression and ancient polymorphism
One of the most astonishing features of the phylogenetic reconstruction was the identification of a large number of aberrant sites that did not fit any known phylogenetic reconstructions. A number of explanations may account for these phylogenetic aberrations, such as double mutations, introgressive hybridisation and lineage sorting. The probability of aberrant sites as a result of double mutations can be calculated as follows: which is the probability of no mutation in one lineage with mutation rate μ over t generations. Therefore, the probability of mutation in one lineage is Thus the probability that the same site mutates in two different lineages is Plot of sequence similarity (dI) for genes in chromosomal order between BubB (River buffalo) and BubC (Swamp buffalo), showing high degrees of similarity over large regions of chromosomes 1 and 2

Figure 3 Plot of sequence similarity (dI) for genes in chromosomal order between BubB (River buffalo) and BubC (Swamp buffalo), showing high degrees of similarity over large regions of chromosomes 1 and 2.
And the expected number of double mutations across n sites is The probability that both mutations result in the same base is approximately 1/3. Hence the expected number of double mutations in two lineages is Therefore, even if t is set to 10 million years as a conservative estimate for the divergence of B. taurus and B. bubalis and n = 30,000 then for equation 5 one would expect around 2 bases to have undergone a double mutation resulting in the same derived allele from the ~30,000 bases sequenced in each lineage since the divergence of B. taurus and B. bubalus from a common ancestor some 10 MYA. The time of divergence is somewhat less within the Bovini but there are more than 2 lineages. However, the likelihood of a mutation affecting the exact same basẽ 130 times in our dataset is highly improbable, so in general, most abnormalities must be evidence of lineage sorting or introgressive hybridisation and not double mutations.
Also a high proportion (40%) of aberrant sites were identified as sites that were deemed to be polymorphic in Bubalina and Bovina subtribes. Members of these two subtribes separated 5-8 MYA and no hybridisation has ever been documented between any members of these two subtribes [1]. Therefore, these aberrant sites cannot have arisen due to recent hybridisation events. If 40% of the aberrant sites are due to polymorphisms that existed in the common ancestor at Bubalina and Bovina, it seems likely that other polymorphisms in the common ancestor of Bovina will also have contributed to the other 60% of aberrant sites. However, recent introgression might have also contributed, as the incomplete speciation of the tribe has resulted in many viable hybrid offspring from various crossings [1].
The presence of identical haplotypes between Bovina representatives and Domestic cattle may also be a result of ancient geneflow between animals before reproductive isolation was achieved, but in essence this hypothesis would virtually be identical to the presence of ancestral polymorphisms. Another possibility for this finding may also be explained by hybridisation between another closely related species. A previous study from the labs of University Station in Texas also found an anomalous association between Yak and Taurine cattle, due to an animal descending from a B. indicus cow [5,9,11]. The fact that a Bos indicus sample was not included in the data set, meant tests could not determine if some of the aberrant sites in our sample were due to gene flow between B. grunniens and B. indicus and this will need to be investigated further.
The high degree of sequence similarity detected between Yak and Domestic cattle at all variable sites in the alignment, aberrant sites and over extended haplotypes suggests that hybridisation may have also occurred between these two species (Table 4). Yak-taurine hybrids are reported in China and Mongolia where their altitudinal ranges intersect [1]. However, no evidence of linked haplotypes at aberrant sites between neighbouring genes or on very small scales between exons within the same gene was detected (Table 5). Genes were selected from tightly linked regions of the B. taurus genome and the majority of amplicons within genes were only separated by 1-20 kilobases. No information on the genetic distance between genes has been collected but, the small physical distances between amplicons on the same gene suggest the recombination rate between many of these amplicons would be far below 1% per generation. The fact that all animals rarely share haplotypes with B. taurus between neighbouring amplicons and genes suggests that there is no evidence of linked haplotypes. This finding is also reflected in plots of sequence diversity between Yak and Domestic cattle with low sequence diversity detected for some amplicons but consistently larger estimates found for neighbouring amplicons from the same gene (Figure 2A). Thus recent introgression has not played a large role in the presence of these aberrant sites within the Bovina and they may be due to homologous polymorphisms that existed in the MRCA that have undergone lineage sorting in subsequent generations. However, the lack of information on the genetic ancestry for all of our animals may require some caution when interpreting some of our results.
Ancestral polymorphisms that have undergone lineage sorting in the Bovini would have required an ancestral species with an extremely large population size that was very polymorphic. Palaeontological evidence generally agrees with this assumption, with the earliest fossil for the Bovini tribe found in Asia ~8 MYA, with evidence of populations existing solely in Pakistan and India, south of the Himalayas. It was not until about 6-7 MYA that the Himalayas began increasing in size and areas of desert were becoming seasonally humid facilitating wide dispersal of Bovini ancestors [6]. Thus the 52 aberrant sites identified in the Bubalina and Bovina subtribes most likely originate in this once large restricted and panmictic population. The fact that each allele is found in the Bubalina and Bovina suggest that the polymorphisms must have remained segregating in both groups for what would appear to be ~8 million years. The remaining aberrant sites appear to have arisen in the independent ancestors of the Bovina, which suggests that the respective population sizes for these populations must have also been very large. These sites may have persisted for at least 2 million years, which is the approximate divergence time for the major Bovina lineages from molecular and palaeontological data [6,11,27,28,33,34]. The finding that some polymorphisms in the Bovini have been segregating for so long is completely novel. These ancestral polymorphisms have existed for very long periods and been maintained as neutral or overdominant polymorphisms. The low dN/dS among these polymorphisms suggest that they are neutral because it is unlikely the synonymous polymorphisms show overdominance.
Knowledge of polymorphisms segregating in humans and other primates has been known for some time with examples of polymorphisms in ABO, MN and Lewis blood groups [35], although it was not known whether these polymorphisms were the result of the same genetic variations. Recently, work on the MHC has identified a number of polymorphisms that are trans-specific and have existed in humans and other primates for millions of years [36,37]. However, the majority of these genes are involved with disease resistance and the maintenance of polymorphism has been hypothesised to exist due to balancing selection. To our knowledge this is one of the only examples where polymorphisms at the same alleles have been maintained for such long periods outside of the MHC or involving genes that are responsible for disease resistance.
The difficulties in phylogenetic reconstruction caused by lineage sorting in closely related species have previously been reported for neutral genetic markers and mitochondrial control regions [11,22,38]. Because the majority of phylogenetic studies typically only involve a few genetic markers or concentrate on specific regions of the mitochondrial genome the exact rate of alleles undergoing lineage sorting is unknown, especially for nuclear genes, but the maintenance of polymorphic sites beyond the lifespan of a species is expected to be rare [39]. Therefore, the presence of alleles undergoing lineage sorting in coding regions of the bovine genome should be extremely scarce. However, a relatively large number of these aberrant sites were detected within genomic coding regions of the Bovini tribe. These substitutions most likely would have been neutral or nearly neutral in the ancestral population from which they were derived, and thus the probability of their loss and fixation in future lineages is governed by the neutral theory of molecular evolution [40]. By sequencing a large number of close relatives researchers should be able to identify a high number of these aberrant sites and infer the dN/dS rate among neutral mutations.

Conclusion
In conclusion, a partial phylogenetic reconstruction of the Bovinae subfamily has identified three distinct lineages that have arisen after the Bovini split from the Bosephalaphini and the Tragelaphini tribes. Within the Bovini tribe two subtribes the Bubalina and Bovina were resolved, which we estimate diverged approximately 5-8 MYA. About 2-3 MYA the Bovina subtribe was still a single widespread species. It then appears that the Bovina split into 3 major groups: Domestic cattle, Bison/Yak and the Indochinese cattle or Gaur/Mithan and Banteng and this occurred very rapidly. Polymorphisms in the ancestral species continued to segregate for long periods of time, which has lead to some alleles undergoing lineage sorting.
The lineage sorting we have detected appears to have confused the algorithms and tree drawing programs used for this analysis, and the apparent star like phylogeny will most likely continue to confuse phylogenetic reconstructions. The majority of these sites appear to be neutral and must have arose in a polymorphic ancestor and continued to segregate within species for millions of years. The age and maintenance of these ancestral polymorphisms in coding regions may be useful for studying the neutral rate of evolution and could also help explain the discrepancies between effective population size and levels of polymorphism in domesticated cattle.

Gene candidates
Candidate genes were selected for sequencing using a priori knowledge of the gene's position in relation to suspected milk QTL on chromosomes 1, 2, 4 and 9. In addition MacEachern et al. have compared a large number of genes and EST for evidence of higher than average rates of molecular evolution between Bos taurus, Homo sapiens and Bos indicus [41,42]. Genes showing higher than average rates of molecular evolution between these species may be of evolutionary importance and are good candidates to study molecular evolution. This potential list of genes was further reduced by selecting genes that might have a role in milk production from examination of the most recent annotation of the human genome. Table 6 summarises the genes, descriptions and their chromosomal location from the latest cattle genomic resources. All primer sets were optimised using a number of samples from the Bovidae. If optimisation failed to amplify any products a second primer pair was trialled. If no improvement was recorded, the amplicon was dropped from the analysis. A complete list of primers for all exons, a summary of PCR success in each species and Genbank accession numbers are available in additional file 4. Amplicons where all species amplified are available as a multiple alignment spanning the exon and flanking intronic sequences. This alignment was used extensively for phylogenetic reconstruction and is presented in additional file 2. Sequencing was completed using the Big Dye Terminator v 3.1 cycle sequencing kit on an ABI 3700 DNA Analyser (Applied Biosystems) according to the manufacturer's instructions for the sense and antisense strands. GeneScan version 3.7 (Applied Biosystems) was used to analyse the data collected from each capillary. Seqscape V. 2.5 (Applied Biosystems) was used to analyse chromatograms and resolve any potential heterozygous sequences. In total over 6,000 sequencing reactions were performed, analysed and exported as FASTA sequence files.

Consensus sequence generation
Sense and antisense reads were aligned using CLUSTALW [45] and used to create a consensus sequence for the analysis using an automated computer program. Consensus sequences for each amplicon were created from multiple alignments of sense and antisense reads for each species and breed. Choosing the most common residue at each point in the alignment creates the consensus, if one of two alleles does not occur at a frequency of 0.7 or higher, the ambiguous character (N) is selected, which masks polymorphisms within species and may result in a conservative estimate of the number of sites undergoing lineage sorting.

Ancient sequence generation
Alignments for each genomic fragment typically contained sequences representing the majority of the extant members of the tribe Bovini (Bovidae, Bovinae) and outgroup representatives from the tribes Tragelaphini (Bovidae, Bovinae) and Hominini (Hominidae, Ponginae). These alignments were used to create an ancient nucleotide sequence, which should represent the common ancestor of the Bovini tribe.
A computer program was developed to infer the ancestral state of each base between the two lineages leading to the two main subtribes of the Bovini, Bovina and Bubalina. The first subtribe Bovina includes all of the species from the genera Bos (cattle) and Bison (bisons) and the second subtribe Bubalina incorporates all of the species from the genera Bubalus (Asian buffalo) and Syncerus (African buffalo) ( Table 1). The ancestral sequence for these two subtribes was inferred using Eland (Tragelaphini) and Human (Hominini) as outgroups. Wherever differences were detected between the two subtribes of Bovini the ancestral allele was inferred as the base that was shared between one of the subtribes and one of the outgroup species, the mutant or derived allele was assumed to be the unique allele with no other representatives in the phylogeny. Where both subtribes and the outgroup species had unique alleles the ancestral sequence was represented as an ambiguity character (N). Therefore, if j = a given posi-   [16,47,48]. Phylogenetic trees were generated using the neighbour joining algorithm and Kimura's two-parameter distance using nonparametric bootstrapping with 5,000 replicates from the total (noncoding and coding) dataset in the MEGA4 package [25].

Identifying sites with abnormal inheritance in the Bovini
A computer program was generated to identify sites within introns and exons that differed from the proposed molecular phylogeny. The program identified aberrant sites that were either the result of a double mutation, recent gene flow between species or ancient polymorphisms. Hence, a site is aberrant under a given phylogenetic tree if the same mutation occurred twice or more on at least two different branches of a given phylogenetic tree ( Figure 4). Derived alleles were inferred from the ancestral sequence and phylogenies were inferred from our and others' findings. Hence, if at position j in the alignment Mithan and Yak share an allele with Domestic cattle and Gaur, Banteng and Bison share an allele with the buffalo species, this site would be classified as an aberrant site. These aberrant sites are of interest, either because; they were polymorphic in an ancestral species and arisen due to lineage sorting, or they have arisen due to recent introgression, or are true homoplastic double mutations. The close relationship and limited divergence time of the Bovini subtribe make homoplasy (double mutations) highly unlikely. Thus, if we assume aberrant sites are not due to double mutations we should be able to test the accuracy of various phylogenetic reconstructions. By analysing the number of segregating sites within the dataset we were able to manually infer phylogenetic relationships and interrogate segregating sites for abnormal inherit- ance. Hence, the phylogenetic tree that minimised the number of abnormalities should add extra weight to the phylogenetic reconstruction inferred by our and others' studies.

Associations in the Bovini driven by lineage sorting or introgression
To identify whether lineage sorting or more recent introgression with domesticated cattle was responsible for any anomalies in genetic distance, an analysis was conducted examining genetic diversity between Domestic cattle and Mithan, Gaur, Bison and Yak at aberrant and all variable sites. Haplotype diversity was also examined between these species, with haplotypes resolved over all variable sites for each amplicon. Haplotypes generated from all variable sites were compared with B. taurus for 100% identity. Because Bovini subfamily is monophyletic all chromosomes originate from a common ancestor and therefore a high proportion of sites between all members of the Bovini are identical. Due to the geographic and sexual isolation of this clade many polymorphic sites are species or lineage specific, therefore identifying haplotypes that are 100% similar with B. taurus from variable single nucleotide sites are rare. To overcome this problem we examined haplotypes generated from homologous sites. Haplotypes at these sites should be more informative because they arose in the common ancestor, are present in all representatives of the clade and may be inherited abnormally and as a result may be responsible for confusing phylogenetic relationships. Also, due to the recent divergence of the Bovini tribe species-specific or double mutations at these exact sites are unlikely and haplotype structure should help identify recent introgression. Thus, if Bison or Yak have recently hybridised with Domestic cattle we would expect haplotypes generated from aberrant sites to be identical with those in B. taurus extended over amplicons in the same gene or possibly between genes on the same chromosome, while lineage sorting should show a more random mode of inheritance at these sites and consequently show low haplotype identity with B. taurus over the same region. However, care must be taken if haplotypes are generated over amplicons from only a few aberrant sites, as species may be identical by chance.
A hypothetical monophyletic tree with three distinct clades (1), (2) and (3), where the A allele appears to have arisen in the extant species more than once Figure 4 A hypothetical monophyletic tree with three distinct clades (1), (2) and (3), where the A allele appears to have arisen in the extant species more than once. This abnormal inheritance is due to either a homoplastic double mutation, a hybridisation event by two members of clade 2 with a member of clade 1 or lineage sorting has occurred on a homologous A/C polymorphism that arose in the most recent common ancestor (MRCA) of clades 1 and 2.