First, we identified SNPs and evaluated the performance of different SNP detection methods.
Functional analysis of SNPs
To exclude most of the false positive predicted SNPs from the functional analysis, we assessed possible functional effects only for the SNPs that were called by all four pipelines. Of these SNPs, 1,051,772 SNPs were located inside genic regions, including 27,722 coding SNPs of which 11,545 are predicted to cause either non-synonymous amino acid substitutions in proteins or to generate gain or loss of 182 stop codons. Eighty-one SNPs were located inside transcriptional splice sites, including 46 acceptor sites and 35 donor sites, and they have the potential to cause alternative splicing transcripts [68–70]. Using SIFT [71, 72] and PolyPhen  to predict the effect of missense mutations, we found 286 homozygous and 530 heterozygous SNPs that were predicted to be deleterious to protein function. These missense SNPs correspond to 5% and 9.1% of the homozygous and heterozygous non-synonymous SNPs (nsSNPs), respectively; the difference reflecting the fact that deleterious alleles are less likely to be homozygous SNPs. However, we observed that the frequency of deleterious SNPs within CNV regions was 3.3-fold higher than the average across the whole genome (Chi-square test, p = 0.0001). The bias of deleterious SNPs in CNV regions may be explained either by a sequence duplication resulting in pseudogenes or by gene duplication where one copy harbours a deleterious mutation while the other retains the gene function. A gene set enrichment analysis for genes with nsSNPs using all bovine genes as background showed an enrichment bias for olfactory transduction and immune related pathways (p < 0.05). This result is similar to the result of the enrichment analysis of genes located inside CNV regions (see the discussion of the CNV regions below).
We found that the SNP frequency in the X chromosome is about 4.9 times lower than in the autosomes. Because, while autosomes are diploid, in the sequenced bull, the X chromosome is haploid; therefore, the sequencing depth of the X chromosome will be about half that of the autosomes resulting in the lower SNP detection rate. However, data in this study and in previous studies  have indicated an almost linear correlation between the SNP detection rate and sequencing depth in this sequence depth range, suggesting that low sequencing depth only partially explains the low SNP frequency found on the X chromosome. The low variant frequency observed on the X chromosome of the sequenced animal also suggests that cattle selection often favours a lower mutation rate on the X chromosome compared with on the autosomes because of the exposure of deleterious recessive mutations on hemizygous chromosomes [73, 74].
We also found that the SNP frequency in CpG islands is approximately half that of the SNP frequency in the rest of the genome (Chi-square test, p < 0.0001) indicating that CpG islands are under more stringent selective pressure. Variants in CpG islands can potentially break the structure of the CpG dinucleotides thereby affecting the methylation status of the cytosine residues, possibly resulting in abnormal epigenetic regulation of gene expression.
Because of the limited number of imprinted genes that have been experimentally discovered in the cattle genome, we applied ortholog mapping of murine and human imprinted genes to the cattle genome based on Imprinted Gene Databases . We found 47 cattle genes that were potentially imprinted and that the frequency of the heterozygous nsSNPs was 3.7 fold lower in the imprinted genes than in all the other genes in the genome (Chi-square test, p = 0.0048). This result suggested that the potentially imprinted genes underwent positive selection and is consistent with the parental conflict theory [76, 77]. However, because the imprint status of these genes is yet to be determined experimentally in cattle, these results need to be verified by future studies.
Based on the pedigree information for the sequenced animal (Additional file 3), the inbreeding coefficient was calculated by Pedigraph v2.4 to be 0.046875 , suggesting that some identical by descent (IBD) regions exist in the genome. The SNPs detected in this study also provided a means to survey large IBD regions that were revealed as runs of homozygosity (ROH) in the sequenced animal. Here, we defined the ROH regions by applying a sliding window with size 1 Mb and a step size 200 Kb, with the ratio of heterozygous and homozygous SNPs set to less than 0.1. Using these criteria, 71 ROH ranging in size from 1 Mb to 92.4 Mb and corresponding to 13.5% of the whole genome in the sequenced individual, were identified.
Because the length of each IBD run is determined partly by the number of generations since the common ancestor, there is a continuum in the length of homozygous segments, depending on the degree of shared parental ancestry and its age. ROH due to recent inbreeding tend to be longer because there has been little opportunity for recombination to break up the IBD segments. On the other hand, ROH of much older origin are generally shorter because the chromosomal segments have been broken down by repeated meioses . According to pedigree data from this study, the two largest IBD regions, with sizes of more than 50 Mb, were most likely to have been derived from a shared ancestor three generations ago.