Massive screening of copy number population-scale variation in Bos taurusgenome
BMC Genomics volume 14, Article number: 124 (2013)
Copy number variations (CNVs) represent a significant source of genomic structural variation. Their length ranges from approximately one hundred to millions of base pair. Genome-wide screenings have clarified that CNVs are a ubiquitous phenomenon affecting essentially the whole genome. Although Bos taurus is one of the most important domestic animal species worldwide and one of the most studied ruminant models for metabolism, reproduction, and disease, relatively few studies have investigated CNVs in cattle and little is known about how CNVs contribute to normal phenotypic variation and to disease susceptibility in this species, compared to humans and other model organisms.
Here we characterize and compare CNV profiles in 2654 animals from five dairy and beef Bos taurus breeds, using the Illumina BovineSNP50 genotyping array (54001 SNP probes). In this study we applied the two most commonly used algorithms for CNV discovery (QuantiSNP and PennCNV) and identified 4830 unique candidate CNVs belonging to 326 regions. These regions overlap with 5789 known genes, 76.7% of which are significantly co-localized with segmental duplications (SD).
This large scale screening significantly contributes to the enrichment of the Bos taurus CNV map, demonstrates the ubiquity, great diversity and complexity of this type of genomic variation and sets the basis for testing the influence of CNVs on Bos taurus complex functional and production traits.
Copy number variants (CNVs) represent a significant source of genomic structural variation. Their length ranges from 100 bp to several Megabases (up to 5 Mb) and they comprise insertions, deletions, and duplications [1–5]. CNVs were initially thought to be only associated to diseases, but genome-wide screenings have clarified that they are ubiquitous and widespread in many animal genomes [6–11].
Recent studies have shown that genomic structural variations (including CNVs) are common among normal and healthy individuals [12–14]. They account for more differences between individuals, in terms of total bases involved, and have a higher per-locus mutation rate than SNPs . Understanding their distribution in the population at large is crucial in order to clarify their role in determining the phenotype and/or disease state . In humans, several studies have attempted to characterize CNVs in populations using data from the International Human HapMap Consortium [1, 9, 13, 17, 18], and other reference groups [2, 3, 16]. These studies have confirmed that CNVs are widespread throughout the genome and show a broad variation in their frequency of occurrence in populations. In addition they are present throughout the genomes of all taxa investigated so far: mammals [19–26], birds  and invertebrates [28, 29].
CNVs exist in at least two distinct, although non-exclusive, states. Common CNV polymorphisms (i.e. frequency > 1%) often with multiple allelic states defined by variations in copy number and/or genomic structure; and rare CNVs, that typically lead to deletion or duplication of larger genomic segments and exist in fewer allelic states (i.e., hemizygous or trisomic). These latter classes of CNVs are highly penetrant and short-lived in the population, either occurring de novo or persisting for only a few generations and subject to purifying selection . While these structural variations are often benign, they can sometimes influence or even disrupt biological functions. For example CNVs have been identified as causative of a number of human diseases [5, 11].
Bos taurus is one of the most important domestic animal species worldwide. It is one of the most studied ruminant models for metabolism, reproduction, and disease . Consequently, the understanding of the genetic basis of the differences in productive and functional traits in this species has great economic importance and biological significance. In this context, knowledge of the abundance and distribution of CNVs and of their association with phenotypes are of major interest. However, until now, relatively few studies have investigated CNVs in cattle [32–40], none using a population-wide analysis. Therefore, little is known about how CNVs contribute to normal phenotypic variation and disease susceptibility in cattle, compared to humans and other model organisms.
The recent focus of the research community on the study of single nucleotide polymorphisms (SNPs) to assess genetic variation in cattle have promoted the use of genotyping arrays mapping to thousands of loci throughout the genome (e.g. Illumina BovineSNP50 BeadChip with 54,001 informative SNP probes). This type of array is now easily available to scan thousands of individuals at an affordable cost, allowing CNVs to be investigated on a wide scale. Compared to the higher-density of a comparative genomic hybridization array (CGH arrays), a method that detects copy number changes at the level of 5–10 kb, SNP arrays have the advantage of providing both normalized intensities (Log R ratio – LRR), allelic intensity ratios (B allele frequency – BAF) and a better estimate of the loss of heterozygosity (LOH) making CNV detection more robust. Several algorithms are able to detect CNVs using the intensity of fluorescent signals from SNP arrays. In this study we applied the two most commonly used and efficient ones , as implemented in the QuantiSNP  and PennCNV  software, to investigate the genome-wide characteristics of CNVs in five Bos taurus breeds. We scanned the 29 autosomal chromosomes in a panel of 2654 animals and identified 4830 unique CNV candidates belonging to 326 regions, comparing our findings with existing publicly available information on cattle CNVs and investigated the identity and function of genes located within the duplicated regions. Our results significantly enrich the current knowledge about copy-number variants in the Bos taurus genome determining their distribution across the genome in five dairy and beef cattle breeds (Italian Friesian, Italian Brown, Italian Simmental, Marchigiana and Piedmontese). These findings are an important resource for follow-up studies on cattle genome structure and CNV-trait association [44, 45].
CNV discovery and distribution
After dataset cleaning, a total of 51582 SNPs from the BovineSNP50 BeadChip were independently analysed with QuantiSNP  and PennCNV  to identify cattle CNVs. After CNV calling, we identified the best Bayes Factor (BF) threshold to be used by plotting the number and length of discovered CNV as a function of the Bayes factor values, and used the adjusted R2, obtained by qRT-PCR (see Methods and Materials section) as a measure of the false positive rate. Since in the literature [44, 46, 47] a BF threshold values of 10 is very often used and there is no evident improvement in the R2 value for BF values higher than 15, we assumed 15 as the best value that minimizes false positive calling rate and maximizes CNV calling number [48, 49], thus obtaining a good confidence also for single-observed CNVs (Figure 1). As expected and as shown in Figure 1a and 1b, the proportion of CNV length classes detected changes as a function of BF. BF measures the confidence we have in the CNV and depends upon signals arising from a number of contiguous probes. Short CNVs detected by fewer probes result with low BF values, and consequently longer CNVs detected by more probes result in higher BF values. The somewhat larger than usual BF value used here therefore is unfavorable to short CNVs. By setting a high BF value we preferred to identify a lower number of short CNVs but highly confident. It should be noted, however, that the skew of distribution observed in Figure 1b is consistent with several studies reported in the literature [1, 30, 35].
A total of 2654 individuals from five breeds were analysed. We identified 7493 CNVs (4839 after eliminating redundancy) (Figure 2; Additional file 1: Table S1) and 402 CNV regions (CNVRs) (Additional file 2: Table S2) determined by aggregating overlapping CNVs across all samples.
Each individual possesses an average of 6 CNVs, ranging from 23kb to 4963kb with mean and median length of 930 kb and 700 kb, respectively. CNV regions (CNVRs) include 18 CNVs on average and span regions with length between 53kb to 10552kb, with mean and median length of 1240kb and 782kb, respectively. Furthermore, 37 CNVRs have an observed frequency >1%, 24 a frequency > 2% and 5 a frequency > 5%. Considering all 7493 CNVs, 92 of them (1.22%) are homozygous deletions, 5259 (70.18%) heterozygous deletions, 1592 (21.25%) and 550 (7.35) are duplications with three and four copies respectively (Table 1). We observed on average 258 CNVs per chromosome, a significant fraction of which (10%) located in BTA6 (Bos taurus autosome 6) chromosome, while the lowest number of CNVs (0.3%) was in BTA28.
Eleven copy-number variation regions of homozygous and heterozygous deletions and duplications (Additional file 2: Table S2) were validated by quantitative real-time PCR. These were randomly selected across eleven autosomal chromosomes. Each CNV was amplified in a minimum of three and a maximum of seven specimen belonging to different breeds, for a total of 50 validation tests. The CNV copy number estimated by qRT-PCR was plotted against the BeadChip copy number determination (Figure 1c). Linear regression analysis showed a high level of correlation (R2 = 0.92) and a curve slope of 1.00 (Standard Error: 0.05; p-value = 2.2e-16).
The analysis of the distribution of CNV size indicates that with the BF values used less than 2% of CNVs are ≤ 100kb, 12% have a length between 100 and 250kb, 27% have a length between 250 and 500kb, 33% have a length between 500 and 1000 kb, and 25% are longer than 1 Mb. In few samples we identified CNVs about 8Mb long. CNVR number and length are not significantly correlated to chromosome length. BTA29 hosts three CNVRs, while BTA6, has 20 CNVRs, the highest value. Out of the 326 CNVRs, 192 include loss-only events, 31 gain-only events and 103 include both. Loss events are approximately 6.2-fold more common than gain events in CNVRs, while the corresponding rate is 2.5-fold for CNVs. CNVRs affected by loss events have, on average, smaller size than gain regions, in line with the recent published results of Hou et al. .
Looking at the genomic distribution of CNVs within the population, they collectively span a wide fraction of the genome, ~20% of the autosomal genome (497 Mb), in line to what has been found in humans (~16%) . These findings prove that potentially significant portions of the genome can vary in number. There is a substantial difference in the fraction of the genome affected by common (defined as more frequent than >1%) and rare CNVs. The common ones occupy only ~0.1% of the genome suggesting that the bulk of the observed copy-number variations belong to the rare CNV set. There is also a different frequency distribution among CNV types (gain or loss). Duplications and heterozygous deletions are substantially retained in the population while homozygous deletions are found only at very low frequency, generally in one or two samples. These findings suggest the existence of purifying selection in the population due to the potentially deleterious effect of homozygous deletions (Figure 3; Additional file 3: Table S3).
CNV association with segmental duplications and gene content
Although the complete set of mechanisms responsible for generating CNVs is unknown, studies on cattle [2, 37] and other mammalian species [5, 29, 40] highlighted an enrichment of CNVs near segmental duplications (SD). Segmental duplications, defined as genomic regions of high sequence identity (greater or equal to 90%) to more than one genomic locus, may mediate CNV genesis by acting as a substrate for non-allelic homologous recombination. These recombination events may result in amplification, deletion, inversion, or copy number variants. We tested whether there is a non-random association between the CNVs that we discovered and known SD regions  and found a significant overlap: 76.7% of the CNVs intersect with SDs (p-value < 0.001 as estimated by a random permutation test).
The 4839 non-redundant CNVs found within autosomes overlap with a total of 5789 known genes (Additional file 4: Table S4 and Additional file 5: Table S5). Among them, 5019 (87%) are protein coding genes, 676 (12%) non-coding RNAs (229 miRNA, 73 rRNA, 211 snRNA, 131 snoRNA, 32 misc_RNA), and 94 (1%) are pseudogenes and retrotransposable elements. The ~5000 loci included in CNVs contain about 25% of the estimated total number of genes of the species (Additional file 4: Table S4). This fraction is higher than what has been reported in similar papers (Hou et al., 1,263 , Bae et al., 538 ) but comparable with the results of the population-scale study in humans carried out by Mills and colleagues , who mapped genomic structural variations affecting more than 10000 genes.
We used the DAVID tool  to analyse the Gene Ontology (GO) functional categories of the protein coding genes located in CNVs (Table 2). Several GO terms were found to be significantly over-represented (p-adjusted < 0.05). The most enriched GO cellular component categories among the protein coding genes are related to ribosomal activity, with an enrichment fold larger than two (cytosolic small ribosomal subunit, 3.43; cytosolic ribosome, 3.2; small ribosomal subunit, 2.43; ribosomal subunit, 2.06). This set of genes has a limited spectrum of functions, with one-third of their GO terms being related to metabolism. This is also confirmed by a KEGG pathway enrichment analysis (Table 2). We found a significant enrichment (~2-fold) in Nitrogen metabolism, Ribosomal and Oxidative phosphorylation pathways. Interestingly, the same conclusion has been reached in a recent study of CNVs with next-generation sequencing in cattle , thus suggesting that CNVs may contribute to the genetic variance of production traits in this species.
Figure 4 shows the comparison of our data with those obtained in similar studies available in the literature [34, 35, 37, 51]. The four studies we considered used different approaches and different breeds and altogether detected 1810 CNVs from less than 1000 samples. Among them, the two studies based on the same genotyping array we used (BovineSNP50 v1) (Bae et al.  and Hou et al. ) respectively detected 308 and 281 CNVs overlapping with those described here. These correspond to 52% and 36% of the CNVs detected in our study.
The other two datasets obtained by Fadista et al.  and Liu et al.  who used a CGH array, show a more limited overlap with our dataset, namely 19% and 18%. The lower overlap in these cases is very likely due to the fact that the CGH array they used has a much higher density of probes (420 bases of average probe spacing ) compared to the BovineSNP50 beadchip (49 kb of average probe spacing). The identification with high confidence of short CNVs (< 50 kb), even the more frequent ones [35, 40], is much harder with the Illumina genotyping chip, which identifies CNVs having a distribution skewed towards large size. We also measured the percentage of overlap of the CNVs detected by us and by two other studies based on the next-generation sequencing approach [39, 40]. Even though the authors of these studies examined fewer samples (two samples in  and six in ), their more accurate methodology, at nucleotide resolution, shows a moderately higher overlap with our data (33% and 22% respectively, Additional file 1: Table S1). The only partial overlap of the CNVs we find with those detected in other studies can, in principle, be explained by the different breeds used here. Many CNVs appear to be breed specific and may contribute to breed differentiation. On the other hand several studies  suggest that the bulk of CNV variability is more individual than breed specific and therefore the larger number we find is most likely due to the fact that we tested a large number of individuals.
Bos taurusCNV features among breeds
We looked at the differences among the five Bos taurus breeds investigated: Italian Friesian (dairy), Italian Brown (dairy), Italian Simmental (dairy/beef), Piedmontese (beef), and Marchigiana (beef).
Among them, the Italian Brown shows the higher abundance of unique, single CNVs and CNVRs (Table 3, Figure 5a) (p-value < 0.0001), while Marchigiana and Italian Friesian have a higher number of single and unique CNVs than the Piedmontese and Italian Simmental (p-value < 0.001). The Italian Brown shows the highest rate of loss events (p-value < 0.0001), while the Piedmontese shows the lowest frequency of deletion events per sample (p-value < 0.01). The Italian Brown and Marchigiana have, on average, significantly more gain events (p-value < 0.0001) than Italian Friesian and Italian Simmental, but not more than Marchigiana and Piedmontese, probably due to the wider distribution of the latter. While Italian Simmental has significantly less gain events than all breeds but Italian Friesian (p-value < 0.0001). When considering the average proportion of single CNVs per CNVRs (CNV density) within each breed, it can be observed that the Italian Brown has a more concentrated distribution (more CNVs per CNVRs), two times less sparse than the Italian Simmental, the Piedmontese and the Marchigiana (p-value < 0.006). We found no significant difference in the distributions of CNV lengths among breeds, with the only exception of the Italian Simmental that shows a moderately lower mean and median lengths. The average number of CNVs per sample is comparable among the five breeds.
The CNVs distribution among chromosomes (Figure 5b) is, in general, homogeneous and consistent across breeds with the exception of two breeds showing a peak in CNV frequencies in two different chromosomes (BTA5, BTA17). In BTA5 the percentage of CNVs in four breeds is only 3.4% (p-value < 1e-12), while in Marchigiana this chromosome carries 18.1% of all its CNVs observed (107/591 CNVs). The same is true for the BTA17 where the Italian Simmental has 18.5% of the CNVs (107/578 CNVs) to be compared with 7.8% for the other breeds (p-value < 0.04). Considering all the other CNV features (length, population frequency and chromosome position), no significant difference was observed among breeds. Overall this findings also suggest that differences between individuals seems to be much larger than differences between breeds.
Gene ontology enrichment was computed taking into account the genes involved in CNVs for each breed. Only the 17 genes of the Italian Simmental (Additional file 6: Table S6, Additional file 7: Table S7) showed functional enrichment (Table 2). In particular we observed a significant enrichment for GO term involved in Somatotropin and prolactin/lactogen/growth activity genes caused by a single and breed-specific CNV (chr23:33,906,415-36,330,036; three copies) that contains 12 loci (LOC751562-3, PRP1,3,4,6,9, CSH2, PRP-VII, PRL, HDGFL1, MIR2284C). These genes belongs to the PRL family (prolactin related proteins), expressed in the placenta around the first 60 days of gestation and are involved in the establishment and maintenance of pregnancy . Prolactin genes (PRL) are known to have undergone rapid evolution in the lineage leading to ruminants [51–54] and to be duplicated in all well studied ruminants species. The evidence presented here suggests a possible implication of this cluster in the explanation of genetic variation of production traits.
In this investigation we find more CNVs than in previous studies [34–36, 39, 40, 51]. This is likely due to the large number of individuals analysed. There is also a (probably less relevant) difference in the analysis tools that we have used, PennCNV (as in previous studies) and QuantiSNP, known to be more efficient . Given the high number of individuals analysed we detected a number of previously unidentified rare CNVs. It has been reported that in humans, for example, the bulk of the observed copy-number variation is present at ~0.02%–1% frequency .
We cannot exclude the presence of false positives in our dataset, but the results of qRT-PCR validation of 50 individuals for the presence of 11 CNVs (see Figure 1c, R2 = 0.92) suggests that the level of BF (BF = 15 vs the commonly used threshold of 10) used in favour of the detection of false positive CNVs was rather effective. Only the validation reported by Fadista et al.  is comparatively equally extensive (65 individuals and 6 CNVs). Furthermore, the number of CNVs per individual in our case averages of 2.8, a lower value than what found in other studies (around 3.6 in Bos taurus with the same SNP chip). We are therefore confident that the rate of false positives we detected is reasonably low and that do not affect the overall picture.
Notwithstanding the high number of samples examined and CNVs identified, we likely still haven't drawn a complete picture of CNV presence in cattle, mainly because of the limitations of the genotyping array used. We are well aware that the relatively low density of the Illumina arrays with respect of other methods (CGH arrays, whole re-sequencing) make the detection of short CNVs very hard, while it is very well documented, by deep-sequencing methodologies that in Homo sapiens[18, 55] and more recently in Bos taurus the most populated class of CNVs is that of variants shorter than 50 kb [39, 40]. This limitation will only be partially overcome by using the more recent higher-density BovineHD BeadChip (777 k SNPs). This chip, with its 3430 bp average probe distance is ~8 times less dense than the available CGH arrays and therefore would not solve the problem of incompleteness. It is unlikely that any single available technology will capture all genome structural variations and the use of multiple experimental methods (sequence assembly comparisons, paired-end sequencing, sequencing analysis and high-resolution tiling arrays) will be needed to unravel the complexity of genome variations.
Our study presents the first population-scale description of copy number variants in Bos Taurus obtained by analysing data from more than 2500 individuals belonging to five different dairy and beef breeds and using two different bioinformatics algorithms. We found that CNVs collectively span ~20% of the genome and that a significant portion of the genome is potentially subject to variation in copy number, as observed in humans. We described here the frequencies, patterns, and the potential of gene landscape impact of such cattle-specific and breed-specific CNVs. Many CNVs include genes having specific biological roles, e.g. in metabolism, and are thus likely to be functional. Our population scale analysis reveals that, because of their very low frequency, many CNVs are likely to arise independently, generating increased diversity among individuals and providing insight into the penetrant behaviour of CNVs in the population. This cattle CNV map provides information that complements SNP information and may be added to SNP-based genome-wide association and selection studies. A more comprehensive knowledge of the full landscape of bovine genetic variation permits a better understanding of ruminant biology and a further improvement of selection methods in this species.
Animal handling and DNA extraction was carried out following national guidelines and was approved by the animal ethics committee.
Systematic genome-wide CNV analysis
We studied CNVs in a sample of 2654 Italian bulls (B. taurus males used for reproductive purposes in Italian breading). The selection of only bulls is due to the fact that males are usually the ones screened for genotyping and genetically evaluated to record the production traits of their offsprings. The animals belong to five different breeds (891 Italian Friesian, 705 Italian Brown, 482 Italian Simmental, 369 Piedmontese, 207 Marchigiana). Genomic DNA of all samples was analysed using the BovineSNP50 v1 BeadChip 54001 probes (Illumina, San Diego, CA)  according to the standard protocol . Sex chromosomes were excluded from the analysis and only autosomes were used. The QuantiSNP  and PennCNV  tools were used to identify copy number deletions and duplications. Both methods are based on a Hidden Markov Model for the detection of CNVs from Illumina high-density SNP genotyping data. PennCNV is the most frequently used algorithm for CNV studies of this type, partly because of the user-friendly design of the program. Its low false positive rate is another convenient aspect. By contrast, QuantiSNP outperformed six other methods in a recent evaluation study of CNV calling algorithms . We deemed the combined use of both algorithms to be a valid strategy.
Samples with LogR ratio (the normalized total intensity at each locus) higher than 0.30 were filtered out together with individuals with CNV longer than 8Mb, likely to be affected by diseases . For both QuantiSNP and PennCNV, a quality control step for GC-content was performed to check for GC-wave factor and subsequently taken into account for correcting the bias in the analysis . To optimally tune the parameters, such as GC wave factor correction, a training dataset composed of 10% of the data was used. Next, a quality filter for CNV calling based on Bayes Factor thresholds using parameters reported previously [44–47] was applied followed by quantitative PCR (qRT-PCR). The qRT-PCR was used to select the BF threshold with the lower false positive rate. When both the QuantiSNP and PennCNV algorithms detected overlapping CNVs, those with higher BF were selected. All statistical tests to estimate differences in CNV features among breeds, were performed using the Wilcoxon-Mann–Whitney rank sum test statistic as implemented in the R package (wilcox.test, http://www.r-project.org).
Association between CNV, segmental duplication and gene content
The non-random association between CNVs and segmental duplications was tested by determining the direct overlap of CNV boundaries with the segmental duplication location available from the literature . The association test was performed by comparing the data with those obtained by randomly selecting a segment length from the distribution of CNV lengths and a valid chromosomal location for 1000 times.
Gene content of the cattle CNV regions was obtained via the Ensemble BioMart tool  using the genome version Btau_4.0. The obtained list of protein coding genes was used to determine the GO terms and pathway enrichment using the DAVID Bioinformatics resource . The Benjamini method for multiple testing correction was used .
To validate the discovered CNVs, TaqMan quantitative real-time PCR was performed on 50 individuals in 11 regions (Additional file 1: Table S1). Reactions were performed in triplicate in a volume of 25 μl with the Maxima Probe qPCR master mix (Fermentas) on a LightCycler® 480 System (Roche). The PCR cycling conditions were: pre-incubation for 15 min at 95°C, 55 cycles of 15 s at 95°C, 30 s at 58°C. The PCR products were also sequenced to verify the correctness of the amplification region. Primer efficiency was tested for each primer pair (Additional file 1: Table S1) over five dilution points using Maxima SYBR Green qPCR master mix (Fermentas). BTF3 was used as reference gene for all qPCR experiments as in Bae et al. 2010. The quantification analysis was performed using the R package qpcR (http://www.dr-spiess.de/qpcR.html) using the ΔΔCt method [21, 62]. The Regression analyses were calculated with the linear model fit function (lm) implemented in R (http://www.r-project.org).
Bos taurus autosome
Comparative genomic hybridization
Copy number variation
Copy number variation region
- GO term:
Gene Ontology term
Log R ratio
Quantitative real-time polymerase chain reaction
Small nucleolar RNA
Single nucleotide polymorphism
Small nuclear RNA
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME: Global variation in copy number in the human genome. Nature. 2006, 444: 444-454. 10.1038/nature05329.
Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM, Clark RA, Schwartz S, Segraves R, Oseroff VV, Albertson DG, Pinkel D, Eichler EE: Segmental duplications and copy-number variation in the human genome. Am J Hum Genet. 2005, 77: 78-88. 10.1086/431652.
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE: Fine-scale structural variation of the human genome. Science. 2004, 304: 581-584. 10.1126/science.1092500.
Feuk L, Carson AR, Scherer SW: Structural variation in the human genome. Nat Rev Genet. 2006, 7: 85-97.
Zhang F, Gu W, Hurles ME, Lupski JR: Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet. 2009, 10: 451-481. 10.1146/annurev.genom.9.081307.164217.
Goidts V, Cooper DN, Armengol L, Schempp W, Conroy J, Estivill X, Nowak N, Hameister H, Kehrer-Sawatzki H: Complex patterns of copy number variation at sites of segmental duplications: an important category of structural variation in the human genome. Hum Genet. 2006, 120: 270-284. 10.1007/s00439-006-0217-y.
Khaja R, Zhang J, MacDonald JR, He Y, Joseph-George AM, Wei J, Rafiq MA, Qian C, Shago M, Pantano L, Aburatani H, Jones K, Redon R, Hurles M, Armengol L, Estivill X, Mural RJ, Lee C, Scherer SW, Feuk L: Genome assembly comparison identifies structural variants in the human genome. Nat Genet. 2006, 38: 1413-1418. 10.1038/ng1921.
Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, Lee C, Scherer SW, Jones KW, Shapero MH, Huang J, Aburatani H: Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Research. 2006, 16: 1575-1584. 10.1101/gr.5629106.
McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB, Lee C, Daly MJ, Altshuler DM: Common deletion polymorphisms in the human genome. Nat Genet. 2005, 38: 86-92.
Newman TL, Rieder MJ, Morrison VA, Sharp AJ, Smith JD, Sprague LJ, Kaul R, Carlson CS, Olson MV, Nickerson DA, Eichler EE: High-throughput genotyping of intermediate-size structural variation. Hum Mol Genet. 2006, 15: 1159-1167. 10.1093/hmg/ddl031.
Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE, Lam WL: A Comprehensive Analysis of Common Copy-Number Variations in the Human Genome. Am J Hum Genet. 2007, 80: 91-104. 10.1086/510560.
Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW: Discovery of Human Inversion Polymorphisms by Comparative Analysis of Human and Chimpanzee DNA Sequence Assemblies. PLoS Genet. 2005, 1: 10-10.1371/journal.pgen.0010010.
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 2006, 38: 75-81. 10.1038/ng1697.
Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA: Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet. 2006, 38: 82-85. 10.1038/ng1695.
Lupski JR: Genomic rearrangements and sporadic disease. Nat Genet. 2007, 39: S43-S47. 10.1038/ng2084.
Pinto D, Marshall C, Feuk L, Scherer SW: Copy-number variation in control population cohorts. Hum Mol Genet. 2007, 16 Spec No: R168-R173.
Locke DP, Sharp AJ, McCarroll SA, McGrath SD, Newman TL, Cheng Z, Schwartz S, Albertson DG, Pinkel D, Altshuler DM, Eichler EE: Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am J Hum Genet. 2006, 79: 275-290. 10.1086/505653.
Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HYK, Leng J, Li R, Li Y, Lin C-Y, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stütz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll SA, Korbel JO: Mapping copy number variation by population-scale genome sequencing. Nature. 2011, 470: 59-65. 10.1038/nature09708.
Kehrer-Sawatzki H, Cooper DN: Structural divergence between the human and chimpanzee genomes. Hum Genet. 2007, 120: 759-778. 10.1007/s00439-006-0270-6.
Lee AS, Gutiérrez-Arcelus M, Perry GH, Vallender EJ, Johnson WE, Miller GM, Korbel JO, Lee C: Analysis of copy number variation in the rhesus macaque genome identifies candidate loci for evolutionary and human disease studies. Hum Mol Genet. 2008, 17: 1127-1136. 10.1093/hmg/ddn002.
Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, Eis PS, Shannon WD, Li X, McLeod HL, Cheverud JM, Ley TJ: A High-Resolution Map of Segmental DNA Copy Number Variation in the Mouse Genome. PLoS Genet. 2007, 3: 9-10.1371/journal.pgen.0030009.
Egan CM, Sridhar S, Wigler M, Hall IM: Recurrent DNA copy number variation in the laboratory mouse. Nat Genet. 2007, 39: 1384-1389. 10.1038/ng.2007.19.
Snijders AM, Nowak NJ, Huey B, Fridlyand J, Law S, Conroy J, Tokuyasu T, Demir K, Chiu R, Mao J-H, Jain AN, Jones SJM, Balmain A, Pinkel D, Albertson DG: Mapping segmental and sequence variations among laboratory mice using BAC array CGH. Genome Res. 2005, 15: 302-311. 10.1101/gr.2902505.
Guryev V, Saar K, Adamovic T, Verheul M, Van Heesch SAAC, Cook S, Pravenec M, Aitman T, Jacob H, Shull JD, Hubner N, Cuppen E: Distribution and functional impact of DNA copy number variation in the rat. Nat Genet. 2008, 40: 538-545. 10.1038/ng.141.
Chen W-K, Swartz JD, Rush LJ, Alvarez CE: Mapping DNA structural variation in dogs. Genome Res. 2009, 19: 500-509.
Ramayo-Caldas Y, Castelló A, Pena RN, Alves E, Mercadé A, Souza CA, Fernández AI, Perez-Enciso M, Folch JM: Copy number variation in the porcine genome inferred from a 60 k SNP BeadChip. BMC Genomics. 2010, 11: 593-10.1186/1471-2164-11-593.
Wang X, Nahashon S, Feaster TK, Bohannon-Stewart A, Adefope N: An initial map of chromosomal segmental copy number variations in the chicken. BMC Genomics. 2010, 11: 351-10.1186/1471-2164-11-351.
Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M: Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 2008, 320: 1629-1631. 10.1126/science.1158078.
Maydan JS, Lorch A, Edgley ML, Flibotte S, Moerman DG: Copy number variation in the genomes of twelve natural isolates of Caenorhabditis elegans. BMC Genomics. 2010, 11: 62-10.1186/1471-2164-11-62.
Itsara A, Cooper GM, Baker C, Girirajan S, Li J, Absher D, Krauss RM, Myers RM, Ridker PM, Chasman DI, Mefford H, Ying P, Nickerson DA, Eichler EE: Population Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease. Am J Hum Genet. 2009, 84: 148-161. 10.1016/j.ajhg.2008.12.014.
Elsik CG, Tellam RL, Worley KC: The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 2009, 324: 522-528.
Ibeagha-Awemu EM, Kgwatalala P, Ibeagha AE, Zhao X: A critical analysis of disease-associated DNA polymorphisms in the genes of cattle, goat, sheep, and pig. Mamm Genome. 2008, 19: 226-245. 10.1007/s00335-008-9101-5.
Liu GE, Van Tassel CP, Sonstegard TS, Li RW, Alexander LJ, Keele JW, Matukumalli LK, Smith TP, Gasbarre LC: Detection of germline and somatic copy number variations in cattle. Dev Biol. 2008, 132: 231-237.
Bae JS, Cheong HS, Kim LH, NamGung S, Park TJ, Chun J-Y, Kim JY, Pasaje CFA, Lee JS, Shin HD: Identification of copy number variations and common deletion polymorphisms in cattle. BMC Genomics. 2010, 11: 232-10.1186/1471-2164-11-232.
Fadista J, Thomsen B, Holm L-E, Bendixen C: Copy number variation in the bovine genome. BMC Genomics. 2010, 11: 284-10.1186/1471-2164-11-284.
Seroussi E, Glick G, Shirak A, Yakobson E, Weller JI, Ezra E, Zeron Y: Analysis of copy loss and gain variations in Holstein cattle autosomes using BeadChip SNPs. BMC Genomics. 2010, 11: 673-10.1186/1471-2164-11-673.
Hou Y, Liu GE, Bickhart DM, Cardone MF, Wang K, Kim E, Matukumalli LK, Ventura M, Song J, VanRaden PM, Sonstegard TS, Van Tassell CP: Genomic characteristics of cattle copy number variations. BMC Genomics. 2011, 12: 127-10.1186/1471-2164-12-127.
Kijas JW, Barendse W, Barris W, Harrison B, McCulloch R, McWilliam S, Whan V: Analysis of copy number variants in the cattle genome. Gene. 2011, 482: 73-77. 10.1016/j.gene.2011.04.011.
Stothard P, Choi J-W, Basu U, Sumner-Thomson JM, Meng Y, Liao X, Moore SS: Whole genome resequencing of Black Angus and Holstein cattle for SNP and CNV discovery. BMC Genomics. 2011, 12: 559-10.1186/1471-2164-12-559.
Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, Garcia JF, Van Tassell CP, Sonstegard TS, Eichler EE, Liu GE: Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res. 2012, 22: 778-790. 10.1101/gr.133967.111.
Dellinger AE, Saw S-M, Goh LK, Seielstad M, Young TL, Li Y-J: Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 2010, 38: e105-10.1093/nar/gkq040.
Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J: QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 2007, 35: 2013-25. 10.1093/nar/gkm076.
Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF A, Hakonarson H, Bucan M: PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007, 17: 1665-1674. 10.1101/gr.6861907.
Pagnamenta AT, Bacchelli E, De Jonge MV, Mirza G, Scerri TS, Minopoli F, Chiocchetti A, Ludwig KU, Hoffmann P, Paracchini S, Lowy E, Harold DH, Chapman JA, Klauck SM, Poustka F, Houben RH, Staal WG, Ophoff RA, O’Donovan MC, Williams J, Nöthen MM, Schulte-Körne G, Deloukas P, Ragoussis J, Bailey AJ, Maestrini E, Monaco AP: Characterization of a Family with Rare Deletions in CNTNAP5 and DOCK4 Suggests Novel Risk Loci for Autism and Dyslexia. Biol Psychiatry. 2010, 68: 320-328. 10.1016/j.biopsych.2010.02.002.
Liu GE, Ventura M, Cellamare A, Chen L, Cheng Z, Zhu B, Li C, Song J, Eichler EE: Analysis of recent segmental duplications in the bovine genome. BMC Genomics. 2009, 10: 571-10.1186/1471-2164-10-571.
Pagnamenta AT, Wing K, Sadighi Akha E, Knight SJL, Bölte S, Schmötzer G, Duketis E, Poustka F, Klauck SM, Poustka A, Ragoussis J, Bailey AJ, Monaco AP: A 15q13.3 microdeletion segregating with autism. Eur J Hum Genet. 2009, 17: 687-692. 10.1038/ejhg.2008.228.
Cronin S, Blauw HM, Veldink JH, Van Es MA, Ophoff RA, Bradley DG, Van Den Berg LH, Hardiman O: Analysis of genome-wide copy number variation in Irish and Dutch ALS populations. Hum Mol Genet. 2008, 17: 3392-3398. 10.1093/hmg/ddn233.
Griswold AJ, Ma D, Cukier HN, Nations LD, Schmidt MA, Chung R-H, Jaworski JM, Salyakina D, Konidari I, Whitehead PL, Wright HH, Abramson RK, Williams SM, Menon R, Martin ER, Haines JL, Gilbert JR, Cuccaro ML, Pericak-Vance MA: Evaluation of copy number variations reveals novel candidate genes in autism spectrum disorder-associated pathways. Hum Mol Genet. 2012, 21: 3513-23. 10.1093/hmg/dds164.
Leblond CS, Heinrich J, Delorme R, Proepper C, Betancur C, Huguet G, Konyukh M, Chaste P, Ey E, Rastam M, Anckarsäter H, Nygren G, Gillberg IC, Melke J, Toro R, Regnault B, Fauchereau F, Mercati O, Lemière N, Skuse D, Poot M, Holt R, Monaco AP, Järvelä I, Kantojärvi K, Vanhala R, Curran S, Collier DA, Bolton P, Chiocchetti A, Klauck SM, Poustka F, Freitag CM, Waltes R, Kopp M, Duketis E, Bacchelli E, Minopoli F, Ruta L, Battaglia A, Mazzone L, Maestrini E, Sequeira AF, Oliveira B, Vicente A, Oliveira G, Pinto D, Scherer SW, Zelenika D, Delepine M, Lathrop M, Bonneau D, Guinchat V, Devillard F, Assouline B, Mouren M-C, Leboyer M, Gillberg C, Boeckers TM, Bourgeron T: Genetic and Functional Analyses of SHANK2 Mutations Suggest a Multiple Hit Model of Autism Spectrum Disorders. PLoS Genet. 2012, 8: e1002521-10.1371/journal.pgen.1002521.
Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57.
Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, Mitra A, Alexander LJ, Coutinho LL, Dell’Aquila ME, Gasbarre LC, Lacalandra G, Li RW, Matukumalli LK, Nonneman D, De A, Regitano LC, Smith TPL, Song J, Sonstegard TS, Van Tassell CP, Ventura M, Eichler EE, McDaneld TG, Keele JW: Analysis of copy number variations among diverse cattle breeds. Genome Res. 2010, 20: 693-703. 10.1101/gr.105403.110.
Takahashi T, Yamada O, Soares MJ, Hashizume K: Bovine prolactin-related protein-I is anchored to the extracellular matrix through interactions with type IV collagen. J Endocrinol. 2008, 196: 225-234. 10.1677/JOE-07-0069.
Wallis M: The molecular evolution of vertebrate growth hormones: a pattern of near-stasis interrupted by sustained bursts of rapid change. J Mol Evol. 1996, 43: 93-100. 10.1007/BF02337353.
Wallis M: Episodic evolution of protein hormones: molecular evolution of pituitary prolactin. J Mol Evol. 2000, 50: 465-473.
Durbin RM, Altshuler DL, Abecasis GR: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073. 10.1038/nature09534.
Jakobsson M, Scholz S, Scheet P, Gibbs , VanLiere J, Fung H, Szpiech Z, Degnan J, Wang K, Guerreiro R, Bras J, Schymick J, Hernandez D, Traynor B, Simon-Sanchez J, Matarin M, Britton A, Van De Leemput J, Rafferty I, Bucan M, Cann H, Hardy J, Rosenberg N, Singleton A: Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008, 451: 998-1003. 10.1038/nature06742.
Steemers FJ, Chang W, Lee G, Barker DL, Shen R, Gunderson KL: Whole-genome genotyping with the single-base extension assay. Nat Methods. 2006, 3: 31-33. 10.1038/nmeth842.
Ballif BC, Hornor SA, Jenkins E, Madan-Khetarpa S, Surti U, Jackson KE, Asamoah A, Brock PL, Gowans GC, Conway RL, Graham JM, Medne L, Zackai EH, Shaikh TH, Geoghegan J, Selzer RR, Eis PS, Bejjani BA, Shaffer LG: Discovery of a previously unrecognized microdeletion syndrome of 16p11.2–p12.2. Nat Genet. 2007, 39: 1071-1073. 10.1038/ng2107.
Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K: Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 2008, 36: e126-10.1093/nar/gkn556.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005, 21: 3439-40. 10.1093/bioinformatics/bti525.
Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B. 1995, 57: 289-300.
Livak KJ, Schmittgen TD: Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the 2−ΔΔCT Method. Methods. 2001, 25: 402-408. 10.1006/meth.2001.1262.
Research funded by the Italian Ministry of Agriculture, grant SelMol and Innovagen. Authors wish to thank ANABIC, ANABORAPI, ANAFI, ANAPRI, ANARB, the Regione Lazio and EPIGEN.
The authors declare that they have no competing interests.
FC, CM and AN conceived and designed the project. FC and GC carried out all the bioinformatics analysis under the supervision of AT and AV. FC and CM carried out the qRT-PCR experiments. FC, GC, AT, PAM wrote the manuscript. All the authors have read and approved the manuscript for publication.
Electronic supplementary material
Additional file 2: CNVRs dataset. Complete list of CNVR found in this study. It also includes the list of the types of CNV in each region. (TSV 16 KB)
Additional file 3: CNV distribution. List of CNVs and their copy number, length, frequency in the population and number of genes included. (TSV 165 KB)
Additional file 6: Breed specific genes. Complete list of genes specific for each of the five studied Bos taurus breeds. (TSV 21 KB)
About this article
Cite this article
Cicconardi, F., Chillemi, G., Tramontano, A. et al. Massive screening of copy number population-scale variation in Bos taurusgenome. BMC Genomics 14, 124 (2013). https://doi.org/10.1186/1471-2164-14-124