- Research article
- Open Access
Analysis of copy number variants by three detection algorithms and their association with body size in horses
BMC Genomics volume 14, Article number: 487 (2013)
Copy number variants (CNVs) have been shown to play an important role in genetic diversity of mammals and in the development of many complex phenotypic traits. The aim of this study was to perform a standard comparative evaluation of CNVs in horses using three different CNV detection programs and to identify genomic regions associated with body size in horses.
Analysis was performed using the Illumina Equine SNP50 genotyping beadchip for 854 horses. CNVs were detected by three different algorithms, CNVPartition, PennCNV and QuantiSNP. Comparative analysis revealed 50 CNVs that affected 153 different genes mainly involved in sensory perception, signal transduction and cellular components. Genome-wide association analysis for body size showed highly significant deleted regions on ECA1, ECA8 and ECA9. Homologous regions to the detected CNVs on ECA1 and ECA9 have also been shown to be correlated with human height.
Comparative analysis of CNV detection algorithms was useful to increase the specificity of CNV detection but had certain limitations dependent on the detection tool. GWAS revealed genome-wide associated CNVs for body size in horses.
The determination of copy number variants (CNVs) has become increasingly important for the evaluation of genomic traits in domestic animals . CNVs have been shown to be a major source for genetic variation especially in complex traits influencing gene expression, phenotypic variation, adaption and the development of diseases [2, 3]. Analyses using different detection methods have been performed in diverse species like cows, pigs and horses [4–7]. Recent whole-genome sequencing revealed 282 CNVs in a quarter horse mare and suggested CNVs to be an important resource for future studies of complex diseases and traits in horses . CNV studies for horses were carried out by the array comparative genomic hybridization (CGH) methodology and the CNV detection algorithm PennCNV using whole genome SNP genotyping chips [4, 5]. Both techniques have been shown to be a valid method to detect CNVs [3, 6, 9]. It was proposed that a comparison between these methods might be an important step for understanding the advantages and disadvantages of these platforms . SNP arrays were proposed to be advantageous due to lower prices, lower signal-to-noise ratios and the use of the parameter B-allele frequency which facilitates the interpretation of results [1, 6, 10]. Furthermore, less samples per experiment were assumed to be required for SNP genotyping compared to CGH analyses . However, the main bias for SNP arrays was considered to be the low SNP coverage of the genome in regions of CNVs due to difficulties of assay development and implementation . In CGH arrays the genome coverage was shown to be highly dependent of the study design. In pig breeds CGH analyses have been proposed to be advantageous due to an enhanced marker density and a uniform distribution of probes . CNV analyses in horses were performed on a CGH array which only targeted exons and therefore did not allow an identification of CNVs in intergenic regions . This was taken into account for the benefit of detecting smaller CNVs in coding exons of annotated genes .
CNV calls in CGH as well as SNP arrays have been considered to be highly dependent on the algorithms used for the identification of CNVs [11, 13, 14]. For SNP arrays, various CNV detection programs have been available. Comparative analyses of CNV detection algorithms suggested that multiple predictions from different detection programs increased the confidence in the data and helped to eliminate false positive results . Nevertheless the accuracy of CNV detection by different algorithms has also been shown to be limited due to false negative results [15, 16]. SNP array analyses in horses have not been performed in more than one CNV detection algorithm yet . On the other hand, the analysed horse populations varied from a low number of horses (16) of diverse breeds used in the CGH study, to a higher number of horses (520) of a few breeds (4) in SNP array based PennCNV detections [4, 5, 17]. Due to the development of the horse population into a highly variable group, evaluations of different breeds have been shown to be important, especially for complex variations which are supposed to be challenging due to genetic heterogeneity and variations in the phenotypic expression [18, 19].
Genome-wide association studies (GWAS) for height have been performed for CNVs in human and revealed copy-number variants that were proposed to play a role in the development of short stature . Recent studies in horses identified CNVs in the region of genes mainly involved in sensory perception, signal transduction and metabolism but also in candidates for neuronal homeostasis, coat colour, blood group antigens, keratin formation and height . A quantitative trait locus (QTL) for stature was found on equine chromosome (ECA) 16 at 75 Mb. Size variation in horses has been investigated in various GWAS for single nucleotide polymorphisms (SNPs) using BeadChip data [19, 21–23]. On the whole, five loci on ECA3, ECA6, ECA9, ECA11 and ECA28 have been discussed to be probably involved in the determination of withers height. A functional polymorphism on ECA3 was shown to be highly associated with body size and with the relative expression levels of the adjacent gene ligand dependent nuclear receptor corepressor-like (LCORL). It was assumed that LCORL might be a main regulator for the determination of body size in horses.
The aim of this study was to perform CNV detection analyses in accordance with current standards using three CNV detection algorithms in a large number of horses of various breeds and to compare these results with current microarray studies. The CNVs detected were further analysed for their association with body size as a model for complex traits.
Results and discussion
The detection of CNVs was performed on the data of the Illumina Equine SNP50 beadchip using the algorithms CNVPartition (Illumina), PennCNV  and QuantiSNP . Analysis revealed 166, 860 and 1090 CNVs using these programs for the detection (Additional file 1, Additional file 2 and Additional file 3). The mean size for all detected CNVs was 487,562 bp and the median 169,367 bp. Considering the distribution of CNVs over the chromosomes, the detection results of PennCNV revealed the largest number of CNVs on ECA1, ECA12 and ECA13 while the results of QuantiSNP showed an enrichment of CNVs on ECA1, ECA3 and ECA12 (Figure 1). Detection analysis by CNVPartition revealed a high number of detected CNVs on ECA1, ECA12 and ECA23. However, the chromosomes with the highest numbers of CNVs did not necessarily show the highest coverage with CNV regions. We found strong CNV coverage enrichment on ECA23 (CNVPartition), ECA13 (PennCNV), ECA27 and ECA28 (PennCNV and QuantiSNP) and ECA12 (all three programs, Table 1). Across all three detection algorithms, ECA12 was not only significantly enriched by CNVs but also showed the largest number of detected CNVs. An accumulation of CNVs has also been reported in CGH analyses in horses for ECA12, ECA17 and ECA23 and was shown in Illumina Equine SNP50 beadchip based PennCNV analyses for ECA1, ECA2 and ECA17 [4, 5]. We assume that the amount of detected CNVs on specific chromosomes is dependent on the detection method and can vary among different populations. Nevertheless, there is much evidence to presume that ECA12, ECA27 and ECA28 are considerably enriched for CNVs.
Comparison between three detection programs
Comparative analysis between the algorithms showed that PennCNV and QuantiSNP detected similar numbers of CNVs identified on each autosome. They displayed a CNV detection overlap of 28.4% and 22.8% (Figure 2). The percentage of CNVs overlapping with CNVPartition, the algorithm with the lowest number of detected CNVs, was 32.5% (PennCNV) and 37.4% (QuantiSNP). In total, 50 CNVs could be detected by all three programs (Additional file 4). The average size of these 50 CNVs detected by all three programs was 388,892 bp and ranged from 516 to 978,353 bp (median size 293,244 bp). The number of losses and gains was computed among those breeds that revealed a CNV in all three algorithms (see Additional file 4). On the whole, five CNVs showed higher copy numbers in some and lower copy numbers in other horses while further 28 CNVs only displayed losses and 17 CNVs only gains in these horses.
Comparisons between our detection results of the 50 CNVs detected in all three programs with recent CNV studies [4, 5] showed that these 50 CNV regions were highly similar to those CNV regions in previous studies especially for SNP array analyses (Table 2). Nevertheless, the study by Dupuis et al. revealed a considerably higher number of detected CNVs . We propose that this effect was a result of more stringent quality criteria and the combination of three detection algorithms used in our study.
CGH analyses showed less concordance with our detection events. It was proposed that the results of CGH and SNP arrays were generally not easily comparable . In particular, the CNV-study designs in horses were different, as CGH analyses only targeted exons, while the SNP arrays covered the whole genome. Furthermore, our analysis was performed on a considerably higher number of samples that might allow a more general view on the distribution of CNVs in the horse population. Comparisons between the CNV regions detected by CGH analyses and those identified by the three SNP array detection algorithms individually revealed a relatively high overlap with the detection results of QuantiSNP (25%), less consistency with the PennCNV results (7%) and an extremely low overlap with CNVPartition (3%, Table 3). Despite the relatively low overlap in total, PennCNV showed concordance up to 100% in some chromosomal regions. These comparisons confirm the assumption that the prediction accuracy of CNVPartition seems to be relatively poor due to the high rates of missed events . We suppose that this was the reason for the considerably high number of detected CNVs by individual programs in comparison with the number of overlapping detection events. A closer look at each algorithm showed that PennCNV and QuantiSNP provided a similar distribution of CNVs over the chromosomes and had a considerably higher number of detection events in comparison with CNVPartition. PennCNV and QuantiSNP are both hidden Markov model (HMM) based algorithms that use the log R ratio and B allele frequency independently (QuantiSNP) or in combination (PennCNV) . Comparative analyses of CNV detection methods for SNP arrays confirmed that PennCNV and QuantiSNP had a large overlap of detection events . A study of bladder cancer in human which used a study design similar to ours suggested PennCNV and QuantiSNP to be a more reliable method for the detection of CNVs than CNVPartition . QuantiSNP was even assumed to be the best of these three methods and also outperformed the other algorithms .
CNV sharing among breeds
After filtering out samples with low call rate and quality features we were able to detect CNVs in 717 horses. The total number of detected (identical and not identical) CNVs was 4013 for these horses of different breeds. Due to the choice of unrelated horses, we supposed this number of CNVs gives a good view on the distribution of CNVs in the horse population with the restriction that different numbers of samples were available per breed. Comparative analysis of all three algorithms revealed 536 CNVs that could be detected in all programs for the same breed. These CNVs were found in Arabian, Hanoverian, Holsteiner, Lusitano, Maremanno, Oldenburg, Thoroughbred, Westphalian (Table 4). With regard to the individuals we could confirm 21 CNVs that were detected in the same horse using all algorithms. Theses CNVs ranged in size from 516 to 862,853 bp and showed an average size of 368,720 bp. Further 29 CNVs showed an overlap among different breeds (Table 5).
Comparisons between the 50 CNVs derived from all three detection algorithms showed that the largest number of CNVs occurred in the Hanoverians, supposably due to the largest number of samples used in this study. On the whole 18 Hanoverian specific CNVs could be detected. Two of these showed gains and losses in different horses (Additional file 4). In general, only one CNV derived from the comparative analysis of three detection algorithms showed losses and gains among different breeds. This CNV was located on ECA1 at 155.49-155.55 Mb and showed losses in Hanoverian and Lusitano and gains in Holsteiner and Thoroughbred horses.
Furthermore a relatively high number of CNVs shared with the Hanoverian could also be found in Oldenburg, Westphalian and Holsteiner, presumably due to the close relationship among these breeds. A CNV overlap between Arabian, Lusitano and Maremanno horses could be explained by the strong influence of the Arabian and Lusitano bloodlines on the Maremanno breed. In addition, we investigated the distribution of CNVs in two Przewalski horses and detected on the whole 10 CNVs which could not be identified by all three detection algorithms but showed an overlap with different breeds (Additional file 5). We assume that these CNVs might be conserved for a long time and passed on during the domestication of the horse.
Evaluations of the individual horses showed that most horses shared their CNVs with at least another animal according to previous observations . Comparing the average size of CNVs detected in more than one animal (397,708 bp) with the size of CNVs not shared by a second horse (250,773 bp) confirmed the suggestion that CNV sharing among horses is correlated with CNV length as larger CNVs are more likely to be shared .
Analysis of genes within CNV regions
Analysis of genomic regions of 50 CNVs, derived from comparative analysis of CNV detection algorithms, revealed 153 different genes within 45 CNVs. In five CNV regions we were not able to find any functional gene. The major category of genes consisted of olfactory receptor (OR) genes (66.7%), a group of genes which is known to be significantly enriched in CNV regions in human, cattle, pig and rat [6, 7, 27, 28]. It was supposed that the overrepresentation of OR genes is not a result of positive selection but of the frequent appearance of these genes in segmentally duplicated regions . These regions are known to be more susceptible to CNVs while genomic regions with dosage-sensitive genes usually show a low number of CNVs [3, 27].
Functional analysis was performed using human orthologs of the horse genes due to the poor annotation of the horse genome. The results of the Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.7 showed an enrichment of 131 genes involved in sensory perception, signal transduction and cellular components (Additional file 6). The highest P-values could be observed for the classifications olfaction (P = 4.20x10-142), olfactory receptor (P = 2.40x10-134) and sensory perception of smell (P = 6.90x10-132). Further analyses with the PANTHER classification system revealed statistically over- and underrepresented biological processes and confirmed the overrepresentation of genes involved in sensory perception (P = 4.53x10-01) and signal transduction as well as response to stimuli (Table 6). Processes involved in cell cycle, transcription and metabolism were underrepresented.
Analysis of the molecular functions demonstrated significantly overrepresented processes concerning receptor activities and particularly underrepresented catalytic and hydrolase activities (Additional file 7). The PANTHER protein class analysis showed a comparatively reduced number of nucleic acid binding, transferase and hydrolase proteins. Our results suggest that CNV regions are not randomly distributed among the genes of the horse genome, but they harbour certain genes overrepresented in those regions while other genes involved in specific biological processes are underrepresented. Similar distributions were observed in cattle and in CGH array analyses of 16 horses, showing sensory perception and signal transduction to be the primary processes affected by CNVs [5, 7].
Furthermore, an enrichment of genes involved in metabolic processes was also observed in CGH analyses of horses . Our study revealed an impoverishment of CNVs in regions of genes responsible for a metabolic process. This term is defined by PANTHER as chemical reactions and pathways by which living organisms transform chemical substances. A closer look at our analysis of metabolic pathways shows that primary metabolic processes, protein metabolic processes, nucleobase, nucleoside, nucleotide and nucleic acid metabolic processes were underrepresented while the cyclic nucleotide metabolic process was overrepresented. We assume that metabolic process is a very general term and has to be interpreted in context of more specific ontology terms for metabolic processes. Furthermore, we suppose that the high number of horses used in our study in comparison with previous CGH analysis might give a more general view on the distribution of enriched and impoverished genes in CNV regions. Similar to our results, human CNV studies revealed a significant underrepresentation of genes involved in nucleic acid metabolism in deletion regions . Furthermore, enriched gene categories concerning sensory perceptions of smell, chemical stimuli, smell and taste as well as neurophysiological processes, brain development, immune responses and external biotic stimuli could be shown in human [3, 30, 31]. It was assumed that the underrepresentation of genes occurs due to the strong selective pressure for genes involved in important processes like transcriptional regulation and development .
Association analysis with body size
GWAS for body size was performed for each algorithm on basis of its CNV detection results and for the intersection of all three detection programs. A highly genome-wide significant peak on ECA1 could be shown for the data generated by PennCNV (P = 0.006) and QuantiSNP (P = 0.010). Analysis of the intersection of 50 CNVs revealed a P-value at the threshold of significance (P = 0.057) presumably influenced by CNVPartition which did not reveal any significant association. The associated CNVs were deletions in the region of 156 Mb (Table 7). They were located in the area of the candidate genes olfactory receptor 4, subfamily K5 (OR4K5), subfamily K2 (OR4K2), subfamily N2 (OR4N2) and subfamily M1 (OR4M1). GWAS for copy number variation in human revealed the syntenic region of 40.25-40.39 Mb to be significantly associated with stature . It was proposed that individuals with short stature show an excess of lower-frequency deletions . Our analysis revealed the heterozygous deletions on ECA1 to be associated with larger sized horses.
A second CNV region, with the highest association for body size (P = 0.0002), was located on ECA8 at 4.43-4.62 Mb. All these heterozygous deletions could be found in larger sized horses and were absent in smaller horse breeds. The peak region harboured the candidate genes immunoglobulin lambda variable 3–19 (IGLV3-19), variable 3–27 (IGLV3-27) and variable 2–11 (IGLV2-11) which all belong to the immunoglobulin lambda light chain variable gene cluster, which is important for immunoglobulin structure .
A third candidate region could be detected by QuantiSNP on ECA9 (P = 0.006) at 29.89 Mb. This CNV showed deletions for 37 larger sized warmblood horses. The neighbouring candidate genes opioid receptor, kappa 1 (OPRK1) and ATPase, H + transporting, lysosomal, 50/57-KD, V1 subunit H (ATP6V1H) have been shown to be associated with body conformation in pigs . Vacular-type H + ATPase proton pump is a complex located in the ruffled border plasma membrane of bone-resorbing osteoclasts and is important for bone resorption. Mutations or deletions in V-ATPase subunits encoding genes have been shown to decrease resorptive activity in bones [35–37]. Further candidate genes in the region of the associated CNV are the v-yes-1 yamaguchi sarcoma viral related oncogene homolog (LYN), the trimethylguanosine synthase, S. cervisae, homolog of (TGS1) and the pleiomorphic adenoma gene 1 (PLAG1). Genome scans for sequence variants in human revealed these genes to be in the region of SNPs with the strongest correlation for height in a meta-analysis . The candidate gene PLAG1 is known to be involved in developmental processes [39, 40]. Its influence on bovine stature could be shown by variants modulating the expression of a chromosome domain encompassing PLAG1. A targeted disruption of PLAG1 in mice caused early growth retardation which was maintained throughout adult life .
Our results showed that body size in horses is mainly associated with homozygous or heterozygous deleted CNVs on ECA1, ECA8 and ECA9. These regions do not correspond with the region of the potential main regulator LCORL or further associated SNPs or CNVs for body size in horses [5, 21–23]. We assume that the associated CNVs detected in our analysis might represent additional regulators in the complex process for the determination of height.
The association of the CNV regions on ECA1 and ECA8 could be confirmed by two detection algorithms, while the peak region on ECA9 was exclusively detected by QuantiSNP. The Illumina-based algorithm CNVPartition did not reveal any association for body size although the associated CNVs on ECA1 and ECA8 could be detected by all three programs. We assume that the generally low number of detected CNVs in only few horses were the reason for the missing association. For this reason we propose that CNVPartition did not allow an appropriate association analysis due to the high false negative rates. This correlates well with our analysis for the detection accuracy of each algorithm. The results demonstrate certain limitations that have to be considered for the use of multiple predictions. It was proposed that analyses of different detection algorithms have to be taken with care due to false negative events but are also useful to avoid false positives caused by noisy data and call attention to discrepancies in the data [11, 15]. PennCNV and QuantiSNP were shown to be more reliable in detecting CNVs than CNVPartition . Our analysis confirmed this suggestion and showed that not only the number of compared CNV detection events but also the choice of the program is important for an effective analysis.
For validation of detected CNVs we performed qPCR analysis in two CNVs on ECA1 and one CNV on ECA8 for twenty horses each (Additional file 8). We chose three CNV regions that were detected by all three programs of which two were associated with body size and one was also validated in horses of previous analysis . All three CNV regions could be confirmed by qPCR (Figure 3). The rate for the accurate copy number detection of at least one algorithm was 95% for Olfr1284 (ENSECAG00000006791) on ECA1 and 80% for IGLV3-32 (ENSECAG00000005113) on ECA8. For the CNV region on ECA1 at the candidate gene OR4K2 (ENSECAG00000006318) all horses could be validated. The results show that the analysed CNVs could be validated although few horses did not show the copy number detected by SNP chip analysis. Similar analyses in pigs revealed accurate discovery rates of 71% .
Considering the individual programs, the false negative discovery rate of CNVPartition was particularly high, while QuantiSNP showed the lowest false negative discovery rates in all three CNV regions (Additional file 9). We assume that these results underline our comparative analyses for the detection algorithms showing QuantiSNP to be the most reliable and accurate program.
In addition to this validation, we compared CNVs of previous CGH analyses that were validated by qPCR with our detection results. On the whole, four previously verified CNVs could be confirmed in our analysis. One of these CNVs was rare in our population and could only be detected by QuantiSNP and CNVPartition, further three CNVs could be found in a considerably higher number of horses by QuantiSNP and also by PennCNV in contrast to CNVPartition (Additional file 10). This comparison helps to verify our CNV detection results in these regions and confirms the assumption that there are significant differences in the detection abilities of the analysed algorithms. We conclude that a validation of detected CNVs is crucial especially for further analyses of associated regions, and helps to evaluate the accuracy of the detection results by specific programs.
The aim of our analysis was to investigate three different algorithms for the detection of CNVs in horses and to find associated CNVs for the regulation of body size. Comparative analysis of the detection programs in 717 horses revealed 50 CNVs with an average size of 388,892 bp. Functional analysis of genes located in these CNVs confirmed the high amount of OR genes (66.7%) and showed an overrepresentation of genes involved in sensory perception, signal transduction and cellular components while cell cycle, transcription and metabolic processes were underrepresented.
We conclude that in general the creation of an intersection of three CNV-programs is useful to increase the accuracy of CNV detection and to reduce the number of false positive results. Nevertheless, the comparison between these programs also provides a strong restriction and a higher number of false negative results which is highly dependent on the choice of the detection tool. We recommend taking advantage of the different algorithms used in detection programs and to perform multiple predictions in a first step. Such an analysis will show possible limitations and establish suitable algorithms for further evaluations. In our study the combined use of PennCNV and QuantiSNP was most effective for the accurate detection of CNVs. GWAS for the CNV detection results of these algorithms identified for body size three homozygous or heterozygous deleted CNV regions associated with larger sized horses on ECA1, ECA8 and ECA9. Two of these regions were analysed by qPCR and could be validated. We conclude that a comparative CNV analysis is a useful approach as it reveals the limitations of individual programs and helps to estimate the reliability of the detection results.
All animal work has been conducted according to the national and international guidelines for animal welfare. The sampling was approved by the Lower Saxony state veterinary office Niedersächsisches Landesamt für Verbraucherschutz und Lebensmittelsicherheit, Oldenburg, Germany (registration numbers 509c-42502-01A60, 02A-138 and 07A-482).
EDTA-blood samples of 17 different breeds were collected including 148 Arabian, one Anglo-Arabian, two Brandenburger, 514 Hanoverian, nine Holsteiner, 13 Oldenburg, five Trakehner, 35 Westphalian, one Selle Francais, one German Riding Pony, 47 Lusitano, 48 Maremanno, two Przewalski, 12 Rhinelander horse, two Rhenish-German Cold-Blood, 14 Thoroughbred and one Zweibrücker. The horses were chosen out of different breeding lines to provide unrelated horses as far as possible. For genotyping we isolated genomic DNA using standard methods with RBC (Red Blood Cell) lysis buffer and SE (sodium EDTA) buffer. The DNA concentration of the samples was adjusted to 50 ng/μl using the Nanodrop ND-1000 (Peqlab Biotechnology, Erlangen, Germany) and quality control was performed by gel electrophoresis using 1% agarose gels (peqGold Universal Agarose, Peqlab Biotechnologie, Erlangen, Germany). All 854 animals were genotyped using the Illumina equine SNP50 BeadChip (Illumina, San Diego, USA) including 54,602 SNPs.
To provide reliable results, quality control was performed in a first step by the choice of DNA samples with high qualities and the determination of a call rate >90% for animals and SNPs. Further criteria for the exclusion of noisy data were chosen individually for each algorithm. Primary visualisation and quality control of the CNV data was performed with the GenomeStudio software (Illumina). Data files containing SNP name, chromosome, position, B-allele frequency and log R ratio were exported as a unique file and separated using the split option in PennCNV package. Analysis of the autosomal regions for CNV was done in three different programs. PennCNV was run according to default criteria (Illumina)  using the command line detect_cnv.pl –hmm example.hmm –pfb horse909Tiere.pfb –minsnp 3 –lastchr 31 –test –tabout –coord –conf –listfile list.txt –out penncnv1.txt and probe counts less than three were omitted afterwards. Quality control was performed employing standard exclusions of the Log Ratio (SD (LRR)) >0.3 and the GC-content caused fluctuation of signal intensity (│GCWF│) >0.02. An individual-call mode was used for all samples. Then we used QuantiSNP 2.0, a program based on an Objective Bayes Hidden-Markov Model (HMM) , for CNV analysis. The detection was performed by the command: quantisnp2 –outdir horse –logfile 909Pferde –chr [1:31] –chrx 32 –beadstudio-files 909Pferde.txt. After CNV detection the minimum probe count of three, a threshold for the Log Bayes Factor of less than ten was set. The GenomeStudio plug-in cnvPartition 3.1.6 was run by default criteria including a confidence threshold of 35, a minimum homozygous region size of 1,000,000 a minimum probe count of 3 and GcWaveAdjustLRR.
GWAS for the CNV detection results of each individual programme and the intersection of PennCNV, QuantiSNP and cnvPartition was performed using PLINK version 1.07 (http://pngu.mgh.harvard.edu/purcell/plink/)  and SAS/Genetics, version 9.3 (2013) in a quantitative trait analysis. A mixed linear model (MLM) was employed to control data for stratification. Size ranges for every breed were estimated and results were averaged as it was shown in our previous analysis for body size (Table 4) . The chromosomal enrichment was also accounted with SAS/Genetics for each algorithm similar to the enrichment analysis in the CGH study  by merging overlapping CNVs on basis of the SNP map of 48860 SNPs to CNV regions. The number of SNPs covered by CNV regions was divided by the length of the chromosome. The whole-genome coverage of CNV regions was used as standard after dividing by the total length of the autosomes. Significant chromosomal enrichment occurred when the enrichment of the genome was lower than the enrichment of the chromosome.
Genes located in the CNV regions of the intersection of all three programs were identified using NCBI MapViewer http://ftp.ncbi.nih.gov/genomes/MapView/ Equus_caballus. Due to the insufficient annotation of the horse genome, we determined the human orthologs for these genes by NCBI gene http://www.ncbi.nlm.nih.gov/gene and HomoloGene http://www.ncbi.nlm.nih.gov/homologene to perform ontology analysis. The Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.7 (http://david.abcc.ncifcrf.gov/home.jsp) was used for functional annotation of genes in CNV regions [42, 43]. The functional annotation chart was run with a count threshold of 2, an EASE Score threshold of 0.1 and the Bonferoni correction. Results were grouped in functional groups. Further ontology analysis was performed using the PANTHER (Protein ANalysis THrough Evolutionary Relationships, version 8.0) classification system http://www.pantherdb.org/ for the classification of genes by their molecular function, biological process and cellular component with default Bonferoni correction .
Validation of three different genomic regions harbouring CNVs was performed by quantitative real-time (qRT)-PCR on the ABI7300 sequence detection system (Applied Biosystems by Life Technologies, Darmstadt, Germany). On the whole eighteen warmblood horses and two Arabians were analysed. DNA was isolated from blood samples in the same way as it was done for SNP chip analysis. The DNA concentration of the samples was adjusted to 10 ng/μl using the Nanodrop ND-1000 (Peqlab Biotechnology).
Primer pairs and probes were designed using Primer express 3.0 (Applied Biosystems) and Primer3 (http://frodo.wi.mit.edu/primer3/). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) on ECA6 was used as reference gene according to previous CNV analyses in horses . Reactions were assembled in a final volume of 12 μl containing 2.0 μl gDNA, 1.0 μl reverse and 1.0 μl forward primes (10 pmol; Additional file 6), 0.25 μl VIC-labeled TaqMan probe for CNV regions (6 nmol; 100 μM) and FAM-labeled TaqMan probe for GAPDH, 1.68 μl nuclease free water and 6 μl Maxima Probe qPCR master mix 2x supplemented by 0.07 μl ROX Solution (50 μM; Fermentas Life Sciences, St. Leon-Rot, Germany). Quantitative real-time (qRT)-PCR was performed for 10 minutes (min) at 95°C followed by 40 cycles at 95°C for 15 seconds and 60°C for 1 min. Analysis was done for every sample two times and the average value was used for further calculations. One sample without copy number variations in the analysed regions was used as reference. For the relative quantification of CNVs we used the 2-ΔΔCt method . The ΔCT values of the test samples were subtracted from the ΔCT of the reference sample and put in the formula 2-ΔΔCt.
Comparative genomic hybridization
Copy number variant
Hidden Markov model
Olfactory receptor 4, subfamily K5
Olfactory receptor 4,subfamily K2
Olfactory receptor 4, subfamily N2
Olfactory receptor 4 ,subfamily M1
Immunoglobulin lambda variable 3–19
Immunoglobulin lambda variable 3–27
Immunoglobulin lambda variable 2–11
v-yes-1 yamaguchi sarcoma viral related oncogene homolog
Polymerase chain reaction
Pleomorphic adenoma gene 1
The trimethylguanosine synthase, S. cervisae, homolog of.
Clop A, Vidal O, Amills M: Copy number variation in the genomes of domestic animals. Anim Genet. 2012, 43 (5): 503-517. 10.1111/j.1365-2052.2012.02317.x.
Beckmann JS, Estivill X, Antonarakis SE: Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability. Nat Rev Genet. 2007, 8 (8): 639-646. 10.1038/nrg2149.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al: Global variation in copy number in the human genome. Nature. 2006, 444 (7118): 444-454. 10.1038/nature05329.
Dupuis MC, Zhang Z, Durkin K, Charlier C, Lekeux P, Georges M: Detection of copy number variants in the horse genome and examination of their association with recurrent laryngeal neuropathy. Anim Genet. 2012, 44 (2): 206-208.
Doan R, Cohen N, Harrington J, Veazy K, Juras R, Cothran G, McCue ME, Skow L, Dindot SV: Identification of copy number variants in horses. Genome Res. 2012, 22 (5): 899-907. 10.1101/gr.128991.111.
Ramayo-Caldas Y, Castello A, Pena RN, Alves E, Mercade A, Souza CA, Fernandez AI, Perez-Enciso M, Folch JM: Copy number variation in the porcine genome inferred from a 60 k SNP BeadChip. BMC Genomics. 2010, 11: 593-10.1186/1471-2164-11-593.
Seroussi E, Glick G, Shirak A, Yakobson E, Weller JI, Ezra E, Zeron Y: Analysis of copy loss and gain variations in Holstein cattle autosomes using BeadChip SNPs. BMC Genomics. 2010, 11: 673-10.1186/1471-2164-11-673.
Doan R, Cohen ND, Sawyer J, Ghaffari N, Johnson CD, Dindot SV: Whole-genome sequencing and genetic variant analysis of a quarter horse mare. BMC Genomics. 2012, 13: 78-10.1186/1471-2164-13-78.
Pinto D, Darvishi K, Shi X, Rajan D, Rigler D, Fitzgerald T, Lionel AC, Thiruvahindrapuram B, Macdonald JR, Mills R, et al: Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol. 2011, 29 (6): 512-520. 10.1038/nbt.1852.
Alkan C, Coe BP, Eichler EE: Genome structural variation discovery and genotyping. Nat Rev Genet. 2011, 12 (5): 363-376. 10.1038/nrg2958.
Winchester L, Yau C, Ragoussis J: Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic. 2009, 8 (5): 353-366. 10.1093/bfgp/elp017.
Li Y, Mei S, Zhang X, Peng X, Liu G, Tao H, Wu H, Jiang S, Xiong Y, Li F: Identification of genome-wide copy number variations among diverse pig breeds by array CGH. BMC Genomics. 2012, 13: 725-10.1186/1471-2164-13-725.
Lai WR, Johnson MD, Kucherlapati R, Park PJ: Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005, 21 (19): 3763-3770. 10.1093/bioinformatics/bti611.
Tsuang DW, Millard SP, Ely B, Chi P, Wang K, Raskind WH, Kim S, Brkanac Z, Yu CE: The effect of algorithms on copy number variant detection. PLoS One. 2010, 5 (12): e14456-10.1371/journal.pone.0014456.
Zhang D, Qian Y, Akula N, Alliey-Rodriguez N, Tang J, Gershon ES, Liu C: Accuracy of CNV detection from GWAS data. PLoS One. 2011, 6 (1): e14511-10.1371/journal.pone.0014511.
Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M, et al: Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinforma. 2007, 8: 368-10.1186/1471-2105-8-368.
Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M: PennCNV: an integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007, 17 (11): 1665-1674. 10.1101/gr.6861907.
Glazier AM, Nadeau JH, Aitman TJ: Finding genes that underlie complex traits. Science. 2002, 298 (5602): 2345-2349. 10.1126/science.1076641.
Brooks SA, Makvandi-Nejad S, Chu E, Allen JJ, Streeter C, Gu E, McCleery B, Murphy BA, Bellone R, Sutter NB: Morphological variation in the horse: defining complex traits of body size and shape. Anim Genet. 2010, 41 (Suppl 2): 159-165.
Dauber A, Yu Y, Turchin MC, Chiang CW, Meng YA, Demerath EW, Patel SR, Rich SS, Rotter JI, Schreiner PJ, et al: Genome-wide association of copy-number variation reveals an association between short stature and the presence of low-frequency genomic deletions. Am J Hum Genet. 2011, 89 (6): 751-759. 10.1016/j.ajhg.2011.10.014.
Signer-Hasler H, Flury C, Haase B, Burger D, Simianer H, Leeb T, Rieder S: A genome-wide association study reveals loci influencing height and other conformation traits in horses. PLoS One. 2012, 7 (5): e37282-10.1371/journal.pone.0037282.
Makvandi-Nejad S, Hoffman GE, Allen JJ, Chu E, Gu E, Chandler AM, Loredo AI, Bellone RR, Mezey JG, Brooks SA, et al: Four loci explain 83% of size variation in the horse. PLoS One. 2012, 7 (7): e39929-10.1371/journal.pone.0039929.
Metzger J, Schrimpf R, Philipp U, Distl O: Expression levels of LCORL are associated with body size in horses. PLoS One. 2013, 8 (2): e56497-10.1371/journal.pone.0056497.
Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J: QuantiSNP: an objective bayes hidden-markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 2007, 35 (6): 2013-2025. 10.1093/nar/gkm076.
Dellinger AE, Saw SM, Goh LK, Seielstad M, Young TL, Li YJ: Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 2010, 38 (9): e105-10.1093/nar/gkq040.
Marenne G, Rodriguez-Santiago B, Closas MG, Perez-Jurado L, Rothman N, Rico D, Pita G, Pisano DG, Kogevinas M, Silverman DT, et al: Assessment of copy number variation using the illumina infinium 1M SNP-array: a comparison of methodological approaches in the spanish bladder cancer/EPICURO study. Hum Mutat. 2011, 32 (2): 240-248. 10.1002/humu.21398.
Young JM, Endicott RM, Parghi SS, Walker M, Kidd JM, Trask BJ: Extensive copy-number variation of the human olfactory receptor gene family. Am J Hum Genet. 2008, 83 (2): 228-242. 10.1016/j.ajhg.2008.07.005.
Guryev V, Saar K, Adamovic T, Verheul M, Van Heesch SA, Cook S, Pravenec M, Aitman T, Jacob H, Shull JD, et al: Distribution and functional impact of DNA copy number variation in the rat. Nat Genet. 2008, 40 (5): 538-545. 10.1038/ng.141.
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 2006, 38 (1): 75-81. 10.1038/ng1697.
Wong KK, DeLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE, et al: A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet. 2007, 80 (1): 91-104. 10.1086/510560.
Feuk L, Carson AR, Scherer SW: Structural variation in the human genome. Nat Rev Genet. 2006, 7 (2): 85-97.
De Smith AJ, Walters RG, Froguel P, Blakemore AI: Human genes involved in copy number variation: mechanisms of origin, functional effects and implications for disease. Cytogenet Genome Res. 2008, 123 (1–4): 17-26.
Frippiat JP, Williams SC, Tomlinson IM, Cook GP, Cherif D, Le Paslier D, Collins JE, Dunham I, Winter G, Lefranc MP: Organization of the human immunoglobulin lambda light-chain locus on chromosome 22q11.2. Hum Mol Genet. 1995, 4 (6): 983-991. 10.1093/hmg/4.6.983.
Fan B, Onteru SK, Du ZQ, Garrick DJ, Stalder KJ, Rothschild MF: Genome-wide association study identifies loci for body composition and structural soundness traits in pigs. PLoS One. 2011, 6 (2): e14726-10.1371/journal.pone.0014726.
Qin A, Cheng TS, Pavlos NJ, Lin Z, Dai KR, Zheng MH: V-ATPases in osteoclasts: structure, function and potential inhibitors of bone resorption. Int J Biochem Cell Biol. 2012, 44 (9): 1422-1435. 10.1016/j.biocel.2012.05.014.
Kornak U, Kasper D, Bosl MR, Kaiser E, Schweizer M, Schulz A, Friedrich W, Delling G, Jentsch TJ: Loss of the ClC-7 chloride channel leads to osteopetrosis in mice and man. Cell. 2001, 104 (2): 205-215. 10.1016/S0092-8674(01)00206-9.
Frattini A, Orchard PJ, Sobacchi C, Giliani S, Abinun M, Mattsson JP, Keeling DJ, Andersson AK, Wallbrandt P, Zecca L, et al: Defects in TCIRG1 subunit of the vacuolar proton pump are responsible for a subset of human autosomal recessive osteopetrosis. Nat Genet. 2000, 25 (3): 343-346. 10.1038/77131.
Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV, Zusmanovich P, Sulem P, Thorlacius S, Gylfason A, Steinberg S, et al: Many sequence variants affecting diversity of adult human height. Nat Genet. 2008, 40 (5): 609-615. 10.1038/ng.122.
Karim L, Takeda H, Lin L, Druet T, Arias JA, Baurain D, Cambisano N, Davis SR, Farnir F, Grisart B, et al: Variants modulating the expression of a chromosome domain encompassing PLAG1 influence bovine stature. Nat Genet. 2011, 43 (5): 405-413. 10.1038/ng.814.
Hensen K, Braem C, Declercq J, Van Dyck F, Dewerchin M, Fiette L, Denef C, Van de Ven WJ: Targeted disruption of the murine Plag1 proto-oncogene causes growth retardation and reduced fertility. Dev Growth Differ. 2004, 46 (5): 459-470. 10.1111/j.1440-169x.2004.00762.x.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ, et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81 (3): 559-575. 10.1086/519795.
Da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4 (1): 44-57.
Da Huang W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37 (1): 1-13. 10.1093/nar/gkn923.
Mi H, Thomas P: PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol Biol. 2009, 563: 123-140. 10.1007/978-1-60761-175-2_7.
Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(−delta delta C(T)) method. Methods. 2001, 25 (4): 402-408. 10.1006/meth.2001.1262.
We thank the Hanoverian state stud Celle, Westphalian state stud Warendorf, Hanoverian Breeding Association, the Arabian horse Society (Verband der Züchter und Freunde des Arabischen Pferdes e.V.) and horse owners for the provision of data and samples. We also thank J. Wrede for the help with data analysis. This study has been funded by the German Research Council (DFG), Bonn, Germany (DFG (Di333/ 11–2, Di333/12-2) and the Mehl-Mülhens Stiftung, Köln, Germany.
The authors declare that they have no competing interests.
JM, OD designed the study, carried out the experiments and data analysis, drafted and finalized the manuscript. OD, MSL, ACM, MF and MS provided samples and data for the study and helped to revise the manuscript. UP took part in the design of qPCR analysis. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: CNVs detected by CNVPartition. The table summarises start and end positions of detected CNVs, their size, copy number, number of samples and genes located in CNV regions. Text in PDF format. (PDF 58 KB)
Additional file 2: CNVs detected by PennCNV. The table summarises start and end positions of detected CNVs, their size, copy number, number of samples and genes located in CNV regions. Text in PDF format. (PDF 688 KB)
Additional file 3: CNVs detected by QuantiSNP. The table summarises start and end positions of detected CNVs, their size, copy number, number of samples and genes located in CNV regions. Text in PDF format. (PDF 939 KB)
Additional file 4: Comparative analysis of three CNV detection algorithms. The table shows 50 CNVs derived from three detection algorithms, their position, size, copy number per sample and breed (HAN: Hanoverian; LUS: Lusitano; MAR: Maremanno; OLD: Oldenburg; RDK: Rhenish-German Cold-Blood; WES: Westphalian; HOL: Holsteiner; TB-H: Thoroughbred; AV: Arabian; RHD: Rhinelander horse; PRZ: Przewalski; BRAN: Brandenburger; TRAK: Trakehner; RPON: German Riding Pony). Text in PDF format. (PDF 174 KB)
Additional file 5: Display of CNVs detected in two Przewalski horses by comparative analysis of three algorithms. The overlap of CNVs with different breeds is shown. (TIFF 401 KB)
Additional file 6: Functional annotation analysis of enriched genes derived from 45 CNV regions using DAVID 6.7. In five CNV regions we could not detect any genes for evaluation. Text in DOC format. (DOCX 18 KB)
Additional file 7: Gene ontology analysis of significantly over- and underrepresented genes involved in molecular functions and protein classes. The software PANTHER was used for the evaluation of 45 CNV regions detected by all three algorithms. (DOCX 17 KB)
Additional file 8: Genomic regions analysed for copy number variants (CNVs) by real-time quantitative RT-qPCR. Primer sequences, their position, product size, annealing temperature and TaqMan probes are shown. Text in DOC format. (DOCX 19 KB)
Additional file 9: Copy number detection accuracy of three SNP array algorithms. The deleted (0, 1) or duplicated (3) copy numbers validated by qPCR are compared with the detection results of CNVPartition, PennCNV and QuantiSNP. Reference samples with the copy number 2 are not displayed as they were selected specifically for two copies in all three programs. (DOCX 18 KB)
Additional file 10: et al. ) with CNVs detected by CNVPartition, PennCNV and QuantiSNP. The number of samples detected by QuantiSNP is considerably high. (DOCX 14 KB)
About this article
Cite this article
Metzger, J., Philipp, U., Lopes, M.S. et al. Analysis of copy number variants by three detection algorithms and their association with body size in horses. BMC Genomics 14, 487 (2013). https://doi.org/10.1186/1471-2164-14-487
- Body size
- Copy number variation