Genome-wide detection of CNV regions and their potential association with growth and fatness traits in Duroc pigs

Background In the process of pig breeding, the average daily gain (ADG), days to 100 kg (AGE), and backfat thickness (BFT) are directly related to growth rate and fatness. However, the genetic mechanisms involved are not well understood. Copy number variation (CNV), an important source of genetic diversity, can affect a variety of complex traits and diseases and has gradually been thrust into the limelight. In this study, we reported the genome-wide CNVs of Duroc pigs using SNP genotyping data from 6627 animals. We also performed a copy number variation region (CNVR)-based genome-wide association studies (GWAS) for growth and fatness traits in two Duroc populations. Results Our study identified 953 nonredundant CNVRs in U.S. and Canadian Duroc pigs, covering 246.89 Mb (~ 10.90%) of the pig autosomal genome. Of these, 802 CNVRs were in U.S. Duroc pigs with 499 CNVRs were in Canadian Duroc pigs, indicating 348 CNVRs were shared by the two populations. Experimentally, 77.8% of nine randomly selected CNVRs were validated through quantitative PCR (qPCR). We also identified 35 CNVRs with significant association with growth and fatness traits using CNVR-based GWAS. Ten of these CNVRs were associated with both ADG and AGE traits in U.S. Duroc pigs. Notably, four CNVRs showed significant associations with ADG, AGE, and BFT, indicating that these CNVRs may play a pleiotropic role in regulating pig growth and fat deposition. In Canadian Duroc pigs, nine CNVRs were significantly associated with both ADG and AGE traits. Further bioinformatic analysis identified a subset of potential candidate genes, including PDGFA, GPER1, PNPLA2 and BSCL2. Conclusions The present study provides a necessary supplement to the CNV map of the Duroc genome through large-scale population genotyping. In addition, the CNVR-based GWAS results provide a meaningful way to elucidate the genetic mechanisms underlying complex traits. The identified CNVRs can be used as molecular markers for genetic improvement in the molecular-guided breeding of modern commercial pigs. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07654-7.


Background
Genetic variation occurs in many forms, including single nucleotide polymorphisms (SNPs), insertions/deletions (INDELs) of small genetic fragments, and copy number variations (CNVs), in human and animal genomes. CNVs are a particular subtype of genomic structural variation that range from approximately 50 bp to several Mb and are mainly represented by deletions and duplications [1][2][3][4]. Adjacent copy number variation areas with overlapping regions can be combined into a large genome segment, known as the copy number variation region (CNVR) [5]. In terms of the total bases involved, CNVs encompass more nucleotide sequences and arise more frequently than SNPs [6]. Therefore, they have higher mutation probability and more significant potential impacts [7], such as changing gene structure and altering gene dosage and thus dramatically affect gene expression and adaptive phenotypes [8]. Additionally, some CNVs are associated with several complex diseases [9][10][11]. These observations led us to predict that CNVs are a primary contributor to phenotypic variation and disease susceptibility.
Indeed, multiple studies have suggested that CNVs play an essential role in affecting some complex traits and causing disease. In humans, Aitman et al. [12] demonstrated that copy number polymorphism in the Fcgr3 gene is a determinant of susceptibility to immunologically mediated renal disease; additionally, a recent study identified that copy number variation in NPY4R might be related to the pathogenesis of obesity [13]. Similarly, phenotypic variations and diseases caused by CNVs are also widespread in domesticated animals. For example, in pigs, the focus of this study, an increase in copy number (CN) of the KIT gene is associated the dominant white phenotype [14,15]. With regard to reproductive performance, CNV in the MTHFSD gene was reportedly correlated with litter size in Xiang pigs [16]. Zheng et al. [17] also showed that a higher CN of the AHR gene had a positive effect on litter size. With regard to productive performance, Revilla et al. [18] discovered a CNVR containing the GPAT2 gene, which might be associated with several growth-related traits. Thus, analyzing CNVs and identifying their potential association with complex traits has gradually become an essential part of genetic studies.
Growth rate and fatness are vital objectives in the process of pig breeding, and are directly associated with economic advantages. The growth rate measured at different stages mainly include average daily gain (ADG) during the test period as well as with age (AGE), which was defined as estimated age at a certain weight [19]. Fat deposition is also a critical biological process that is generally measured as the backfat thickness (BFT). Until now, considerable association analysis has focused on identifying single-site variants, quantitative trait loci (QTLs), and related candidate functional genes that might influence growth and fatness traits [20][21][22]. However, systematic association studies of complex quantitative traits based on CNVs have rarely been conducted [18,23], and the full relevance of CNVs to the genetic basis of these traits is yet to be clarified. In addition, the genetic architecture of these traits is complex and usually controlled by multiple genes [19]. The majority of association studies for growth and fatness traits in pigs have used only a small number of genotyped animals, which has limited the statistical power of the association analysis [24]. It is therefore necessary to conduct CNV association analysis in a population with a sufficiently large sample size.
In this study, we performed genome-wide CNV detection in a large population of Duroc pigs of U.S. and Canadian origin. Moreover, CNVR-based genome-wide association studies (GWAS) of growth and fatness traits were applied to the two experimental populations. We identified CNVR and candidate genes that can provide additional information on the molecular mechanisms underlying important economic traits and promote the rapid development of molecular breeding approaches in pigs.

Detection of genome-wide CNVs in two pig populations
We detected CNVs in 18 autosomes in Duroc pigs of Canadian and U.S. origin using PennCNV software v1.0.5 [25]. A total of 33,347 CNVs (5403 losses and 27, 944 gains) were identified in 5928 pigs. Among these, 19,987 CNVs were from 3271 Duroc pigs of U.S. origin, and 13,360 CNVs were from 2657 Duroc pigs of Canadian origin. These CNVs were merged to identify CNVRs (see Additional file 1: Table S1). A total of 953 CNVRs were identified in the two populations with 388 gains, 376 losses, and 189 mixed variations (gains and losses occurring in the same region). Table 1 and the CNVR map ( Fig. 1) summarize the distribution of total CNVRs on different autosomes. CNVRs in chromosome 4 (SSC4) had the highest coverage (20.64%) while those in SSC1 had the lowest (6.43%). The number of CNVRs varied from 20 (SSC18) to 82 (SSC1), and the total size of CNVRs detected in this study was 246.89 Mb, accounting for~10.90% of the pig autosomal genome.
By matching the CNVs in each population to the corresponding CNVRs, we identified 802 CNVRs in the U.S. Duroc pigs, 499 CNVRs in the Canadian Duroc pigs, with 348 CNVRs that were shared by both populations (see Additional file 2: Table S2). CNVs in U.S. Duroc pigs ranged in size from 10.4 kb to 2.6 Mb, averaging 183.6 kb (Fig. 2a), while CNVR size ranged from 10.4 kb to 2.7 Mb (Fig. 2b). In Canadian Duroc pigs, CNV size ranged from 10.4 kb to 2.1 Mb, with an average of 165.2 kb (Fig. 2c), while CNVR size ranged from 10.4 kb to 2.7 Mb (Fig. 2d). In summary, most CNVs and CNVRs in both populations were 50-500 kb in size, with the CNVRs covering~9.56 and 7.44% of the porcine genome (Sus scrofa 11.1) in U.S. and Canadian Duroc pigs, respectively. Notably, CNV duplications were more likely to occur in both populations. In addition, we found that among the top 20 largest CNVRs, 19 were mixed types. More intriguingly, 15 of them (75%) were resided in telomeric regions (Fig. 1), indicating that CNVs occur more frequently towards telomeres, which are hot spots for the recombination and duplication of large fragments [26].

Comparison of CNVRs detected in previous swine studies
We compared the CNVRs identified in this study with those in nine previous swine studies based on Scrofa11.1 assembly (see Additional file 3: Table S3). For CNVRs based on the early porcine assembly 10.2, we converted the data to Scrofa11.1 assembly using the UCSC Lift-Over tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). The results show varying levels of overlapping CNVRs in the studies (Table 2), due to differences in breed, platform, algorithm, and CNV definition, which significantly impact the results [33]. We used a much looser definition of overlap, where two CNVRs were considered to overlap as long as they shared at least one nucleotide base [34]. The most considerable overlap in CNVRs identified between this study and previous studies was observed with results obtained from next-generation sequencing platforms (see Additional file 3: Table S3). The percentages of overlapped CNVRs were 21.72 and 21.82%, respectively [17,34].

Validation of identified CNVRs using qPCR
To confirm the reliability of the identified CNVRs, we randomly selected nine CNVRs (CNVR 149, 359, 374,  Table S4.

CNVR frequency in two Duroc pig populations
We also calculated the frequencies of the CNVRs in the U.S. (Fig. 4a) and Canadian (Fig. 4b) Duroc pig populations. The frequency of CNVR in U.S. Duroc pigs varied from 0.030% (detected in one pig) to 40.6% (1327 of 3271 pigs). In the Canadian Duroc pigs, CNVR frequencies ranged from 0.038% (detected in one pig) to 52.2% (1386 of 2657 pigs). Moreover, the frequency of CNVRs was concentrated at 0.03-0.3%, indicating most CNVRs are rare, only exist in a few animals and are challenging to measure reliably [35]. For this reason, CNVR-based GWAS were performed using CNVRs with frequencies exceeding 0.5% [32].

Phenotypic and CNVR-based GWAS statistics
To further characterize the functions of CNVRs in pigs, GWAS were performed for three quantitative traits. The statistical summaries of ADG, AGE, and BFT in the two populations are listed in Table 3. All phenotypic data approximately followed a normal distribution.
Since most CNVRs have a low frequency that is challenging to measure reliably, we used CNVRs with frequencies higher than 0.5% in each population for further analysis, to improve the reliability of the GWAS results [32]. A total of 139 CNVRs from 3303 U.S. Duroc pigs and 92 CNVRs from 2677 Canadian Duroc pigs were selected for association analysis. The Manhattan plots and significant CNVRs obtained from separate association analyses in these two populations are shown in Figs. 5 and 6, Tables 4 and 5.
GWAS in two populations identified five CNVRs as the most significantly associated with growth and fatness traits. Additional file 5: Table S5 were summarized to reflect the phenotypic effect of the CNVRs more Based on the data from all pigs, we further investigated the function of genes encompassing these significant CNVRs. Several common significant CNVRs that are associated with both ADG and AGE traits were found to overlap with numerous genes, and nine of these were   identified as major functional candidates, including PNPLA2, SDK1, PFKL, and BSCL2. For BFT, we identified seven candidate genes, including GPER1, PDGFA, and GRTP1.

Functional analysis of genes associated with trait-related CNVRs
A total of 606 genes overlapping with 31 significant CNVRs were detected based on the Ensembl [36] annotation of the Sus scrofa 11.1 genome (see Additional file 6: Table S6). These include 447 protein-coding genes and 110 lncRNA genes, as well as some miRNAs, small nucleolar genes (snoRNA), and processed pseudogenes. To further investigate the functional genes affecting growth performance and fatness, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and gene ontology (GO) analyses of protein-coding genes were carried out using the KOBAS software (version 3.0) [37]. Gene set enrichment analysis revealed many terms relevant to growth and fatness traits (see Additional file 7: Table S7, the accession numbers were obtained from Ensembl database [36]). In brief, KEGG analysis revealed that these genes mainly participate in glycosaminoglycan degradation, oxytocin signaling, and the cholinergic synapse pathway. Furthermore, GO analysis was primarily enriched in positive regulation of protein kinase B signaling, MAP kinase activity, carbohydrate metabolic process, and other important biological processes. Using information Fig. 6 Manhattan plots of CNVR-based GWAS in the Canadian Duroc pig population. Manhattan plots consisted of average daily gain at 100 kg (a) and days to 100 kg (b), and backfat thickness at 100 kg (c). The x-axis represents the chromosomes, and the y-axis represents the -log10(Pvalue). The solid and dashed lines indicate the 5% genome-wide (5.43E-04) and suggestive (1.09E-02) Bonferroni-corrected thresholds, respectively from the GeneCards database and relevant literature, we further identified several genes involved in critical pathways and biological processes (Tables 4 and 5). Here, we highlight four genes of interest that overlapped with significant CNVRs and were enriched in gene set enrichment analysis (P < 0.05): plateletderived growth factor subunit A (PDGFA), G proteincoupled estrogen receptor 1 (GPER1), patatin-like phospholipase domain containing 2 (PNPLA2) and Bernardinelli-Seip Congenital Lipodystrophy Type 2 Protein (BSCL2).

Discussion
Over the past decade, GWAS have made remarkable contributions to the discovery of common SNPs that influence complex traits [38]. However, most variants explain only a small proportion of heritability, a phenomenon known as "missing heritability" [39]. To this end, CNVs, as an important source of genetic diversity, may provide a new way for explaining the genetic variability that GWAS cannot detect [40].
In this study, we successfully identified 19,987 and 13, 360 CNVs in U.S. and Canadian Duroc pigs, respectively, using rigorous criteria to reduce false-positive rates. All CNVs were merged to generate 953 CNVRs in the two populations, accounting for~10.90% of the pig autosomal genome (Sus scrofa 11.1). The results showed that the size and frequency of duplications were much higher than those of deletions in the large fragment (> 10 kb) CNVs (27,944 gains vs. 5403 losses). Previous CNV studies reported similar cases. For example, a CNV study conducted by Long et al. [41] using Porcine SNP60 BeadChip, identified approximately 70.6% duplications and 29.4% deletions. Using Next-generation sequencing data, Zheng et al. [17] also reported that the frequency of duplications was higher than that of deletions in the Duroc and Meishan pigs. This phenomenon suggests that although CNVs can cause duplications or deletions at the same locus in different populations [42], the genome is more tolerant to duplications than it is to deletions [43], and these duplications are more likely to occur in large CNVs (> 10 kb) [5,44]. In addition, based upon SNP chip design principles, it can be inferred that if there are more than 2 copies (duplications) in a diploid organism, then the likelihood of identifying a high frequency SNP and the chance of detecting variation may be greater than if there are only 0, 1 or 2 copies [45].
To evaluate the accuracy of the PennCNV software in identifying CNVs, we performed qPCR validation for nine randomly selected CNVRs and successfully confirmed seven of these (~77.8%). This percentage is similar to that reported by Wang et al. [28] (75%), Dong et al. [46] (70%), and Wang et al. [33] (80%). We also observed that two "failed" CNVRs were inconsistent with our expectations. Multiple factors may have contributed to the discordance in the results. For example, the sparse probes on the SNP chip may cause the estimated size of CNVRs to be larger than their actual length. Consequently, the primers may have been designed outside the exact boundaries of the CNVRs [46]. Additionally, these results indicate that a high proportion of singleton CNVs exists in the population [47].
We also compared our results with those of previous studies on CNVRs and found a low overlap rate [17,[27][28][29][30][31][32][33][34]. In brief, a total of 465 CNVRs entirely or partially overlapped with previously reported CNVRs. A considerable overlap rate was observed with the results reported by Zheng et al. [17], whereas those reported by Xie et al. [31] gave the lowest overlap. These discrepant observations may be due to differences in the breeds studied. In this study, the large number of samples used for CNV detection led us to identify more novel CNVRs than previous studies. It also suggests that a vast number of CNVs in the pig genome have not been discovered [48]. In addition, most of the previous studies were based on the Sscrofa10.2 genome version, whereas the comparative work in our study was based on version Sscrofa11.1. Thus, based on the vast differences between these two genome versions [49], many CNVRs in Sscrofa10.2 could not be successfully converted to Scrofa11.1 ( Table 2). Differences in SNP density after quality control, as well as different CNV detection platforms, algorithms, and criteria for CNV determination could also explain this outcome [33]. Intriguingly, even within the same breed, different genetic backgrounds may have significant effects on reproducibility. In our previous GWAS, principal component analysis (PCA) and linkage disequilibrium (LD) decay analysis suggested that the U.S. Duroc population had a genetic background that differed from that of the Canadian Duroc population [50,51]. As shown by our results, only 348 of 953 CNVRs were detected in both populations. In addition, population size might also affect CNV detection. In our study, the number of U.S. Duroc pigs used for CNV detection was 1.3 times higher than that of Canadian Duroc pigs (3770 vs. 2857), which may have led to differences in the final numbers of CNVs (19,987 vs. 13,360) and CNVRs (802 vs. 499).
Although CNVs are widespread in pigs and are associated with economically important traits, the full relevance of CNVs to the genetic architecture of growth rate and fatness across all stages is yet to be elucidated. To further investigate the relationship between CNVs and complex traits (ADG, AGE, and BFT), we performed CNVR-based GWAS on these two pig populations. We identified 16 significant CNVRs that were associated with ADG or AGE in U.S. Duroc pigs, including 10 CNVRs that were significant for both traits. A similar pattern was observed in the Canadian Duroc pigs. For instance, we detected nine CNVRs that affect both traits. The computational formula of the adjusted ADG was inversely proportional to that of the adjusted AGE in this study, and both traits also had a relatively high genetic association [19]. This may explain why most CNVRs were significant for both traits. However, the results of GWAS between U.S. and Canadian Duroc pigs differed substantially, and we found no shared CNVRs when we analyzed ADG and AGE in the two populations. Moreover, we detected 14 BFT-related CNVRs in U.S. pigs, but only one was identified in the Canadian population. This finding highlights the complex genetic architecture of growth and fatness traits. Although Duroc is considered a single breed, substantial genetic differences exist between subpopulations [52], as shown for the U.S. and Canadian Duroc pigs in this study. These results are consistent with those of Zhou et al. [50] and Zhuang et al. [51]. It is presumed that, due to differences in natural and human selective pressures, genetic drift and the exchange of genetic material has resulted in less consistency in CNVRs between the two populations [53]. Therefore, genetic differentiation between the two populations may have a substantial impact on the genome localization of genetic variants [51]. More notably, four CNVRs-CNVR 152 (SSC3: 2.8-3.5 Mb), CNVR 315 (SSC5: 52.3-52.7 Mb), CNVR 514 (SSC9: 0.6-1.3 Mb), and CNVR 732 (SSC13: 206.6-208.2 Mb)-were associated with growth and fatness traits in U.S. Duroc population. These results suggest that these CNVRs may play a pleiotropic role in regulating pig growth and fat deposition [18,20].
To better understand the molecular function of the genes involved in significant CNVR, we examined their GO and KEGG classification. Many of these genes participated in carbohydrate metabolic process, MAP kinase activity, glycosaminoglycan degradation, and O-glycan biosynthesis. Consequently, we highlighted four genes; PDGFA, GPER1, PNPLA2, and BSCL2, which were previously recognized as important for body weight and fat deposition, but their roles in pigs are poorly understood. White adipose tissue is recognized as an energy-storing organ that is closely associated with fat deposition and body weight [54]. Gonzalez et al. [55] found that PDGFA plays a vital role in the proliferation and maintenance of adipocyte progenitors in dermal adipose tissue through the PI3K-Akt pathway. Previous studies also reported that PDGFRα is activated by the homodimers PDGFA, PDGFB, and PDGFC, whereas PDGFRβ is activated by PDGFB and PDGFD [56,57]. More importantly, human adipose tissue differentiation into beige or white adipocytes depends on PDGFRα/PDGFRβ signaling [58]. The BSCL2 gene also participates in adipocyte differentiation and lipid droplet formation. Mutations in the BSCL2 gene cause human congenital lipodystrophy, an autosomal recessive genetic disease characterized by almost complete loss of adipose tissue, insulin resistance, and fatty liver [59,60]. The gene GPER1 encodes G proteincoupled estrogen receptor 1, which is involved in metabolism and immunity [61]. Sharma et al. [62] reported that weight gain in male GPER-knockout (KO) mice was associated with visceral and subcutaneous fat. However, these GPER KO mice showed no differences in food intake or exercise activity levels compared with wild-type littermates. This observation demonstrates that GPER may regulate metabolic parameters associated with obesity. As an important triglyceride hydrolase in mammalian cells, PNPLA2 predominantly performs the first step in triglyceride hydrolysis. Dai et al. [63] revealed that functional polymorphisms in the 5′ upstream region of PNPLA2 are potential DNA markers for backfat thickness in Duroc pig breeding programs.
In recent years, studies on the influence of CNVs on complex traits have gradually been thrust into the limelight [17,33]. To the best of our knowledge, the present study represents the largest sample size used for the detection of genome-wide CNVs in Duroc pigs. However, due to the sparse markers in the SNP chip used, we may have overestimated the frequency of large-scale CNVs detected in our study. Accordingly, high-density SNP chips or whole-genome sequencing technologies should be applied for further CNV detection.

Conclusions
In this study, we performed genome-wide CNV detection and CNVR-based GWAS for growth and fatness traits in a large population of U.S. and Canadian Duroc pigs. A total of 953 CNVRs were detected in these two populations, accounting for~10.90% of the pig autosomal genome. Moreover, 35 CNVRs were associated with growth and fatness traits. However, we found no shared CNVR QTL in the two populations among these CNVRs. These findings indicate that genetic differences between the two populations may have a substantial impact on the genomic localization of genetic variants. We also identified major candidate genes, including PDGFA, GPER1, PNPLA2, and BSCL2, that may be related to growth and fatness traits. Our results provide valuable insights into the genetic mechanisms underlying growth and fatness traits in pigs.

Ethics statement
The animals and experimental methods used in this study follow the guidelines of the Ministry of Agriculture of China and the Use Committee of South China Agricultural University (SCAU). The ethics committee of SCAU (Guangzhou, China) approved all animal experiments. The experimental animals were not anesthetized or euthanized in this study.

Samples and phenotype data
Experimental animals were raised at the Wens Foodstuff Group Co., Ltd. (Guangdong, China) of Duroc core breeding farms. A total of 6627 Duroc pigs were used, including 3770 (2280 males and 1490 females) Duroc pigs of U.S. origin and 2857 (1570 males and 1287 females) Duroc pigs of Canadian origin, born between 2013 and 2018. Once these 6627 Duroc pigs reached a body weight of 30 ± 5 kg, they were transferred to the test station. During the experiment, all pigs were raised under normal management conditions, provided with drinking water, and were freely fed. Additionally, data on average daily gain at 100 kg (ADG), days to 100 kg (AGE), and backfat thickness at 100 kg (BFT) were collected from each population; a more detailed description of the phenotypic measures can be found in our previous publication [50]. In brief, when their body weight reached approximately 100 kg (100 ± 5 kg), ADG and AGE traits were measured and adjusted to 100 kg. The adjusted formula for AGE is as follows [19,50]: The following formula was used for adjusted ADG [19,50]: ADG adjusted to 100 kg kg=day ð Þ ¼ 100 kg AGE adjusted to 100 kg In addition, when their body weight reached 100 ± 5 kg, the BFT phenotype was measured using an Aloka 500 V SSD B ultrasound probe (Corometrics Medical Systems, USA) from the 10th-rib to the 11th-rib of the pig [64]. Adjusted 100 kg BFT was obtained from the Canadian Centre for Swine Improvement (http://www. ccsi.ca/Reports/Reports_2007/Update_of_weight_ adjustment_factors_for_fat_and_lean_depth.pdf) using the following formula:

SNP genotyping and quality control
Genomic DNA was extracted from ear tissue using the traditional phenol/chloroform method, and the quality of DNA in all samples (6627 DNA samples) was assessed based on light absorption ratio (A 260/280 and A 260/230 ) and gel electrophoresis, using a DNA concentration of 50 ng/μL [65]. Samples were genotyped using the Illumina GeneSeek 50 K SNP array (Neogen, Lincoln, NE, United States) with 50,649 SNP markers across the entire genome. Quality control was performed using the PLINK software v1.90 [66]. SNPs located in sex chromosomes, or without positional information, were discarded and only samples with high-quality genotyping (call rate of 90% and above) were retained [27,41,67]. Finally, a set of 46,458 informative SNPs from 3770 Duroc pigs of U.S. origin and 46,458 informative SNPs from 2857 Duroc pigs of Canadian origin was used for CNV detection.

CNV detection
The PennCNV software v1.0.5 was used to identify individual-based CNVs by combining the SNP signal data of log R ratio (LRR) and B allele frequency (BAF) as well as the population frequency of the B allele (PFB). The LRR and BAF values for each SNP were computed using the GenomeStudio software (v2.0; Illumina, Inc., USA). The Perl comppile_pfb.pl command in PennCNV was used to calculate PFB based on the BAF of each SNP. Moreover, the wave adjustment procedure was conducted using the -gcmodel option in the PennCNV to reduce the impact of genomic waves [68]. We calculated the GC content in the 500 kb genomic region around each SNP derived from the Sscrofa 11.1 version of the pig reference genome (http://ensemble.org/Sus_ scrofa/Info/Index). PennCNV was run using the -test option without considering pedigree information, as the relationship among the pigs in our study population is unknown. The final CNVs were identified by retaining high-quality samples according to the following criteria: LRR < 0.3, BAF drift < 0.01, and GC wave factor of LRR < 0.05. Meanwhile, to further decrease false-positive CNV calls, CNVs with consecutive SNPs ≥3 and CNV length ≥ 10 kb were retained. We also used the BED-Tools software v2.26.0 [69] to merge CNVs with at least 1 bp overlap in all samples to define the CNV region (CNVR) [17]; CNV and CNVR borders were determined based on the location of SNP markers. The CNVRuler software v1.3.3.2 [70] was used to define three types of CNVR: loss, gain and mixed (gains and losses occurring in the same region). In addition, we matched CNVs with the corresponding CNVR in each population to obtain the CNVRs. In other words, CNVRs with full coverage CNV sequences were considered population-based CNVRs. A final set of 802 CNVRs mapped in 3303 U.S. Duroc pigs and 499 CNVRs mapped in 2677 Canadian Duroc pigs was used for subsequent analyses.

Quantitative PCR validation
We chose real-time quantitative polymerase chain reaction (qPCR) to validate the CNVRs detected by PennCNV. A total of nine CNVRs were randomly selected based on the CNVR type (loss, gain, and mixed) and frequency in the population. Due to uncertainty in the boundaries of the identified CNVRs, we used the Oligo 7 software [71] to design primers for specific regions in the ELFN1, PUSL1, MAPRE2, SGMS2, PCID2, DSCAM, GATD3A ADGRA1, and LIFR genes (see Additional file 4: Table S4). We also selected the GCG gene as the reference locus because this gene is highly conserved among pigs and exists as a single copy in the reference genome [17,33,72]. A total of 74 DNA samples were randomly selected for qPCR validation, and normal samples identified with no copy number change in the test region were used as references. Real-time quantitative PCR was conducted using Qiagen's Quantitative Reaction Kit (QuantiFast SYBR Green PCR kit, Qiagen, Hilden, Germany). The PCR reaction was performed using a total 10 μL volume consisting of the following reagents: 1 μL DNA (50 ng/μL), 0.3 μL of both forward and reverse primers (10 pM/μL), 5 μL Blue-SYBR-Green mix (2×), and 3.4 μL water. The PCR conditions were as follows: 95°C denaturation, hot start 5 min; 45-50 PCR cycles (95°C, 10 s, 60°C, 15 s, and 72°C, 20 s); dissolution curve (95°C, 15 s, 55°C, 15 s, and 95°C, 15 s). All reactions were carried out on 384-well clear reaction plates, and each sample was amplified in triplicate, with average C t values calculated for further copy number determination. The relative copy number difference in the test region was determined using 2 × 2 -ΔΔCt , where ΔΔC t = [(mean C t of the target gene in the test sample) -(mean C t of GCG in the test sample)] -[(mean C t of the target gene in the reference sample) -(mean C t of GCG in the reference sample)] [73]. Values of approximately 2 were considered normal. A value of 3 or more and a value of 1 or less represented gain and loss statuses, respectively.
In this study, the GEMMA software v0.98.1 [76] was applied to a univariate linear mixed model to conduct GWAS on a single population. To improve the accuracy of the GWAS results, we filtered the CNVR datasets with frequencies lower than 0.5% in each population [32]. A final set of 139 CNVRs in 3303 U.S. Duroc pigs and 92 CNVRs in 2677 Canadian Duroc pigs was selected for association analysis. Before GWAS, genomic relatedness matrix (GRM) and principal component analysis (PCA) based on SNP datasets for each population were generated using the GEMMA and GCTA software v1.92.4beta [77]. The statistical model used was as follows: where y represents a vector of the corrected phenotypic value for each population; W is the incidence matrix of covariates, including fixed effects of the top three eigenvectors of PCA, sex, birth weight, and parity; α represents the vector of corresponding coefficients including the intercept; X is the vector of CNVR marker genotypes; β specifies the corresponding effect size of the CNVR; u is the vector of random effects, with u~MVN n (0, λτ −1 K); ε is the vector of random residuals, with ε~MVN n (0, τ −1 In); λ signifies the ratio between two variance components; τ −1 is the variance of the residual errors; K is GRM; I is an n × n identity matrix; MVN n denotes the n-dimensional multivariate normal distribution. In the CNVR-based GWAS, the Bonferroni method was used to determine the genome-wide significant (0.05/N) threshold, where N represents the number of CNVRs. Given that is a stringent criterion, a more lenient threshold was also used for detecting the suggestive (1/N) CNVRs [78,79].

Candidate gene identification and functional enrichment analysis
The physical position information was obtained from the Sscrofa 11.1 version of the pig reference genome. Genes that overlapped with significant CNVRs were selected for KEGG pathway and GO analyses using KOBAS v3.0. Enriched terms with P < 0.05 based on Fisher's exact test were selected for further exploration of the genes involved in biological pathways and processes [65,80]. GeneCards (http://www.genecards.org/) and Ensembl (www.ensembl.org/biomart/martview) were used to query gene functions.
Abbreviations CNV: Copy number variation; CNVR: Copy number variation region; CN: Copy number; ADG: Average daily gain; AGE: Days to 100 kg; BFT: Backfat thickness; SSC: Sus scrofa chromosome; GWAS: Genome-wide association study; SNP: Single nucleotide polymorphism; INDEL: Insertion and deletion; QTL: Quantitative trait locus; qPCR: Quantitative polymerase chain reaction Guangdong Province (2019BT02N630). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials
The SNP genotyping data containing variant information for the U.S. (n = 3,770) and Canadian (n = 2,857) Duroc pigs are not publicly available because the genotyped animals belong to commercial breeding companies, but they can be obtained from the corresponding author under reasonable requirements. Pig genome (Sus scrofa 11.1), annotations (v103) and the accession numbers listed in Table S6 and Table S7 can be obtained from ENSEMBL (http://ftp.ensembl.org/pub/release-103/).

Declarations
Ethics approval and consent to participate The animals and experimental methods used in this study follow the guidelines of the Ministry of Agriculture of China and the Use Committee of South China Agricultural University (SCAU). The ethics committee of SCAU (Guangzhou, China) approved all animal experiments. The informed consent was obtained from Wens Foodstuff Group Co., Ltd. (Guangdong, China) to data collection. There was no use of human participants, data, or tissues.

Consent for publication
Not applicable.