Skip to main content

A genome-wide scan of copy number variants in three Iranian indigenous river buffaloes



In Iran, river buffalo is of great importance. It plays an important role in the economy of the Country, because its adaptation to harsh climate conditions and long productive lifespan permitting its farming across the Country and to convert low-quality feed into valuable milk. The genetic variability in Iranian buffalo breeds have been recently studied using SNPs genotyping data, but a whole genome Copy Number Variants (CNVs) mapping was not available. The aim of this study was to perform a genome wide CNV scan in 361 buffaloes of the three Iranian river breeds (Azeri, Khuzestani and Mazandarani) through the analysis of data obtained using the Axiom® Buffalo Genotyping Array 90 K.


CNVs detection resulted in a total of 9550 CNVs and 302 CNVRs identified in at least 5% of samples within breed, covering around 1.97% of the buffalo genome. and A total of 22 CNVRs were identified in all breeds and a different proportion of regions were in common among the three populations. Within the more represented CNVRs (n = 302) mapped a total of 409 buffalo genes, some of which resulted associated with morphological, healthy, milk, meat and reproductive traits, according to Animal Genome Cattle database.


This work provides a step forward in the interpretation of genomic variation within and among the buffalo populations, releasing a first map of CNVs and providing insights about their recent selection and adaptation to environment. The presence of the set of genes and QTL traits harbored in the CNVRs could be possibly linked with the buffalo’s natural adaptive history together to a recent selection for milk used as primary food source from this species.


The Asian water buffalo (Bubalus bubalis) and the African wild buffalo or Cape buffalo (Syncerus caffer) are the two main species of buffalo in the world [1]. The domestication of the Asian water buffalo with two subspecies i.e. the river (Bubalus bubalis bubalis 2n = 50) and the swamp (Bubalus bubalis carabanensis 2n = 48) buffalo, occurred about 3000–6000 years before present in the Indo–Pakistani area and in the vicinity of borders of China, respectively [2, 3]. River buffalo is common in India, Egypt, Southwest Asia and Europe, and swamp buffalo is common in China and Southeast Asia [3, 4].

According to the FAOSTAT data ( accessed 2020/10/08) the proportion of buffalo population in Iran respect to cattle species was 2.5%. A very similar data is reported by Beldman et al. [5] who analyzed the dairy farming sector in Iran indicating a proportion of 3% of buffalo heads over the total population of cattle.

In the developing countries including Iran, river buffalo breeding is nowadays recognized of great importance because of (i) the ability of buffalo to convert low-quality feed to valuable milk, (ii) adaptation to harsh climate conditions and resistance to local parasites, (iii) long productive lifespan [6], and (iv) their potential in milk and meat production performances [7]. As described by Safari et al. [7] the main breeding activities in Iranian Buffalos are carried by the Animal Breeding Center of Iran and envisage milk recording and genetic evaluation of reproducers. In Iran, there are three main buffalo breeds including Azeri (from the north-west and north), Khuzestani from west and south-west, and Mazandarani (from north) [4, 6]. The recently released buffalo SNP genotyping array has been reported as a suitable tool for studying genetic diversity of river buffalo breeds as well as a potential starting point for genome-wide association and genomic selection programs [1, 8]. The genetic variability in these three breeds have been recently studied using SNPs markers. Davoudi et al. [9] investigated the haplotypic structure and genetic diversity in Khuzestani river buffalo, while Mokhber et al. [4] evaluated the genetic structure of the Azeri and Khuzestani breeds to identify genomic regions associated to different environmental conditions and production goals. Additionally, Ghoreishifar et al. [6] identified ROH in the Azeri and Khuzestani breeds.

Copy number variants (CNVs) are a source of structural variability that have been utilized to identify genetic variability among breeds in several species. Redon et al. [10] defined a CNV as a DNA segment of one kilobase (kb) or larger that is present at a variable copy number in comparison with a reference genome. Several studies have been performed in different species, finding that CNVs are also related to phenotypic variability [11,12,13,14,15,16] as well as disease susceptibility [17, 18] describing up to 30% of the genetic variation in gene expression. Studies regarding the CNVs detection based on SNP chip data in buffalos were not available, consequently, the aim of this research was to perform a genome wide CNV mapping in samples of the three Iranian river buffalo breeds through the analysis of data obtained using the Axiom® Buffalo Genotyping Array 90 K.


The PCA and the FST results based on the SNPs genotypes confirms that the three populations are clearly differentiated (Fig. S1).

A total of 9550 CNVs (5154–53.97% – deletions and 4396–46.03% – gains) on all 24 autosomes were detected in the 361 samples (Supplementary Table S1). Table 1 reports the descriptive statistics for CNVs for each of the three buffalo populations, as well as the loss/gain ratio calculated considering the number of loss on number of gains. The largest loss/gain ratio was found in the MAZ breed (1.32) while the KHU showed a value close to the unit.

Table 1 Summary of statistics for CNV detected in the three Buffalo populations

The graphical representations of CNV statistics are shown for each breed in Fig. 1. In details, the relationship between CNV count and the averaged total length of CNV for each individual is shown in Fig. 1a The graphical distribution allowed to identify a similar pattern of distribution for the samples belonging to the three buffalo breeds with few individuals counting a low number of CNVs with a high average length. Even when CNVs are classified according to classes of length (5 classes as in Fig. 1b legend), the three populations showed similar structure in CNVs. An exception was observed for the proportion of CNVs for the longer class of length (> 500 Kb), resulted higher in MAZ respect those identified for AZE and KHU breeds.

Fig. 1
figure 1

Graphical representation of CNV statistics. a Relationship between number and mean total length (bp) of CNV identified in each sample: Blu dots – AZE breed, Green dots KHU breed, Red dots – MAZ breed; b Proportion of the 5 classes of CNV Length (Green > 500 Kb, Yellow 100–500 Kb, Grey 50–100 Kb, Orange 10–50 Kb, blue <10Kb) for each of the three breeds

Overlapping CNVs across samples within population were summarized into 1678, 1060, and 257 copy number variable regions (CNVRs) for AZE, KHU and MAZ, respectively (Table 2). The Supplementary Table S2 reports the list of CNVRs found in each population, together with the samples CNV count and states.

Table 2 Summary of all CNVRs identified in the three Buffalo populations. Min, Max and Mean length are expressed in base pair (bp)

The total number of regions identified, as reported in Table 1, is possibly linked to the sample’s population size, especially for singleton_CNVRs that in the MAZ breed result to be a very large proportion (73%) of identified CNVRs: it is more likely, in fact, to identify singleton_CNVRs in small populations as they are those not in common among individuals.

A total of 101, 133 and 68 CNVRs (n. 302 in total, 203 non redundant regions) are the regions defined by CNVRs mapped in at least 5% of samples (n. 12 – AZE, n. 5 – KHU, and n. 2 – MAZ). Non redundant regions include pop_CNVRs (i.e. non singleton regions within population) plus the common_CNVRs (i.e. common CNVRs across populations), counting these latter only once when found in more than one population. All the following statistics and graphical representation have been obtained using the pop_CNVRs, covering 28.40 Mb (1.14%), 29.91 Mb (1.21%), and 11,44 Mb (0.46%) of the buffalo autosomes total length, respectively.

For each population, a graphical representation of CNVRs frequencies on autosome is shown, together with the mean CNVRs coverage length (Fig. 2). The number of detected CNVRs among chromosomes is uneven, and no correlation between chromosomes length and mean CNVRs length resulted in these populations. The mean CNVRs length is not uniform along all chromosomes, mainly for MAZ breed. Two peaks (mean length) are evident for all breeds: on chromosomes 13 the mean CNVRs lengths were about five (AZE), and three (KHU and MAZ) times higher respect to the total mean length calculated for all autosomes. The second peak, found on chromosome 11, is lower respect to the previous one, but observable in all breeds.

Fig. 2
figure 2

Frequencies (columns) and mean length in Mb (line) of CNVRs (identified in more than 5% of samples) for each chromosome for each of the three breeds: Blue – AZE; Green – KHU; Red – MAZ

Within the 203 non redundant CNVRs a different proportion of common_CNVRs were observed among the three populations (Fig. 3): i) MAZ shared the 45.6% of AZE regions (corresponding to 30.7% of AZE regions); ii) MAZ shared the 42.6% of KHU regions (corresponding to the 23.3% of KHU regions); iii) AZE shared the 60.4% of KHU regions (corresponding to the 45.8% of KHU regions). Among the common_CNVRs, 22 are those identified in all three populations (Table 3).

Fig. 3
figure 3

a Venn diagram of pop_CNVRs and common_CNVRs and count of genes associated with QTL_Terms (Trait Type and Trait Class) for each buffalo breed (Blue – AZE; Green – KHU; Red – MAZ); b Proportion of QTL_Terms (trait class) and c proportion of QTL_term (trait type) for each of the three breeds (Blue – AZE; Green – KHU; Red – MAZ) referred to association studies available for bovine species (, accessed on 30 October 2020)

Table 3 Details of the 22 common_CNVRs identified in all three Buffalo breeds. Number of samples together with annotated genes and QTL_Terms (Trait Names) and IDs – traits are also reported. QTL are in concordance with Animal Genome Cattle QTL Database

The pop_CNVRs represented about half of total CNVRs number, except for AZE for which the common_CNVRs are more than twice that the pop_CNVRs.

A total of 234, 365, and 158 genes were annotated within AZE, KHU and MAZ CNVRs, respectively (corresponding to 409 non redundant buffalo genes) (Supplementary Table S3) and their functional classification according to DAVID database is reported in Supplementary Table S4 and in Table 4 (Nominal P-value < 0.10).

Table 4 Gene annotation according to DAVID Database (Functional Annotation Clustering tool) (P-Value < 0.10). Species used as genetic background: Bos taurus; list uploaded = n.409 genes; genes recognized = n.334). Bold = P-value < 0.05

The Animal Genome Cattle Database was accessed to reveal the potential relationships between CNVRs and QTL. As shows in Fig. 3, in Table 3, and in Table 5, 59 genes resulted associated with a total of 80 different “Trait Name”, grouped in 23 “Trait Type” (Anatomy, Blood parameters, Chemistry, Conformation, Disease, Fatness, Feed intake, Fertility, General reproduction parameters, Growth, Life history traits, Lifetime production, Limb traits, Mastitis, Milk composition – fat, Milk composition – protein, Milk processing trait, Milk composition – other, Milk yield, Organ disorders, Semen quality, Sensory characteristics, Udder traits) corresponding to six “Trait Class” (Exterior, Healthy, Meat and Carcass, Milk, Production, and Reproduction Traits) according to Animal Genome Database Cattle Traits nomenclature.

Table 5 Common and pop_CNVRs for which genes and QTL annotation were available. Complete list of common and pop_CNVRs and QTL_IDs (referring to Bos taurus species) are reported in Supplementary Table 3. QTL_Terms: Trait Names are grouped in Fig. 3 as Traits Type according to Animal Genome Cattle QTL Database. The Table does not include common_CNVRs reported in Table 3

Results of PCA showed a spatial distribution of samples (Fig. 4) due to the difference in CNV across the three pop_CNVRs. This is reflecting the overlapping among common_CNVRs and proprietary CNVRs or each breed as shown in Fig. 3 (A) in the Venn diagram.

Fig. 4
figure 4

CNVRs Principal Component Analysis (PCA) graph: Blue – AZE; Green – KHU; Red – MAZ


In Iran, breeding of river buffaloes plays an important role in the economy of the country, also impacting on social and cultural activities [7]. In this country the most common local buffalo breeds are the Azeri and Khuzestani. The Mazandrani breed is also farmed in Iran, but its numeric consistency of about 4000 individuals is lower respect to those reported for AZE and KHU breeds (i.e. 119,000 and 81,000 buffaloes, respectively) [4]. All these three breeds, different in morphology, have been undergone a different selection processes and are well adapted to different environments proper of the geographical areas of the country in which they are reared: Azari – the north-west-north of Iran (70%), Khuzestani – west and south-west (22%) of the Country while the Mazandarani is farmed in the north (8%) region (

The genetic variability in Iranian buffalo breeds have been recently studied and findings based on the Linkage Disequilibrium (LD) obtained from the Affymetrix 90 K SNP genotypes [19] showed a close genetic relation between AZE and KHU due to a high LD consistency across the two populations and a lower similarity when the comparison involved MAZ with both AZE and KHU due to variability in the LD trend within breed. These results are consistent with those found in an additional Principal Component Analysis (Supplementary Figure S1) based on SNPs genotypes showing a clear clustering of the three breeds. Additionally, the FST statistics here calculated as in [3], shows a differentiation among the three breeds with similar FST values: the lowest between AZE vs KHU (FST = 0.017 our study, FST = 0.021 [3]); an intermediate between AZE vs MAZ (FST = 0.045 our study, FST = 0.038 [3]); the highest between KHU and MAZ (FST = 0.058 our study, FST = 0.045 [3]).

To improve the knowledge on these populations, we investigated the genomic structure of the Iranian buffaloes through the analysis of the CNVs, in order to provide additional information that could be used for breeding and conservation programs of these populations, as concluded by Clop et al. [20].

A genome wide CNVs detection has been here performed resulting in a total of 9550 CNVs and 2.995 CNVRs in the three breeds. The AZE and KHU show about 50% of their CNVR as singleton, while MAZ breed 73%. The large singleton_CNVRs proportion for MAZ is most likely due to the reduced sample size that does not permit to identify a larger proportion of pop_CNVRs in this breed. Excluding singleton_CNVRs the CNVRs resulted 1605 in the three breeds. Among the 1605 pop_CNVRs, 302 were identified across breeds in at least 5% of individuals within breed, covering about 1.97% of the buffalo genome (calculated considering the no redundant 203 CNVRs). Out of these latter CNVRs, 22 (10.8%) are those shared by all populations, identifying a common genomic structural background of among these buffaloes. In fact, when considering the common_CNVRs counts (those shared among breeds), the highest number of common_CNVRs are found in AZE having the lowest proportion of pop_CNVRs (30.7%) respect to those identified for KHU (48.9%) and for MAZ (44.1%). MAZ breed shared a higher number of CNVRs with AZE respect to those resulted in common with KHU. This result suggest that the moderate level of admixture identified for MAZ and AZE by [6], is confirmed here and graphically visualizable in the PCA (Fig. 4), where MAZ buffaloes are mainly distributed close to AZE individuals. The PCA showed in Fig. 4 is also showing that CNVs are not the same across the three breeds as they do partially overlap: in fact only 22 CNVRs are in common among the three breeds, while 31, 65 and 30 were found only in the AZE, KHU and MAZ respectively, as showed by Venn diagram in Fig. 3. CNVs are in fact a different class of markers respect to SNP and it is expected that PCA is discriminating the population not in the same exact manner.

The most represented common_CNVR, defined by the CNVs identified in 209 buffaloes of all the three breeds (Table 3), mapped on chr13. It does not show a specific CNV pattern of state in a particular breed, i.e. only gain or only loss, but is having a complex behavior. A part of this region harbors 21 LOC gene IDs annotated consecutively over the genome (i.e. from 14,252,993 to 17,645,518 bp), all correspond to the same multidrug resistance-associated protein 4-like protein coding gene, also known as ATP-binding cassette sub-family C member 4 (ABCC4). This gene appears to be involved in several basic metabolic pathways including resistance/susceptibility to intestinal nematodes [21], feed efficiency [22], and marbling score [23]. Recently, in a study conducted on dairy buffaloes, for the ABCC4 gene it has been also highlighted its possible contribution on reproduction traits, resulting among the top genes associated with number of services per conception [24]. Additionally, the expression of ABCC4 gene increases in pregnant cattle and pigs’ endometrium and it could be important to support pregnancy given its role in prostaglandin efflux from cells [25, 26].

Other two common_CNVRs were defined by CNVs called in more than 100 individuals. The first region mapped on chr8 (at 38,899,124-39,035,018 pb) and the second one on chr16 (at 49,791,674-50,216,803 bp). Only this latter region, resulted complex in AZE and KHU and gain in MAZ, harbors genes (ABCC8, USH1C, MYOD1, OTOG, KCNC1, SERGEF), which all resulted close (i.e. surrounding 200 kb) to the significant SNPs affecting Nellore age at first calving [27]. Also, the MYOD1 gene is a component of myogenic regulatory factors involved in myoblast differentiation and in concordance with [28, 29] results, this gene may potentially affecting meat production.

The CNVRs in which genes or QTL were not annotated are those identified in lower number of individuals (within breed). For these samples the CNVs defining CNVRs also resulted with a large proportion of complex state (21% for AZE; 17% for KHU, and 3% for MAZ, excluding singleton_CNVRs.

Recent literature support that CNV may have a role in selection mechanism in addition to SNPs and that CNV changes may indicate that artificial selection may cause difference in genetics and phenotypes among breeds [30, 31]. We could speculate that the lack of a strong directional selection for a specific trait did not favor the increase in copies for specific genes as occurred in human [32], dogs [33] or polar bears [34], where the dietary shift produced an increase in DNA copy number where the AMY gene family was involved, an example of positive selection on CNV. In livestock populations where the directional selection is not focused towards a unique direction or for a specific trait, the proportion of complex CNVRs is comparable to the ones here found. This is the case of creole cattle in Mexico where the proportion of complex CNVRs was 16% [35] and of the Aosta Red Pied (Valdostana Red Pied) where the selection goal contemporarily focuses on milk, meat and adaptation to summer pasture practice, i.e. adaptation to harsh environment [36]. A further evidence is provided in the avian species as for strongly selected populations, as in chicken, where a very low proportion of complex CNVRs, 0 to 5% was found, while in a the non-selected Mexican creole poultry population the proportion of complex CNVRs was up to 14% [14].

Within the CNVRs here found in gain in state (i.e. defined by CNVs all duplicated), several annotated genes with a well-known associated phenotype in cattle or in other species were found in addition to those reported in Tables 3 and 5. Regarding the PCSK6 gene there are evidences of an association with follicle development in human [37], while for GALNTL6 with feed efficiency and growth traits in cattle [38], with saturated fatty acids profile in intramuscular fat of the longissimus thoracis muscle of Nellore [39], and with cow and heifer conception rate [40]. For the same breeds, gain regions also harbored GALNT6 and FYN genes, both involved in reproduction traits in cattle ( and in mouse [41], respectively. The FYN gene, considered by [42] as thermotolerant gene, seems to have a role in cow conception and early embryo development in cattle. Finally, PCSK6 and SNRPA1, together with PLXNA2, mapping within an AZE gain CNVR, resulted lying within a positive selection signature region identified using SNP as genetic markers in creole breeds [30]. The PLXNA2 gene has been also associated with cattle temperament [43]. For the MAZ breed, there are few gain CNVRs harboring genes: this can be related to the very low number of samples available in this study. Among these regions, the one located on chr1 overlaps EPHB3, a gene resulted associated with muscling at weaning (MW) and muscling at yearling (MY) in Bos indicus populations [44]. Also, EPHB3 maps in a selection signature region identified by VST analysis based on CNVs performed through a comparison between Valdostana Red Pied vs Italian Brown Swiss, a double proposal and a dairy cattle breed, respectively [36].

According to the annotation analysis performed with DAVID Database, buffalo CNVRs are enriched in genes (n = 334 recognized IDs) mainly involved general biological processes (Supplementary Table S3). Also, a total of 80 different “QTL_Trait-Terms” associated with 59 genes have been identified and classified in 6 major QTL trait categories (Fig. 4 and Tables 3 and 4), of which the most represented are Milk (i.e. Milk composition – fat and –protein) and Production Traits (i.e. Growth) (Tables 3 and 4, Supplementary Table S3). We did not observe a prevalence of a particular QTL_Trait Term (taking in to account the differences in CNVRs and annotated genes counts) in one of the three breeds, except for AZE – General Milk– and for MAZ – Reproductive QTL_Trait Terms.


The knowledge of genomic variation in the water buffalo species is still very limited and most of the recent findings still rely on comparison with cattle species. This work provides a step forward in the biological interpretation of genomic variation in the buffalo species. As the CNVs are known to be mostly non-neutral markers, these results may contribute to interpret genomic variation within and among the buffalo populations, that can be used to provide insights about their recent selection and adaptation to environment.

We may speculate that the presence of the set of genes and QTL traits harbored in the CNVRs here mapped could be linked with the buffalo’s natural adaptive history, i.e. to their ability to adapt to diverse and severe environmental conditions (different for AZE and MAZ respect to KHU) and may be occurred because these bovid in recent time have started a selection program for milk yield, that is a primary food source from this species.



A total of 384 Iranian Buffalo raw genotyping data (i.e. cel.files) obtained using the Axiom® Buffalo Genotyping Array 90 K (Affymetrix, Santa Clara, CA, USA) were available for the three breeds [4, 6, 9] and utilized as input files in order to obtain the Log R Ratio (LRR) values for each sample. As extensively described in [4] and in [19], the SNPs data comes from populations sampled with the aim to be representative of each breed. The AZE breed (252 samples) was sampled in the East and West-Azarbaijan, Ardebil and Gilan provinces, the KHU (110 samples) in Khuzestan and Kermanshah provinces, and the MAZ (22 samples) from Miankaleh wildlife sanctuary of Mazandaran province.

A quality control of raw intensity files using the standard protocol in the Affymetrix Power Tools package ( was performed in order to guarantee a high quality of obtained data. Individuals with a value of call rate less than 97% and Dish Quality Control less than 82% were removed. After quality control a total of 9 low quality samples have been identified and not used.

The marker positions of the Genotyping Array 90 K array were recently updated and based on the newly released University of Adelaide water buffalo assembly (UOA WB v. 1;

A total of 35,114 SNPs was filtered out, in part for their undefined chromosomal locations, and in part because a proportion of SNPs had more than one nearby probeset: for these latter the Axiom Analysis suite picked the best performing probeset per each SNP. A total of 70,230 SNPs was then retained in the analysis mapped on the Bubalus bubalis (UOA_WB_1) genome assembly.

To confirm the clustering of individuals to the three populations, the genetic diversity within and among breeds was explored using SNP genotypes by Principal Component Analysis (PCA) and by the pairwise Fixation Index obtained according to the pipelines of Golden Helix (SVS) 8.8.4 software (Golden Helix Inc., Bozeman, MT, USA).

This study did not require approval from the Animal Care and Use Committee as we use already available data obtained in previous researches compliant [4, 6, 9] with rules and regulations for animal sampling.

CNVs, CNVRs detection and subsequent analyses

The CNV detection was performed on the 24 autosomes, using the Copy Number Analysis Module (CNAM) of SVS (Golden Helix, Bozeman, MT, USA), through the univariate analysis based on LRR values obtained using the Axiom® CNV Summary Tool software ( A quality assurance of LRR raw data and filtering of outlier samples was performed before CNV calling with the SVS software through: i) the overall distribution of derivative log ratio spread (DLRS) values as described by [45]; ii) the GC-wave factor (GCWF) that measures the GC-content causing the fluctuation of a signal intensity file [46]. A total of 14 samples were excluded during the quality assurance because of their high DLRS and GCWF values. Consequently, the CNV mapping was performed on a final dataset of 361 samples (n. 242 – AZE, n. 100 – KHU, and n. 19 – MAZ) including as parameters in CNAM: maximum 100 segments per 10,000 markers; minimum of 3 marker per segment; 2000 permutations per pair with a p-value cut off of 0.005.

CNV regions (CNVRs) at population level were obtained by merging CNVs that overlapped by at least 1 bp in at least two animals using the -mergeBed command of Bedtools [47]. CNVRs were then cataloged as gain, loss and complex (i.e. CNVRs comprising both gain and loss CNVs) regions. A CNV identified in only one individual was classified as singleton_CNVR. Finally, the “-intersectBed” command of Bedtools software was employed to catalogue CNVRs as “pop_CNVRs” and “common_CNVRs” if they have been mapped in only one population (no intersection among CNVRs identified in more than one breed) or they resulted in common among breeds (part – at least 50% – or full overlapping), respectively. Only CNVRs found in at least 5% of the individuals of a breed were considered to infer statistics at population level and for the gene annotation.

The complete list of buffalo protein coding genes was downloaded from NCBI online Database (!/proteins/791/374666|Bubalus%20bubalis/). Genes with official “gene name ID” and LOC genes associated with a protein coding gene name (excluding uncharacterized ones) were annotated within the detected CNVRs using the Bedtools “-intersectBed” command. Gene Ontology terms (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed using the DAVID Bioinformatic Database (

As the Quantitative Trait Loci (QTL) database for the buffalo species is not available, the QTL associated to the genes here found in the CNVRs were identified in the cattle QTL database (QTLdb: by gene name, using the “Search by associated gene” option of QTLdb.

In order to disclose diversification of the three buffalo breeds based on CNVs, a Principal Component Analysis (PCA) using Past software [48]. Sample-CNV genotypes were coded as “deletion”, “duplication”, and “normal” states for each of the identified CNVRs and used in the PCA analysis.

Availability of data and materials

The datasets supporting the results and conclusions of our study are included within the article and in the additional files.



Single Nucleotide Polymorphisms


Quantitative Trait Loci


Copy Number Variants


Copy Number Variant regions


Azeri breed


Khuzestani breed


Mazandrani breed


Principal Component Analysis


Linkage Disequilibrium


  1. Iamartino D, Nicolazzi EL, Van Tassell CP, Reecy JM, Fritz-Waters ER, Koltes JE, et al. Design and validation of a 90K SNP genotyping assay for the water buffalo (Bubalus bubalis). PLoS One. 2017;12(10):e0185220.

  2. Yindee M, Vlamings BH, Wajjwalku W, Techakumphu M, Lohachit C, Sirivaidyapong S, et al. Y-chromosomal variation confirms independent domestications of swamp and river buffalo. Anim Genet. 2010;41(4):433–5.

    CAS  Article  PubMed  Google Scholar 

  3. Colli L, Milanesi M, Vajana E, Iamartino D, Bomba L, Puglisi F, et al. New insights on water Buffalo genomic diversity and post-domestication migration routes from medium density SNP Chip data. Front Genet. 2018;9:53.

  4. Mokhber M, Moradi-Shahrbabak M, Sadeghi M, Moradi-Shahrbabak H, Stella A, Nicolzzi E, et al. A genome-wide scan for signatures of selection in Azeri and Khuzestani buffalo breeds. BMC Genomics. 2018;19(1):449.

  5. Beldman A, Van Berkum S, Kortstee H, Zijlstra J. Dairy farming and dairy industry in Iran. Wageningen Economic Research; 2017.

    Book  Google Scholar 

  6. Ghoreishifar SM, Moradi-Shahrbabak H, Fallahi MH, Jalil Sarghale A, Moradi-Shahrbabak M, Abdollahi-Arpanahi R, et al. Genomic measures of inbreeding coefficients and genome-wide scan for runs of homozygosity islands in Iranian river buffalo, Bubalus bubalis. BMC Genet. 2020;21(1):16.

  7. Safari A, Hossein-Zadeh NG, Shadparvar AA, Arpanahi RA. A review on breeding and genetic strategies in Iranian buffaloes (Bubalus bubalis). Trop Anim Health Prod. 2018;50(4):707–14.

    Article  PubMed  Google Scholar 

  8. Ghoreishifar SM, Moradi-Shahrbabak H, Moradi-Shahrbabak M, Nicolazzi EL, Williams JL, Iamartino D, et al. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools. Livest Sci. 2018;216:174–82.

  9. Davoudi P, Moradi-Shahrbabak H, Mehrabani-Yeganeh H, Ghoreishifar SM, Gholami S, Abdollahi-Arpanahi R. Exploring the structure of Haplotype blocks, runs of Homozygosity and effective population size in Khuzestani River Buffalo. Slovak J Anim Sci. 2020;53:67–77.

  10. Redon R, Ishikawa S, Fitch K, Feuk L, Perry G, Andrews T, et al. Global variation in copy number in the human genome. Nature. 2006;444(7118):444–54.

  11. Genova F, Longeri M, Lyons LA, Bagnato A, Gandolfi B, Aberdein D, et al. First genome-wide CNV mapping in FELIS CATUS using next generation sequencing data. BMC Genomics. 2018;19(1):895.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Guan D, Martínez A, Castelló A, Landi V, Luigi-Sierra MG, Fernández-Álvarez J, et al. A genome-wide analysis of copy number variation in Murciano-Granadina goats. Genet Sel Evol. 2020;52(1):44.

  13. Schiavo G, Dolezal MA, Scotti E, Bertolini F, Calò DG, Galimberti G, et al. Copy number variants in Italian large white pigs detected using high-density single nucleotide polymorphisms and their association with back fat thickness. Anim Genet. 2014;45(5):745–9.

  14. Gorla E, Cozzi MC, Román-Ponce SI, Ruiz López FJ, Vega-Murillo VE, Cerolini S, et al. Genomic variability in Mexican chicken population using copy number variants. BMC Genet. 2017;18(1):61.

  15. Lee Y-L, Bosse M, Mullaart E, Groenen MAM, Veerkamp RF, Bouwman AC. Functional and population genetic features of copy number variations in two dairy cattle populations. BMC Genomics. 2020;21(1):89.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. Solé M, Ablondi M, Binzer-Panchal A, Velie BD, Hollfelder N, Buys N, et al. Inter- and intra-breed genome-wide copy number diversity in a large cohort of European equine breeds. BMC Genomics. 2019;20(1):759.

  17. Durán Aguilar M, Román Ponce SI, Ruiz López FJ, González Padilla E, Vásquez Peláez CG, Bagnato A, et al. Genome-wide association study for milk somatic cell score in Holstein cattle using copy number variation as markers. J Anim Breed Genet. 2017;134(1):49–59.

  18. Schurink A, da Silva VH, Velie BD, Dibbits BW, Crooijmans RPMA, Franҫois L, et al. Copy number variations in Friesian horses and genetic risk factors for insect bite hypersensitivity. BMC Genet. 2018;19(1):49.

  19. Mokhber M, Shahrbabak MM, Sadeghi M, Shahrbabak HM, Stella A, Nicolzzi E, et al. Study of whole genome linkage disequilibrium patterns of Iranian water Buffalo breeds using the axiom buffalo genotyping 90K Array. PLoS One. 2019;14(5):e0217687.

  20. Clop A, Vidal O, Amills M. Copy number variation in the genomes of domestic animals. Anim Genet. 2012;43(5):503–17.

    CAS  Article  PubMed  Google Scholar 

  21. Liu GE, Brown T, Hebert DA, Cardone MF, Hou Y, Choudhary RK, et al. Initial analysis of copy number variations in cattle selected for resistance or susceptibility to intestinal nematodes. Mamm Genome. 2011;22(1-2):111–21.

  22. Chen Y, Arthur PF, Barchia IM, Quinn K, Parnell PF, Herd RM. Using gene expression information obtained by quantitative real-time PCR to evaluate Angus bulls divergently selected for feed efficiency. Anim Prod Sci. 2012;52(11):1058–67.

    CAS  Article  Google Scholar 

  23. Feitosa FLB, Pereira ASC, Venturini GC, Tonussi RL, Espigolan R, Gordo DM, et al. Genome wide association study between copy number variation regions with marbling score in Nelore cattle. In: Proceedings, 10th World Congress of Genetics Applied to Livestock Production. Vancouver; 2014.

  24. de Camargo GMF, Aspilcueta-Borquis RR, Fortes MRS, Porto-Neto R, Cardoso DF, Santos DJA, et al. Prospecting major genes in dairy buffaloes. BMC Genomics. 2015;16(1):872.

  25. Spencer TE, Forde N, Dorniak P, Hansen TR, Romero JJ, Lonergan P. Conceptus-derived prostaglandins regulate gene expression in the endometrium prior to pregnancy recognition in ruminants. Reproduction. 2013;146(4):377–87.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. Seo H, Choi Y, Shim J, Yoo I, Ka H. Prostaglandin transporters ABCC4 and SLCO2A1 in the uterine endometrium and conceptus during pregnancy in pigs. Biol Reprod. 2014;90(5):100–1.

    CAS  Article  PubMed  Google Scholar 

  27. Mota LFM, Lopes FB, Júnior GAF, Rosa GJM, Magalhães AFB, Carvalheiro R, et al. Genome-wide scan highlights the role of candidate genes on phenotypic plasticity for age at first calving in Nellore heifers. Sci Rep. 2020;10:1–13.

    Article  Google Scholar 

  28. Zhou D, Xu H, Chen W, Wang Y, Zhang M, Yang T. Study on the transcriptional regulatory mechanism of the MyoD1 gene in Guanling bovine. RSC Adv. 2018;8(22):12409–19.

    CAS  Article  Google Scholar 

  29. Dong X, Cao H, Mao H, Hong Q, Yin Z. Association of MyoD1 gene polymorphisms with meat quality traits in domestic pigeons (Columba livia). J Poult Sci. 2019;56(1):20–6.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. O’brien AMP, Utsunomiya YT, Mészáros G, Bickhart DM, Liu GE, Van Tassell CP, et al. Assessing signatures of selection through variation in linkage disequilibrium between taurine and indicine cattle. Genet Sel Evol. 2014;46:1–14.

    Article  Google Scholar 

  31. Cardoso DF, de Albuquerque LG, Reimer C, Qanbari S, Erbe M, Do Nascimento AV, et al. Genome-wide scan reveals population stratification and footprints of recent selection in Nelore cattle. Genet Sel Evol. 2018;50:1–12.

  32. Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, et al. Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007;39(10):1256–60.

  33. Axelsson E, Ratnakumar A, Arendt M-L, Maqbool K, Webster MT, Perloski M, et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature. 2013;495(7441):360–4.

  34. Rinker DC, Specian NK, Zhao S, Gibbons JG. Polar bear evolution is marked by rapid changes in gene copy number in response to dietary shift. Proc Natl Acad Sci U S A. 2019;116(27):13446–51.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. Cozzi MC, Martinez-Ruiz CP, Roman-Ponce SI, Murillo VEV, Utrera ÁR, Montaño-Bermúdez MM, et al. Copy number variants reveal genomic diversity in a Mexican creole cattle population. Livest Sci. 2019;229:194–202.

  36. Strillacci MG, Gorla E, Cozzi MC, Vevey M, Genova F, Scienski K, et al. A copy number variant scan in the autochthonous Valdostana red pied cattle breed and comparison with specialized dairy populations. PLoS One. 2018;13:1–18.

    Article  Google Scholar 

  37. Wang Y, Wang X-H, Fan D-X, Zhang Y, Li M-Q, Wu H-X, et al. PCSK6 regulated by LH inhibits the apoptosis of human granulosa cells via activin a and TGFb2. J Endocrinol. 2014;222(1):151–60.

  38. Seabury CM, Oldeschulte DL, Saatchi M, Beever JE, Decker JE, Halley YA, et al. Genome-wide Association Study for Feed Efficiency and Growth Traits in U.S. Beef Cattle. BMC Genomics. 2017;18:386.

    Article  Google Scholar 

  39. Lemos MVA, Chiaia HLJ, Berton MP, Feitosa FLB, Aboujaoud C, Camargo GMF, et al. Genome-wide association between single nucleotide polymorphisms with beef fatty acid profile in Nellore cattle using the single step procedure. BMC Genomics. 2016;17(1):213.

  40. Parker Gaddis KL, Null DJ, Cole JB. Explorations in genome-wide association studies and network analyses with dairy cattle fertility traits. J Dairy Sci. 2016;99(8):6420–35.

    CAS  Article  PubMed  Google Scholar 

  41. McGinnis LK, Kinsey WH, Albertini DF. Functions of Fyn kinase in the completion of meiosis in mouse oocytes. Dev Biol. 2009;327(2):280–7.

    CAS  Article  PubMed  Google Scholar 

  42. Sigdel A, Liu L, Abdollahi-Arpanahi R, Aguilar I, Peñagaricano F. Genetic dissection of reproductive performance of dairy cows under heat stress. Anim Genet. 2020;51(4):511–20.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. Gutiérrez-Gil B, Ball N, Burton D, Haskell M, Williams JL, Wiener P. Identification of quantitative trait loci affecting cattle temperament. J Hered. 2008;99(6):629–38.

  44. Zhou Y, Utsunomiya YT, Xu L, Abdel HEH, Bickhart DM, Alexandre PA, et al. Genome-wide CNV analysis reveals variants associated with growth traits in Bos indicus. BMC Genomics. 2016;17:419.

    Article  Google Scholar 

  45. Pinto D, Darvishi K, Shi X, Rajan D, Rigler D, Fitzgerald T, et al. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol. 2011;29(6):512–20.

  46. Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, et al. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 2008;36(19):e126.

  47. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. Hammer Ø, Harper DAT, Ryan PD. PAST: paleontological statistics software package for education and data analysis. Palaeontol Electron. 2001;4:9.

    Google Scholar 

Download references


The authors gratefully acknowledge the Animal Breeding Center of Iran (ABCI) and Towsee Kesht va Dam Noandish Alborz Co (Takdna) for access to the animals and recording. The Author also acknowledge Dr. Davide Passaro for his technical support in data management.

Online database!/proteins/791/374666|Bubalus%20bubalis/ accessed 2020/10/08.


This research did not receive external funding.

Author information

Authors and Affiliations



MGS and AB: Conceived the experiment, run the data analyses, interpreted the results and wrote the manuscript; HMS: provided the genotypes; PD, SMG, MM and HMS collaborated in interpreting the results and in writing the manuscript. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Hossein Moradi-Shahrbabak.

Ethics declarations

Ethics approval

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Table S1

List of CNVs identified for each Iranian river buffalo breed.

Additional file 2: Supplementary Table S2

List of all CNVRs identified for each Iranian river buffalo breed.

Additional file 3: Supplementary Table S3

List of 203 non redundant CNVRs. Samples count for each breed, genes and QTL are also reported.

Additional file 4: Supplementary Table S4

Gene annotation according to DAVID Database.

Additional file 5: Supplementary Figure S1

PCA and FST based on SNP genotypes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Strillacci, M.G., Moradi-Shahrbabak, H., Davoudi, P. et al. A genome-wide scan of copy number variants in three Iranian indigenous river buffaloes. BMC Genomics 22, 305 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Water Buffalo
  • Structural variation
  • Copy number variant
  • Biodiversity