- Research article
- Open Access
Prospecting major genes in dairy buffaloes
BMC Genomicsvolume 16, Article number: 872 (2015)
Asian buffaloes (Bubalus bubalis) have an important socio-economic role. The majority of the population is situated in developing countries. Due to the scarce resources in these countries, very few species-specific biotechnology tools exist and a lot of cattle-derived technologies are applied to buffaloes. However, the application of cattle genomic tools to buffaloes is not straightforward and, as results suggested, despite genome sequences similarity the genetic polymorphisms are different.
The first SNP chip genotyping platform designed specifically for buffaloes has recently become available. Herein, a genome-wide association study (GWAS) and gene network analysis carried out in buffaloes is presented. Target phenotypes were six milk production and four reproductive traits. GWAS identified SNP with significant associations and suggested candidate genes that were specific to each trait and also genes with pleiotropic effect, associated to multiple traits.
Network predictions of interactions between these candidate genes may guide further molecular analyses in search of disruptive mutations, help select genes for functional experiments and evidence metabolism differences in comparison to cattle. The cattle SNP chip does not offer an optimal coverage of buffalo genome, thereafter the development of new buffalo-specific genetic technologies is warranted. An annotated reference genome would greatly facilitate genetic research, with potential impact to buffalo-based dairy production.
Asian buffaloes are a livestock species with a high socio-economic importance and promising characteristics for production. The species is mostly found in developing countries integrating production system by providing meat and milk to local communities. Asian buffaloes are also used as draught animals. In developed countries, such as Italy, the buffalo population is selected for dairy production, specially the production of mozzarella cheese, the most famous trademark product. Buffalo milk has high fat content and solids concentration and these intrinsic characteristics are favourable for cheese manufacturing.
There are two species of domesticated buffaloes: river or water buffalo (Bubalus bubalis, 2n = 50) and swamp buffalo (Bubalus carabanesis, 2n = 48)  . The genetic difference is marked by a fusion of chromosomes 4 and 9 in swamp buffalo. The first cross between the two species produces fertile offspring, but fertility is reduced in subsequent crosses . In comparison to cattle, buffalo metacentric chromosomes (five) are a fusion of two cattle acrocentric chromosomes and the other chromosomes conserve a high homology between species .
There are two water buffalo genomes sequenced [4, 5]. Both of them have the sequences available at NCBI platform in scaffolds. However, the sequences are not displayed in chromosomes and genes are not annotated (UMD_CASPUR_WB_2.0; http://www.ncbi.nlm.nih.gov/assembly/GCF_000471725.1/). In an effort to generate a reference set to aid polymorphism discovery and gene annotation of the buffalo genome, RNA from 30 different tissues was extracted and sequenced . The lack of buffalo genomic data means that researchers need to refer to a “next of kin” species: cattle.
Cattle and buffalo species are in a close evolutionary relationship and the cattle genome is far better characterised than buffalo. An initial genome maps for buffaloes using cattle-derived markers and possible rearrangements were identified between species . Positional candidate genes and physical mapping were generated giving a better understanding of buffalo genome structure.
More recently, a cattle SNP chip was applied to characterize buffalo genome. Genotyping 10 buffaloes with a 54 k cattle SNP chip (Illumina) and found that ~80 % of the SNPs were successfully genotyped, but only ~2 % (1,159) were segregating in the population . This result indicates that genome sequences are conserved between the species but not necessarily the polymorphisms. The authors also identified that the SNPs genotyped are not equally distributed in the buffalo genome. There are some SNP-rich and some regions with poor SNP coverage, and therefore the cattle SNP chip does not offer an optimal coverage of buffalo genome. Genotyping 384 buffaloes using 777 k cattle SNP chip (Illumina) and showed that ~88 % of the SNPs were genotyped in buffaloes, but also only ~2 % (16,580) of the SNPs were segregating . In a linkage disequilibrium study, these authors reported a mean value of r2 equal to 0.28 indicating that these SNP could be used for genomic selection and SNP association analyses. Studies that used cattle SNP chips did identify SNP associated with production and reproductive traits in buffaloes; using the 54 k cattle SNP chip  and a 777 k cattle SNP chip . However, given that only ~2 % of the SNP in these cattle chips was segregating in buffaloes, a species specific SNP chip would be more informative. Importantly, SNPs present in the cattle chips that segregate in buffaloes are probably “old” polymorphisms, existent before the speciation event that separated cattle from buffaloes. Old polymorphisms might not be appropriate to study the result of artificial selection in dairy buffaloes. Based on this market necessity, a commercial buffalo SNP chip array was recently released (Axiom ® Buffalo Genotyping Array 90 K Affymetrix). The selection of SNP included in the chip array is based on buffalo sequencing data (Affymetrix), but SNP position and annotation to genes used the cattle genome as a reference (UMD3.1 assembly). Due to the fact that this is the most appropriate tool available, it was used in the present study.
The aim of the study was to identify SNPs, genomic regions and genes that affect production and reproductive traits. To this aim genome-wide association analyses and gene network predictions were carried. Gene network analyses aid the identification of genes that have pleiotropic effects and/or regulatory roles . The genes identified might be candidates for future fine-mapping studies in search of causative mutations. The interpretation of results herein might also trigger genome structure, metabolism and physiology comparisons between species, supporting evolutionary studies.
Animal ethics committee approval was not required for the present study. The data and samples used here were obtained from an existent databank of the Animal Science Department from São Paulo State University (Unesp), Jaboticabal-SP, Brazil. The department is responsible for the Milk-Recording Buffalo Program. The farmers gently contribute with phenotypes, pedigree information and samples of the animals.
Six production traits and four reproductive traits were targeted. The production traits were: milk production (MP), fat production (FP), protein production (PP), fat percentage (%F), protein percentage (%P) and somatic cell score (SCS). The reproductive traits were: age at first calving (AFC), calving interval (CI), open days (OD) and number of services per conception (NSC). The data analyzed was based on 11,530 lactations of 3,431 buffaloes, monthly recorded from 1995 to 2013. Murrah buffaloes were from 12 farms with 186 sires with registered daughters. The final pedigree archive had 14,346 animals. The structure of the data is presented in Table 1. The SCS doesn’t have normal distribution and it was transformed to the log scale, using the function: SCS = (log2(CCS/100.000)) + 3 .
Records were obtained apart from the fifth production day in milk. First five day of colostrum production were not considered. Only lactations longer than 90 days were used in the analyses. The cumulative milk production over 305 days (MP), fat production (FP) and protein production (PP) were calculated apart from the production in the milk-recording day. The %F, %P and SCS were the monthly record means per lactation. The age at first calving (AFC) was defined as the difference, in months, between the first calving and the birth of the buffalo. The calving interval (CI) was defined as the difference, in months, between consecutive calving events. The number of services per conception (NSC) is the number of artificial inseminations per conception for each buffalo. The open day (OD) is the difference, in days, between the calving and the subsequent conception. The contemporary group (CG) was formed by herd, year and calving season (October-March and April-September) for all the traits, except CI and by herd, year and birth season for CI. Each CG had at least four animals and trait records between ±3.5 standard-deviations of the group mean.
Breeding value prediction
A repeatability animal model was used for the genetic analyses of all traits except AFC. For AFC, an animal model without repetition was used, because this trait can only be measured once. Variance components were estimated by Restricted Maximum Likelihood method (REML) using the Wombat software . The model included the fixed effect of CG, age fitted as a co-variable (age of buffalo at calving, linear and quadratic) (except for AFC) and the random effects of additive genetic value, permanent environment (except for AFC) and residual. Fitted model scan be represented in matrix notation:
Where β, a, p and e are the vectors of fixed effects, additive genetic value, permanent environment and residual, respectively; X , Z and W are the incidence matrix of fixed effects, additive genetic value and permanent environment. A brief report of the parameter estimates (heritability, genetic and phenotypic correlations) was included (Tables 2 and 3). The genetic values and their accuracies were obtained (Table 4), de-regressed as was proposed by  and used as pseudo-phenotypes for GWAS.
Genotyping and quality control
A total of 452 buffaloes (57 sires and 395 dams) were genotyped using the 90 K Axiom ® Buffalo Genotyping (Affymetrix). The animals genotyped were the ones with the best accuracies. The sires have at least 40 progenies and the dams at least three calvings and many of the dams are mothers of sires used in the herds. Initially, the SNP chip contained 92,826 markers. Sample quality control observed the call rate of 0.95 and above, and a heterozygozity of ± 3 standard-deviation of the mean. For SNPs quality control, thresholds were set for call rate (superior to 0.98), Hardy-Weinberg equilibrium (P-value test less than 10−6), and correlation between markers (if higher than 0.998 one SNP of the correlated pair was excluded). Also, coincident SNPs were eliminated. Minor allelic frequency was not used to discard markers. Some SNPs were genotyped twice when there was a probe in each strand for the same SNP. In the case of coincident SNP, the probe with the most animals genotyped was used. Markers present in Y chromosome and mitochondrial DNA were discarded. Markers in X chromosome were considered. The males have only one X chromosome, so they are always homozygous for the markers (0 or 2), females have two, so they were codified like the autosomes (0, 1 or 2). After the quality control, the number of SNPs retained for association analysis was 61,145.
Genome-wide association study
In order to associate SNPs with the de-regressed breeding values (DEBV) of the studied traits, a mixed linear model was implemented using R software and GenABEL package . The DEBV have information of the record of the animal genotyped as well as from their relatives. The reliability (source of information) varies among the animals, so the DEBV have heterogeneous variances corrected by the residual weights as proposed by . The model implemented was:
Where y is the vector of the DEBVs, X is the vector of the genotypes in the locus being tested, β is the fixed additive genetic value attributed to the locus, μ is the vector of the polygenic with normal distribution μ ~ N(0, Gσ u 2 ) and ε is the vector of the residual error with normal distribution ε ~ N(0, Iσ e 2 ).
The pedigree relationship matrix based on pedigree, G, describes the relation of the whole genome among the individuals, since it is estimated based on alleles identical by state (IBS) of the markers. The parameters σ u 2 and σ e 2 were estimated using Restricted Maximum Likelihood (REML) method for each SNP. The generalized least square (GLS) was used to estimate the β effects using the F test for the null hypothesis H 0 : β = 0.
Where y is the vector of the DEBVs, x is the design matrix, β is the vector of coefficients of the regression on recoded SNP genotypes; z is the incidence matrix for animal effects; μ ~ N(0, Aσ a 2 ) is a vector of the polygenic animal effects and e ~ (0, Iσ e 2 ) is the vector of residuals, in which A is an additive genetic relationship matrix of animals and I is an identity matrix, and σ a 2 and σ e 2 are the animal’s additive polygenic variance and residual error variance, respectively. SNP allele substitution fixed effects (β) and random background polygenic effects were evaluated in this model. Values in the design matrix, x, were coded as 0, 1, 2 for the SNP genotypes, representing the number of copies of the minor allele carried by the individual. The parameters and were estimated using Restricted Maximum Likelihood (REML) method for all SNP. The generalized least square (GLS) was used to estimate the β effects using the F test for the null hypothesis H 0 : β = 0.
Subsequently, a Wald chi-square statistics was used to determine if the SNP was associated with the traits studied .
The percentage of the phenotypic variance (Vp) explained by each SNP was estimated according to the equation:
α = allelic substitution effect
p = allelic frequency for ith observed SNP in the population
σ p 2 = Phenotypic variance estimate of the trait
Multi-trait analysis, pleiotropic effects and gene network prediction
The association weight matrix (AWM) methodology  was adapted and used to build a gene network from GWAS output data. In the original description of AWM a key trait is selected to weight network predictions. In this study, the main idea was to identify genes that equally contribute to the variation observed in all ten traits studied, as these pleiotropic SNPs might be more useful for genetic evaluation in buffaloes. In this context, it was used the methodology described by : instead of using SNPs P-values, t-values calculated served to ground gene network predictions, (t ≥ 2.80 ≈ p ≤ 0.05). These statistics determine the importance of the SNPs across the traits and are interpreted as a measure of pleiotropic effect. All the SNPs were used in the analysis regardless of their location. Normally in AWM, SNP-to-gene distances are considered prior to construction of gene networks. However, in this study, inclusion of all SNPs was preferred since genotyped SNPs were buffalo variants with locations annotated in the cattle genome (precise SNP-to-gene distances are actually unknown).
To identify significant SNP-SNP interactions we used the partial correlation and information theory (PCIT) algorithm . Pairwise correlations across matrixrows were used to predict SNP-SNP interactions and hence build a genenetwork . The SNP pairs significantly co-associated and with correlation higher than 0.85 had an edge (connection) established in the gene network, which was visualized using the Cytoscape software  and MCode App . In the network, every SNP was a node and every significant interaction was an edge connecting two nodes. When a SNP was next to a gene (Variant Effect Predictor default), the gene ID was included in the network.
Identification of SNP location and gene enrichment
Variant Effect Predictor (VEP) from Ensembl website was used to verify if the significant SNP was near a gene and determine the distance. Analyses were done using the cattle genome.
Gene ontology (GO) enrichment analyses were carried using Gorilla tool (http://cbl-gorilla.cs.technion.ac.il/) to aid interpretation of GWAS results. The top genes associated with the traits were compared to a genome-wide background gene list. Top genes were defined as genes with a SNP whose P < 0.001 (distance of the SNP to gene determined by VEP default). These GO enrichment analysis were carried for each trait separately.
Results and Discussion
Single-trait-single-SNP GWAS was carried for six milk production and four reproductive traits in a population of dairy buffaloes. These GWAS in buffaloes used a specific SNP-chip designed for the species. Although the selection of SNP included in the chip array is based on buffalo sequencing data, SNP position and annotation to genes used the cattle genome as a reference (UMD3.1 assembly) because there is no public complete genome reference available for buffalos. The numbers of significant SNP were similar between traits, within significance thresholds (Table 5). They were also similar in number to those obtained by studies that used low density SNP chips in cattle [12, 22].
We described the data structure (Table 1) and have estimated heritabilities for the studied traits, which range from 0.06 to 0.42 (Table 2). The genetic correlations range from −0.11 to 0.93 (Table 3). These parameters are reported as they underpin GWAS results and gene network predictions. Data structure awareness is important context for GWAS common concerns: multiple testing and sample size limitations.
The SNPs that explained most of the phenotypic variance indicated regions of the genome that have an influence in the traits studied and indicate new candidate genes. Phenotypic variance percentage, positions and nearby genes were provided for these significant SNPs (Table 6). Significance of all the SNPs tested and percentages of phenotypic variance explained were reported as well (Additional file 1: Table S1).
Association analyses for the reproductive traits resulted in candidate genes for buffaloes that have known roles in reproductive physiology. For age at first calving (AFC), the gene coding for interferon-Tau, IFN-TAU, and other interferon genes were identified. Embryonic production of IFN-TAU is the primary signal for maternal recognition of pregnancy in buffaloes . Another gene associated with AFC was LOC100299005 (SELP), a gene with up-regulated expression during inflammatory processes related to follicular atresia in cattle . It is clear that modifications in the protein structure and/or in the expression levels of these genes could affect conception outcome and therefore impact on AFC [25, 26]. SELP gene, mapped to chromosome X, had the most significant SNP associated to AFC. The sexual chromosomes influence reproductive and andrological traits in cattle [27–29], among others traits, such as SCS and milk content in dairy cattle [30–33]. The results presented here add to this list and encourage the inclusion of sexual chromosomes in GWAS to avoid missing important information.
For calving interval (CI), a gene involved in spermatozoa acrosome reaction in humans was identified: TPCN1 . Spermatozoa acrosome reaction is necessary for fertilization and tends to be studied in the context of male fertility. The association of TPCNI with CI suggests an interesting thought: a gene related to male fertility might be more relevant to herd performance (in terms of CI) than genes related to female fertility. Increased conception rates after calving, and, as consequence, decreased CI may also reflect fertilization ability of bulls in the studied population. As a complex trait, CI may be influenced by several component traits linked to both male and female fertility, including spermatozoa quality andrological parameters .
For number of services per conception (NSC), the top gene found was LOC100336232 (ABCC4). This gene has its expression increased in the endometrium of pregnant cows  and pigs  and seems to be important to support pregnancy since it acts on prostaglandin efflux from cells . Prostaglandin has a variety of roles in reproduction being responsible for maternal recognition of the pregnancy and conceptus implantation, processes that closely related to NSC. Moreover, in a whole genome re-sequencing of Hanwoo cattle, ABCC4 was identified as the gene with the biggest number of non-synonymous SNPs, splice-site variants, and coding indels . ABCC4 may be a useful source of variation to be studied in buffaloes and cattle. In Angus cattle, the ABCC4 expression was significantly correlated with residual feed intake (RFI) , being up-regulated in high RFI animals. In Nelore cattle, a CNV within intron 22 of ABCC4 was correlated with marbling score . The emerging hypothesis is that ABCC4 acts in basic metabolic pathways and is highly polymorphic with potential effect in a variety of phenotypes (i.e. reproduction, meat quality, etc.).
Gene ontology enrichment analyses were also performed using GOrilla to compare the top genes associated with the traits (P < 0.001) with a genome-wide background gene list. For AFC and CI, many processes involving neural development and activity were listed (GO terms: GO:0048814 and GO:0021836 for AFC and GO:0050807, GO:0045666, GO:00501962, GO:0050769, GO:0051960, GO:0031290, GO:0051960, GO:0031290, GO:0021819 for CI). There were other genes expressed in the central nervous system that were associated to puberty in female cattle [12, 22, 41]. The role of these genes in reproduction may be due to the neuronal activity in the hypothalamus-pituitary axis, responsible for initiating the hormone cascade that is a trigger for puberty followed by the initiation of the estrous cycle in females . It is reasonable to assume that genes involved with pubertal development and maintenance of estrous cycle could be associated with AFC and CI.
Regarding %F and FP, four genes related to the carbohydrate metabolism (KCTD8, FOXO4, SSTR3, LOC782855) and one gene related to lipid metabolism (ESRRG) were identified. The KCTD8 gene interacts with genes that act in the insulin secretion and glucagon liberation pathways, participating in the glucose absorption . FOXO4 gene down-regulates gluconeogenesis and up-regulates glycolysis . SSTR3 inhibits the activity of Glucose-dependent insulinotropic polypeptide’s function in intestine, promoting the accumulation of glucose and fat . LOC782855 (RPS26) was related to diabetes in humans . The association of ESRRG to fat production in the present GWAS could be expected, since this gene regulates other adipogenic genes . In cattle, ESRRG was also considered a key regulator of puberty in a multi-trait analyses that included fat deposition traits . Most of the top genes associated with fat percentage and fat production integrate the carbohydrate metabolism and not the lipid metabolism as in cattle [48, 49]. This fact could suggest some differences between buffalo and cattle fat production in milk. On average, buffaloes have higher contents of milk fat than cattle. In buffaloes the percentage of milk fat ranges from 6.7 % to 12.0 % [11, 50–52], while in cattle it ranges from 3.1 % to 4.5 % [53, 54]. The difference in milk fat might be explained by a more efficient acetate metabolism to produce lipids in buffaloes, as the results suggest. In comparison to cattle and under same high fibber diet, buffaloes have a higher average daily gain . It means that buffaloes have a better capacity to digest fibber content in rumen. Fibber fermentation generates acetate, a fat precursor . This characteristic might generate a bigger contribution of genes related to acetic acid metabolism in the fat production traits in buffaloes, differentiating considerably cattle and buffalo metabolism for fat milk content .
SNP associations with milk protein production suggested CCND3 as a candidate gene in buffaloes. This gene has a role in alveolar development in the mammary gland, in cooperation with prolactin . Variances in the biological activity of protein coded by CCND3 may affect the structure and/or physiology of the alveolus. Milk production is a function of blood circulation in the mammary alveolus. Other genes, whose physiological activity is within the alveolus, were already correlated with milk content in buffaloes . Now, CCND3 has been added to the list with specific implications for milk protein content.
Gene ontology enrichment analyses were also carried for production traits. For the somatic cells score (SCS), the biological process of regulation of lymphocyte migration was significant (GO:2000401). This immunological metabolic process is correlated with SCS because this trait is used as an indirect measure of mastitis which severely diminishes milk production. Buffaloes with a more efficient immune system (better variants for genes that regulate lymphocyte migration) might do better in avoiding the disease.
Buffalo and cattle chromosomes have an extensive similarity and 84 % of cattle markers were successfully used in buffaloes . However, despite good examples of putative candidate genes reported herein (mainly obtained due to the similarities), many genes that were associated to milk production traits have no known role in milk production. Some of these genes are described as having a role in basic metabolism and many are not characterized at all. The associations presented here open the door to study these genes in the context of milk production. It also reinforces that basic research to characterize and identify the function of the genes is still necessary, especially in buffaloes. According to , there are only 493 annotated genes in buffalo. The regular number of genes in mammalian genomes is around 20,000, so the discrepancy is evident. Moreover, rearrangements and inversions in the cattle homologue chromosomes complicate the annotation of buffalo genes . A species specific genome reference for buffalo is needed.
The identification of genes with pleiotropic effects could contribute to the genetic evaluation of many traits. An example of a gene with important pleiotropic effects is PLAG1 in cattle . To identify genes with a pleiotropic effect and regulatory role in buffaloes, we predicted a gene network from the ten studied traits. The gene network was visualized using Cytoscape software. Data from 1,723 SNPs were used in network predictions, selected SNP were associated with the majority of traits. Of these, a total of 608 SNPs were identified to be close or within a gene or a known transcript. The SNPs, that didn’t have a gene close to them, remained on the network as nodes named after their chromosome position. The final gene network had 1068 nodes and 3307 edges. The nodes interactions varied from 23 to 1 with an average of 3.9 interactions per node (Fig. 1a).
A SNP located in C14H8orf34 gene was a central node of the network, having 23 predicted interactions. Information about this gene is scarce; however some indications of its function and pleiotropic effect might be discussed. SNPs in this gene were associated, in humans, with “fasting serum aspartate aminotransferase” and “urinary free epinephrine excretion per day” . Some considerations might be done concerning these phenotypes. Aspartate aminotransferase is a test carried to check for liver damage . The liver has more than 400 functions and participates in the general metabolism. A gene with a liver function would be a logical candidate for pleiotropic or regulatory roles. The association of C14H8orf34 with epinephrine production may also support a pleiotropy claim. Epinephrine is a hormone and a neurotransmitter synthesized in the adrenal glands that acts via many pathways to accelerate metabolism under stress situations .
With the aim to find high interconnected regions, termed clusters, further analyses were carried using MCode App (Cystoscape). The first cluster had CA10 as its seed (Fig. 1b). The function of CA10 is the inter-conversion between carbon dioxide and bicarbonate, with essential physiological function in many tissues . In humans, SNPs in CA10 were related to menarche, weight and body mass index [66, 67]. The association of this gene with growth and reproductive traits reinforces the wide effect and supports the findings herein. In this context, CA10 might be a regulator of fat metabolism and reproductive development in buffaloes.
Transcription factor (TF) genes, a total of 28 in the network, also had their clusters mined for regulatory information. Genes that work as TF are important in terms of pleiotropy effect since they can guide transcription and interact with many other genes. The TF with highest number of predicted interactions was RARB, a retinoic acid receptor important for cell growth and differentiation . RARB is expressed in many tissues from liver and intestine  to sperm  in cattle. The gene seems also to be a very important for bovine mammary gland cell viability . Studies of cell cycle and apoptotic events in mammary glands  suggests that the role RARB in its development is expressive. Considering the traits herein analysed, six out of ten traits analysed are related to the udder and its physiology, so the finding of a TF that has a crucial role in its development as a central regulator in the network is plausible (Fig. 1c).
The TF gene with the second largest number of connections in the network was ATF1. This gene regulates other genes involved in growth and survival and was associated with angiogenesis in the mammary gland . Milk is derived from blood due to difference of pressure in the alveolus. A suitable explanation is that the better vascularised the alveolus are, the bigger is the milk and contents production, resulting in a suitable explanation. This gene was also indicated as a key TF for meat quality  (Fig. 1c).
The RARB gene has eight interactions, three of them are with known genes (RILPL1, bta-mir-10a, PRKCA) and ATF1 gene has six interactions, four of them are with known genes (ABCA13, TBC1D19, MAGI2, CUL5). It is curious to verify that some of these genes also have a variety of roles in the metabolism and are also candidates to have pleiotropic effects.
A genome wide association study with milk production was done with the buffalo SNP chip in Mediterranean buffaloes  with 78,137 SNPs considered in the analyses. However, the SNPs reported to be highly associated with the trait  are not the same to the ones found in the present study. Some of the SNPs weren’t included in the present analyses and some didn’t have significance. The divergent results could be explained in many ways: different genetic composition of the breeds (Mediterranean x Murrah), selection pressure in the population (expected to be higher in the Italian population), inbreeding (higher in Brazilian population), SNPs segregating and analyzed in both studies (78,137 SNPs in Mediterranean x 61,144 SNPs in Murrah), methodologies for estimation of the SNP effects and etc.
GWAS studies were done for reproductive and production traits in China , and using partially the same animals of the present study  and a low and a high density bovine SNP, respectively. The genes found were not the same as the above discussed neither. There are many differences in the markers used in these studies. They worked with 935 SNPs  and 15,745 SNPs  that are cattle variants and are also segregating in buffaloes. These markers should exist before the differentiation of the species. In the present work, the density of the SNPs is much higher (61,144 markers) and the selection of them based on buffaloes data. Differences in SNPs used may explain the contrasting results. The bovine SNP chip does not cover the buffalo genome with the same efficiency, even if the species are close in evolution terms. The use of cross-species SNP chips might not be as informative as initially proposed . Even if the DNA sequences are similar, the functional variants may not be, as suggested before  and . The interesting candidate genes discussed herein resulted from significant SNP and are largely supported by the literature in terms of its biological function. These promising GWAS results emphasize the importance of selecting SNP that are species specific.
Dairy cattle GWAS of varying breeds and traits were further evidence for species differences when compared to results herein [30, 32, 33, 49, 74–79]. Only one SNP (rs41610147) was associated with three fertility traits in Danish Holstein cattle (female fertility index, interval from calving to first insemination, days from first to last insemination in heifers)  and was located in an association region in buffaloes. The SNP (rs41610147) in cattle is 398 bp far from the second most significant SNP associated with NSC in buffaloes. These SNPs may be indicating the same causative mutation or major gene associated to reproductive physiology in both species. It is possible to conclude that despite high genome homology between buffalo and cattle, the contribution and influence of genes and variants to studied traits is mostly different. Candidate genes that might be buffalo-specific could be explained by the presence of underpinning causative mutations that are not found in cattle. Some examples of divergent time of genes between the species were discussed before .
Comparing different breeds of the same species already results in differences regarding associated genes and the proportion of phenotypic variance that they explain [80, 81]. These differences might be explained by epistatic effects, selection pressure, different environment, recombination rate, effective population size, allelic frequencies differences, genome coverage by the SNP chip etc. Logically, if differences are found between breeds of the same species, comparing different species can only result in stronger contrast.
Species-specific technologies are important and needs to be further developed. Particularly, for the buffalo species, the lack of a publicly available complete and annotated genome complicates the advance and development of new methodologies for genetic evaluation for the specie.
The genes identified in this study are candidates for fine-mapping with the aim to find putative causative mutations. The incorporation of this information in a low density SNP chip is informative and auxiliary to genetic evaluation, with cost-benefit for producers. The identification of causative mutations reduces the need for tag SNP (in linkage disequilibrium with causative mutations), promotes higher accuracy for genomic breeding values, which can persist over generations and permits a higher transferability across breeds . The results presented expand our knowledge and indicate regions for possible genes not yet annotated in buffaloes, but potentially important. It also serves as basis for further functional genes studies.
The present article is a genome wide association and gene network analyses in buffaloes using a SNP chip specifically developed for the species. Putative genes for production and reproductive traits were found and these are candidates for searching causative mutations. Comparative analyses between cattle and buffaloes support that although the genome sequence is similar, the variants between them are different. Evidence that species-specific technology should be developed for buffaloes was presented discussed herein.
Age at first calving
Association weight matrix
Copy number variation
De-regressed breeding values
Generalized least square
Genome-wide association study
National Center for Biotechnology Information
Number of services per conception
Partial correlation and information theory
Restricted Maximum Likelihood method
Residual feed intake
Somatic cell score
Single nucleotide polymorphism
Variant Effect Predictor
Gallagher Jr DS, Womack JE. Chromosome conservation in the Bovidae. J Hered. 1992;83(4):287–98.
Harisah M, Azmi TI, Hilmi M, Vidyadaran MK, Bongso TA, Nava ZM, et al. Identification Of Crossbred Buffalo Genotypes And Their Chromosome Segregation Patterns. Genome. 1989;32(6):999–1002.
Iannuzzi L, Di Meo GP, Perucatti A, Schibler L, Incarnato D, Gallagher D, et al. The river buffalo (Bubalus bubalis, 2n = 50) cytogenetic map: assignment of 64 loci by fluorescence in situ hybridization and R-banding. Cytogenet Genome Res. 2003;102(1–4):65–75.
Tantia MS, Vijh RK, Bhasin V, Sikka P, Vij PK, Kataria RS, et al. Whole-genome sequence assembly of the water buffalo (Bubalus bubalis). Indian J Anim Sci. 2011;81(5):465–73.
Zimin AMG, Ferre F, Biagini T, Shroeder S. The Buffalo Reference Genome Assembly. San Diego: XXI Plant and Animal Genome; 2013.
Strozzi FPM, Iamartino D, Ferre F, Chillemi G, Zimin A, Williams J. The Buffalo Transcriptome. San Diego: XXI Plant and Animal Genome; 2013.
Amaral MEJ, Grant JR, Riggs PK, Stafuzza NB, Rodrigues Filho EA, Goldammer T, et al. A first generation whole genome RH map of the river buffalo with comparison to domestic cattle. BMC Genomics. 2008;9:631.
Michelizzi VN, Dodson MV, Pan Z, Amaral MEJ, Michal JJ, McLean DJ, et al. Water Buffalo Genome Science Comes of Age. Int J Biol Sci. 2010;6(4):333–49.
Borquis RRA, Baldi F, de Camargo GMF, Cardoso DF, Santos DJA, Lugo NH, et al. Water buffalo genome characterization by the Illumina BovineHD BeadChip. Genet Mol Res. 2014;13(2):4202–15.
Wu JJ, Song LJ, Wu FJ, Liang XW, Yang BZ, Wathes DC, et al. Investigation of transferability of BovineSNP50 BeadChip from cattle to water buffalo for genome wide association study. Mol Biol Rep. 2013;40(2):743–50.
Venturini GC, Cardoso DF, Baldi F, Freitas AC, Aspilcueta-Borquis RR, Santos DJA, et al. Association between single-nucleotide polymorphisms and milk production traits in buffalo. Genet Mol Res. 2014;13(4):10256–68.
Fortes MRS, Reverter A, Zhang Y, Collis E, Nagaraj SH, Jonsson NN, et al. Association weight matrix for the genetic dissection of puberty in beef cattle. Proc Natl Acad Sci U S A. 2010;107(31):13642–7.
Dabdoub SM, Shook GE. Phenotypic relations among milk yield, somatic count cells, and mastitis. J Dairy Sci. 1984;67:163–4.
Karin M. WOMBAT - A tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). J Zhejiang Univ Sci B. 2007;8(11):815–21.
Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol. 2009;41:55.
Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM. GenABEL: an R library for genorne-wide association analysis. Bioinformatics. 2007;23(10):1294–6.
Wald A. Tests f statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc. 1943;54:426–82.
Bolormaa S, Pryce JE, Reverter A, Zhang Y, Barendse W, Kemper K, et al. A Multi-Trait, Meta-analysis for Detecting Pleiotropic Polymorphisms for Stature, Fatness and Reproduction in Beef Cattle. Plos Genetics. 2014;10:3.
Reverter A, Chan EKF. Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks. Bioinformatics. 2008;24(21):2491–7.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504.
Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4:27.
Hawken RJ, Zhang YD, Fortes MRS, Collis E, Barris WC, Corbet NJ, et al. Genome-wide association studies of female reproduction in tropically adapted beef cattle. J Anim Sci. 2012;90(5):1398–410.
Chethan SG, Singh SK, Nongsiej J, Rakesh HB, Singh RP, Kumar N, et al. IFN-tau Acts in a Dose-Dependent Manner on Prostaglandin Production by Buffalo Endometrial Stromal Cells Cultured in vitro. Reprod Domest Anim. 2014;49(3):403–8.
Hayashi K-G, Ushizawa K, Hosoe M, Takahashi T. Differential genome-wide gene expression profiling of bovine largest and second-largest follicles: identification of genes associated with growth of dominant follicles. Reprod Biol Endocrinol. 2010;8:11.
de Camargo GMF, Costa RB, de Albuquerque LG, Regitano LCD, Baldi F, Tonhati H. Association between JY-1 gene polymorphisms and reproductive traits in beef cattle. Gene. 2014;533(2):477–80.
Killeen AP, Morris DG, Kenny DA, Mullen MP, Diskin MG, Waters SM. Global gene expression in endometrium of high and low fertility heifers during the mid-luteal phase of the estrous cycle. BMC Genomics. 2014;15.
Fortes MRS, Lehnert SA, Bolormaa S, Reich C, Fordyce G, Corbet NJ, et al. Finding genes for economically important traits: Brahman cattle puberty. Anim Prod Sci. 2012;52(2–3):143–50.
McDaneld TG, Kuehn LA, Thomas MG, Snelling WM, Smith TPL, Pollak EJ, et al. Genomewide association study of reproductive efficiency in female cattle. J Anim Sci. 2014;92(5):1945–57.
de Camargo GMF, Porto-Neto LR, Kelly MJ, Bunch RJ, McWilliam SM, Tonhati H, et al. Non-synonymous mutations mapped to chromosome X associated with andrological and growth traits in beef cattle. BMC Genomics. 2015;384:384.
Abdel-Shafy H, Bortfeldt RH, Tetens J, Brockmann GA. Single nucleotide polymorphism and haplotype effects associated with somatic cell score in German Holstein cattle. Genet Sel Evol. 2014;46:35.
Bouwman AC, Visker MHPW, van Arendonk JAM, Bovenhuis H. Genomic regions associated with bovine milk fatty acids in both summer and winter milk samples. BMC Genet. 2012;13:1–13.
Strucken EM, Bortfeldt RH, de Koning DJ, Brockmann GA. Genome-wide associations for investigating time-dependent genetic effects for milk production traits in dairy cattle. Anim Genet. 2012;43(4):375–82.
Jiang L, Liu J, Sun D, Ma P, Ding X, Yu Y, et al. Genome Wide Association Studies for Milk Production Traits in Chinese Holstein Population. PLoS One. 2010;5:10.
Arndt L, Castonguay J, Arlt E, Meyer D, Hassan S, Borth H, et al. NAADP and the two-pore channel protein 1 participate in the acrosome reaction in mammalian spermatozoa. Mol Biol Cell. 2014;25(6):948–64.
Kennedy CE, Krieger KB, Sutovsky M, Xu W, Vargovic P, Didion BA, et al. Protein Expression Pattern of PAWP in Bull Spermatozoa Is Associated With SpermQuality and Fertility Following Artificial Insemination. Mol Reprod Dev. 2014;81(5):436–49.
Spencer TE, Forde N, Dorniak P, Hansen TR, Romero JJ, Lonergan P. Conceptus-derived prostaglandins regulate gene expression in the endometrium prior to pregnancy recognition in ruminants. Reproduction. 2013;146(4):377–87.
Seo H, Choi Y, Shim J, Yoo I, Ka H. Prostaglandin Transporters ABCC4 and SLCO2A1 in the Uterine Endometrium and Conceptus During Pregnancy in Pigs. Biol Reprod. 2014;90:5.
Lee K-T, Chung W-H, Lee S-Y, Choi J-W, Kim J, Lim D, et al. Whole-genome resequencing of Hanwoo (Korean cattle) and insight into regions of homozygosity. BMC Genomics. 2013;14:519.
Chen Y, Arthur PF, Barchia IM, Quinn K, Parnell PF, Herd RM. Using gene expression information obtained by quantitative real-time PCR to evaluate Angus bulls divergently selected for feed efficiency. Anim Prod Sci. 2012;52(11):1058–67.
Feitosa FLB, Pereira ASC, Venturini GC, Tonussi RL, Espigolan R, Gordo DM, et al. Genome wide association study between copy number variation regions with marbling score in Nelore cattle. In: Proceedings, 10th World Congress of Genetics Applied to Livestock Production. Vancouver. 2014.
Fortes MRS, Snelling WM, Reverter A, Nagaraj SH, Lehnert SA, Hawken RJ, et al. Gene network analyses of first service conception in Brangus heifers: Use of genome and trait associations, hypothalamic-transcriptome information, and transcription factors. J Anim Sci. 2012;90(9):2894–906.
Bliss SP, Navratil AM, Xie J, Roberson MS. GnRH signaling, the gonadotrope and endocrine control of fertility. Front Neuroendocrinol. 2010;31(3):322–40.
Al Safar HS, Cordell HJ, Jafer O, Anderson D, Jamieson SE, Fakiola M, et al. A Genome-Wide Search for Type 2 Diabetes Susceptibility Genes in an Extended Arab Family. Ann Hum Genet. 2013;77:488–503.
Xiong X, Tao R, DePinho RA, Dong XC. Deletion of Hepatic FoxO1/3/4 Genes in Mice Significantly Impacts on Glucose Metabolism through Downregulation of Gluconeogenesis and Upregulation of Glycolysis. PLoS One. 2013;8:8.
Moss CE, Marsh WJ, Parker HE, Ogunnowo-Bada E, Riches CH, Habib AM, et al. Somatostatin receptor 5 and cannabinoid receptor 1 activation inhibit secretion of glucose-dependent insulinotropic polypeptide from intestinal K cells in rodents. Diabetologia. 2012;55(11):3094–103.
Plagnol V, Smyth DJ, Todd JA, Clayton DG. Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13. Biostatistics. 2009;10(2):327–34.
Kubo M, Ijichi N, Ikeda K, Horie-Inoue K, Takeda S, Inoue S. Modulation of adipogenesis-related gene expression by estrogen-related receptor gamma during adipocytic differentiation. Biochimica Et Biophysica Acta-Gene Regulatory Mechanisms. 2009;1789(2):71–7.
Wang X, Wurmser C, Pausch H, Jung S, Reinhardt F, Tetens J, et al. Identification and Dissection of Four Major QTL Affecting Milk Fat Content in the German Holstein-Friesian Population. PLoS One. 2012;7:7.
Raven L-A, Cocks BG, Hayes BJ. Multibreed genome wide association can improve precision of mapping causative variants underlying milk production in dairy cattle. BMC Genomics. 2014;15:62.
Pirlo G, Terzano G, Pacelli C, Abeni F, Care S. Carbon footprint of milk produced at Italian buffalo farms. Livest Sci. 2014;161:176–84.
Mihaiu M, Lapusan A, Bele C, Mihaiu R, Dan S, Taulescu CM. Compositional Particularities of the Murrah Hybrid Buffalo Milk and its Suitability for Processing in the Traditional System of Romania. Bull UASVM Vet Med. 2011;2:68.
Aspilcueta-Borquis RR, Di Palo R, Araujo Neto FR, Baldi F, de Camargo GMF, de Albuquerque LG, et al. Genetic parameter estimates for buffalo milk yield, milk quality and mozzarella production and Bayesian inference analysis of their relationships. Genet Mol Res. 2010;9(3):1636–44.
Cui X, Hou Y, Yang S, Xie Y, Zhang S, Zhang Y, et al. Transcriptional profiling of mammary gland in Holstein cows with extremely different milk protein and fat percentage using RNA sequencing. BMC Genomics. 2014;15:226.
Bernabucci U, Biffani S, Buggiotti L, Vitali A, Lacetera N, Nardone A. The effects of heat stress in Italian Holstein dairy cattle. J Dairy Sci. 2014;97(1):471–86.
Lapitan RM, Del Barrio AN, Katsube O, Ban-Tokuda T, Orden EA, Robles AY, et al. Comparison of fattening performance in Brahman grade cattle (Bos indicus) and crossbred water buffalo (Bubalus bubalis) fed on high roughage diet. Anim Sci J. 2008;79(1):76–82.
Li X, Chen H, Guan Y, Li X, Lei L, Liu J, et al. Acetic Acid Activates the AMP-Activated Protein Kinase Signaling Pathway to Regulate Lipid Metabolism in Bovine Hepatocytes. PLoS One. 2013;8:7.
Asher JM, O'Leary KA, Rugowski DE, Arendt LM, Schuler LA. Prolactin Promotes Mammary Pathogenesis Independently from Cyclin D1. Am J Pathol. 2012;181(1):294–302.
Araujo DN, de Camargo GM F, da Silva Fonseca PD, Cardoso DF, Hurtado-Lugo NA, Aspilcueta-Borquis RR, et al. Polymorphisms in Oxytocin and alpha(1a) Adrenergic Receptor Genes and Their Effects on Production Traits in Dairy Buffaloes. Anim Biotechnol. 2015;26(3):165–8.
Moaeen-ud-Din M, Bilal G. Sequence diversity and molecular evolutionary rates between buffalo and cattle. J Anim Breed Genet. 2015;132(1):74–84.
El Nahas SM, Abou Mossallam AA, Mahfouz ER, Bibars MA, Sabry N, Seif El-Din S, et al. Radiation hybrid map of buffalo chromosome 7 detects a telomeric inversion compared to cattle chromosome 6. Anim Genet. 2014;45(5):762–3.
Fortes MRS, Kemper K, Sasazaki S, Reverter A, Pryce JE, Barendse W, et al. Evidence for pleiotropism and recent selection in the PLAG1 region in Australian Beef cattle. Anim Genet. 2013;44(6):636–47.
Comuzzie AG, Cole SA, Laston SL, Voruganti VS, Haack K, Gibbs RA, et al. Novel Genetic Loci Identified for the Pathophysiology of Childhood Obesity in the Hispanic Population. PLoS One. 2012;7:12.
Zheng X-N, Wang X-W, Li L-Y, Xu Z-W, Huang H-Y, Zhao J-S, et al. Pu-erh Tea Powder Preventive Effects on Cisplatin-Induced Liver Oxidative Damage in Wistar Rats. Asian Pac J Cancer Prev. 2014;15(17):7389–94.
Straub RH, Cutolo M. Involvement of the hypothalamic-pituitary-adrenal/gonadal axis and the peripheral nervous system in rheumatoid arthritis - Viewpoint based on a systemic pathogenetic role. Arthritis Rheum. 2001;44(3):493–507.
Pastorekova S, Parkkila S, Pastorek J, Supuran CT. Carbonic anhydrases: Current state of the art, therapeutic applications and future prospects. J Enzyme Inhib Med Chem. 2004;19(3):199–229.
Tu W, Wagner EK, Eckert GJ, Yu Z, Hannon T, Pratt JH, et al. Associations Between Menarche-Related Genetic Variants and Pubertal Growth in Male and Female Adolescents. J Adolesc Health. 2015;56(1):66–72.
Tanikawa C, Okada Y, Takahashi A, Oda K, Kamatani N, Kubo M, et al. Genome Wide Association Study of Age at Menarche in the Japanese Population. PLoS One. 2013;8:5.
Wang Y, Baumrucker CR. Retinoids, retinoid analogs, and lactoferrin interact and differentially affect cell viability of 2 bovine mammary cell types in vitro. Domest Anim Endocrinol. 2010;39(1):10–20.
Kruger KA, Blum JW, Greger DL. Expression of nuclear receptor and target genes in liver and intestine of neonatal calves fed colostrum and vitamin A. J Dairy Sci. 2005;88(11):3971–81.
Kasimanickam VR, Kasimanickam RK, Rogers HA. Immunolocalization of retinoic acid receptor-alpha, −beta, and -gamma, in bovine and canine sperm. Theriogenology. 2013;79(6):1010–8.
Jones DT, Lechertier T, Mitter R, Herbert JMJ, Bicknell R, Jones JL, et al. Gene Expression Analysis in Human Breast Cancer Associated Blood Vessels. PLoS One. 2012;7:10.
Ramayo-Caldas Y, Fortes MRS, Hudson NJ, Porto-Neto LR, Bolormaa S, Barendse W, et al. A marker-derived gene network reveals the regulatory role of PPARGC1A, HNF4G, and FOXP3 in intramuscular fat deposition of beef cattle. J Anim Sci. 2014;92(7):2832–45.
Iamartino D, Williams JL, Sonstegard T, Reecy J, Van Tassell C, Nicolazzi EL, et al. The Buffalo Genome and the Application of Genomics in Animal Management and Improvement. Buffalo Bull. 2013;32:151–8.
Meredith BK, Berry DP, Kearney F, Finlay EK, Fahey AG, Bradley DG, et al. A genome-wide association study for somatic cell score using the Illumina high-density bovine beadchip identifies several novel QTL potentially related to mastitis susceptibility. Front Genet. 2013;4:229. Article No.: 229.
Strillacci MG, Frigo E, Schiavini F, Samore AB, Canavesi F, Vevey M, et al. Genome-wide association study for somatic cell score in Valdostana Red Pied cattle breed using pooled DNA. BMC Genet. 2014;15:106.
Hoglund JK, Sahana G, Guldbrandtsen B, Lund MS. Validation of associations for female fertility traits in Nordic Holstein, Nordic Red and Jersey dairy cattle. BMC Genet. 2014;15:8.
Guo J, Jorjani H, Carlborg O. A genome-wide association study using international breeding-evaluation data identifies major loci affecting production traits and stature in the Brown Swiss cattle breed. BMC Genet. 2012;13:82.
Hoglund JK, Guldbrandtsen B, Lund MS, Sahana G. Analyzes of genome-wide association follow-up study for calving traits in dairy cattle. BMC Genet. 2012;13:71.
Pryce JE, Bolormaa S, Chamberlain AJ, Bowman PJ, Savin K, Goddard ME, et al. A validated genome-wide association study in 2 dairy cattle breeds for milk production and fertility traits using variable length haplotypes. J Dairy Sci. 2010;93(7):3331–45.
Allais S, Leveziel H, Hocquette JF, Rousset S, Denoyelle C, Journaux L, et al. Fine mapping of quantitative trait loci underlying sensory meat quality traits in three French beef cattle breeds. J Anim Sci. 2014;92(10):4329–41.
Saatchi M, Schnabel RD, Taylor JF, Garrick DJ. Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds. BMC Genomics. 2014;15:442.
Hayes BJ, MacLeod IM, Daetwyler HD, Bowman PJ, Chamberlain AJ, Vander Jagt CJ, et al. Genomic Prediction from Whole Genome Sequence in Livestock: the 1000 Bull Genomes Project. In: Proceedings, 10th World Congress of Genetics Applied to Livestock Production. Vancouver. 2014.
The work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (Fapesp), grant 2010/20887-1 and for the scholarship of the first author 2013/12851-5. We thank the farmers for allowing sampling the animals.
The authors declare that they have no competing interests.
HT, RRAB, GMFC, MRSF and LRPN conceived the study and structured the analyses. RRAB and DJAS did the genomewide association analyses. MRSF and LRPN did the meta-analyses and genenetwork analyses. GMFC wrote the manuscript and participated in the analyses. DFC, SAL, AR and SSM participated in the analyses and contributed to the discussion. All the authors have read and approved the manuscript.
GMF de Camargo, RR Aspilcueta-Borquis, MRS Fortes and R. Porto-Neto contributed equally to this work.
P-values, allelic substitution effect and percentage of phenotypic genetic variance of the SNPs used in the GWAS for the ten traits in dairy buffaloes. (XLSX 30.7 mb)