Genomic analysis of worldwide sheep breeds reveals PDGFD a major target of fat-tail selection in sheep

Fat tail is a special trait in sheep acquired during sheep domestication. Several genomic analyses have been conducted in sheep breeds from limited geographic origins to identify the genetic factors underlying this trait. Nevertheless, these studies obtained different candidates. The results of these regional studies were easily biased by the breed structures. To minimize the bias and distinguish the true candidates, we used an extended data set of 968 sheep representing 18 fat-tailed breeds and 14 thin-tailed breeds from around the world, and integrated two statistic tests to detect selection signatures, including Genetic Fixation Index (FST) and difference of derived allele frequency (ΔDAF). The results showed that platelet derived growth factor D (PDGFD) exhibited the highest genetic differentiation between fat- and thin-tailed sheep breeds. Further analysis of sequence variation identified that a 6.8-kb region within the first intron of PDGFD is likely the target of positive selection and contains regulatory mutation(s) in fat-tailed sheep. Histological analysis and gene expression analysis demonstrated that PDGFD expression is associated with maturation and hemostasis of adipocytes. Luciferase reporter assays showed that a segment of conserved sequence surrounding the orthologous site of one sheep mutation is functional in regulating PDGFD expression in human. These results reveal that PDGFD is the predominant factor for the fat tail phenotype in sheep by contributing to adiopogenesis and maintaining the hemostasis of mature adipocytes. This study provides insights into the evolution of fat-tailed sheep and has important application to animal breeding, as well as obesity-related human diseases.


Introduction
Following domestication in the Fertile Crescent approximately 8,000-9,000 years ago [1], sheep has spread and distributed over a wide geographical range worldwide mainly due to their adaptability to nutrient-poor diets and extreme environments, and thus developed a substantial variation in many phenotypic traits [2]. One of the main morphological changes is the lengthening of the tail and distinct patterns of tail fat deposition. It is believed that fat-tailed sheep was selected in response to the steppe and desert conditions in central Asia from 3, 000 BC [3], which is several thousand years after the domestication of its thin-tailed ancestor, and then spread east into north China and west into South Africa.
However, nowadays, fat-tail phenotype loses its earlier advantages and a decrease in the size of sheep tail is desirable for both producers and consumers, partially because that fat tails have a significant influence on fat deposition in other parts of the body [4], mating and normal locomotion of the animal [5]. Besides, the consumers in many instances show lab [14]. Tail type for these breeds was determined by either direct observation or description of previous literatures. The detailed information of these sheep breeds was provided in Supplementary Table S1. All the SNP data were generated using the Illumina Ovine 50K Beadchip and were thus readily merged together. The final dataset included 968 sheep individuals and 47,415 common autosomal SNPs (based on genome Oar_v3.1).

Determination of ancestral allele and data quality control
Ancestral allele information for a subset of 33,059 SNPs were obtained from Ovine HapMap [13]. For the remaining SNPs, the ancestral allele was deduced according to the major allele in Mouflon sheep. Finally, a total of 46,540 SNPs with available ancestral allele information were kept for next analysis. PLINK v2.05 [15] was applied for further SNP data quality control. SNPs were removed if any of the following conditions were met: 1) with call rate ≤90%; 2) with minor allele frequency (MAF) ≤0.05. Sheep individuals with an average call rate below 90% were discarded. To ensure independence among the studied sheep individuals, cryptic relatedness among individuals within each breed were identified using pair-wise Identity-By-Descent (IBD) metric (referred to as PI-HAT in PLINK). One individual from a pair of sheep individuals was removed from the following analyses if their PI-HAT value was over 0.3.

Phylogenetic analysis
A pruned set of 32,450 SNPs were used to investigate the genetic relationship among these sheep breeds from different geographic locations. Principle Component Analysis (PCA) was performed with the GCTA software [16] Table S2). The VCF file was converted to PLINK PED file using VCFtools [18]. PLINK [15] was used to perform the SNP quality control (removing SNPs with call rate ≤90% or MAF ≤0.05) and to calculate the allele frequency of each SNP in thin-and fat-tailed sheep group. The absolute difference of allele frequency (ΔAF) of each SNP between fatand thin-tailed sheep group were calculated using R. Kilobase of exon per Million fragments mapped) were calculated from raw counts and used to quantify relative gene expression for each sample. times. Relative gene expression was converted using the 2 -CT method against the internal control house-keeping gene ACTB where CT= (CTexperimental gene -CTexperimental ACTB) -(CTcontrol gene -CTcontrol ACTB). The relative gene expression in control group was set to 1. [23], adipocytes from non-diabetic lean and non-diabetic obese Pima Indian subjects (GSE2508) [24] were obtained from GEO database and analyzed with online build-in tool GEO2R.

Dual luciferase reporter assay
Using human genomic DNA as template, two fragments (WT, 222 bp; Del, 202 bp) spanning PDGFD gene promoter region were synthetized directly by BerryGenomics Company (Beijing, China), and then the products were cloned into the site between Kpn1 and EcoRV restriction sites of pGL4.23 (Promega) luciferase reporter vector. All plasmids were sequenced to verify the integrity of the insert. Transfection was performed with Lipofectamine 3000 reagent (ThermoFisher) essentially following manufactory's instruction. The promoter activity was evaluated by measurement of the firefly luciferase activity relative to the internal control minimal TK promoter driven-Renilla luciferase activity using the Dual Luciferase Assay System as described by the manufacturer (Promega). A minimum of four independent transfections was performed and all assays were replicated at least twice.   (Fig. 1c).

Results
Eight out of the 16 common significant SNPs were further ruled out from our candidate list because they have DAF > 0.5 (or DAF < 0.5) in more than 30% of the thin-tailed (or fat-tailed) sheep breeds, after examining the DAF of these SNPs in additional three American thin-tailed sheep breeds and three African fat-tailed sheep breeds (See Methods) (Supplementary Fig. S1). The remaining eight SNPs are highly diverged between thin-and fat-tailed sheep, and keep ancestral allele in most thin-tailed sheep breeds while derived allele in most fat-tailed sheep breeds worldwide (Fig. 1d), indicative of promising candidates associated with fat tail phenotype. PCA and phylogenetic tree analysis using these eight SNPs showed clearly separated clades between thin-and fattailed sheep, which is quite distinct from the results obtained based on genome-wide variants showing geographic clustering (Supplementary Fig. S2) and further supported the strong divergence of these loci. Gene annotation of 150 kb-long genomic region surrounding the eight SNPs revealed 24 genes and PDGFD corresponding SNP OAR15_3091174 among them that is located on chromosome 15 was most highly differentiated (Fig. 1d).

Analysis of sequence variations of PDGFD region
To better investigate these selected genes, we extracted their genome sequence variants showed that PDGFD is relatively more abundantly expressed in adipose tissue than that in other tissues (Supplementary Fig. S3). These already known involvements of PDGFD in lipid metabolism provide further support for the hypothesis that this gene is an ideal candidate for fat tail in sheep.
Furthermore, ΔAF is noted to be particularly elevated in a 6. within this region revealed a consistent differentiation between sheep breeds with different tail types (Fig. 2B). Therefore, it is likely that this 6.8-kb region is the target of positive selection for fat tail and is the best candidate region for the functional mutation(s).
This region is located at the first intron of PDGFD (Fig. 2B, bottom), suggesting that the mutations causing the association with the fat-tail phenotype are regulatory. We next genotyped a total of 13 SNPs that exhibit highest ΔAF value (>0.8) between thin-and fat- analysis confirmed that all these SNPs are highly divergent (Fig. 2C). Collectively, these observations strongly suggested that the 6.8-kb region identified here deserves further investigation for revealing the causative mutation(s).
In addition to PDGFD, several other selected genes identified in this study have known functions that are associated with lipid metabolism, such as ENPP2 [27], ANAPC5 [28], RNF34 [29], KDM2B [30], CAMKK2 [31], TSHR [32], and CDH1 [33]. The genetic differentiation for these genes was not as pronounced as for the PDGFD loci, implying that these loci likely contribute to the phenotypic difference with relatively small effects.
We also examined the DAF distribution of the top candidate SNPs that were proposed in previous studies and the results showed that the majority of these SNPs are not consistently diverged between thin-and fat-tailed sheep populations [7-12] (Supplemental Fig. S4).

PDGFD expression is associated with adipogenesis
As an initial step to elucidate the role of PDGFD in the development of fat tail in sheep, tail tissues from a fat-tailed Chinese sheep breed, namely Tan sheep, at four different embryonic time points, including embryonic day 60 (E60), E70, E80 and E90, were collected and subjected to histological analysis. HE staining showed that lipid vacuole is absent in cells at E60 and E70, while accumulates in cells at E80 and E90, as evidenced by observation that one large vacuole was present in the cell and the nucleus was packed in a corner (Fig. 3A), a typical feature of mature adipocytes. In line with this, Oil Red O staining which visualizes fat-containing droplets revealed that a large portion of cells at E80 and E90 showed distribution of lipid drops (Fig. 3B). These observations suggested that committed preadioocytes are present in tail tissues at E60 and E70 which differentiate to mature adipocytes around E80 and E90.
Further RNA-seq analysis revealed that the expression of some marker genes which are known to be down-regulated (FKBP5, BMP5 and PDGFRA) and up-regulated (LPL, FABP4 and OPLAH) during adipogenesis had no obvious difference between E60 and E70 while significantly decreased and increased at E80 compared to earlier stages, respectively (Fig. 3C), confirming the histological results that cells in tail tissues of fattailed sheep undergo adipogenesis from E60/70 to E80/90. Interestingly, PDGFD expression was undistinguishable between E60 and E70, but was dramatically decreased at E80 (Fig. 3D). Subsequent qRT-PCR analysis confirmed the large reduction of PDGFD expression at E80, followed with a continued but not significant decline at E90, as compared to that at E60 and E70 (Fig. 3E). Other candidate genes including BMP2 which were proposed by several studies [8, 9, 12], exhibited no specific oscillation in expression mirroring the process of adipocyte maturation in tail tissues along with embryonic development (Supplemental Fig. S5). Gene expression correlation analysis revealed that PDGFD expression is highly positively correlated with markers enriched in preadipocytes while reversely correlated with markers up-regulated in mature adipocytes ( Fig. 3F). Accordantly, retrospective analysis of public transcriptomic data sets revealed that PDGFD expression is higher in adipose progenitors than that in other cell types isolated from human white adipose tissues (Fig. 3G) and is dramatically reduced during the adipogenesis of mouse 3T3-L1 (Fig. 3H). These results together suggest that PDGFD expression is enriched in preadipocytes and is down-regulated during adipogenesis.

PDGFD is dysregulated between adipose tissues of lean and fat individuals across species
Further qRT-PCR analysis showed that PDGFD expression is higher in tail tissues of fattailed sheep than that in thin-tailed sheep at both E70 and adult stages ( Fig. 4A and B).
Interestingly, re-analysis of public data sets revealed that PDGFD is also more abundantly expressed in adipose tissues of obese individuals than that of lean individuals in both mouse (Fig. 4C) and human ( Fig. 4D and E), suggesting an evolutionarily conserved role of PDGFD involving in hemostasis of mature adipocytes.

A segment of conserved intronic region affects PDGFD expression in human
We next sought to explore the regulatory role of the most highly differentiated SNPs found within the first intron of PDGFD (Fig. 2). It was noted that a mutation (SNP OAR15_3859314) displays high frequency in fat-tailed sheep (Fig. 5A) and occurs at extremely conserved region between species including human (Fig. 5B). We therefore suspected that it could be critical for regulating PDGFD expression and selected it for further investigation. Together with the previous results showing that similar to observations in sheep, PDGFD expression is also enriched in human preadipocytes and higher in obese human individuals ( Fig. 3G and Fig. 4E-F), we set out to extend our finding in sheep to human by investigating human PDGFD orthologous region. To do this, we cloned a fragment of human sequence (WT, wild type) as well as a mutated fragment with depletion of a conserved 20-bp region surrounding the site orthologous to sheep SNP OAR15_3859314 (Del) into a luciferase reporter vector and performed dual luciferase reporter assays (Fig. 5C). We first transduced the vectors into human A549 cells and the results showed that the activity of reporter carrying WT human sequence is remarkably suppressed as compared to blank luciferase vector (Fig. 5D), suggesting that the inserted WT sequence plays an inhibition role in regulating PDGFD expression.
Interestingly, activity of the mutated reporter was restored relative to the reporter carrying the WT sequence (Fig. 5D), indicating that the conserved region is functional and is capable of transcriptionally activating the PDGFD expression. These observations were further confirmed by luciferase reporter assay performed in mouse 3T3-L1 preadipocytes ( Fig. 5E). More interestingly, eQTL (Expression quantitative trait loci) analysis revealed that a mutation located 532 bp immediately downstream of the corresponding locus of sheep OAR15_3859314 mutation in human genome significantly increases the expression of PDGFD in human adipose tissues (Fig. 5F).

Discussion
To accurately map the candidate gene(s) underlying the fat tail phenotype of sheep, herein we comprehensively analyzed genomic variation data from a large cohort of sheep breeds with different tail types from around the world and integrated two different selection tests to detect genomic regions with signals of selection. We demonstrated that PDGFD gene loci exhibited the highest genetic differentiation between fat-and thin-tailed sheep and further found that the potential causal mutations are located with regulatory region.
In addition, we show that PDGFD expression is negatively associated with maturation of adipocytes and is higher in fat tissues of fat individuals than that in lean individuals across different species. Finally, we provide evidence showing that a mutation occurring at conserved region of first intron is functional to transcriptionally activate the expression of PDGFD in human.  Fig. S4). Unfortunately, the genetic differentiation level of the former two SNPs is below the genome-wide threshold in one (MEF vs SAT) and two group-pair comparison(s) (MEF vs SAT, MEF vs EUT), respectively (Supplemental Table S3-5), which are probably caused by the low abundance of derived allele in some fat-tailed sheep populations, such as Afshari, LocalAwassi and EthiopianMenz (Supplemental Fig. S4). Therefore, BMP2 has lower priority than the PDGFD locus as the candidate for fat-tailed phenotype.
Furthermore, a recent study based on whole-genome re-sequencing data revealed that PDGFD is the most highly differentiated loci between fat-tail sheep populations from China and Middle East and thin-tailed Tibetan sheep, followed by BMP2 as the second top highly divergent gene [9], providing further support to our finding that PDGFD is the predominant candidate for fat tail in sheep, although our study suffers limitation that only genes covered by the Ovine 50K Beadchip were investigated.
Adipose tissue development is associated with specification and differentiation of precursor cells (preadipocytes) to mature adipocytes as well as expansion of adipocyte size [34]. This finely orchestrated process involves changes in expression of a series of molecular regulators which remain to be discovered. Our histological analysis and gene expression analysis suggest that PDGFD expression is negatively associated with adipocyte accumulation in fat tissues during embryonic development in sheep (Fig. 3A-F). Consistently, re-analysis of data generated from previous studies in human and mouse reveals that PDGFD expression is enriched in preadipocytes and is decreased during the adipogenesis from precursor cells (Fig. 3G-H) [21, 22]. These observations strongly suggest that PDGFD plays a critical role in the initiation and/or commitment of preadipocytes and this function is conserved and universe across species. Despite of a continuous decline of expression during development, we find that PDGFD sustains higher expression in tail tissues of fat-tailed sheep than that of thin-tailed sheep from embryonic (E70) to adult stage ( Fig. 4A and B). Supporting this, a previous study reported that the expression of PDGFD is significantly upregulated in tail adipose tissue We discover a list of mutations that are located within the first intron of PDGFD and display high frequency in fat-tail sheep while low abundance in thin-tailed sheep ( Fig. 2 and Supplementary Table S6). Since intronic mutations mainly affect the transcriptional efficiency of genes by creating or disrupting binding sites for transcriptional factors [38,39], it is likely that the elevated expression of PDGFD in fat-tailed sheep is attributed to some/one of these mutations. We focus on one mutation that occurs within evolutionarily conserved region (Fig. 5A-B)  Interestingly, a mutation located 532 bp downstream and supposedly to be within the same linkage disequilibrium block of corresponding site of sheep SNP OAR15_3859314 significantly increase PDGFD expression in human adipose tissues (Fig. 5F). Coupled to the observations that PDGFD expression in adipose tissues of obese individuals is higher than that of lean individuals in both sheep and human (Fig. 4), we consider that mutations occurring at the intronic region orthologous to the candidate region that we identified in sheep (Fig. 2) are prone to elevate PDGFD expression and contribute to fat accumulation.
In conclusion, we have demonstrated that PDGFD gene is the predominant factor      (C) Pdgfd expression is higher in inguinal fat of high weight gaining mice than that of in low weight gaining mice revealed by microArray analysis (GSE4692). *P <0.05 (n=3).
PDGFD expression is higher in adipocytes from lean than that from obese Indian individuals in both (D) male and (F) female subjects revealed by microarray (GSE2508).