Methylome and transcriptome analyses of yaks of different ages revealed that DNA methylation and transcription factor ZGPAT co-regulate milk production CURRENT

Domestic yaks play an indispensable role in sustaining the livelihood of Tibetans and other ethnic groups on the Qinghai-Tibetan Plateau (QTP), by providing milk and meat, and have evolved numerous physiological adaptabilities to high-altitude landscape, such as strong capacity of blood oxygen transportation and high metabolism. The molecular mechanisms underlying milk production and adaptation to high altitudes of yak need further exploration. We performed genome-wide DNA methylome and transcriptome analyses of breast, lungs, and gluteal muscle from yaks of different ages. We identified differentially methylated regions (DMRs) across age groups within the each tissue, and breast tissue had considerably more differential methylation than that from the three younger age groups. Hypomethylated genes with high expression level might regulate milk production by influencing protein processing in the endoplasmic reticulum. Weighted gene correlation network analysis revealed that the “hub” gene ZGPAT was highly expressed in adult breast tissue and that it potentially regulated the transcription of 280 genes, which play roles in regulating protein synthesis, processing, and secretion. Besides, Tissue network analysis indicates that high expression of HIF1A regulates energy metabolism in the lung.


Abstract
Background Domestic yaks play an indispensable role in sustaining the livelihood of Tibetans and other ethnic groups on the Qinghai-Tibetan Plateau (QTP), by providing milk and meat, and have evolved numerous physiological adaptabilities to high-altitude landscape, such as strong capacity of blood oxygen transportation and high metabolism. The molecular mechanisms underlying milk production and adaptation to high altitudes of yak need further exploration.

Results
We performed genome-wide DNA methylome and transcriptome analyses of breast, lungs, and gluteal muscle from yaks of different ages. We identified differentially methylated regions (DMRs) across age groups within the each tissue, and breast tissue had considerably more differential methylation than that from the three younger age groups. Hypomethylated genes with high expression level might regulate milk production by influencing protein processing in the endoplasmic reticulum. Weighted gene correlation network analysis revealed that the "hub" gene ZGPAT was highly expressed in adult breast tissue and that it potentially regulated the transcription of 280 genes, which play roles in regulating protein synthesis, processing, and secretion. Besides, Tissue network analysis indicates that high expression of HIF1A regulates energy metabolism in the lung.

Conclusions
The results of this comprehensive study provide a solid basis for understanding the epigenetic mechanisms underlying milk production in yaks, which could be helpful to breeding programs aimed at improving milk production.

Background
Domestic yaks play an indispensable role in sustaining the livelihood of Tibetans and other ethnic groups on the Qinghai-Tibetan Plateau (QTP) and in the Himalayas and connecting Central Asian highlands by providing milk, meat, hide, fiber, fuel, and transportation [1,2]. Milk is an important source of high-quality protein because of its high content of essential amino acids, such as lysine, which is deficient in many human diets [3], and because of its well-known physiological effects, such as immunomodulatory and gastrointestinal activities [4]. The milk protein content and composition influence the technological properties of milk and are therefore important for the dairy industry, especially in Europe, where the majority of the milk produced is transformed into cheese. In recent decades, there have been extraordinary advances in our knowledge of the physiology and biochemistry of the lactating mammary gland. It has been clearly demonstrated that milk protein synthesis in the mammary gland depends on hormonal and developmental cues that modulate the transcriptional and translational regulation of genes through the activity of specific transcription factors, non-coding RNAs, and alterations of the chromatin structure in the mammary epithelial cells [5]. The interplay between all these aforementioned factors might play a key role in milk protein synthesis, which is crucial during the onset of and throughout lactation in high-producing dairy cattle.
Despite such advances, little is currently known about the regulation of the physiological and cellular mechanisms required for milk protein synthesis and secretion in yak. We hypothesized that genes related to milk production might be regulated by DNA methylation and distinct sub-modules of correlated expression variation might be indentified. In this study, we performed genome-wide DNA methylome and transcriptome analyses of lung, breast, and biceps brachii muscle tissues at four different month ages (MA) from yaks to identify the regulatory networks associated with milk protein synthesis, metabolism, and secretion in yak.

Results
Global DNA methylation and gene expression in the breast, lungs, and biceps brachii muscle at different ages We generated the methylomes and transcriptomes of lung, breast, and biceps brachii muscle tissues at four different month ages (MA), representing four life stages with 3 replicates (MA = 6, 30, 54, and 90 months; childhood, juvenile, youth, and adult), from a total of 12 female Riwoqe yaks. After performing sequence quality control and filtering, we obtained a single-base resolution methylome covering 85.6% (27,471,373/32,092,725) of CpG sites across the genome with an average depth of 22.5×. We first calculated pairwise Pearson's correlations of CpG sites with at least 10 × coverage depth across all samples, and the samples were well clustered by tissue (Fig. 1a). Biological replicates highly correlated with each other (median Pearson's r = 0.74), and the correlation between ages (median Pearson's r = 0.72) was relatively weaker and showed the lowest coefficients among tissues (median Pearson's r = 0.66) (Fig. 1c). The transcriptome sequencing data of all samples were aligned to our newly assembled yak genome reference (unpublished), and subsequently obtained the transcripts. In total, we obtained a total of 2,0504 transcripts, which were annotated to the Gene Ontology (GO) [6], InterPro [7], Kyoto Encyclopedia of Genes and Genomes (KEGG) [8], Swiss-Prot [9], and TrEMBL [10] databases (Table S1). We also calculated pairwise Pearson's correlations of all transcripts and obtained similar results as with DNA methylation (Fig. 1b). Biological replicates showed the highest correlation coefficients, while different tissues showed the lowest correlation coefficients (Fig. 1d). Differential DNA methylation among the age groups was involved in milk production We looked for differentially methylated regions (DMRs) across age groups within the breast, lungs, and gluteal muscle (Table S2-4). Within the lung and gluteal-muscle tissues, age groups did not differ in age-related DMRs (A-DMRs), but adult breast tissue had considerably more differential methylation than the three younger age groups (Fig. 2a). At ~ 90 months, yaks enter the lactation period, so it is possible that the observed methylation may at least partially control yak lactation. We then selected 375 hypomethylated (highly expressed) genes along with 207 hypermethylated (lowly expressed) genes from adult yak breast tissue. The hypomethylated genes were enriched in only "protein processing in endoplasmic reticulum (ER)" (nine genes, 2.964-fold enrichment, p = 0.0049).
We examined A-DMRs that overlapped across age groups. Childhood and adult tissues rarely shared A-DMRs when comparing the lung and muscle tissues (childhood, muscle: 586 A-DMRs, breast: 2,278, lung: 496; adult, muscle: 471, breast: 12,050, lung: 772) (Fig. 2b, c). Juvenile and youth stages also rarely shared A-DMRs across muscle and lung tissues (Fig. 2d), suggesting that methylation patterns were already established at the childhood stage and that no extensively divergent epigenetic difference occurred through the different ages under natural high-altitude conditions.

Consensus Network Analysis For Tissues And Age Groups
We first performed a multi-way ANOVA test for each gene among all samples (n = 36), to test the null hypothesis that the gene expression level did not differ among age groups and tissues. At the threshold for significance (p < 0.05), 417 age-related and 8,560 tissue-related genes were selected for further weighted gene correlation network analysis (WGCNA), which takes advantage of the correlations among genes and groups genes into modules using network topology [13]. Subsequently, we conducted WGCNA for tissue-and age-related gene expression separately to identify a "consensus network"-a common pattern of genes that are correlated in all conditions. We performed consensus network, module statistic, and eigengene network analyses to identify modules, assess relationships between modules and traits, and study the relationships between co-expression modules [14]. The consensus networks identified for tissues and age groups had clearly delineated modules (Fig. 1a, 1b), and the modules identified were significantly correlated with tissues and age groups (Fig. 1c, 1d).
Age network analysis indicates that ZGPAT might regulate milk production Within the age-related gene network, the largest module ("turquoise", n = 356) had opposite directions of correlation for breast tissue (r=-0.57, p = 3e-04) and age (r = 0.37, p = 0.03) and showed a positive correlation with biceps brachii muscle (Table S5). The breast tissue had a stronger signal than age and possibly overwhelmed the signal from age. Genes in this module were enriched in the GO categories of "protein polyubiquitination," "RNA polymerase II core promoter proximal region sequence-specific DNA binding," "ATP binding," "transcription, DNA-templated," and "negative regulation of transcription from RNA polymerase II promoter" (p-values of 0.000344, 0.000822, 0.000991, 0.00139, and 0.00244, respectively). The "blue" (n = 48) and "grey" (n = 13) modules showed only a negative correlation with age (r= -0.58, p = 2e-04; r= -0.76, p = 6e-08, respectively) and exhibited no enrichment of GO categories for genes. After applying the threshold of the absolute value of gene significance for age (|GS| >0.5) and module membership measures (|MM| >0.6) in each module, we defined 20 and 7 "hub" genes in the "turquoise" and "blue" modules ( Table 1). The gene expression of the "hub" genes was well clustered by modules, which was consistent with the opposite directions of correlation with age ("turquoise" r = 0.37, "blue" r=-0.58). The upregulated expression level of the "hubs" in breast tissue at 90 months of age indicated that the "turquoise" module had a stronger correlation with breast tissue than with age ( Fig. 4a). We then used the AnimalTFDB 3.0 database [15] to examine transcription factors in these 27 "hubs" and found that ZGPAT encodes a transcription regulator protein and was significantly upregulated in breast tissue at 90 months of age (Fig. 4a). previous study reported that this protein specifically binds A-3' consensus sequence and represses transcription by recruiting the chromatin multi-complex NuRD to target promoters [16]. High expression of ZGPAT in breast tissue at 90 months of age was observed, and it potentially regulated the transcription of 280 genes (weight > 0.15) in the network from the "turquoise" module. To identify the most important cellular activities controlled by this TF regulatory network, we analyzed over-represented GO biological process and molecular function terms and KEGG pathways. These potential target genes were enriched in the GO categories of "protein binding," "ATP binding," and "zinc ion binding," among others, and the KEGG categories of "aminoacyl-tRNA biosynthesis," "autophagy animal," and "protein processing in endoplasmic reticulum" (Fig. 4b). These enriched GO terms and KEGG pathways are likely to play roles in regulating protein synthesis, processing, and secretion in breast tissue. For example, 6 of 7 genes from "protein processing in endoplasmic reticulum" were also upregulated at 90 months of age in breast tissue (Fig. 4c) and involved in multiple processes in the endoplasmic reticulum, including vesicle trafficking (SEC24C), folding and assembly (SELENOS), transportation (BCAP31), and ubiquitination and degradation (BAG1, UBE2G2, and MARCH6) [11,12]. Only DNAJC10 was downregulated at 90 months of age in breast tissue, and this gene encodes an endoplasmic reticulum co-chaperone that is part of the endoplasmic reticulum-associated degradation complex involved in recognizing and degrading misfolded proteins [12].
Tissue network analysis indicates that high expression of HIF1A regulates energy metabolism in the lung Within the tissue-related gene module network, four modules showed a positive correlation and two modules showed a negative correlation with the lung, and all significant module-trait relationships were negative in the muscle but positive in the breast (Fig. 3d). Moreover, 99.54% of the total 8,560 tissue-related genes were related to the top 4 modules ("turquoise," n = 3833, "blue," n = 2795, "brown," n = 1052, "yellow," n = 339) (Fig. 5a, Table S6), and these modules were also highly correlated with other modules; for example, "brown," "yellow," and "black" showed a high eigengene adjacency with each other (Fig. 5b).
We applied the more stringent threshold of the absolute value of gene significance for the age and module membership measures in the top four modules to identify "hub" genes in the "turquoise," "blue," "brown," and "yellow" modules. With the threshold values of |GS|>0.7 and |MM| >0.8, 34 "hub" genes were identified in the "turquoise" module. Then, 24 "hubs" were filtered from the gene significance of module-lung relationships, and 10 "hubs" were filtered from the gene significance of module-breast relationships; these were further divided into 3 clusters by hierarchical clustering, and they showed high expression levels in the breast (cluster 1), lung (cluster 2), and biceps brachii muscle (cluster 3) tissues, respectively, with distinct clustering patterns by tissue (Fig. 5c). neurogenesis and may also function as tumor suppressors [17]. STAT6 is a member of the STAT family of transcription factors. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases and then form homo-or heterodimers that translocate to the cell nucleus where they act as transcription activators [18]. HIF1A, hypoxiainducible factor-1, functions as a master regulator of the cellular and systemic homeostatic response to hypoxia by activating the transcription of many genes, including those involved in energy metabolism, angiogenesis, and apoptosis and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia [19]. HIF1A was found to be upregulated in the lung and to potentially regulate the transcription of 2008 genes (weight > 0.15), which were enriched in multiple GO biological process and molecular function categories and KEGG pathways. Notably, most of the enriched GO terms and KEGG pathways were related to energy metabolism, such as "mitochondrial respiratory chain complex I assembly," "NADH dehydrogenase (ubiquinone) activity," "ATP binding," "mitochondrial translation," "tricarboxylic acid cycle," and "GTP binding" in GO terms and "thermogenesis," "carbon metabolism," and "citrate cycle TCA cycle" in KEGG pathways (Fig. 5d). Mitochondria function as the primary energy producers of the cell and serve as the center of biosynthesis, the oxidative stress response, and cellular signaling, placing them at the hub of a variety of immune pathways [20]. NADH dehydrogenase is a core subunit of the mitochondrial membrane respiratory chain and believed to belong to the minimal assembly required for catalysis [21]. Protein which binds ATP or GTP, carries three phosphate groups esterified to a sugar moiety and represents energy and phosphate sources for the cell [22,23]. The tricarboxylic acid cycle, a series of metabolic reactions in aerobic cellular respiration, occurs in the mitochondria of animals and plants, and during this cycle, acetyl-CoA, formed from pyruvate produced during glycolysis, is completely oxidized to CO 2 via the interconversion of various carboxylic acids. It results in the reduction of NAD and FAD to NADH and FADH2, whose reducing power is then used indirectly in the synthesis of ATP by oxidative phosphorylation [24]. The "thermogenesis" pathway is essential for warm blooded animals, ensuring normal cellular and physiological functioning under conditions of environmental challenge [25].

Discussion
Numerous studies reported transcription profiling of mammary gland in different livestock animals such as cattle [26], sheep [27], and goat [28], and DNA methylation profiling of mammary gland in cattle [29]. These studies have indicated temporal and spatial specificity in the methylome and transcriptome profiles of the mammary gland in different species. However, these work only reported differential gene expression profile in mammary gland, and the regulatory network is still unknown. In the present study, we generated the methylomes and transcriptomes of lung, breast, and biceps One of the contributions of this study is that hub genes were identified by WGCNA. The data show that the hub genes with the highest MM and GS in modules of interest should be considered as the natural candidates for further research. This study identified turquoise module genes associated with milk yield, and 20 genes were considered as the hub genes and showed the highest mRNA expression level in breast tissue at 90 months, when yaks enter the lactation period. In these hub genes, ZGPAT was annotated as transcription factor and it potentially regulated the transcription of 280 genes in the network from the "turquoise" module, which were enriched in the KEGG categories of "aminoacyl-tRNA biosynthesis," "autophagy animal," and "protein processing in endoplasmic reticulum". This result revealed that ZGPAT is likely to play driving role in regulating protein synthesis, processing, and secretion in breast tissue. Moreover, the 7 genes potentially regulated by ZGPAT in "protein processing in endoplasmic reticulum" were totally different from aforementioned 9 genes regulated by hypomethylation, illustrating that DNA methylation and transcription factor possibly co-regulate milk production. In addition, the tissue network analysis indicates the central role of HIF1A in regulating energy metabolism, which is important in adaptation to low temperature and hypoxia in high altitude environment.

Conclusions
The results of this comprehensive study provide a solid basis for understanding the molecular mchanisms underlying milk protein synthesis and high-altitude adaptation in yaks. This information advances our understanding of regulatory network in mammary gland at different development stages and could be helpful to breeding programs aimed at improving milk production.

Animals and samples
In total, twelve female yaks that were 6, 30, 54 or 90 months old with 3 replicates for each month age

Identification Of DMRs
After filtering the low-quality reads, the methylation sequencing data of the samples were aligned to the yak reference genome (unpublished data) using BSMAP (Version 2.74) [30]. The methylated CpG (mCG) sites were identified following a previously described algorithm [31]. Differentially methylated regions (DMRs) were identified using metilene (Version 0.2-6) within a 500 bp sliding window at 250 bp steps, applying the thresholds of differential methylation β >=15%, two-dimensional Kolmogorov-Smirnov-Test p-value < 0.05, and Mann-Whitney-U test p-value < 0.05 [32].
Total RNA extraction, library preparation, and sequencing The TRIzol reagent (Invitrogen, Carlsbad, CA, USA) was used to isolate the total RNA of each sample.
The purity, concentration, and integrity of RNA were checked using the NanoPhotometer spectrophotometer ( Quality Analysis, Transcriptome Assembly, And Abundance Estimation Clean reads were obtained by removing reads containing adapter or poly-N and low-quality reads from the raw data using in-house perl scripts. The Q20, Q30, and GC contents of the clean reads were calculated. All the downstream analyses were based on the good-quality clean reads. Paired-end clean reads were mapped to the yak reference genome (unpublished data) with STAR (available at https://github.com/alexdobin/STAR/releases). The mapped reads of each sample were assembled using StringTie [33]. The mapped reads of each sample were assembled using StringTie. Then, all transcriptomes from the samples were merged to reconstruct a comprehensive transcriptome using perl scripts. After the final transcriptome was generated, StringTie and edgeR were used to estimate the expression levels of all transcripts [34]. StringTie was used to assess the expression level of mRNAs by calculating fragments per kilobase of transcript per million fragments mapped (FPKM).

Weighted Gene Correlation Network Analysis
A WGCNA network [13] was generated for age-related genes and tissue-related genes. Consensus networks and module statistics followed the overall approach described by Langfelder et al. (2008).
Briefly, the network was derived based on a signed Spearman correlation using a b of 10 as a weight function. The topological overlap metric (TOM) [14] was derived from the resulting adjacency matrix and used to cluster the modules using the blockwiseModules function (blockwise Consensus Modules, for the consensus modules) and the dynamic tree cut algorithm [14] with a height of 0. 25

Consent for publication
Not applicable.

Availability of data and material
The DNA methylation data and RNA transcriptome data in this study are available in SRA under the accession numbers PRJNA530286 and PRJNA512958, respectively. the samples. ZC and CZ performed the library constraction and sequencing and the quality control analysis. QZ, YZ and HC performed downstream analysis of the data and assisted in the generation of additional files for the manuscript. All authors read and approved the final manuscript.  Overview of month age-associated DMRs. (a) Basic statistics for A-DMRs within each tissue.
Overlap of A-DMRs associated with the 6 months age group (b), 90 months age group (c), and 30 and 54 months age groups (d) in the muscle, breast, and lung respectively.