Comparative methylation and RNA-seq expression analysis in CpG context to identify genes involved in Backfat vs. Liver diversification in Nanchukmacdon Pig

Background DNA methylation and demethylation at CpG islands is one of the main regulatory factors that allow cells to respond to different stimuli. These regulatory mechanisms help in developing tissue without affecting the genomic composition or undergoing selection. Liver and backfat play important roles in regulating lipid metabolism and control various pathways involved in reproductive performance, meat quality, and immunity. Genes inside these tissue store a plethora of information and an understanding of these genes is required to enhance tissue characteristics in the future generation. Results A total of 16 CpG islands were identified, and they were involved in differentially methylation regions (DMRs) as well as differentially expressed genes (DEGs) of liver and backfat tissue samples. The genes C7orf50, ACTB and MLC1 in backfat and TNNT3, SIX2, SDK1, CLSTN3, LTBP4, CFAP74, SLC22A23, FOXC1, GMDS, GSC, GATA4, SEMA5A and HOXA5 in the liver, were categorized as differentially-methylated. Subsequently, Motif analysis for DMRs was performed to understand the role of the methylated motif for tissue-specific differentiation. Gene ontology studies revealed association with collagen fibril organization, the Bone Morphogenetic Proteins (BMP) signaling pathway in backfat and cholesterol biosynthesis, bile acid and bile salt transport, and immunity-related pathways in methylated genes expressed in the liver. Conclusions In this study, to understand the role of genes in the differentiation process, we have performed whole-genome bisulfite sequencing (WGBS) and RNA-seq analysis of Nanchukmacdon pigs. Methylation and motif analysis reveals the critical role of CpG islands and transcriptional factors binding site (TFBS) in guiding the differential patterns. Our findings could help in understanding how methylation of certain genes plays an important role and can be used as biomarkers to study tissue specific characteristics. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08123-x.


Background
Pork is an important high-protein food consumed across the world and requires timely effort to monitor and sustain the quality of meat. Several molecular breeding programs are being run around the world to understand and fulfil future requirements with enhanced food quality which largely depends upon the taste and composition, and these factors ultimately shape the breeding program by the choice of meat [1,2]. The Republic of Korea is one of the highest pig-consuming countries and there is a huge domestic demand for its Jeju native black pig (JNP) for its superior taste [3,4]. Due to the enhanced taste but low reproduction of JNP, a threat of extinction has loomed over the JNP breed [5]. To address this issue, an breeding program was conducted to develop a pig breed with a high reproduction rate and sustain the superior taste characteristics [6]. In the course of the intensive breeding program and continued close monitoring using modern biological methods, a pig breed referred to as 'Nanchukmacdon' was developed. It has increased fat deposition, a better metabolism rate and maintained superior characteristics features in subsequent generations. The enhanced characteristics displayed by the mixed breed involve the expression of genes and different biological pathways in different tissues that play important roles in maintaining the harmony of the cells and the development of tissue from single cells [7,8].
Despite having the same genome, an unknown mechanism is governing the gene expression, development, genome imprinting, diseases, and diversification, and has been involved in evolutionary changes in different tissues [9,10]. A single cell at embryonic stages differentiates to form different tissues which could show contrasting physical characteristics with almost unchanged genomic composition governed by DNA methylation [11,12] (Fig. 1). These epigenetic mechanisms provide plasticity to the organism and the ability to adapt to different situations by altering the expression pattern of genes to regulate related pathways [13,14]. However, it is still unclear whether methylation profiles can help in identifying tissue-specific genes that may have a role in influencing tissue-specific features or may be involved in biological functions by directing different pathways. There is consequently, a void in understanding the tissue specific diversification through methylation and gene regulation patterns. While DNA methylation in the mammalian tissue development process is a conserved process, understanding of the conversion process at the genome-wide level is still not very well understood. In eukaryotic organisms, DNA methylation leads to epigenetic modification which "at the promoter site" leads to curbs the transcription process by binding to regulatory protein and primarily occurs in the CpG island that is more abundant in the upstream region of the gene [12,13,15]. Comparative analyses of methylation in CpG island have primarily focused on cross-species comparative analysis and have revealed intriguing trends in both the conserved and divergent features of DNA methylation in eukaryotic evolution [14,16,17]. Studying these factors provides a way to better understand the genes that influence these processes which could help us in understanding the overall regulation mechanism [15,18]. In an attempt to understand methylation, a previous study reported that the role of backfat deposition is associated with growth rate, meat quality, and reproductive performance [19]. Backfat thickness is also considered as one of the main parameters when selecting female pigs for breeding herds since it dominates several reproductive performance parameters [20,21]. Liver is also a major organ involved in the regulation of lipid metabolism with fatness and plays a crucial role in animal growth, meat quality, immunity, and reproduction rate. Comparative understanding of tissue diversification is a complex process that involves the expression of certain genes in one tissue while remaining unchanged in another. To understand the hidden mechanism that sustain such superior characteristics methylation studies in tissue diversification could open a new front in identifying the biological phenomena associated with the new pig breed. The phenomena underlying these processes will ultimately provide better insight to understand the regulatory mechanism of genes in different tissues controlling biological pathways.
In the present work, we reported genes involved in tissue-specific changes at the methylation level and the role of gene expression in the regions. To this end, we performed WGBS and RNA-seq of (N=5+5) samples of backfat and liver respectively and integrated the analysis results to understand the tissue characteristics. The methylation pattern in the CpG island was further studied to determine the potential role in the hypermethylated region with the respective expression pattern in the specific tissues. RNA-seq analysis abled us to analyze the differential expression patterns, as well as gene ontology studies, reveals the close association in different biological important pathways that were enriched in various tissue under methylated conditions. Along with this, we also aimed to conduct a de novo whole genome motif analysis to identify the methylated motif and the transcription factor binding sites in terms of overall changes of tissue and specific pathways.

WGBS data analysis
WGBS data analysis was performed to compare methylation patterns amongst backfat and liver tissue. Overall mapping of WGBS data on the reference genome was 75 % with an average conversion rate of methyl call exceed for reverse and forward (C+T)> 99.4 %. The overall methylation composition was inclined towards liver ( Fig. 2A) and methylation in the CpG context was higher in backfat than in the liver, with 77 % of total methylation vs. 71 %, respectively ( Fig. 2B & Additional file: File S1). Using SeqMonk (https://www.bioinformatics. babraham.ac.uk/projects/seqmonk/), a tool for visualization of high throughput mapped sequence data, we detected a sharp increase at the 2 kb upstream region of the TSS region and at downstream of TTS region. CpG methylation help in stabilizing chromatin structure as well as it also controls the regulation of related gene expression and these effects could overall responsible for stabilization of genome (Fig. 2C). This methylation level remains stable after the promoter region and contributes to structural stability and regulation of gene expression. CpG island studies also confirmed a sharp decrease in the methylated CpG level outside the 2 kb CpG island ( Fig. 2C and D). Individual methylation patterns for all the identified genes confirms that the pattern of methylation corresponds with the distribution of gene promoters, which are usually prone to transcription (Additional file: Figures S1). A DMR study was carried out to compare the tissue-specific methylation level and a de novo motif analysis for TFBS in backfat vs. liver DMRs was conducted using Homer software (Table 1) (Additional file: Table S1).

Identification of DEGs, CpG methylation, and Gene ontology
DESeq2 an R package was implemented to identify statistically significant differences in gene expression obtained from featureCounts, which is used to count reads from RNA or DNA sequence data in terms of genomic features [22]. The overall relationship between backfat and liver was represented in a volcano plot (Fig. 3A). A total of 2761 and 2375 DEGs in liver and backfat were observed between samples from respective tissues of Nanchukmacdon pig and the filter parameters used for DEGs were false discovery rate (FDR) values of ≤ 0.05 and log2FoldChange≥±2 [23].
Lists of DEGs with FDR ≤0.05 were compiled and submitted to DAVID v6.8 [24] for functional annotation and enrichment analysis. To perform the gene ontology analysis, we divided the obtained DMRs according to their expression pattern into four sets: hyper-methylated upregulated (729 genes) and downregulated (630 genes) in backfat, and hyper-methylated upregulated (792 genes) and downregulated (1032 genes) in liver, for a total of 3183 genes (Additional file: File S2). For each list, enriched gene ontology (GO) in Biological Processes (BP), Molecular functions (MF), Cellular Compartments (CC), and KEGG pathway analyses were performed (Additional file: File S3). These terms were then clustered semantically using the ReviGO server. Throughout the transcriptome of Nanchukmacdon pig, an enriched function with an elevated GO-term function was identified and the clustered representation of these elevated GO terms was observed using a treemap (Additional file: Figures S2: (BP)1a-4a, (CC)1b-4b, and (MF)1c-4c respectively). We identified the significantly expressed genes (P-value ≤ 0.05) related to the KEGG pathway that ranges from metabolic pathway, fatty acid biosynthesis, ErbB signaling pathway, adipocytokine signaling pathway, calcium signaling pathway, and oxidative phosphorylation. The CpG island plays a major role in the differential expression of genes. Methylation of CpG islands has been reported to affect their gene expression. After identification of differentially expressed methylated regions in the backfat and liver we retrieved coordinates for all the autosome chromosomes from the UCSC browser and mapped them to the identified regions. We obtained a total of 16 genes were methylated at the CpG island (Table 2).

Circos plot
A circular plot was generated with five rings where the outermost ring represents the 18 autosome chromosomes of S. scrofa and the inner four rings were composed of different conditions. The second and fourth  represent the hypermethylated and upregulated genes identified in the DMRs and DEGs for backfat and liver tissues respectively. Similarly, the third and fifth rings represent the downregulated genes in the methylated regions and are generated using the CIRCOS tool [25] ( Fig. 4).

Discussion
In the present investigation, to understand the role of genes involved in tissue-specific diversification, we have presented a comprehensive view of comparative methylation pattern with differentially expressed genes amongst backfat and liver tissue in Nanchukmacdon Pig. Methylation analysis is one of the most promising methods recently evolved and used to accurately decipher the diversification in cross tissue differentiation patterns as well as to identify close relationships amongst different tissues. Studying these patterns will ultimately help us in identifying markers that specifically target breeds to enhance tissues of interest. Therefore, methylation analysis at CpG islands were performed by integrating WGBS with RNA-seq data from different tissues and profiled to identify potential genes and regions that are hyper-methylated and upregulated or downregulated in backfat and liver in CpG islands which could be playing important roles in tissue-specific diversification.
We performed a DMR analysis for de novo methylated regions and found the rank 1 motif includes "TATA box" promoter sequence, which specifies to other molecules where transcription begins and strongly modulates cell-and tissue-specific RANKL expression and the osteoclastogenesis process [26]. We observed a uniform pattern of motif methylation in the highly conserved regulatory factor x gene family, which has been reported in the early development and maturation of cells [27] ( Table 1). The top identified motifs were of particular interest, with most motifs actively involved in upstream binding to transcription factors and affecting cis and epicistrome features that regulate DNA landscape [28]. Similarly, the observed RAR/RXR binding sites are enriched in differentiation regions and the identified motif was found to have a strong association with regulatory transcription factors and previously has been involved in the differentiation process [29].
Our findings on common genes in CpG islands with methylation and differentially expressed patterns have a limited total number of genes of 16. Amongst these, 13 genes were hyper-methylated in the liver, and three were hyper-methylated in backfat. Among the identified genes, SIX2 was already reported to be involved in the differentiation process [30]. Methylation in the CpG island is necessary to control aberration and it regulates different pathways. To assess the impact, we performed gene ontology analysis on genes retrieved from four different approaches: hyper-methylated upregulated in backfat and liver, hyper-methylated down-regulated genes in backfat and liver tissues. The KEGG pathway analysis correlated with the calcium signaling pathway, fat digestion and absorption, cAMP signaling pathway, etc. (Fig. 3C). Downregulated genes identified in hypermethylated regions of backfat were related to complement activation, cholesterol biosynthesis, and tissue development. In contrast, the up-regulated genes in hyper-methylated regions were strongly associated with locomotory behavior, BMP signaling pathways, and collagen fibril development processes. Similarly, identified genes in liver hyper-methylation and upregulated were involved in biological important processes such as cholesterol biosynthesis, bile acid, and bile salt transport, response to glucose, and the immune response mechanism. As well, we found that downregulated genes have a role in the embryonic skeletal system, signaling pathways, cell adhesion, etc. Each rectangle in the treemap represents a single cluster which joined into 'superclusters' of loosely related terms, visualized with different colors (Fig. 3B & Additional file S3].

Conclusions
In conclusion, an integrated methylation and RNA-seq analysis provided a comprehensive overview of methylation and transcriptomic pattern in backfat and liver tissues. The results indicate that methylation dominantly occurred during backfat development at the CpG island in order to control aberrations. We have identified 16 common genes that were highly expressed and differentially methylated and could be used as potential markers in molecular breeding processes and to enhance biological relevant tissues.

Preparation of gDNA and Total RNA and Sequencing
In the present study, Nanchukmacdon pigs were grown in farm of National Institute of Animal Science located on Jeju island with close monitoring of their health. Five boars with average age of 26 months were randomly selected for effective size calculation to collect samples for WGBS, and RNA-seq analysis (n=5 for liver and backfat tissues, respectively). The pigs were euthanized with an anesthetic injection given into the ear vein with an overdose of Alfaxan (0.7 mg/kg) and bulk tissue (10 mg) was thereafter collected [31]. Subsequently, samples were stored in a sterile container and immediately frozen at −70°C until further analysis. Ethical committee of National Institute of Animal Science (NIAS) approved and verified all the experimental procedures and followed ARRIVE guidelines to perform the experiments [32]. Genomic DNA was isolated using a DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA), and total RNA was isolated using the TRIzol method according to manufacturer protocols. The concentrations of DNA and RNA were determined using a Qubit fluorometer (Invitrogen, UK), NanoDrop (Thermo Scientific, USA), and 364 Bioanalyzer (Agilent, UK), and integrity was monitored by agarose gel electrophoresis. gDNA from Nanchukmacdon pig backfat and liver was subjected to bisulfite conversion using the fragment size (250 bp±25 bp), and the sequencing libraries were constructed as previously described [6]. Similarly, RNAseq data were generated for Nanchukmacdon pig (N=5) after RNA isolation of backfat and liver tissues using the TRIzol method following the prescribed protocol and previously described [33]. The sequencing libraries were constructed using a RNA sample preparation kit (Illumina, San Diego, CA, USA), and they were run in the Illumina NovaSeq instrument for 50 × 2 cycles.

DMRs and DEGs analysis of WGBS and RNA-seq data
The WGBS data were analyzed as previously described to understand methylation patterns in the identified genes [34]. Trim_galore was utilized for quality check of sequencing data [35] subsequently, sam_blaster was used to remove duplicate reads from the alignment following reads were sorted using SAMtools [36]. The reads were then mapped to the reference genome of sscrofa11.1 using Bismark [37] and the methylation level in CpG, CHH, and CHG island measured using bismark methyl extractor. Sorting of Bam file was undertaken before running the methylcall and performed with an average conversion rate of >99.4 % by applying filters based on a minimum coverage of 10 and a mapping quality of at least 10. Since we were interested in identifying the differential pattern in the respective tissues, we later performed DMR studies across backfat and liver using the methylKit an R package [38][39][40]. Logistic regression approach was implemented to model the odd log probability to observe the ratio. Following, false discovery rate (Q ≤ 0.01) and percent methylation difference larger than 25 % were selected and DMRs were extracted.
Similarly, we performed a RNA-seq analysis as it enables a comprehensive understanding of the expression pattern of tissue-specific changes in genes. With statistical advanced tools, we performed a quality check by FastQC to assess low-quality pair-end reads [41] and further removed potential adapters by using the Trimmomatic tool before sequence alignment [42]. Paired-end reads were aligned to the S. scrofa genome (Sscrofa11.1) using Hisat2 [43,44] following, read count was performed using FeatureCount [22]. Finally, DESeq2 [45] was used to identify DEGs.

De novo motif discovery
Hyper-methylated regions were predicted with a cutoff of ±25 in DMRs in backfat and liver. We were interested in understanding the motif for these methylated regions in GC% of the CpG island found near to the transcription start site and these motif analysis was performed by findMotifsGenome.pl module of HOMER software at the default parameter [46]. Rank-wise motifs were detected with sorted p-value, %target, and %background targets.

Functional enrichment analysis of methylated genes with differentially expressed genes
After identifying DEGs commonly found in backfat and liver, methylated regions with FDR ≤ 0.05 and log2Fold-Change ≥±2 were compiled and submitted to DAVID v6.8 [24] for functional annotation and enrichment analysis. For each list, enriched Gene Ontology (GO) studies were performed for BP, MF, and CC. ReviGO a web server utilize the GO terms to present a treemap from respective process and clustered semantically with different colors [47] and the Clusterprofiler R package [48] was used for summarizing the GO terms.

CpG island and methylation pattern analysis
Based on DMRs, we aimed to identify regions either inclined towards backfat or liver by comparing CpG island coordinates retrieved from the UCSC genome browser [49]. A total of 46,218 regions were retrieved across the genome by following the Table browser with the pig genome assembly Sscrofa11.1 as the reference and selected a track for the CpG island. The identified island was used to extract DMRs located in the genomic coordinates and we extracted the region of interest that plays a crucial role in tissue diversification.
Additional file 1 Figure S1. Comparative methylation pattern of identified genes using SeqMonk.