Role of DNA methylation on the association between physical activity and cardiovascular diseases: results from the longitudinal multi-ethnic study of atherosclerosis (MESA) cohort

Background The complexity of physical activity (PA) and DNA methylation interaction in the development of cardiovascular disease (CVD) is rarely simultaneously investigated in one study. We examined the role of DNA methylation on the association between PA and CVD. Results The Multi-Ethnic Study of Atherosclerosis (MESA) cohort Exam 5 data with 1065 participants free of CVD were used for final analysis. The quartile categorical total PA variable was created by activity intensity (METs/week). During a median follow-up of 4.0 years, 69 participants developed CVD. Illumina HumanMethylation450 BeadChip was used to provide genome-wide DNA methylation profiles in purified human monocytes (CD14+). We identified 23 candidate DNA methylation loci to be associated with both PA and CVD. We used the structural equation modeling (SEM) approach to test the complex relationships among multiple variables and the roles of mediators. Three of the 23 identified loci (corresponding to genes VPS13D, PIK3CD and VPS45) remained as significant mediators in the final SEM model along with other covariates. Bridged by the three genes, the 2nd PA quartile (β = − 0.959; 95%CI: − 1.554 to − 0.449) and the 3rd PA quartile (β = − 0.944; 95%CI: − 1.628 to − 0.413) showed the greatest inverse associations with CVD development, while the 4th PA quartile had a relatively weaker inverse association (β = − 0.355; 95%CI: − 0.713 to − 0.124). Conclusions The current study is among the first to simultaneously examine the relationships among PA, DNA methylation, and CVD in a large cohort with long-term exposure. We identified three DNA methylation loci bridged the association between PA and CVD. The function of the identified genes warrants further investigation in the pathogenesis of CVD. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08108-w.


Conclusions:
The current study is among the first to simultaneously examine the relationships among PA, DNA methylation, and CVD in a large cohort with long-term exposure. We identified three DNA methylation loci bridged the association between PA and CVD. The function of the identified genes warrants further investigation in the pathogenesis of CVD.
Keywords: Cardiovascular disease, Physical activity, DNA methylation, MESA, Structural equation modeling Background Cardiovascular disease (CVD) is the leading cause of mortality in the United States [1]. Understanding contributors to the etiology of CVD will help us better control the disease at the level of primary prevention [2]. The etiology of CVD involves complex multifactorial mechanisms including environmental and genetic factors [3]. Among the environmental or behavioral exposures, physical activity (PA) has consistently been shown in epidemiological studies to be associated with reduced risk of CVD [4]. Although multiple mediating mechanisms have been proposed (e.g., an anti-inflammatory response, insulin-sensitivity) [4,5], our knowledge about the underlying mechanisms of the benefits of PA on cardiovascular health is still sparse. Owing to the evolutionary development of genomic study methods, recent studies suggest that epigenetic modifications are also involved in the association between PA and CVD [6].
DNA methylation is the best-studied epigenetic modification. It is a dynamic epigenetic mechanism in several cellular process such as regulating gene expression, genomic imprinting, and X-chromosome inactivation [7]. In eukaryotes, the methylated cytosine (5-methylcytosine, 5mC) occurs predominantly at C followed by guanine (G) forming the so-called CpG site. Although the result of DNA methylation is usually linked to the silencing of genes [8], growing evidences showed that DNA methylation can also increase gene expression in some instances [9]. Epigenome-wide association studies (EWAS) have shown that many DNA methylation loci are associated with CVD traits [10,11] and influenced by physical activity [12]. However, because the associations between PA or CVD and DNA methylation were studied separately in previous studies, the role of DNA methylation on PA's benefits on cardiovascular health was only deduced from the biological functions of identified genes [13]. There is a lack of evidence directly showing the complexity of PA-DNA methylation interaction in the development of CVD [14].
To investigate the complex relationships among PA, DNA methylation, and CVD, we applied the structural equation modeling (SEM) approach to a cohort. Compared with common statistical methodologies, SEM allows for testing of complex relationships among multiple variables including the roles of mediators [15].
In epigenetic epidemiology, it is useful to investigate whether DNA methylation is a component of the pathway linking the exposure and the outcome [14]. In the current study, we evaluated the role of DNA methylation on the longitudinal association between PA and CVD, along with other covariates, using data from the Multi-Ethnic Study of Atherosclerosis (MESA).

Data source and study cohort
The MESA study is a prospective cohort study designed to investigate the prevalence, associations, and progression of subclinical CVD in 6814 men and women aged 45-84 years and free of clinical CVD at the first examination ( Exam 1, 2000Exam 1, -2002. Following Exam 1, five follow-up examinations (Exam 2-6) have occurred. Written informed consent was obtained from all participating subjects under research protocols approved by the review boards of the six filed centers. A detailed description of the MESA study be found elsewhere [16]. Corresponding to Exam 5 (2010Exam 5 ( -2012, the MESA Epigenomics and Transcriptomics Study (2010-2012) was conducted to investigate the association between DNA methylation and gene expression in purified human monocytes (CD14 + ) [17]. Asian participants were not included since the study was not available for participants from two of six MESA sites [18]. The DNA methylation data are publicly accessible through the GEO website with accession ID GSE56046.
In this study, we linked the MESA DNA methylation data to Exam 5 data through unique MESA participant IDs. As a result, we used exposure (physical activity), covariate, and DNA methylation data of 1264 participants collected in 2010-2012 (Exam 5). Data were analyzed and compared for CVD outcomes occurring through December 31, 2015. We excluded participants with prevalent CVD at baseline (n = 155) and those with missing exposure or covariate data (n = 44). The final sample size of the analyzed cohort was 1065.

Variables and analysis Physical activity
Physical activity and sedentary behaviors were assessed through an interviewer-administered MESA Typical Week Physical Activity Survey (TWPAS) which was designed to estimate participants' habitual activities as of their last interview. The survey contains 28 question item categories asking participants about the duration of time when doing each category of activities. The unit of the continuous activity variables was measured as both minutes per week and metabolic equivalence of tasks (METs) per week. According to the MET value of each activity, the intensity was classified according to MET value as sedentary behavior (METs ≤1.5), light PA (1.5 < METs < 3.0), moderate PA (3 ≤ METs < 6), and vigorous PA (METs ≥6) [19,20]. In the screening stage of this study, the continuous total PA variable was defined as the sum of total METs per week spent doing light, moderate, and vigorous PAs. The quartile categorical total PA variable (METs/week) was created from the continuous total PA variable used both in screening stage and in the SEM model to examine potential non-linear pattern. In addition to total PA, each PA components (i.e., light, moderate, and vigorous PA), two combinations of PA intensity (i.e., moderate to vigorous PA [MVPA] and exercise PA), and leisure sedentary behavior were also used for screening PA related DNA methylation loci to increase the screening power. The detailed descriptions of all PA variables are listed in Supplementary Table S1.

CVD outcomes
Each MESA participant was contacted every 9-12 months to collect information on new CVD conditions, which were classified using medical records and death certificates [16]. MESA assessed 11 individual cardiovascular events. The detailed descriptions of all CVD outcomes are listed in Supplementary Table S2. In this study, CVD (all types), which was the combination of all 11 individual cardiovascular events, was used as the CVD outcome in the SEM model. The endpoint outcome was the binary response (Yes/No) to CVD (all types) occurring through December 31, 2015. However, in the screening stage of DNA methylation loci identification, all individual CVD outcomes were used separately to increase the sensitivity of screening.

Covariates
Atherosclerotic CVD (ASCVD) 10-year risk score is developed by the American College of Cardiology (ACC) and American Heart Association (AHA) in collaboration with the National Heart Lung and Blood Institute (NHLBI). The score is in a percentage form to estimate the risk of a hard ASCVD event within the next ten years [21,22]. The risk estimator is calculated based on an individual's sex, race, age, total cholesterol (mg/dL), high-density lipoprotein cholesterol (HDL, mg/dL), systolic blood pressure, use of blood pressure medication, smoking status, and diabetes status. The calculation additionally takes into account the potential interactions between these variables.
Some potential confounders associated with CVD in the literature [23][24][25], but absent in the ASCVD score, included ethnicity (Hispanic or not), body mass index (BMI), waist circumference (WC, cm), and coronary artery calcium (CAC) score. BMI and WC were categorized into three categories based on well-established definitions [26]. CAC score is the quantification of the coronary artery calcification by computed tomography (CT) [27]. In this study, we used the Agatston CAC score and categorized it into four levels (0, 1-100, 101-400, > 400) according to one of the most widely used classification systems [28].

Identification of DNA methylation locus
Illumina HumanMethylation450 (450 K) BeadChip was used to provide genome-wide DNA methylation profiles with over 485,000 DNA methylation loci. Quality control was descripted in detail in a previous study [18]. Prior to identification, CpG sites were filtered according to beta values (cut-off = 0.2) to exclude background changes [29]. A two-stage strategy was then used. In the first stage (screening stage), EWAS were performed using limma and preprocessCore packages in R [30,31]. Separate univariate logistic regression models were fitted using the normalized M-value for each CpG site as the dependent variable. Each of the PA related variables (continuous or categorical) or each of the CVD outcome variables (binary) was used as the independent variable. P-values were adjusted for multiple testing using the Benjamini-Hochberg method for controlling the false discovery rate (FDR), which was discussed in our previous publications [32,33]. CpG sites with adjusted p-value < 0.05 were selected as potential candidates. All candidate CpG sites screened from regression models with PA related variables were pooled as Set A, and all candidate CpG sites from models with CVD outcome variables were pooled as Set B. In the second stage (identification stage), CpG sites overlapped by Set A and Set B were finally identified as the final candidate DNA methylation loci for the further SEM model. The corresponding genes were determined using Illumina annotation. Volcano plots and heatmaps were created using ggplot2 and pheatmap packages in R [34,35] to show the differentially methylated level on the identified DNA loci.

SEM model
SEM analyses were performed to examine the proposed conceptual model, namely, the role of DNA methylation on the association between PA and CVD. Model fits were conducted using Mplus version 8.5 (Muthen and Muthen, 2017). All the parameters in the SEM model were estimated by a maximum likelihood method [36]. The estimated coefficient (a.k.a. coefficient load, β) generated from the SEM model was used to measure the strength of association between two variables. Prior to model fitting, the correlation matrix of all identified DNA methylation loci was examined based on M-values to avoid multicollinearity. We did SEM model selection by including the categorical total PA variable (METmins/week), CVD (all), and all identified DNA methylation loci, along with all covariates. Association was considered non-significant if the coefficient load β had a pvalue > 0.1. Variables lacking significant associations with other variables were removed from the SEM model. The final SEM model was fitted using variables with at least one significant direct or indirect path from PA to CVD. Sensitivity analyses were conducted by fitting SEM models and substituting the continuous PA variable for the categorical PA variable or excluding mediators from the final SEM model.

Demographic and clinical characteristics of the studied samples
During a median of 4.0 years' follow-up, 69 of 1065 participants had a cardiovascular disease event. The cumulative incidence is 6.5%. The baseline characteristics of the 1065 participants revealed an average age of 69.2 years (range: 54-92 years) with a slightly higher proportion of females (52.3%) than males ( Table 1). The largest group of the participants were non-Hispanic White (46.9%), followed by Hispanics (31.3%) and non-Hispanic Blacks (21.8%). Compared with participants who did not develop any CVD event (non-incident cases), incident cases were more likely to be older people (72.7 years vs 69.0 years), to have hypertension (71.0% vs 57.0%), and undergoing hypertension treatment (69.6% vs 52.4%). Notably, compared with non-incident cases, incident cases were more likely to be identified in the lowest PA quartile (39.1% vs 24.4%), but less likely to be in the second (17.4% vs 25.5%) or third PA quartile (15.9% vs 25.5%). At baseline, the mean ASCVD (atherosclerotic cardiovascular disease) score of the future incident cases was higher (28.5%) than that of non-incident cases (19.3%). Also, incident cases were more likely to have a CAC (coronary artery calcium) score greater than 100 (55.0%) than non-incident cases (33.6%).
As shown in Table 2, several baseline summary measures were significantly different across the PA quartiles (PA Q1-4). From the PA Q1 to Q4, the mean ages of participants decreased from 72.6 years to 66.0 years, which indicated that older people were more likely to be physically inactive. A higher proportion of males (55.1%) than females (44.9%) were observed in PA Q1. Females were dominant (> 54.0%) in all other three higher quartiles (PA Q2-3), indicating that females were more likely to be physically active than males. Regarding the clinical characteristics, hypertension prevalence at baseline was higher in lower PA quartiles (i.e., PA Q1 and Q2; > 62.0%) than that in upper PA quartiles (i.e., PA Q3 and Q4; < 55%). Notably, the mean ASCVD score was highest in PA Q1 (25.9%) and lowest in the PA Q4 (15.8%). Similarly, the proportion of the participants with a high CAC score (i.e., greater than 400) decreased from 20.6% (PA Q1) to 14.4% (PA Q4). Table 3 displays the cumulative incidence of each CVD outcome across the PA quartiles. Besides overall CVD (6.5%), the most frequently identified outcome during the follow-up was hard CVD (3.9%), followed by stroke (2.3%). Across the PA quartiles, it is noticeable that more CVD (all) incident cases were observed in PA Q1 and PA Q4 with the cumulative incidences of about 10.0%. The middle two quartiles (i.e., PA Q2 and Q3) had less CVD (all) incident cases, with no more than 4.5% cumulative incidence. A similar "U" shape pattern was also observed for the CVD (hard) outcome. In contrast, stroke incidence did not have this pattern. The highest stroke cumulative incidence (5.2%) was observed in PA Q1. All the other three quartiles (PA Q2-4) had lower incidences (≤ 1.5%). Although the interesting "U" shape pattern was observed in this study, the similar pattern was not appreciable in cross-sectional studies using MESA dataset [37].

DNA methylation loci identification
Representative volcano plots ( Fig. 1a and b), Manhattan plots (Fig. S1), and Q-Q plots (Fig. S2) show the differentially methylated loci identified at the screening stage. The total number of candidate loci identified from each variable is presented in Supplementary Tables S3 and S4. As shown in the Venn diagram (Fig. 1c), a total number of 1048 loci were selected from the EWAS on PA variables (Set A), and 124 loci were from the EWAS on all the CVD variables (Set B). The overlapped 23 loci were then identified from the two sets. The detailed information of the overlapped 23 loci and corresponding gene names are listed in Supplementary Table S5. The heatmap shows the differential degree of methylation of the 23 loci grouped by four PA quartiles (PA Q1-4) and CVD (all) outcome (Fig. 1d).

SEM model
The final SEM model was fitted among PA, 3 identified loci and CVD, along with factors such as waist circumference (WC), ASCVD score, CAC score, and ethnicity. 20 loci were excluded during the model selection due to non-significant correlations with CVD outcome. Similarly, sedentary time and BMI were also excluded due to a lack of correlation with other variables. The remaining Table 1 Baseline demographic, behavioral, and risk factor characteristics by CVD outcome during follow-up (mean, SD or N, %)   Table 3 Total number of cases (N) and crude cumulative incidences (%) of CVD outcomes during follow-up by baseline physical activity (PA) intensity three DNA methylation loci were cg17850088, cg04287259, and cg17078656, which corresponded to gene VPS45, PIK3CD, and VPS13D, respectively. Figure 2 shows the path diagram of the final SEM with estimated standardized regression coefficients (β). A positive load suggested a positive association with CVD incidence (i.e., a "stimulation" path), while a negative load suggested an inverse association (i.e., an "inhibition" path). Focusing on the correlations among PA, DNA methylation, and CVD, direct loads on CVD were found from PA quartile 2 (PA Q2 ⇒ CVD, β − 0.782), PA quartile 3 (PA Q3 ⇒ CVD, β − 0.805), and the three genes  . PA quartile 2 (PA Q2) and PA quartile 3 (PA Q3) showed negative loads on CVD (i.e., inhibition on CVD incidence), with the greatest absolute value in the association from PA quartile 3 (|β| = 0.805). Among the three genes, VPS45 and PIK3CD had positive loads ("stimulation") on CVD, indicating they were positively associated with CVD incidence. In contrast, gene VPS13D was inversely associated with CVD incidence ("inhibition") due to a significant negative load. Located in the central part of the path diagram, the three genes, along with ASCVD score, CAC score, and WC, engaged as mediators bridging other variables to CVD through indirect paths. Counted from the total number of path arrows pointed to the mediators, gene VPS45 (seven in total) was found to have the largest number of indirect paths mediating to CVD, followed by mediators such as gene PIK3CD, ASCVD score, and gene VPS13D.
The standardized total effect, direct path effect, and indirect path effects are summarized in Table 4. We calculated the total effects of PA on CVD from the sum of all direct and indirect path effects. . However, different from PA Q2, PA Q3 and Q4 started to have indirect "stimulation" path to CVD incidence. The total effect of PA Q3 on CVD consisted of three "inhibition" paths, and one "stimulation" path through VPS13D (β = 0.084). The total effect of PA Q4 on CVD was the summation of five "inhibition" paths, and one "stimulation" indirect path through VPS13D (β = 0.096). Despite the existence of the "stimulation" paths, the total effects of all PA quartiles were negative, indicating inverse associations ("inhibition") with CVD incidence. However, the overall association of PA Q4 with CVD was much weaker (β = − 0.355) than PA Q2 (β = − 0.959) and Q3 (β = − 0.944). In sensitivity analyses, the coefficients and significant paths between CVD and other variables were robust in all models described in the Methods section.

Discussion
The current study is the first to simultaneously examine the relationships among PA, DNA methylation, and CVD in a large multiethnic cohort with long-term exposure. The major finding of the present work is the identification of three DNA methylation loci (corresponding to genes PIK3CD, VPS45, and VPS13D) playing as mediators linking the association between PA and CVD. Although we used the screened significant loci to fit the SEM model, the p-values distribution of the screened loci in SEM complied uniform pattern. Results of the SEM depicted a network with both direct and indirect pathways showing how PA was associated with the reduced CVD incidence through different genes and/or other risk factors. Consistent with most previous studies [38], we showed that the increase of habitual PA time was associated with decreased incidence of CVD, although not totally linear (strongest with PA Q2 and Q3, but relatively weaker with Q4). The magnitude of the association between CVD and PA Q2 (β = − 0.959) or Q3 (β = − 0.944) was equivalent to > 30% of the association between CVD and ASCVD score (β = 2.454). Such strong quantified associations highlighted the importance of being physically active in the reduction of CVD incidence. However, the highest level of PA quartile (PA Q4) was found to have the weakest protective effect on the cardiovascular system due to the greatest contribution of the "stimulation" path components ( Fig. 2 and Table 4). The reduction of the inhibition effect of PA as the intensity approaches vigorous activity has been previously observed [39,40]. Additionally, growing evidence has shown that vigorous activity can even acutely and transiently increase the risk of CVD in susceptible persons with congenital anomalies or existing cardiac diseases [41]. It is noted that in the present study we fitted models using total PA, which may be different from the studies focusing on vigorous PA or excessive exercise [20,37]. The difference may explain why we observed harmful components (with overall protective effect) of heigh level PA, but not overall harmful effect reported in some previous studies [42]. Although the mechanism by which vigorous PA has harmful components or effect on the cardiovascular system is not well understood [43], we provided one promising biological mechanismthe epigenetic modification of gene VPS13Dbased on the role as a mediator in the SEM model.
The biological functions of the identified genes in the SEM model were strongly related to the development of atherosclerosis. As the dominant cause of CVD, atherosclerosis is in fact a chronic inflammatory condition involving immune cells (e.g., macrophages, etc.) and pro-inflammatory cytokines [44]. The monocyte-derived macrophage migrates to lesions in the vessel and contributes to every phase of atherogenesis [45]. The initial atherosclerotic plaque is characterized by the accumulation of lipid-laden macrophages. The activated macrophages then secrete cytokines (e.g., TNF-α and IL-1), enzymes, and growth factors (e.g., PDGF), and contribute to the further development of atherosclerotic plaque [46]. As the key mediators in the SEM model, genes VPS13D and PIK3CD are both related to monocytemacrophage differentiation [47]. The VPS13D (vacuolar protein sorting 13 homolog D) gene encodes a protein that plays an important role in autophagy [48]. The induction of autophagy is found to be essential for the survival and differentiation of monocytes to macrophages [49]. PIK3CD (phosphatidylinositol-4,5-bisphosphate 3-kinase [PIK3] catalytic subunit δ) encodes a subunit of the enzyme PIK3 which phosphorylates inositol and is involved in the immune response through the PIK3/AKT pathway [50]. The PIK3/AKT/mTOR pathway has been demonstrated to regulate monocyte-derived macrophage activation and polarization [47]. Furthermore, PIK3 was also found to limit the production of TNF-α in activated monocytes [51] and to regulate the activation of platelet [52]. Notably, the polymorphism of the genes VPS13D and PIK3 were also identified to be associated with CVD in other cohorts [52,53]. Gene VPS45 (vacuolar protein sorting 45 homolog) is currently not well studied. The knowledge of VPS45related disease in humans is limited to its association with congenital neutropenia [54]. Considering the important effect of VPS45 identified in our SEM model, future biological studies on VPS45 and its role in the development of CVD are promising.
Several limitations in this study are noted. First, the habitual PA time was collected through surveys at Exam 5 along with the collection of samples for epigenetics. Although the interviewers were well trained, recall bias may have been introduced because of self-reporting. Also, although the formation of PA habitus occurred prior to the collection of epigenetics information, the time order of PA habitus and epigenetic pattern formation was still not clear. Second, the time-varying variables are difficult to control in an observational study without a follow-up visit between baseline and endpoint. Third, a survivorship bias was introduced when we used MESA Exam 5 as a baseline and excluded the CVD prevalent cases. By Exam 5, the MESA study has continued for nearly 10 years since Exam 1. People who did not develop CVD during the last four visits were possibly those who were less likely to have CVD due to genetic or environmental factors. However, since the DNA methylation data were collected during Exam 5, the introduction of a survivorship bias is inevitable. Fourth, although the MESA study included Asian-American participants, this group was not included in the current study because they were not represented in the MESA Epigenomics and Transcriptomics Study. Thus, the interpretation of the results may not be generalizable to Asian-American populations. In addition, because of the potential heterogeneity [55] and misclassification [56] in the Hispanic population of the MESA cohort. The interpretation of the direct effect of Hispanics on CVD had limitation. Fifth, although two of the three identified genes in this study were also identified by other GWAS cohorts, the current study was the only study reported the association between CVD and the methylation of the genes PIK3CD, VPS45, and VPS13D. The possibility of false positive results should be noted due to the lack of replication. Finally, the effects of genes in this study were only indicative due to the lack of causal inference regarding both statistics and biology. The term "mediator" used in this study only defined the linking variable in SEM, without demonstration of causal mediation effects. Although it is reasonable to assume that DNA methylations mediate the effects of an exposure on a disease since DNA methylation is malleable to environmental changes [57,58], causation assumptions in highdimensional omics data are usually not justified without the application of approaches such as global test [59] and Mendelian randomization [60]. It is noteworthy that we interpreted the "stimulation" or "inhibition" effects of genes based on the positive or negative signs of the coefficients, and the assumption that DNA methylation commonly represses gene expression. However, growing evidences showed that DNA methylation can also increase gene expression in some instances [9]. Thus, the actual biological roles of identified genes should be further studied by mechanism research.
Interpreted with a few limitations in mind, the current study is still among the first to simultaneously examine the relationships among PA, DNA methylation, and CVD in a large multiethnic cohort with long-term exposure. The identified genes might have the potential to be therapeutic targets for future CVD prevention or to be involved in the modification of CVD risk score.

Conclusion
Our study was among the first to simultaneously examine the complexity of physical activity (PA) and DNA methylation interaction in the development of (CVD) in a large cohort. The major finding of the present work is the identification of three DNA methylation loci (corresponding to genes PIK3CD, VPS45, and VPS13D) bridging the association between PA and CVD. The function of those three genes warrants further investigation in the pathogenesis of CVD, with potentially therapeutic or predictive implications for clinical care.