The effect of smoking on DNA methylation of peripheral blood mononuclear cells from African American women

Background Regular smoking is associated with a wide variety of syndromes with prominent inflammatory components such as cancer, obesity and type 2 diabetes. Heavy regular smoking is also associated with changes in the DNA methylation of peripheral mononuclear cells. However, in younger smokers, inflammatory epigenetic findings are largely absent which suggests the inflammatory response(s) to smoking may be dose dependent. To help understand whether peripheral mononuclear cells have a role in mediating these responses in older smokers with higher cumulative smoke exposure, we examined genome-wide DNA methylation in a group of well characterized adult African American subjects informative for smoking, as well as serum C-reactive protein (CRP) and interleukin-6 receptor (IL6R) levels. In addition, complementary bioinformatic analyses were conducted to delineate possible pathways affected by long-term smoking. Results Genome-wide DNA methylation analysis with respect to smoking status yielded 910 significant loci after Benjamini-Hochberg correction. In particular, two loci from the AHRR gene (cg05575921 and cg23576855) and one locus from the GPR15 gene (cg19859270) were identified as highly significantly differentially methylated between smokers and non-smokers. The bioinformatic analyses showed that long-term chronic smoking is associated with altered promoter DNA methylation of genes coding for proteins mapping to critical sub-networks moderating inflammation, immune function, and coagulation. Conclusions We conclude that chronic regular smoking is associated with changes in peripheral mononuclear cell methylation signature which perturb inflammatory and immune function pathways and may contribute to increased vulnerability for complex illnesses with inflammatory components.


Background
Smoking is the largest preventable cause of morbidity and mortality in the United States. It largely exerts these effects by increasing liability to complex disorders, such as cancer, chronic obstructive pulmonary disease (COPD), type 2 diabetes (T2DM) and obesity [1]. Smoking driven chronic diseases contribute to early death, disabilities, and strain the health care system [2]. Therefore, understanding the mechanism(s) through which smoking increases vulnerability to these disorders may establish new avenues for prevention or treatment of these complex disorders.
Although some of the details remain unclear, one of the key mechanisms through which smoking may increase liability to these complex disorders is inflammation.
Although serological quantification of well characterized serum markers such as C-reactive protein (CRP) and interleukin 6 receptor (IL6R) may provide a partial understanding of inflammatory changes with respect to smoking, this approach provides limited comprehension of molecular perturbation at a genome-wide scale.
A surge in recent publications has suggested that smoking associated changes in DNA methylation may contribute to these perturbations. This surge began with sporadically published single gene studies that linked smoking to changes in MAOA promoter methylation as well as to an increased risk for coronary heart disease mediated through methylation changes at F2RL3 [3,4]. But in the past two years, these more limited examinations have been joined by several genome-wide investigations that have identified a growing number of loci whose methylation status is associated to smoking. The initial attempt at a more systematic approach was reported by Breitling and colleagues who used the Illumina HumanMethylation 27K BeadChip to probe DNA from peripheral mononuclear cell pellets and identified several candidate loci including F2RL3 (cg03636183), GPR15 (cg19859270) and ORAI2 (cg02564523) [5]. The first truly genome-wide results using the then newly introduced Illumina HumanMethylation 450K BeadChip were first reported by Monick and colleagues who studied methylation in lymphoblast and lung macrophage DNA and found a large number of loci with particular emphasis on differential methylation at the Aryl Hydrocarbon Receptor Repressor (AHRR) [6]. Six months later, using cord blood, Joubert and colleagues confirmed and extended these findings at AHRR and further nominated GF1 and CY1A1 as genes affected by smoking status [7]. Finally, in a study just published, Zeilinger and colleagues identified a larger set of findings that confirmed the prior loci noted above and extended the gene list to include loci such as HIVEP3 (cg15542713), and CACNA1D (cg15417641 and cg21188533) [8].
These genome-wide finding present a potential portal for a better integrated understanding of pathways through which smoking potentially accelerates disease states. Previous studies have established several smoking associated disease pathways. One such is the cyclooxygenase-2 (COX-2) pathway where the expression of COX-2 induced by smoking leads to an increase in prostaglandin E2 (PGE2) that mediates tumor progression and an increase in thromboxane A2 (TxA2), contributing to tumor growth [9]. The COX-2 pathway that has been shown to be involved in numerous smoking related cancers is only one in many similar pathways that could conceivably hold the key to designing new therapeutic interventions and in some instances aid in improving current medical care.
A more recently developed method of identifying pathways perturbed by smoking is by coalescing information from network biology and epigenetics, specifically DNA methylation. DNA methylation is a critical mediator between the genome and the environment. Although genome-wide DNA methylation analyses allow the identification of differentially methylated CpG sites and genes, there is growing evidence that genes do not function independently, but rather in networks. Hence, it is crucial to not only delineate individual loci affected by long-term smoking, but, in addition, use network theory to translate that single locus methylation information into a more holistic understanding of the effects of smoking on the function of the proteome and the cell in general.
This global understanding is necessary to help understand the reasons why some individuals are affected more severely than others. Even though the rate of adults smoking is roughly equivalent in Whites and African Americans, African Americans disproportionately experience smoking associated medical comorbidities [10]. The reason for this disparity is not known but both genetic and sociological factors have been postulated. Epigenomic interrogation of this cohort would allow the disparity examination from a different perspective.
In prior studies, we have analyzed the relationship of smoking to peripheral mononuclear cell DNA methylation in young adult African American smokers [11]. To determine whether smoking and peripheral mononuclear cells may have a role in the systemic inflammation in long-term smokers, we conducted a study to examine the relationship of smoking to genome-wide DNA methylation in an independent and well characterized African American cohort who are older and have greater cumulative smoke exposure. In addition, we use a recently developed network based analysis technique to understand the effect of these smoking associated changes in DNA methylation on protein interaction networks. Our data suggest that regular chronic smoking is associated with highly significant differential methylation at a large number of loci including AHRR (cg05575921 and cg23576855) and GPR15 (cg19859270). Furthermore, the bioinformatic analyses demonstrate enrichment of loci involved in regulation of inflammation, immune function and coagulation.

Clinical characteristics of the cohort
The clinical characteristics of the 111 African American females from the FACHS project who participated in this study are given in Table 1. In brief, the subjects were middle-aged and tended to be severely obese. In this well characterized cohort, pack years was used as a quantitative measure of long-term cumulative smoking. Consistent with the high rate of medical comorbidities noted in this population, the average serum CRP levels (mg/l) were 5.0 in smokers (n = 50) and 2.9 in non-smokers (n = 61). The average serum IL6R levels (pg/ml) were 227 in smokers and 209 in non-smokers. The level of serum CRP (p < 0.05), but not the level of serum IL6R, was significantly higher in smokers.
As a first step of our analyses, we analyzed the relationship of the two serum inflammatory markers featured in this study, CRP and IL6R, to each other and to pack years. Figure 1 illustrates the results of those analyses. A positive correlation was present between CRP and IL6R. Significant positive association was only observed between pack years and CRP (p < 0.0092; Adj R 2 = 0.064).

Genome-wide DNA methylation analyses
We analyzed the genome-wide methylation data of peripheral mononuclear cell DNA using MethLAB with respect to smoking status, controlling for slide, plate and mixed cell population effects. This analysis was geared towards identifying differentially methylated CpG loci as a result of long-term smoking. Since inflammation is closely related to smoking, we speculated that the methylation of genes involved in inflammation would be perturbed. From the analysis, after genome-wide Benjamini-Hochberg correction, 910 probes remained significantly associated at the 0.05 level to smoking status. The 30 most significant CpG sites of this analysis are shown in Table 2 and all significant CpG sites are shown in Additional file 1. As hypothesized, the most significant gene noted was GPR15 (cg19859270) which has been implicated in immune function as a gene that encodes for the G protein-coupled chemokine receptor for human immunodeficiency virus type 1 and 2.
Two other CpG probes, cg05575921 and cg23576855 (ranked 2 nd and 4 th , respectively) were situated in the body of the aryl hydrocarbon receptor repressor (AHRR) gene. The protein encoded by this gene provides feedback inhibition of AHR activation of the xenobiotic pathway by several different mechanisms [12,13]. Our analyses indicate that on average, at these two loci, smokers tended to be hypomethylated. Moreover, our analyses are consistent with prior results of young adult African American smokers. In young adult African Americans, cg05575921 was also highly significantly affiliated to smoking and smokers tended to be hypomethylated. However, although hypomethylation was apparent at cg23576855 in both young and older smokers, this site was only significant with respect to long-term smoking and therefore could be a potential cumulative smoking biomarker.

Validation of Illumina array findings
Validation of genome-wide DNA methylation results is crucial, considering that the average beta values between smokers and non-smokers at numerous loci differed merely by about 0.1. Our previous study and this study  indicate that cg05575921 as opposed to cg19859270 is not only sensitive to long-term smoking, but also to early smoking. Therefore, cg05575921 being the more consistently sensitive smoking biomarker was independently validated using quantitative-PCR on a subset of 62 African American females. The characteristics and methylation levels from both methods for cg05575921 for the individuals in this subset is given in Additional file 2. A strong correlation of 0.94 (p-value < 0.0001) between these two methylation detection methods was observed (Figure 2(a)). Furthermore, alongside the cg05575921 methylation level obtained via the Illumina platform, this independent approach also identified cg05575921 as a biomarker for smoking (ANOVA p-value < 0.0001 for both Illumina and quantitative-PCR as depicted in Figures 2(b) and 2(c), respectively).

GoMiner analyses
We next analyzed the differential distribution of the methylation results using the traditional GoMiner algorithm which treats each probe as an independent assessment. These results are shown in Table 3. Surprisingly, after FDR correction, only 2 of the top 30 pathways were significant at the 0.05 level. Both of these pathways were associated with phospholipid transport. All average methylation values are non-log transformed beta-values. Island status refers to the position of the probe relative to the island. Classes include: 1) Island, 2) N (north) shore, 3) S (south) shore, 4) N (north) shelf, 5) S (south) shelf and 6) blank denoting that the probe does not map to an island. Protein-protein interactions: network analyses using the miPALM algorithm Even though conventional genome-wide DNA methylation analyses yielded consistent identification of differentially methylated CpG loci at the AHRR gene, this analysis did not capture the connectivity of co-regulated genes as a result of long-term smoking. Hence, we employed an alternative approach for understanding the possible effects of smoking associated DNA methylation changes on cellular function by reducing the dimensionality of data and taking advantage of our understanding of the protein interactome. We used the miPALM algorithm which employs a greedy search strategy on weighted networks to re-analyze the genome-wide methylation data (See figure on previous page.) Figure 2 Correlation between Illumina cg05575921 methylation and quantitative-PCR validation cg05575921 methylation and their respective association to smoking status. Panel a illustrates the correlation between cg05575921 methylation from the Illumina platform and cg05575921 methylation from an independent quantitative-PCR validation (r = 0.94, p < 0.0001). Panels b and c depict the relationship between both independently obtained cg05575921 methylation level and smoking status. utilizing only the subset of DNA methylation data mapped to gene promoter regions. Using this approach, smoking associated DNA methylation mapped to 10 significant protein sub-networks ranging from 5 (Figure 3(a)) to 24 (Additional file 3: Figure S1(b)) proteins with miPALM p-values ranging from p < 0.002 (Figure 3(a)) to p < 0.05 (Figure 3(c)). Figure 3 and Tables 4, 5, 6, 7 detail four sub-networks and the differential distribution of its constituent proteins with respect to Gene Ontology (GO) pathways. The first sub-network consists of 5 proteins, second consists of 6 proteins, third consists of 16 proteins and the fourth consists of 17 proteins. Consistent with our hypothesis, the BiNGO analysis demonstrated significant enrichment of proteins in the first and second sub-network to GO pathways involved in immune function (Tables 4 and 5), the third sub-network for the JAK-STAT signaling cascade (Table 6) and the fourth subnetwork for coagulation (Table 7). BiNGO analysis of the other 6 sub-networks resulted in pathways being enriched for molecular transport (Additional file 4: Figure S2(a)), wounding (Additional file 4: Figures S2(b) and (c)), cell signaling (Additional file 3: Figure S1(a)), RNA processing (Additional file 3: Figure S1(b)), and central nervous system (Additional file 3: Figure S1(c)). Please see Additional file 3, Additional file 4, Additional file 5, Additional file 6, Additional file 7, Additional file 8, Additional file 9, Additional file 10 for complete details on these 6 subnetworks.

Discussion
In summary, we confirm prior observations demonstrating that on average, the level of CRP is higher in smokers than non-smokers, consistent with increased inflammation levels in smokers. We also extend findings by ourselves and others on the effect of smoking on peripheral mononuclear cell DNA methylation to show that continued smoking is associated with altered methylation signatures at AHRR and in pathways associated with immune, coagulation and CNS function.
The strength of the genome-wide findings with respect to the effects of smoking on DNA methylation indicates a need for a better longitudinal understanding of the response of the genome to tobacco smoke. In two consecutive prior studies of younger African American smokers with considerably less total cumulative tobacco smoke exposure, we demonstrated that the principal changes of the DNA methylation signature in response to smoking were in the xenobiotic response pathway regulated by the AHRR Figure 3 Protein sub-networks identified by the miPALM algorithm and visualized using Cytoscape. Strength of the interaction between two proteins is depicted by line width. The color of each node represents the negative log p-value from the t-test (see color bar insert) between smokers and non-smokers after correction for multiple comparisons (-log P'). Sub-network (a) and (b) consists of 5 proteins (p < 0.002) and 6 proteins (p < 0.008) respectively, enriched for immune function based on the BiNGO analysis. Sub-network (c) consists of 16 proteins (p < 0.05) enriched for the JAK-STAT signaling pathway based on the BiNGO analysis. Sub-network (d) consists of 17 proteins (p < 0.03) enriched for coagulation based on the BiNGO analysis.
gene [11]. In contrast, in this group of African American female smokers, many of whom have been smoking for 25 or more years, the magnitude of the remodeling of the DNA methylation signature is significantly greater and extends to a number of other genes and pathways including those relevant to inflammation, immune, coagulation and CNS function. At the single gene level, the findings at GFI1 (probe rank 24), F2RL3 (probe rank 49) and GPR15 (probe rank 1) serve to directly confirm prior findings by other investigators [5,7,14]. At a more integrated level, the implication of alterations of pathways involved in immune/ inflammatory activities and coagulation (see Figure 3) is consistent with the known effects of smoking on risk for inflammatory related diseases and stroke [1,15].
Since there is a strong association between cigarette smoking and innate immunity, we hypothesized that networks enriching for immune system/inflammation would be generated. Two such networks are depicted in Figures 3 (a) and (b) that highlight the potential role of peripheral mononuclear cells in moderating vulnerability to some but not all complex illnesses with inflammatory components. The BiNGO analysis of network 3(a) indicated that the proteins are involved in the response to acute inflammation and complement activation. Specifically, MASP2 with a differential methylation adjusted p-value of 3.78 × 10 -6 is known to be a protease that activates the complement cascade, enabling the clearing of pathogens. It has been shown that MASP2 is also associated with the recurrence and survival rate of cancer (colorectal cancer in specific) [16]. Therefore, studying the effects of smoking in a weighted network approach allows us to speculate that patients with colorectal cancer may increase their survival chances and prevent the recurrence of cancer by avoiding smoking. In clinical settings, knowing smoking history could assist physicians in making better judgments of the efficacy of treatments.
Based on network 3(a), it can also be seen that MASP2 interacts strongly with MASP1 and FCN1. MASP1 is known to support MASP2 in the complement activation cascade [17] and this network validates that a strong interaction is present between these two serine proteases. MASP2, MASP1, and FCN1 play key roles in innate immunity. Aberrant methylation of these genes as a result of smoking could possibly impair an individual's Table 4 Top 10 pathways from BiNGO pathway analysis of protein sub-network depicted in Figure 3(  innate immunity, leading to the development, progression and recurrence of cancer. The network shown in Figure 3(b) is enriched for chemotaxis. Here, it can be noted that CCL13 with a differential methylation adjusted p-value of 0.00012 forms relatively strong interactions with MMP3, CCL2 and CCBP2. Again, similar to network 3(a), these interactions corroborate the role of smoking in the progression of inflammation. Our work also builds on the prior work by Breitling and colleagues who showed the role of smoking associated differential methylation at coagulation factor II (thrombin) receptor like-3 (F2RL3) in moderating coronary artery disease, which also has a strong inflammatory component and whose presence is positively associated with CRP [3,18]. Furthermore, the network in Figure 3(d) that enriches for coagulation showed F2 as the central protein, forming interactions with other proteins including F11 and CRB2. The miPALM analyses which generated pathways with important implications demonstrate that the pathways involving IL6R and F2RL3 may not be the only important inflammatory genes/pathways operant in peripheral mononuclear cells and suggest the possibility that smoking associated DNA methylation may act through a variety of mechanisms to increase systemic inflammation and vulnerability to some illnesses with an inflammatory component.
The time frame in which these cells begin to experience smoking associated inflammatory changes in DNA methylation is not clear. In our prior examinations of young adult smokers, these changes were not evident on genome-wide basis. But these subjects were relatively young (19)(20)(21)(22) years old) at the time of ascertainment and the cumulative smoking exposure of even the heaviest of the smokers was not as extensive as that the lightest of the current smokers [11]. Understanding when that shift to inflammatory processes occurs may be important to the allocation of smoking cessation resources. Although it is well established that in general smoking cessation decreases risk for adverse outcomes such as myocardial infarctions and cancer, this may not be true for all patients who quit smoking. According to the "pulmonary overflow" hypothesis, many of the inflammatory changes associated with smoking are not due to the direct effects of smoke but are rather secondary to the "spillover" of inflammatory cytokines from activated macrophages in the lung parenchyma. Because this inflammatory process tends to be self-perpetuating once it has been initiated, if "overflow" hypothesis is correct, stopping smoking in Table 6 All pathways from BiNGO pathway analysis of protein sub-network depicted in Figure 3(  individuals with severe diseases may have little effect on subsequent inflammatory changes in the peripheral mononuclear cell compartment. If so, focusing smoking cessation resources on those smokers who do not have substantial diseases as evidenced by either peripheral or direct measures, may lead to more effective use of prevention dollars.

Conclusions
In conclusion, we reiterate that smoking contributes to inflammation which is consistent with increase in the average serum inflammatory protein, CRP. We also demonstrate that methylation residues at the AHRR locus are sensitive to cumulative smoking and could be potential smoking biomarkers. However, the main finding in this well characterized cohort of long-term smokers is that differential methylation between smokers and non-smokers are key in delineating epigenetically contextual pathways and to identify highly connected networks that could be used for smoking related health diagnostics like cancer and coronary heart disease. We show the extensive effects of smoking on peripheral mononuclear cell DNA methylation on pathways involved in coagulation, CNS and immune function. Understanding aberrant methylation caused by smoking in a network setting provides an integrated stage for understanding complex diseases like cancer while designing potential therapies. Finally, smoking is an important confounder and should be included in future diagnostic models to accurately understand diseases.

Informed consent
The participants provided written consent to participate in this study. The written consent form, consent procedure, and protocols pertaining to this study were approved by the Institutional Review Board at the University of Iowa, the University of Georgia and Iowa State University.

Human subjects
The subjects described in the current study are adult African American females from the states of Iowa and Georgia who participated in data and biomaterial collection during the most recent waves of the Family and Community Health Study (FACHS). The FACHS study is a longitudinal examination of factors affecting the health and health related outcomes of rural African American families [19]. These adults were identified because they had an 11 or 12-year-old child who resided in eligible recruitment area in either Iowa or Georgia. These geographical zones of eligibility varied with respect to socioeconomic status and were not selected for any other characteristic except relative enrichment in African American residents. Lists of these potential caretaker-offspring dyads living in these areas were then generated by community officials in Georgia and school officials in Iowa.
To recruit these individuals in the study, potential participant families were selected randomly from these lists and were then sent an introductory letter of invitation followed by a phone call. Families expressing interest in the study were then provided with a complete description of the study and if still willing, consented. The protocols and procedures pertaining to this study were approved by the Institutional Review Board at the University of Iowa, the University of Georgia and Iowa State University.
Each participant in the FACHS study is interviewed every two years in their homes with a battery of structured and semi-structured instruments that characterize the medical and psychological status of each subject and environment in which the subject is embedded. The data and biomaterials used in this study were collected during the most recent wave (Wave 10) of the project. As part of that examination, subjects were phlebotomized to provide biomaterials for the current study with a subset additionally being interviewed with the tobacco module from the Semi-Structured Assessment for the Genetics of Alcoholism-II (SSAGA-II) [20]. Individuals who were actively smoking were characterized as smokers and those denying any use of tobacco products were characterized as non-smokers.

Genome-wide methylation profiling
For the current study, peripheral mononuclear cell DNA samples from FACHS subjects were assessed for genomewide methylation status using our standard protocols [6,11]. In brief, DNA was prepared from ficoll purified peripheral mononuclear cell DNA pellets using a Qiagen (Valencia, CA) DNA Mini Kit. Measurement of genomewide peripheral mononuclear cell DNA methylation status of FACHS participants was conducted using the Illumina (San Diego, CA) HumanMethylation450 Beadchip by the University of Minnesota Genome Center (Minneapolis, MN) using the protocol specified by the manufacturer [6]. This array contains 485,577 probes recognizing CpG positions of at least 20,216 known transcripts, potential transcripts or CpG islands. Subjects were randomly assigned to 12 sample "slides" with groups of 8 slides being bisulfite converted in a single batch. Four replicates of the same DNA were also included to monitor slide-to-slide and batch bisulfite conversion variability. The resulting data were inspected for complete bisulfite conversion and average beta values for each targeted CpG residue was determined using the Illumina Genome Studio Methylation Module, Version 3.2. The beta value at a CpG locus is the ratio between the intensity of the methylated probe to the sum of intensities of the methylated and unmethylated probes, also known as the total probe intensities. The resulting data was then processed using a custom PERL script to remove beta values with a detection p-value greater than 0.05.

Accounting for mixed cell population
In order to account for mixed cell population in peripheral mononuclear cell mixture, a regression calibration approach similar to that developed by Houseman and colleagues was performed [21]. Since the method developed by Houseman and colleagues was based on the Illumina HumanMethylation 27K BeadChip, we instead utilized the Illumina HumanMethylation 450K BeadChip data of purified cells (CD4+ T cells, CD8+ T cells, CD14+ monocytes, CD19+ B cells and CD56+ Natural Killer cells) contributed by Reinius and colleagues (GEO database under accession number GSE35069) and performed a regression in MethLAB, Version 1.5 [22,23]. The Benjamini-Hochberg method at a 0.05 significance level was used for genome-wide correction. This resulted in 815 sites being significantly differentially methylated with respect to cell types. The top 100 sites were then imported into JMP Version 10 (SAS Institute, Cary, USA) and principal components analysis (PCA) was performed. Based on the Scree plot of this PCA, the first 5 factors were chosen to be included in downstream genome-wide analyses to control for signal as a result of mixed cell population.

Analysis of genome-wide DNA methylation data
Genome-wide methylation data analyses were conducted using the R package, MethLAB, Version 1.5 as previously described [11]. Briefly, the genome-wide differential methylation between smokers and non-smokers were evaluated while controlling for slide, plate and mixed cell population effects. In MethLAB, the phenotype dependent differential methylation p-value at each CpG site is determined using a linear model [23]. Genome-wide false discovery rate was corrected using the method of Benjamini-Hochberg using the MethLAB default significance level of 0.05 [24].

Validation of differential methylation at cg05575921
An independent validation of the differential methylation at cg05575921 was performed on a subset of 62 individuals using quantitative-PCR primer probe set obtained from Behavioral Diagnostics Inc (Iowa City, IA). Quantitative-PCR was performed with an ABI7900 (Life Technologies, New York) using provided reagents, internal controls and standard manufacturer protocols. Interpolation of internal control methylation was used to deduce percent methylation of each sample. Pearson correlation was performed to determine the correlation between the methylation level obtained via the Illumina platform and that obtained from the quantitative-PCR. Also, the relationship between cg05 575921 methylation from the quantitative-PCR and smoking status was assessed using an analysis of variance (ANOVA) test.

Protein network analyses
Protein sub-networks pertinent to smoking were identified using our previously described methods [25]. In brief, the genome-wide methylation data were first divided into bins based on the position of the CpG site with respect to the gene. We then focused the analyses on CpG probes that mapped to the promoter region of the gene. This is due to the known influence of promoter methylation on transcription [26]. This resulted in a total of 22,609 genes or potential gene transcripts with at least one CpG probe mapping to their putative promoter region. Using the data from all probes mapping to a given promoter region, two promoter methylation vectors were constructed for each gene; one for smokers and the other for non-smokers. Using the statistical programming language, R [27], for each gene, an unpaired t-test was performed between the two vectors resulting in p-values of differential methylation between smokers and non-smokers, that were adjusted for multiple comparisons using the Benjamini-Hochberg method [24].
Human protein-protein interaction data were downloaded from the iRefIndex database [28]. This database provides a comprehensive and non-redundant version of protein-protein interactions (PPI) available in several other primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI, and OPHID. The 22,609 genes were then compared to the non-redundant PPI that were extracted. There were 29,912 interactions that mapped to the list of genes. In order to construct an edge-weighted network, the subset of interactions alongside the adjusted p-values was used to calculate the PPI edge weights.
The following equation was used to calculate the edge weight between two genes: Where w ij is the edge weight between genes i and j, p' i is the adjusted p-value of the t-test between smokers and non-smokers for gene i, p' j is the corresponding adjusted p-value for gene j, and p' min is the minimum adjusted p-value among the 22,609 adjusted p-values calculated for the respective 22,609 genes.
Computed edge weights were assigned to their respective PPI and the resulting output was used to identify significant sub-networks using the miPALM algorithm implemented in JAVA. Briefly, this algorithm, which uses a greedy search method, identifies highly connected subnetworks using a novel graph theoretic measure, parametric local modularity [25,29].
Statistically significant sub-networks at the 0.05 significance level were imported into Cytoscape, a bioinformatics platform that allows visualization and analysis of biological networks [30]. The Cytoscape plugin BiNGO was used to determine Gene Ontology (GO) categories of statistically overrepresented genes [31]. In addition, the adjusted p-values of individual genes were imported as a node attribute and visualized as a color gradient. Edge weights were used as an edge attribute to represent the strength of interactions illustrated by the width of the lines connecting nodes.
Enzyme linked immunosorbent assay (ELISA) Sera were obtained using serum separator tubes. After centrifugation, sera were frozen at -80°C until use. ELISA assessments of CRP and IL6R serum levels were conducted using a DuoKit from R&D Systems (Minneapolis, USA) according to manufacturer's recommendations.

Regression analyses
The analyses of clinical and serological data were analyzed using the suite of general linear model algorithms (e.g. ANOVA, Bivariate, ordinal logistic regression) contained in JMP, Version 10 (SAS Institute, Cary, USA). Where indicated, the portion of the variance explained by the model tested is given by the Adjusted R square (Adj R 2 ).

Availability of supporting data
The Illumina Human Methylation 450K BeadChip data used in this study has been deposited to the Gene Expression Omnibus (GEO) database. The GEO accession number is GSE53045.

Additional files
Additional file 1: All 910 significant CpG sites with respect to smoking status after genome-wide correction.
Additional file 2: Characteristics and average methylation levels at cg05575921 of a subset of 62 individuals used in an independent validation.
Additional file 3: Figure S2. Protein sub-networks identified by the miPALM algorithm and visualized in Cytoscape.
Additional file 4: Figure S1. Protein sub-networks identified by the miPALM algorithm and visualized in Cytoscape.
Additional file 5: Gene ontology pathways of Additional file 4: Figure S1(a) identified by the Cytoscape plugin BiNGO.
Additional file 6: Gene ontology pathways of Additional file 4: Figure S1(b) identified by the Cytoscape plugin BiNGO. Additional file 7: Gene ontology pathways of Additional file 4: Figure S1(c) identified by the Cytoscape plugin BiNGO.
Additional file 8: Gene ontology pathways of Additional file 3: Figure S2(a) identified by the Cytoscape plugin BiNGO.
Additional file 9: Gene ontology pathways of Additional file 3: Figure S2(b) identified by the Cytoscape plugin BiNGO.
Additional file 10: Gene ontology pathways of Additional file 3: Figure S2(c) identified by the Cytoscape plugin BiNGO.