Heritability and genome-wide association analyses of fasting plasma glucose in Chinese adult twins

Background Currently, diabetes has become one of the leading causes of death worldwide. Fasting plasma glucose (FPG) levels that are higher than optimal, even if below the diagnostic threshold of diabetes, can also lead to increased morbidity and mortality. Here we intend to study the magnitude of the genetic influence on FPG variation by conducting structural equation modelling analysis and to further identify specific genetic variants potentially related to FPG levels by performing a genome-wide association study (GWAS) in Chinese twins. Results The final sample included 382 twin pairs: 139 dizygotic (DZ) pairs and 243 monozygotic (MZ) pairs. The DZ twin correlation for the FPG level (rDZ = 0.20, 95% CI: 0.04–0.36) was much lower than half that of the MZ twin correlation (rMZ = 0.68, 95% CI: 0.62–0.74). For the variation in FPG level, the AE model was the better fitting model, with additive genetic parameters (A) accounting for 67.66% (95% CI: 60.50–73.62%) and unique environmental or residual parameters (E) accounting for 32.34% (95% CI: 26.38–39.55%), respectively. In the GWAS, although no genetic variants reached the genome-wide significance level (P < 5 × 10− 8), 28 SNPs exceeded the level of a suggestive association (P < 1 × 10− 5). One promising genetic region (2q33.1) around rs10931893 (P = 1.53 × 10− 7) was found. After imputing untyped SNPs, we found that rs60106404 (P = 2.38 × 10− 8) located at SPATS2L reached the genome-wide significance level, and 216 SNPs exceeded the level of a suggestive association. We found 1007 genes nominally associated with the FPG level (P < 0.05), including SPATS2L, KCNK5, ADCY5, PCSK1, PTPRA, and SLC26A11. Moreover, C1orf74 (P = 0.014) and SLC26A11 (P = 0.021) were differentially expressed between patients with impaired fasting glucose and healthy controls. Some important enriched biological pathways, such as β-alanine metabolism, regulation of insulin secretion, glucagon signaling in metabolic regulation, IL-1 receptor pathway, signaling by platelet derived growth factor, cysteine and methionine metabolism pathway, were identified. Conclusions The FPG level is highly heritable in the Chinese population, and genetic variants are significantly involved in regulatory domains, functional genes and biological pathways that mediate FPG levels. This study provides important clues for further elucidating the molecular mechanism of glucose homeostasis and discovering new diagnostic biomarkers and therapeutic targets for diabetes.


Background
Diabetes, as a chronic and metabolic disease, can cause serious damage to the blood vessels, heart, kidneys, nerves and eyes. This condition is one of the leading causes of death worldwide, and higher fasting plasma glucose (FPG) levels, even if below the diagnostic threshold of diabetes, can also lead to increased morbidity and mortality. Diabetes and higher-than-optimal FPG level together leaded to 3.7 million deaths from 1980 to 2014 worldwide [1]. Therefore, it is important to elucidate the underlying pathogenesis of increased FPG levels.
Chinese population are different from other ethnic populations in the aspect of genetic constitutions. Genetically related individuals (e.g. twins) will greatly increase the power of genetic association analysis and effectively identify the genetic variants potentially associated with complex traits [28]. Here, we performed this twin-based genetic epidemiological study to evaluate the magnitude of the genetic influence on FPG variation and further conducted a GWAS to identify specific genetic variants related to the FPG level in a sample of 382 Chinese twin pairs.
After adjustment for the effect of covariates, the DZ twin correlation for the FPG level (r DZ = 0.20, 95% CI: 0.04-0.36) was much lower than half of the MZ twin correlation (r MZ = 0.68, 95% CI: 0.62-0.74), suggesting the genetic effect on the FPG level (Additional file 2).

Gene-based analysis
Although none of the genes reached the genome-wide significance level (P < 2.63 × 10 − 6 ), a total of 1007 genes were nominally associated with the FPG level (P < 0.05). The top 20 genes ranked by P-values are shown in Table 3. Several genes potentially related to FPG levels, including BRAT1, TSPO, SLC2A12, KCNK5, PTPRA, ADCY5, PCSK1, and VPS16, were found.

Pathway enrichment analysis
A total of 719 biological pathways were nominally associated with the FPG level (emp-P < 0.05) were found, and the top 30 pathways are shown in Table 4. The important pathways were mainly involved in β-alanine metabolism, regulation of insulin secretion, glucagon signaling in metabolic regulation, IL-1 receptor pathway, signaling by platelet derived growth factor (PDGF), cysteine and methionine metabolism, etc.

Validation analysis
The gene expression levels of 25 genes in patients with impaired fasting glucose (IFG) and healthy controls (Additional file 5) were tested by the Wilcoxon rank sum method, and C1orf74 (P = 0.014) and SLC26A11 (P = 0.021) were differentially expressed between the two independent groups.

Discussion
In this study, we evaluated the genetic contributions to FPG variation by twin modelling analyses and further identified the genetic variants associated with FPG levels by GWAS. We found that the heritability of FPG was 0.68, which was consistent with the previously reported range (0.22-0.71) in mainland China [16,[30][31][32][33][34]. Even no SNPs reached the genome-wide significance level, 28 SNPs showed suggestive evidence of an association with the FPG level. We found one promising genetic region (2q33.1) where 17 suggestive SNPs were linked to SPATS2L. SPATS2L might indirectly affect FPG levels by regulating the protein expression of β 2 -adrenergic receptors [29] that could increase glucose uptake [35,36]. In addition, SPATS2L was the topmost gene in the gene-based analysis. Thus, SPATS2L may serve as candidate gene to be further validated and a potential biomarker for diabetes.
Post-imputation analysis revealed that one SNP, rs60106404, was significantly associated with the FPG level. This SNP is located at an important gene, SPATS2L, that has been discussed above. Furthermore, more than 200 SNPs were found to reach the level of a suggestive association. We compared our results with previously reported SNPs [25][26][27][37][38][39][40] and found that 8 SNPs could be replicated, indicating our results were credible.
In the gene-based analysis, 1007 genes were nominally associated with FPG levels. Several interesting genes might influence FPG levels through the following mechanisms: (1) BRAT1 deficiency could lead to increased glucose consumption [41]; (2) TSPO expression plays an important role in maintaining healthy adipocyte functions, and the activation of TSPO in adipocytes could improve glucose uptake [42]; (3) SLC2A12, a member of the solute carrier family, catalyzes the uptake of sugars through facilitated diffusion [43]; (4) the proteins encoded by KCNK5 could influence the homeostasis of glucose by regulating insulin secretion [44]; (5) the protein encoded by the PTPRA gene is a member of the protein tyrosine phosphatase (PTP) family. PTPRA might play a role in insulin signaling as a negative regulator and further influence glucose homeostasis [45]. Moreover, the association between PTPRA and FPG levels has previously been reported [26]; (6) ADCY5 plays a role in the normal regulation of insulin secretion [46], which might influence FPG levels. In addition, ADCY5 has been previously reported to be associated with FPG levels [25,27]; (7) the protein encoded by PCSK1 is prohormone convertase 1/3 (PC1/3), which is essential to activate some peptide hormone precursors involved in regulating glucose homeostasis [47], and its association with FPG levels has also been previously reported [27]; (8) although the association of VPS16 with FPG levels has been previously reported [26], its function in glucose metabolism is still unclear. However, other genes, especially the top 20 genes, were currently have unknown Manhattan plot of GWAS based on imputed SNP data. The x-axis shows the numbers of autosomes and the X chromosome, and the y-axis shows the -log 10 of P-values for statistical significance. The dots represent the SNPs functions in glucose metabolism, and they may be potential candidate genes that need to be researched and validated in the future.
In addition, we tested the gene expression levels of several top genes in IFG cases and healthy controls, and found that C1orf74 and SLC26A11 were differentially expressed. SLC26A11 was involved in the transport of glucose and other sugars, bile salts and organic acids, metal ions and amine compounds, as indicated by the Gene-Cards database, while the mechanism of C1orf74 involved in blood glucose metabolism still needs to be explored.
The pathway enrichment analysis identified some important FPG-associated biological pathways: (1) β-alanine could significantly decrease glycolytic metabolism and change glycolytic-related gene expression [48]; (2) glucagon binding to its receptor could activate adenylate cyclase and improve cyclic adenosine monophosphate (cAMP) levels, which could promote insulin secretion [49][50][51]; (3) the IL-1R signaling system can regulate glucose homeostasis by sustaining the health and function of islet β-cells. When pancreatic IL-1R signaling is absent, the whole-body glucose homeostasis is disrupted [52]; (4) in the presence of sufficient PDGF receptor, PDGF can activate protein kinase B and result in the transportation of glucose transporter 4 (GLUT 4) to the surface of the cell, which finally promotes the absorption of glucose and produces an insulin-like effect [53][54][55]; (5) experimental and clinical studies have indicated that cysteine affects the regulation of insulin secretion and glucose levels. In addition, methionine could improve insulin sensitivity [56]; (6) PIPs can be phosphorylated by phosphatidylinositol 3-kinase to produce PIP3, which is involved in the insulin secretion signaling system by activating a PH-containing signaling protein such as protein kinase B [57,58].   The strength of twin samples in our study was observed. The variation of human phenotype may be due to effects of genetic structure, gender, age and certain environmental exposures. Twin samples, as genetically related individuals, will highly increase the power of genetic association analysis and effectively find the genetic variants potentially associated with complex traits [28]. Hence, our results would be more credible.
Nevertheless, this study also has some limitations. This study was with a relatively small sample size because of the difficulties of recruiting and identifying qualified twin pairs. However, our results could still provide useful clues for hypotheses to be further replicated and validated while exploring the molecular mechanism of diabetes. Considering that the genetic influence on FPG variation is expected to be comprised of a lot of SNPs, a meta-analysis with a larger number of samples will be an ideal and desirable method.

Conclusions
Our study has confirmed the significant contribution of genetic effects on FPG variation. The FPG level is highly heritable in the Chinese population, and some genetic variants are involved in regulatory domains, functional genes and biological pathways that mediate FPG levels. The results provide important clues for further elucidating the molecular mechanism of glucose homeostasis. The potential candidate biomarkers of FPG level presented here merit further verification.

Materials and methods
The main materials and methods of this study were similar to our previously published studies [59,60].

Participants
Briefly, we collected samples through the Qingdao Twin Registry (QTR) [61,62]. All twins took a questionnaire (Additional file 6) and underwent a health examination. We tested the FPG level by using the semiautomatic analyzer (Hitachi 7600, Japan). Twins who were pregnant or lactating, took hypoglycaemic agents, or used insulin were eliminated. We also dropped incomplete twin pairs. Finally, a total of 382 twin pairs (139 DZ pairs and 243 MZ pairs) aged 18 years or older were included in the heritability analysis and the subset of 139 DZ twin pairs was further included in the GWAS. All participants signed the written informed consent.

Genotyping, quality control and imputation
Briefly, we firstly genotyped DNA samples, and then conducted quality control analysis [63]. 1,365,181 SNPs were obtained for subsequent SNP-based analysis. IM-PUTE2 software [64] was used to impute untyped SNPs [65], and 7,405,822 SNPs were finally obtained.

Heritability analysis
Genetic analysis were conducted by using Mx programme [66]. The FPG level was transformed following Blom's formula for normality. Pearson's product-moment correlation coefficient was calculated to measure intraclass phenotypic correlations per zygosity. When the correlation of DZ twins (r DZ ) was much lower than half of that of MZ twins (r MZ ), the ADE model was taken into account.
In the classical twin design, the phenotypic variation was decomposed into that due to additive genetic (A), dominant genetic (D), and unique environmental or residual (E) influences. Standard structural equation modeling (SEM) methods were used to estimate the A, D, and E components while adjusting for age, sex, and BMI. We performed a likelihood ratio test to compare the performances of the full ADE model and its nested model, i.e., AE model. And the better fitting model was chosen according to the parsimonious principle and a lower Akaike's information criterion (AIC) value [67]. The power of this classical twin design was above 90%, which was computed based on the sample size combining significance level α and degree of freedom by the Mx programme.

Gene-based analysis
We performed this analysis by using VEGAS2 tool [71,72]. The genes showing more signal or strength of association than expected by chance were found. P < 2.63 × 10 − 6 was set as genome-wide statistical significance.

Pathway enrichment analysis
We used the PASCAL tool to compute the gene and pathway scores [73]. Individual SNPs from GWAS were firstly mapped to genes involved in each pathway. The default parameter values were employed, including all markers inside the gene ±50 flanking kb and the maximum number of 3000 SNPs per gene. Then the gene scores for all genes in one pathway were computed and a joint score was estimated. At last, the pathway scores were computed, and the pathway enrichment of highscoring genes was evaluated through two parameter-free procedures, e.g., chi-square and empirical scores. Bio-Carta, KEGG and Reactome were selected in the MSigDB database to obtain pathways and corresponding gene annotation [74].

Validation analysis
The blood samples of 8 subjects (4 cases and 4 healthy controls) were sequenced to obtain the gene expression data. The cases were defined as those with FPG ≥ 6.1 mmol/L, i.e., IFG status, and the healthy controls were defined as those with FPG ≤ 4.7 mmol/L. Then, the expression levels of 25 genes, i.e., the genes where the top SNPs were located and the top 20 genes in the VEGAS2 analysis, between the two independent groups were compared by the Wilcoxon rank sum test. We set the Pvalue < 0.05 as statistically significant.