Identification of plasma protein markers of allergic disease risk: a mendelian randomization approach to proteomic analysis

Background While numerous allergy-related biomarkers and targeted treatment strategies have been developed and employed, there are still signifcant limitations and challenges in the early diagnosis and targeted treatment for allegic diseases. Our study aims to identify circulating proteins causally associated with allergic disease-related traits through Mendelian randomization (MR)-based analytical framework. Methods Large-scale cis-MR was employed to estimate the effects of thousands of plasma proteins on five main allergic diseases. Additional analyses including MR Steiger analyzing and Bayesian colocalisation, were performed to test the robustness of the associations; These findings were further validated utilizing meta-analytical methods in the replication analysis. Both proteome- and transcriptome-wide association studies approach was applied, and then, a protein-protein interaction was conducted to examine the interplay between the identified proteins and the targets of existing medications. Results Eleven plasma proteins were identified with links to atopic asthma (AA), atopic dermatitis (AD), and allergic rhinitis (AR). Subsequently, these proteins were classified into four distinct target groups, with a focus on tier 1 and 2 targets due to their higher potential to become drug targets. MR analysis and extra validation revealed STAT6 and TNFRSF6B to be Tier 1 and IL1RL2 and IL6R to be Tier 2 proteins with the potential for AA treatment. Two Tier 1 proteins, CRAT and TNFRSF6B, and five Tier 2 proteins, ERBB3, IL6R, MMP12, ICAM1, and IL1RL2, were linked to AD, and three Tier 2 proteins, MANF, STAT6, and TNFSF8, to AR. Conclusion Eleven Tier 1 and 2 protein targets that are promising drug target candidates were identified for AA, AD, and AR, which influence the development of allergic diseases and expose new diagnostic and therapeutic targets. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10412-0.


Introduction
Allergic diseases arise from inappropriate initiation of type 2 immune responses against innocuous environmental antigens and include disorders such as atopic dermatitis (AD), allergic asthma (AA), allergic rhinitis (AR), allergic conjunctivitis (AC) and allergic urticaria (AU) [1,2].The development of AD in early life, followed by other allergies, such as asthma, has been described as the atopic march [3,4].Allergic disease incidence has increased over the past 3 decades to affect an estimated 433 million people worldwide and exert a considerable economic and social burden [5,6].A combination of genetics, exposure to environmental allergens and irritants, microbial interactions, and abnormal immune responses contribute to inflammation and the atopic march [7,8].However, allergic disease pathogenesis is complex and poorly understood with limited drug options to tackle the recurrent and potentially life-long nature.These observations illustrate the need for further mechanistic studies and identification of drug targets.
The human plasma proteome participates in inter-tissue communications via metabolic, signaling, and physiological pathways and includes potential drug targets [9][10][11][12][13].The link between levels of specific plasma proteins and allergic disease risk has been reported previously but residual confounders and reverse causation mean that observational studies do not demonstrate causality [14,15].Mendelian randomization (MR) uses genetic variation as an instrumental variable (IV) to explore the causal effect of exposure on outcomes.The impact of genes or proteins is difficult to interpret from isolated GWAS outcomes due to the occurrence of single nucleotide polymorphisms (SNPs).However, proteome-wide association studies (PWAS) give a clearer picture of the influence of proteins on disease [16].In addition, functional summary-based imputation software allows transcriptome-wide association studies (TWAS) to assess the impact of whole blood proteincoding gene expression on allergic disease risk [17].
The current study used a PWAS approach, combining GWAS data with protein quantitative trait locus (pQTL) and TWAS with expression quantitative trait loci (eQTL), to investigate potential proteinaceous diagnostic and therapeutic targets in allergic disease.Bayesian colocalization analysis and phenotype scanning were conducted to verify causality between candidate proteins and disease pathogenesis.A protein-protein interaction (PPI) network was constructed and pathway enrichment was performed to illuminate potential mechanisms.The detailed research flowchart is presented in Fig. 1.

Data sources and selection of IVs
Whole blood pQTL data concerning 4,657 proteins derived from 7,213 participants of European descent in the ARIC study was analyzed [18].SomaScan technology on the v.4.1 platform (SomaLogic) was used for proteomic profiling.pQTLs were filtered according to the following criteria: (1) variants had cis-acting effects, defined as being situated within a 1 Mb region up-or downstream of the gene encoding the plasma protein; (2) pQTLs met the genome-wide significance threshold of p < 5 × 10 − 8; (3) no significant linkage disequilibrium (LD) was present among pQTLs (r2 < 0.001); (4) pQTLs were located outside the major histocompatibility complex (MHC) region (chr6, 26-34 Mb).The analysis was repeated using pQTL data from 35,559 Icelandic individuals, including 4,907 plasma proteins (Icelandic Cancer Project and deCODE genetics, Reykjavík, Iceland) on SomaScan v 4.0 platform [19].
Data for individual allergic diseases, including 38,369 cases and 411,131 controls for AA, 613 cases and 464,657 controls for AC, and 1,057 cases and 482,892 controls for AU were obtained from a UKB cohort GWAS study [20].A UKB cohort from a trans-biobank meta-analysis study [21] yielded data for 22,474 cases of AD and 774,187 controls.Data on 27,415 cases of AR and 457,183 controls came from a separate GWAS study [22].Summary statistics for the allergic diseases under scrutiny were obtained from the FinnGen Biobank Analysis Consortium database (https:// finng en.gitbo ok.io/ docum entat ion/).Allergic diseases were diagnosed according to ICD-10 (International Classification of Diseases) criteria.Data sources are presented in detail in S. Table 1.

Mendelian randomization analysis
MR analysis was performed using R software (version 4.1.3)and the TwoSampleMR package (version 0.5.6)[23].Effect estimates of plasma proteins with a single SNP were generated using the Wald ratio [24] and of those with two or more SNPs by the Inverse Variance Weighted (IVW) method [25].A false discovery rate (FDR, p < 0.05) correction was applied to account for multiple comparisons in validating MR results [26].Results are expressed as odds ratios per standard deviation increase in genetically determined plasma protein levels.Considering the genetic representativeness of the results, the ARIC/UKB combination, which includes a mixed European genetic background, was chose as the discovery analysis.

Sensitivity analysis
Reverse causality between the proteins identified by discovery analysis and allergic diseases was evaluated by bidirectional MR analysis and MR Steiger test [26].Four additional methods were employed in bidirectional MR analysis to assess the IV validity focusing on the three IV principles: strong association with exposure, direct influence on the outcome through exposure, and no association with outcome confounders.Violations of these principles could compromise the accuracy of the results.MR-Egger regression slope and intercept were used to estimate pleiotropy across IVs to give an adjusted estimate independent of IV validity [27,28].MR-PRESSO was used to identify outliers causing significant pleiotropy and heterogeneity to give a corrected causal effect assessment [29].The weighted-median method enabled consistent inference even when over 50% of IVs were valid [30].MR-Robust Adjusted Profile Score (MRAPS) increased statistical power to give robust estimates in the presence of weak instrumental bias and horizontal pleiotropy [31].By default, IVW results are preferred [32], but we turn to MR-Egger when significant pleiotropy is detected by the MR-Egger pleiotropy test.If the MR-PRESSO global test identifies significant outliers, we prioritize results corrected by MR-PRESSO.A Bonferroni correction was applied to address multiple comparison errors and a corrected value of p < 0.01 was considered to indicate significance in reverse MR.Plasma proteins that appeared significant during both MR Steiger and reverse MR analysis, suggestive of reverse causality, were excluded from further analysis.
Potential links between all identified proteins and confounders were investigated via phenotype scanning using the Phenoscanner database [33] with a genome-wide significance threshold of p < 5 × 10 −8 .pQTLs linked to known allergic disease factors, indicative of pleiotropic effects, were interpreted with caution.Then, proteins identified by discovery analysis were selected for replication analysis in further multi-center MR studies.Validation alternated between the two sets of pQTL and outcome data from FinnGen and UKB (including three validation sets: ARIC/FinnGen, deCODE/UKB, and deCODE/FinnGen), using genome-wide significant SNPs as genetic instruments.The stability of causal associations was evaluated through meta-analysis with a value of I²>50% indicating significant heterogeneity, necessitating a random-effects model [34].

Bayesian colocalization analysis
Colocalization analysis [35] was employed to determine whether a specific genetic variant influenced both an exposure factor and an outcome by modulating gene expression at common loci.Bayesian analysis to calculate the posterior probability of a shared causal variant influencing two traits was performed using the R package 'coloc' (version 5.0, available at https:// github.com/ chr1s walla ce/ coloc) with a default prior probabilities: a prior probability of 1e-4 for any single SNP being associated with each trait (P1 and P2) and a prior probability of 1e-5 for a SNP being associated with both traits (P12) [36].Assuming a single causal variant, four hypotheses were considered: H0: no causal variants for either trait; H1: a causal variant for the first trait only; H2: a causal variant for the second trait only; H3: separate causal variants for both traits and H4: a shared causal variant for both traits [37].Significant colocalization was inferred when the posterior probability of H4 was > 0.8, implying strong evidence of a shared causal influence [38].

Extra validation analysis
Considering that gene expression and protein synthesis are influenced by numerous factors beyond simple genetic processes, we conducted extra validation analyses to verify the results of our discovery analysis at both the tissue and protein levels.This validation was performed using both TWAS and PWAS methods predictive of gene influence on phenotype generated by Functional Summary-Based Imputation (FUSION) software (available at http:// gusev lab.org/ proje cts/ fusion), based on the utility of GWAS summary statistics to indicate associations between GWAS phenotypes and functional phenotypes.TWAS indicated the association of protein-coding genes with allergic disease risk at the tissue level and was used as external validation analysis which utilized the pre-computed eQTL reference panel for target proteins derived from the GTEx8 (Genotype-Tissue Expression version 8) database.Likewise, the PWAS served as an internal validation analysis that integrated the GWAS summary statistics and the pre-computed plasma proteome reference weight also from the ARIC study [17] to calculate the genetic impact on allergic disease.Thus, the impact of significant SNPs from the GWAS on protein abundance could be evaluated and candidate genes linked to allergic disease that regulate plasma protein levels identified.An similar FDR corrected p value < 0.05 was the threshold of significance in the extra validation analysis.
A PPI network was constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) database (version 11.5) [39] with a minimum required interaction score (IAS) threshold of 0.4 to indicate interactions among identified proteins and pre-existing anti-allergy drug targets [40].Information on anti-allergy drug targets was sourced from the DrugBank database.

Evidence-based grading of potential drug targets
Proteins were graded according to the criteria of Feihong Ren [41].

Replicative MR and meta-analysis
A total of 1,394 novel proteins with 4,144 SNPs were identified from deCode data by a similar IV selection process (S.Table 3) and used to validate the findings of discovery analysis from the FinnGen datasets (Table 1).Replication analysis using deCode data showed that GALK1, IL1RL2, and TNFRSF6B failed to replicate for AA, as did VTA1 and TNFRSF6B for AD and FCRLB, IL1RL2 and MANF for AR.IL7R could not be replicated during any iteration and was excluded from further analysis.Meta-analysis showed robust associations for other proteins with AA except APOE (p = 0.8196; OR: 1.0152, 95% CI: 0.8918, 1.1557) and LRRC32 (p = 0.1864; OR: 0.8048, CI: 0.5832, 1.1107, S. Figure 1).Significant associations also remained for proteins with AD, apart from LRRC32 (p = 0.0990; OR: 0.6015, 95% CI: 0.3288, 1.1002, S. Figure 2).Only STAT6 and PILRA retained a significant association with AR following post-replication analysis (S. Figure 3).IL7R in AR was excluded from further analysis due to the failure of replication.

Sensitivity analysis
Steiger filtering confirmed causal relationship directionality with only the relationship of APOE to AA failing to pass the Steiger test (R2xy = 3.36 × 10-6; p = 0.878).
Further bidirectional MR analysis confirmed the significant reverse causal association between AA and APOE (OR: 2.5883, CI: 1.8731, 3.5766, S. Table 4).Consequently, AA was excluded from further analysis.
Phenotype scanning indicated potential pleiotropic effects with APOE, GALK1, ICAM1, MAX, PRSS8, and VTA1 being associated with body mass and blood lipid levels.APOE and PILRA have been previously linked to diabetes, a condition that may be comorbid with allergic diseases.Furthermore, APOE has been associated with the phenotype of maternal diabetes, a condition that correlates with a higher risk of allergic disease in offspring.In addition, ERBB3, IL1R1, IL1RL2, IL6R, LRRC32, STAT6, and TNFRSF6B have been strongly linked to allergic diseases, such as asthma and rhinitis (S.Table 5).
A PPI network was constructed from DrugBank data to illustrate interactions among anti-allergic drug targets and proteins of interest (S.Table 7).Interactions were found between STAT6, TNFRSF6B, IL1RL2, IL6R, and established drug targets, as evidenced by an IAS greater than 0.4, in the PPI network for AA (S. Figure 8).Similarly, interactions were found for ERBB3, IL6R, and MMP12 in the AD-specific PPI network (S. Figure 9) and for ERBB3, ICAM1, IL1RL2, MANF, STAT6 and TNFSF8 in the AR-specific PPI network (S. Figure 10).Further gene-disease enrichment analysis revealed STAT6 was enriched in various allergic diseases, most notably in AA, with a strength of 1.86 and an FDR P-value of 1.8e-13.IL1RL2, IL6R, ERBB3, ICAM1, IL1R1, and GALK1 were also enriched in categories like immune system disease, autoimmune disease, skin disease, and disease of anatomical entity (S.Table 8).We also conducted KEGG and GO enrichment analyses on genes within the PPI network.With the exception of GALK1, NPNT, PRSS8, VTA1, LRRC32, PILRA, MMP12, MANF, TNFSF8, and FCRLB, genes of other proteins were successfully enriched in multiple pathways, particularly STAT6, IL6R, and IL1R1 proteins, which were involved in several immune and inflammation-related pathways.Except for FCRLB, MANF, and TNFSF8, other proteins were all successfully enriched in specific biological processes (see S.Table 9, and 10 and S. Figure 11).

Potential drug targets
Ultimately, 11 proteins for allergic asthma (AA), 6 for atopic dermatitis (AD), and 9 for allergic rhinitis (AR) were evaluated for their potential as drug targets.Among these, STAT6 was identified as excellent Tier 1 potential drug targets for AA as were CRAT and TNFRSF6B for AD.No Tier 1 proteins were identified for AR.Tier 2 proteins included TNFRSF6B, and IL1RL2 for AA; ERBB3, IL6R, MMP12, ICAM1 and IL1RL2 for AD, and ICAM1, IL1RL2, MANF, STAT6 and TNFSF8 for AR.Other proteins were assigned to Tier 3 or below (Table 2).

Discussion
To the best of our knowledge, this study is the first to scrutinize causal associations between plasma proteins and allergic diseases through the integrated approach of MR, colocalization, Steiger filtering analysis, PAV assessment, eQTLs overlap determination, PPI analysis, pathway enrichment, and drug target evaluation.Eleven plasma proteins were identified with links to AA, AD, and AR.MR analysis and extra validation revealed STAT6 and TNFRSF6B to be Tier 1 and IL1RL2 and IL6R to be Tier 2 proteins with the potential for AA treatment.Two Tier 1 proteins, CRAT and TNFRSF6B, and five Tier 2 proteins, ERBB3, IL6R, MMP12, ICAM1, and IL1RL2, were linked to AD and three Tier 2 proteins, MANF, STAT6, and TNFSF8, to AR.
Many novel biomarkers have been identified by proteomic and metabolomic analyses, although studies on allergic diseases have generally used low throughput methods [42].Niet-Fontarigo et al. [43] identified 18 potential biomarkers of asthma phenotype and disease severity, including HSPG2 and IGFALS for AA, through a bottom-up/non-targeted proteomics approach.The sample size was modest with 32 healthy controls, 43 AR patients, and 192 asthmatics and failed to distinguish protein biomarkers from pathogenic factors for allergic disease due to the reverse causal characteristics of an observational study.
Several proteins identified by the current study have previously been linked to allergic disease by epidemiological or laboratory studies.Indeed, STAT6 is known to participate in IL-4 signaling and its role in asthma has been extensively studied since both doctor-diagnosed asthma and blood eosinophil counts are known to be linked to STAT6 signaling and the IL-1 receptor family [44].Baris S et al. [45] have identified a novel inborn error of immunity arising from a STAT6 gain-of-function mutation causing severe allergic dysregulation which is treated by Janus kinase inhibitor therapy.The TNFRSF6 (also called Fas) receptor binds TNFSF6 (FasL) ligands expressed on CD8 + T cells and oligodendrocytes [46][47][48].Th1 cells secrete IFN-γ to activate the Fas/FasL system and induce keratinocyte apoptosis in the spongiosis area which may influence the progression of AD.INF-γ and Fas ligand are secreted by activated CD4 + T  cells, TNFRSF6 expressed on keratinocytes, and tumor necrosis factor (TNF)-α secreted by both the activated CD4 + T cells and keratinocytes, with cell-mediated cytotoxicity induced by perforin and granzyme B released by CD8 + T cells [46][47][48].
The IL-36 receptor (IL-36R, IL-1Rrp2, IL1RL2, or IL-1R6) binds all α, β, and γ members of the IL-36 family.IL-36R is expressed in skin, mammary, and mucosal epithelial cell lines and IL-36 mediates intracellular signaling through the IL-36R and IL-1 receptor accessory protein (IL-1RAcP) [49].IL-36α, IL-36β, and IL-36γ bind IL-36R, form a signal transduction complex with IL-1RAcP, and recruit myeloid differentiation factor 88 to activate mitogen-activated protein kinases mediated by c-Jun N-terminal kinase, extracellular regulated protein kinases 1/2 and the nuclear factor kappa B pathway.The resulting inflammatory mediators have roles in adaptive immunity.IL-36 cytokines released from keratinocytes increase the immunoglobulin (Ig)E production mediated by IL-4 in B cells from AD patients and treatment with anti-IL-36R antibodies decreases IgE and alleviates the disease phenotype [50,51].The RNA helicase, DDX5, which regulates the alternative splicing of IL-36R pre-mRNA was found to be down-regulated in keratinocytes from AD patients which promoted the inflammatory response [52].TNFSF8 (CD30L) is a ligand for the cell surface antigen and marker for Hodgkin lymphoma and related hematologic malignancies, TNFRSF8/CD30.It is considered that inhibition of CD30L/CD30 signaling may constitute a novel biological therapy for AR, since CD30L was shown to amplify Th2 cell effector response in animal models of AR.In vivo treatment with anti-CD30 antibody suppressed AR development and this may be a sufficient target for the treatment of allergic inflammation [53].Some novel proteins were suggested to have potential causal effects on allergic diseases by the current work.For example, CRAT is a mitochondrial enzyme that transfers acetyl groups between CoA and carnitine during lipid metabolism and links with dermatitis have not been previously reported.CRAT is a key regulator of mitochondrial dysfunction-induced cellular senescence in dermal fibroblasts [54,55].Silencing of CRAT is known to cause mitochondrial dysfunction, inflammation and senescence via activation of the cGAS-STING and NF-ĸB pathways [54].In addition, functional variants of the IL6R have been linked to increased risk of AA and AD but mechanisms remain unclear, although IL-6/soluble IL-6R trans-signaling may affect AD and AA development [56][57][58].Genetic variants of ERBB3 have been identified as AD susceptibility factors and serum MMP12 may be an indicator of AD and AR disease pathways [57,[59][60][61].Adhesion molecules are known to be involved in T cell homing to skin lesions in AD patients, one example being ICAM-1 which is highly expressed and may have a pathogenic role [62,63].Lastly, little attention has been paid to MANF, although the protein is measurable in serum and reflective of extracellular biomarkers in AD [64].However, these putative mechanisms are speculative and experimental mechanistic studies are required to extend the findings of the present study.
We acknowledge several limitations to the current study.First, the current focus was on serum proteins which may differ from those within cells and tissues and should also be explored for disease associations.Second, European participants accounted for the vast majority of the current cohort and findings may not be generalizable to populations with different ethnicities.Thirdly, a cis-pQTL coding variant might change a protein's amino acid sequence without necessarily impacting its function or level.Equating sequence alterations with functional changes could lead to incorrect conclusions.Caution should be exercised in the interpretation of these results.Last, publicly available datasets were used and represent data resources for target identification which are not new, although novel insights and perspectives may be drawn from them.

Conclusion
A MR analysis was conducted to explore the proteomic pathogenesis of allergic disease.Examples of Tier 1 and 2 protein targets that are promising drug target candidates were STAT6, TNFRSF6B, IL1RL2, and IL6R for AA; CRAT, TNFRSF6B, ERBB3, IL6R, MMP12, ICAM1 and IL1RL2 for AD, and ICAM1, IL1RL2, MANF, STAT6 and TNFSF8 for AR.These proteins may influence the development of allergic diseases and expose new diagnostic and therapeutic targets.Further experiments are required to validate the current findings regarding proteinaceous allergic disease markers.

Tier 1
Targets: substantial evidence (PPH4 > 0.8) for drug targeting, confirmed by replication analysis and extra TWAS or PWAS validation.Tier 2 Targets: direct linkage to known drug targets within the PPI network, validated either by replication analysis or extra TWAS or PWAS.Tier 3 Targets: proteins with a PPH4 > 0.8, validated by either replication analysis or extra TWAS criteria or linked to known drug targets within the PPI network.Tier 4 Targets: Proteins not classified under the first three tiers.

Fig. 2 Fig. 3
Fig. 2 Manhattan plot illustrating the the chromosomal distribution of identified plasma proteins for allergic diseases.The standard line in the plot represents the threshold of FDR P = 0.05.A Allergic asthma; B Allergic conjunctivitis; C Atopic dermatitis; D Allergic rhinitis; E Allergic urticaria

Fig. 4
Fig. 4 Forest plot of MR results from the discovery analysis.AA: Allergic asthma; AD: Atopic dermatitis; AR: Allergic rhinitis

Table 1
Detailes of the discovery analysis and replicative validations

P Outcome ARIC and UKB ARIC and FinnGen Decode and UKB Decode and FinnGen Meta-analysis
OR Odds ratio, FDR False discovery rate, AA Allergic asthma, AD Atopic dermatitis Table 1 (continued)

Table 2
Assessment of druggability and evidence grading

analysis of Replication analysis Colocalization analysis External validation Steigier direction test PPI network Levels of
Tier 1 Targets: These are proteins backed by substantial evidence (PPH4>0.8)and confirmed through both replication analysis and external TWAS or PWAS validation Tier 2 Targets: Proteins in this category are directly linked to known drug targets within the PPI network and have been validated either by replication analysis or through external TWAS or PWAS Tier 3 Targets: This tier includes proteins with a PPH4>0.8 that either meet the replication analysis or external TWAS validation criteria, or are solely linked to known drug targets within the PPI network Tier 4 Targets: Proteins not classified under the first three tiers fall into this category OR Odds ratio, FDR False discovery rate, MM Malignant melanoma, SCC Squamous-cell carcinoma, BCC Basal cell carcinoma