Skip to main content

Identification of hub genes based on integrated analysis of single-cell and microarray transcriptome in patients with pulmonary arterial hypertension

Abstract

Background

Pulmonary arterial hypertension (PAH) is a devastating chronic cardiopulmonary disease without an effective therapeutic approach. The underlying molecular mechanism of PAH remains largely unexplored at single-cell resolution.

Methods

Single-cell RNA sequencing (scRNA-seq) data from the Gene Expression Omnibus (GEO) database (GSE210248) was included and analyzed comprehensively. Additionally, microarray transcriptome data including 15 lung tissue from PAH patients and 11 normal samples (GSE113439) was also obtained. Seurat R package was applied to process scRNA-seq data. Uniform manifold approximation and projection (UMAP) was utilized for dimensionality reduction and cluster identification, and the SingleR package was performed for cell annotation. FindAllMarkers analysis and ClusterProfiler package were applied to identify differentially expressed genes (DEGs) for each cluster in GSE210248 and GSE113439, respectively. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) were used for functional enrichment analysis of DEGs. Microenvironment Cell Populations counter (MCP counter) was applied to evaluate the immune cell infiltration. STRING was used to construct a protein-protein interaction (PPI) network of DEGs, followed by hub genes selection through Cytoscape software and Veen Diagram.

Results

Nineteen thousand five hundred seventy-six cells from 3 donors and 21,896 cells from 3 PAH patients remained for subsequent analysis after filtration. A total of 42 cell clusters were identified through UMAP and annotated by the SingleR package. 10 cell clusters with the top 10 cell amounts were selected for consequent analysis. Compared with the control group, the proportion of adipocytes and fibroblasts was significantly reduced, while CD8+ T cells and macrophages were notably increased in the PAH group. MCP counter revealed decreased distribution of CD8+ T cells, cytotoxic lymphocytes, and NK cells, as well as increased infiltration of monocytic lineage in PAH lung samples. Among 997 DEGs in GSE113439, module 1 with 68 critical genes was screened out through the MCODE plug-in in Cytoscape software. The top 20 DEGs in each cluster of GSE210248 were filtered out by the Cytohubba plug-in using the MCC method. Eventually, WDR43 and GNL2 were found significantly increased in PAH and identified as the hub genes after overlapping these DEGs from GSE210248 and GSE113439.

Conclusion

WDR43 and GNL2 might provide novel insight into revealing the new molecular mechanisms and potential therapeutic targets for PAH.

Peer Review reports

Introduction

Pulmonary artery hypertension (PAH) is a chronic severe progressive cardiopulmonary disease characterized by pulmonary arterial pressure elevation and right ventricular hypertrophy [1]. The prevalence of PAH is 10.6 per million adults in America nowadays [2]. Despite the benefits of treatments targeting nitric oxide, prostacyclin, and endothelin pathways to delayPAH progression and improve survival, only lung transplantation is considered a curative approach [3]. PAH remains an incurable chronic disease with a poor prognosis [4]. Vasoconstriction, obstructive pulmonary vasculopathy characterized by hyperproliferation and anti-apoptosis phenotype of PASMCs, excessive fibrosis, inflammation, thrombosis, and altered mitochondrial metabolic all participated in the mechanisms implicated in PAH [5]. However, there remains largely unexplored on the pathogenesis of PAH. Therefore, systematic analysis of the function of different cell types in the pulmonary tissue of PAH patients might help deepen understanding of the pathological mechanism of PAH.

Microarray transcriptome has been increasingly and widely used to examine gene expression in PAH [6, 7]. However, data of microarray transcriptome represents the average gene expression amounts of various cells at the whole level of tissue [8]. Lung tissues contain various cell types, including smooth muscle cells, endothelial cells, fibroblasts, immune cells, inflammatory cells, etc. They play different roles throughout the development of PAH. Currently, a novel single-cell RNA sequencing (scRNA-seq) technology is emerging to investigate cell heterogeneity, characterize each cell subpopulation, and putative intracellular communication [9, 10]. This innovative technology has advanced our understanding of PAH at the cell subpopulation level. scRNA-seq has been carried out in lung samples of both PAH rodent models and PAH patients. Previous research reported NF-κB signaling activation in immune cells of monocrotaline and hypoxia-induced PH rat model [11]. Based on the scRNA-seq data of lung ECs from hypoxic pulmonary hypertension mice, Julie and his colleagues indicated CD74 was involved in the regulation of endothelial cell proliferation and barrier integrity [12]. However, scRNA-seq data on PAH is relatively small and still in its infancy currently.

In the present study, integrated bioinformatics analysis of scRNA-seq and microarray transcriptome data from the GEO dataset was analyzed to identify the hub genes in PAH. Differentially expressed genes (DEGs) from GSE210248 and GSE113439 were identified and common DEGs were selected. Protein-protein interaction network (PPI) network was constructed using the aforementioned DEGs, followed by hub gene selection through Cytoscape software. Finally, GNL2 and WDR43 were identified as hub genes, which might provide new insight into the pathogenesis of PAH and act as novel candidates and therapeutic targets for PAH.

Materials and methods

Data acquisition

Data were all processed and analyzed by R software (Version 4.3.0). Both scRNA-seq (GSE210248) and microarray transcriptome (GSE113439) data were obtained from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) database [13] and downloaded through the GEO query package (Version 2.68.0). GSE210248 and GSE113439 were selected in the current research because the samples in the two datasets were obtained from the lung/pulmonary arteries of participants with pulmonary hypertension, rather than the PAH rodent model. Additionally, GSE11339 has a relatively large The details of the two datasets enrolled in this study were listed in Table 1. The GSE210248 scRNA-seq data and GSE113439 array data were generated on GPL20301 Illumina Hiseq 4000 and GPL6244 Affymetrix Human Gene 1.0 ST Array platform, respectively. GSE210248 data included pulmonary arteries from 3 PAH patients and 3 healthy donor control. The dataset contains 19,576 cells from the control group and 22,704 cells from the PAH group. The data of GSE113439 included fresh frozen lung samples from the recipients’ organs of 15 PAH patients and 11 normal lung samples obtained from tissue flanking lung cancer resections.

Table 1 Overview of the enrolled datasets in the current study

Processing of scRNA-seq data

Seurat package (Version 4.3.0) was used for quality control. Cells with 200–2500 genes and < 5% mitochondrial genes were selected for consequent analysis. A total of the 19,576 cells in control group and 21,896 cells in the PAH group were screened out for analysis. Data of genes was further normalized using the “LogNormalize” method and further scaled. Then, the top 2000 highly variable genes (HVGs) were identified by the FindVariableFetures function with the “vst” method. Subsequently, principal component analysis (PCA) was applied to identify significant principal components (PCs), and the p-value was visualized using the JackStraw and ScoreJackStraw functions. Uniform manifold approximation and projection (UMAP) was utilized for dimensionality reduction with 20 PCs and cluster identification across these cells. “Harmony” R package was used for batch correction to avoid the batch effect of sample identity which might disrupt the downstream analysis [14]. SingleR package (Version 2.2.0) [15] was utilized for cell annotation according to the reference datasets HumanPrimaryCellAtlasData [16] and BlueprintEncodeData [17]. FindAllMarkers analysis with |log2 fold change (FC)|> 1 and adjusted p value < 0.05 were performed to screen out the differentially expressed genes (DEGs) for each cell cluster. scRNAtoolVis package (Version 0.0.5) was performed to display the top DEGs and visualized by jjvolcano.

Processing of microarray transcriptome data

DEGs between the control and PAH groups with an adjusted p value < 0.05 were screened out using the limma package (Version 3.56.2) [18]. All DEGs were visualized using the volcano plot and the top 50 DEGs were visualized through the heatmap plot in the “ggplot2” package.

Functional enrichment analysis

Gene Ontology (GO) [19] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [20] analysis were carried out by the clusterprofiler package [21]. GO enrichment included 3 subontologies: biological process (BP), molecular function (MF), and cellular component (CC) [19]. P < 0.05 is considered statistically significant.

Microenvironment Cell Populations counter (MCP counter)

The infiltration of microenvironment immune cells including B lineage, CD8 T cells, cytotoxic lymphocytes, endothelial cells, monocytic lineage, myeloid dendritic cells, neutrophils, NK cells, and T cells was quantified through the MCP counter (Version 1.2.0) based on scRNA-seq data [22].

PPI network construction and identification of hub genes

DEGs in scRNA-seq and microarray transcriptome data were screened by FindAllMarkers analysis in the Seurat package and the limma package. Subsequently, protein–protein interaction (PPI) networks were constructed for the prediction of internal connection among the picked DEGs using the STRING database (Version 11.5, https://string-db.org/) with an interaction conference score set to 0.4 [23]. Then, hub genes were screened out and network visualization was performed using Cytoscape software (Version 3.10.0) [24]. The Molecular Complex Detection (MCODE) plug-in was used to build clustering function modules in the PPI network. Then, the CentiScaPe plug-in was used to calculate the degree, betweenness, and centroid value of each gene within the network. CytoHubba plug-in was used for ranking nodes in the target network using Maximal Clique Centrality (MCC) methods. The Venn diagram was produced by the jvenn website (https://jvenn.toulouse.inrae.fr/app/example.html) for gene overlapping and common gene selection.

Results

ScRNA profiling in PAH

The scRNA-seq data of GSE210248 was downloaded from the GEO database and analyzed through R software. In general, 42,280 cells comprising 19,576 cells from donors (control) and 22,704 cells from PAH patients were included. After filtrating improper gene amounts or mitochondrial genes ≥ 5%, 19,576 cells from donors and 21,896 cells from PAH patients remained. Figure 1A presented the expression characteristics of each sample. As shown in Fig. 1B, nCount_RNA (the number of unique molecular identifiers) was positively correlated with nFeature_RNA (the number of genes) with a correlation coefficient of 0.93. Figure 1C displayed and labeled the top 10 HVGs: SFTPC, CCL21, SFTPA1, IGKC, STFPA2, STFPB, PGC, TPSB2, TPSAB1, S100A12. The top 20 PCs identified by PCA were visualized by JackStrawPlot (Fig. 1D). In addition, the top 10 DEGs in each cluster were presented by heatmap and labeled in yellow (Fig. 1E).

Fig. 1
figure 1

Single-cell RNA sequencing analysis of GSE210148 in PAH. A The features, counts, and mitochondrial gene percentage of each sample. B Correlation between genes and counts in each sample. C HVGs were colored in red, and the top 10 HVGs were labeled. D PCs selection using JackStraw function. E Heatmap of top 10 DEGs in each cluster. The top 10 DEGs were labeled in yellow color

Cell clusters identification in scRNA-seq

Forty-one thousand four hundred seventy-two cells were divided into 42 cell clusters and visualized through UMAP, and cell annotation was performed by SingleR package (Fig. 2A). The number of cells in some clusters was too small, therefore, 10 clusters (Adipocytes, CD4+ T cells, CD8+ T cells, chondrocytes, endothelial cells, epithelial cells, fibroblasts, macrophages, monocytes, and NK cells) with the top 10 cell amounts were selected for subsequent analysis. Cell numbers of each cluster were shown in Table 2. The distribution of each cluster in the selected 10 clusters was presented in Fig. 2B, and the results of cell cluster distribution grouped by control and PAH were displayed in Fig. 2C. Additionally, the number and proportion of cells in each sample were exhibited in Fig. 2D. In comparison with the control group, the proportion of adipocytes (39.4% vs. 5.6%) and fibroblasts (18.1% vs. 2.5%) was significantly reduced in the PAH group, while CD8+ T cell (3.3% vs. 50.0%) and macrophages (4.3% vs. 14.8%) were notably increased in PAH lung tissues compared with donors.

Fig. 2
figure 2

Clustering and annotation of single-cell RNA sequencing data. A UMAP visualization of PAH and donor groups. B UMAP visualization for the top 10 cell clusters. C UMAP visualization for the top 10 cell clusters in PAH and donor group. D Cluster distribution with the average cell number and cell proportion in each sample

Table 2 Cell numbers in each cluster

DEGs of each cluster in GSE210248

The DEGs of each cluster between the control and PAH groups were identified using the FindMarkers function. The top 10 DEGs in each cluster were listed in Table 3. For instance, GZMB, ATP5E, GNB2L17, GKN2, ATP5E6, SLPI5, SELM2, SFTPC3, SFTPC2, and GNB2L11 were the most significant DEG in NK cells, macrophages, fibroblasts, monocytes, epithelial_cells, endothelial_cells, chondrocytes, CD8+ T-cells, CD4+ T-cells, and adipocytes based on adjusted p-value. scRNAtoolVis package was further performed to intuitively illustrate the top 5 upregulated and the top 5 downregulated genes in the PAH group compared with the control group and visualized by jjvolcano (Fig. 3).

Table 3 The top 10 DEGs in each cell cluster between control and PAH group
Fig. 3
figure 3

The top 5 upregulated and top 5 downregulated DEGs in the PAH group compared with the control group using the jjVolcano map

DEGs of pulmonary tissue in GSE113439

The limma package was utilized to explore the DEGs in lung samples of 11 control and 15 PAH patients. DEGs with |logFC|> 0.856 and adjusted p-value < 0.05 were presented in Fig. 4A. Compared with the control group, 828 genes were found upregulated, and 169 genes were downregulated in the lung tissue of PAH patients. A Heatmap of the top 50 DEGs was shown in Fig. 4B. The majority of DEGs were upregulated, only the gene GPR146 was found downregulated among these top 50 DEGs. The results of KEGG functional enrichment analysis were shown in Fig. 4C. These upregulated DGEs were enriched in ribosome biogenesis in eukaryotes, herpes simplex virus 1 infection, RNA transport, homologous recombination, cell cycle, proteoglycans in cancer, aminoacyl-tRNA biosynthesis, spliceosome, fatty acid metabolism, small cell lung cancer, etc. The downregulated DEGs were enriched in systemic lupus erythematosus, Notch signaling pathway, hypertrophic cardiomyopathy, alcoholism, asthma, vascular smooth muscle constriction, cAMP signaling pathway, cardiac muscle contraction, breast cancer, and adrenergic signaling in cardiomyocytes. The cell component (CC), biological process (BP), and molecular function (MF) of GO enrichment analysis were presented in Fig. 4D-F. The top 10 enriched pathways in CC included chromosomal region, nuclear speck, spindle, microtubule, condensed chromosome, chromosome, centromeric region, spindle pole, mitotic spindle, midbody, and centriole; The top 10 enriched pathways in BP included chromosome segregation, organelle fission, nuclear division, ribonucleoprotein complex biogenesis, nuclear chromosome segregation, mitotic nuclear division, sister chromatid segregation, mitotic sister chromatid segregation, regulation of chromosome organization, protein localization to chromosome; The top 10 enriched pathways in MF included ATPase activity, tubulin binding, microtubule binding, catalytic activity, acting on DNA, GTPase binding, helicase activity, DNA-dependent ATPase activity, protein folding chaperone, RNA helicase activity, and RNA-dependent ATPase activity.

Fig. 4
figure 4

DEGs of lung tissue from GSE113439 dataset. A Volcano plot of DEGs with |log2FC|> 0.856 and adjusted p value < 0.05. Upregulated and downregulated genes were colored by red and blue, respectively. B Heatmap displaying the top 50 DEGs of GSE113439. C KEGG of DEGs in GSE113439. Dot blot of the top 10 CC (D), BP (E), and MF (F) pathways of GO in GSE113439. The size and color of dots represent the count of genes and adjusted p value in the selected pathway

Different immune cell infiltration of pulmonary tissue in GSE113439

Using microarray transcriptome data from GSE113439, the MCP counter was utilized to evaluate the immune cell infiltration in control and PAH lung samples. As shown in Fig. 5A, statistically decreased distribution of CD8+ T cells, cytotoxic lymphocytes, and NK cells were found in lung tissues of PAH patients compared with control subjects. However, increased infiltration of monocytic lineage was found in PAH lung tissue. The Heatmap further displayed the abundance of each cell type with normalization value ranging from 0–1 in each sample between the control and PAH group (Fig. 5B) (PAH: GSM310626-GSM3106340; Control: GSM3106341-GSM3106351).

Fig. 5
figure 5

Dysregulated immune cells infiltration in PAH lungs. A The box plot of immune cells abundance in control and PAH group. B Heatmap displaying the abundance of immune cells in each sample of lung tissues in control and PAH patients

Protein-protein interaction network (PPI) network and common DEGs identification in GSE210248 and GSE113439

The PPI network of DEGs from GSE113439 was generated by STRING (Fig. 6). The PPI network consisted of 945 nodes and 7266 edges in 997 DEGs. Then, the PPI network of the DEGs in each cluster of GSE210248 was constructed through the STRING online website. Figures 7A and 8H presented the PPI network of 914 DEGs in adipocytes, 411 DEGs in CD8+ T cells, 572 DEGs in chondrocytes, 377 DEGs in endothelial cells, 93 DEGs in epithelial cells, 1139 DEGs in fibroblasts, 822 DEGs in macrophages, and 1013 DEGs in monocytes with the adjusted p-value < 0.05. A Venn diagram was drawn to screen out the common hub genes from GSE210248 and GSE113439. As shown in Fig. 8, a series of genes were identified through overlapping DEGs from each cluster in GSE210248 and DEGs from GSE113439. The details of overlapped genes were listed in Table 4.

Fig. 6
figure 6

PPI network of DEGs from GSE113439

Fig. 7
figure 7

PPI network of DEGs in each cluster of GSE210248. PPI network of the DEGs in adipocytes (A), CD8+ T cells (B), chondrocytes (C), endothelial cells (D), epithelial cells (E), fibroblasts (F), macrophages (G), and monocytes (H)

Fig. 8
figure 8

Veen’s diagram showing the common DEGs from GSE210248 and DEGs from GSE113439. A Veen’s diagram showing the common DEGs from GSE113439, DEGs from adipocytes, CD4+ T cells, and CD8+ T cells in GSE210248. B Veen’s diagram showing the common DEGs from GSE113439, DEGs from chondrocytes, and endothelial cells in GSE210248. C Veen’s diagram showing the common DEGs from GSE113439, DEGs from epithelial cells, fibroblasts, and macrophages. D Veen’s diagram showing the common DEGs from GSE113439, DEGs from monocytes and NK cells

Table 4 Common DEGs from each cluster of GSE210248 and GSE113439

Hub genes identification in PAH

Considering the amounts of selected common DEGs were relatively large. MCODE was utilized for the selection of candidate hub genes from the PPI network of 997 DEGs in GSE113439. Module 1 with the highest score (68 nodes and 1142 edges) was screened out (Fig. 9A). The centralities of the candidate genes in module 1 were evaluated by the CentiScaPe plug-in and the details were shown in Table 5. Additionally, the CytoHubba plug-in was used for ranking nodes in module 1 using MCC methods. The MCODE score of each gene is also summarized in Table 5. Cytohubba plug-in was performed to further simplify these hub genes and pick out the most critical genes using the MCC method. No overlapped genes were found between DEGs from adipocytes, CD4+ T cells, and DEGs from GSE113409. Therefore, the top 20 genes were filtered out in the remaining 8 clusters of GSE210248 and presented in Fig. 9B-I. Less than 20 DEGs in chondrocytes and epithelial cells were presented because there are only 15 DEGs in chondrocytes and 14 DEGs in epithelial cells were included in the network of Cytoscape software. We further screened out the common hub genes using data from module 1 and the top 20 genes in these clusters. As shown in Fig. 10, WDR43 in chondrocytes and GNL2 in CD8+ T cells were finally identified as the most significant genes in PAH. Furthermore, we detected the expression of WDR43 and GNL2 in the lung samples of 15 PAH patients and 11 control subjects in GSE113439 and found significantly increased WDR43 and GNL2 expression (Fig. 11).

Fig. 9
figure 9

Hub genes of GSE113439 and hub genes of each cluster from GSE210248. A Selected hub genes in module 1 of GSE113439 using MCODE pulg-in. The Top 20 hub genes in CD8+ T cells (B), chondrocytes (C), endothelial cells (D), epithelial cells (E), fibroblasts (F), macrophages (G), monocytes (H), and NK cells (I) identified by cytohubba plug-in according to nodes’ score by MCC method from GSE210248

Table 5 The centralities and MCODE score of candidate genes evaluated by CentiScape and CytoHubba plug-in
Fig. 10
figure 10

Veen’s diagram showing the common hub genes from GSE210248 and GSE113439. A Veen’s diagram showing the common hub genes from GSE113439, CD8+ T cells, chondrocytes, and endothelial cells from GSE210248. B Veen’s diagram showing the common hub genes from GSE113439, epithelial cells, fibroblasts, and macrophages from GSE210248. C Veen’s diagram showing the common hub genes from GSE113439, monocytes and NK cells from GSE210248

Fig. 11
figure 11

The expression of selected hub genes. The expression of WDR43 (A) and GNL2 (B) in lung samples of 11 healthy control and 15 PAH patients in GSE113439. **p < 0.01

Discussion

The present study for the first time indicated WDR43 and GNL2 might act as key genes involved in the pathogenesis of PAH, providing a novel potential underlying mechanism of PAH. In the current study, common DEGs were screened out using integrated analysis of scRNA-seq and microarray transcriptome through the limma package and Seurat package in R software. Subsequently, the PPI network of DEGs was constructed using the STRING website. Then, Cytoscape software was utilized to screen out the hub genes in the cluster of GSE210248 and GSE113439. Ultimately, we identified two hub genes (WDR43 and GNL2) in PAH through a series of bioinformatics analyses.

MCP counter illustrated dysregulated landscape of immune cells in lung tissues of PAH patients, which is consistent with previous reports. Marlene reviewed immune dysregulation in PAH and how immune-mediated vascular injury promoted PAH development [25]. For instance, circulating autoantibodies against endothelial cells might enhance the apoptosis of endothelial cells in PAH [26]. T cells and NK cells were considered as beneficial factors during the pathogenesis of PAH [27, 28]. Additionally, the role of perivascular macrophages has received extensive attention from researchers. Widespread Cd68+ macrophages were detected in occlusive plexiform lesions in clinical and experimental PAH models [29]. Inactivation or deletion of macrophages could prevent the development of PAH [30]. More researches need to be carried out to further explore the role of various immune cells in PAH and the underlying mechanisms.

The WD40 repeat (WDR) domain is the most abundant protein interaction domain in the human proteome. The WDR43 gene is located on chromosome 2 and encodes the WDR43 protein containing 677 amino [31]. Of note, WDR43 is an essential subunit of multiprotein complexes and is involved in a series of signaling pathways including ubiquitin-proteasome pathway, epigenetic regulation, DNA damage repair, and immune-related pathways [32]. For instance, the NOL11-WDR43-Cirhin protein complex is necessary for mitotic chromosome segregation [33]. Intriguingly, several bioinformatics analysis identified WDR43 as a crucial oncogene contributing to the development of colorectal/lung cancer via promoting the migration and proliferation of cancer cells through GEO and The Cancer Genome Atlas (TCGA) database. Mechanistically, c-MYC/WDR43/MDM2 mediated p53 degradation, and cyclin-dependent kinase 2 were involved in the underlyng mechanism [34,35,36]. However, the role of WDR43 in PAH remains uninvestigated.. Similarly, the imbalance of proliferation and apoptosis in pulmonary artery smooth muscle cells (PASMCs) was also the key characteristic in pulmonary hypertension [37]. Therefore, we speculate WDR43 might contribute to PASMCs proliferation and migration, then leading to the pulmonary artery remodeling.

GNL2, the G protein nucleolar 2, was found essential for cell growth and development through participating in the cell-cycle regulation pathway [38]. GNL2 acts as a checkpoint for ribosome export, and it plays a vital role in facilitating ribosomal biogenesis and protein synthesis [39]. GNL2 was found to play a critical role in the RNA metabolic network and was associated with proliferation [40]. Increased expression of GNL2 was correlated with poor prognosis in ovarian cancer patients with 1p34.3 amplifications [41]. Results from another scRNA-seq data of periodontitis revealed GNL2 was upregulated in T cells [42]. While the role of GNL2 in PAH and its potential underlying mechanisms needs further exploration. In combination with the KEGG analysis in the current study, GNL2 might participate in the underlying mechanism of PH through the influence on the biosome biogenesis and cell cycle.

Nowadays, high-throughput RNA sequencing has been widely used to explore novel mechanisms of PAH [6, 43]. Especially, with the rapid development of single-cell sequencing, integrated bioinformatics analysis of microarray transcriptome and scRNA-seq, a newly-rising research method, has attracted researchers’ attention lately [44]. A recent study indicated hpgd was a key gene in pulmonary artery endothelial cells (PAECs) using scRNA-seq data from PAECs of control and PAH rodents [45]. There still remains largely unknown on the mechanism of PAH through integrated bioinformatics analysis. The current research might provide a novel insight into the pathogenesis of PAH.

Conclusion

In summary, we performed an integrated bioinformatics analysis of single-cell sequencing andmicroarray transcriptome. Multi-step analysis suggested that WDR43 and GNL2 were increased in PAH lung tissues and they were identified as hub genes in the pathogenesis of PAH. Our results highlight WDR43 and GNL2 as potential biomarkers and pharmacological therapeutic targets for PAH.

Availability of data and materials

Publicly available GEO datasets were analyzed in this study (https://www.ncbi.nlm.nih.gov/geo/) under the accession numbers GSE210248 and GSE113439.

References

  1. Runo JR, Loyd JE. Primary pulmonary hypertension. Lancet. 2003;361(9368):1533–44.

    Article  PubMed  Google Scholar 

  2. Ruopp NF, Cockrill BA. Diagnosis and treatment of pulmonary arterial hypertension: a review. JAMA. 2022;327(14):1379–91.

    Article  CAS  PubMed  Google Scholar 

  3. Galiè N, Channick RN, Frantz RP, Grünig E, Jing ZC, Moiseeva O, Preston IR, Pulido T, Safdar Z, Tamura Y, et al. Risk stratification and medical therapy of pulmonary arterial hypertension. Eur Respir J. 2019;53(1):1801889.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Lau EMT, Giannoulatou E, Celermajer DS, Humbert M. Epidemiology and treatment of pulmonary arterial hypertension. Nat Rev Cardiol. 2017;14(10):603–14.

    Article  CAS  PubMed  Google Scholar 

  5. Thenappan T, Ormiston ML, Ryan JJ, Archer SL. Pulmonary arterial hypertension: pathogenesis and clinical management. BMJ (Clinical research ed). 2018;360:j5492.

    Article  PubMed  Google Scholar 

  6. Ruffenach G, Medzikovic L, Aryan L, Li M, Eghbali M. HNRNPA2B1: RNA-binding protein that orchestrates smooth muscle cell phenotype in pulmonary arterial hypertension. Circulation. 2022;146(16):1243–58.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Li D, Shao NY, Moonen JR, Zhao Z, Shi M, Otsuki S, Wang L, Nguyen T, Yan E, Marciano DP, et al. ALDH1A3 coordinates metabolism with gene regulation in pulmonary arterial hypertension. Circulation. 2021;143(21):2074–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform. 2021;22(6):bbab259.

    Article  PubMed  Google Scholar 

  9. Papalexi E, Satija R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat Rev Immunol. 2018;18(1):35–45.

    Article  CAS  PubMed  Google Scholar 

  10. Clark IC, Gutiérrez-Vázquez C, Wheeler MA, Li Z, Rothhammer V, Linnerbauer M, Sanmarco LM, Guo L, Blain M, Zandee SEJ, et al. Barcoded viral tracing of single-cell interactions in central nervous system inflammation. Science. 2021;372(6540):eabf1230.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hong J, Arneson D, Umar S, Ruffenach G, Cunningham CM, Ahn IS, Diamante G, Bhetraratana M, Park JF, Said E, et al. Single-cell study of two rat models of pulmonary arterial hypertension reveals connections to human pathobiology and drug repositioning. Am J Respir Crit Care Med. 2021;203(8):1006–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Rodor J, Chen SH, Scanlon JP, Monteiro JP, Caudrillier A, Sweta S, Stewart KR, Shmakova A, Dobie R, Henderson BEP, et al. Single-cell RNA sequencing profiling of mouse endothelial cells in response to pulmonary arterial hypertension. Cardiovasc Res. 2022;118(11):2519–34.

    Article  CAS  PubMed  Google Scholar 

  13. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R. NCBI GEO: mining millions of expression profiles–database and tools. Nucleic Acids Res. 2005;33(Database issue):D562-566.

    Article  CAS  PubMed  Google Scholar 

  14. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289–96.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Aran D, Looney AP, Liu L, Wu E, Fong V, Hsu A, Chak S, Naikawadi RP, Wolters PJ, Abate AR, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mabbott NA, Baillie JK, Brown H, Freeman TC, Hume DA. An expression atlas of human primary cells: inference of gene function from coexpression networks. BMC Genomics. 2013;14:632.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Martens JH, Stunnenberg HG. BLUEPRINT: mapping human blood cell epigenomes. Haematologica. 2013;98(10):1487–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, Selves J, Laurent-Puig P, Sautès-Fridman C, Fridman WH, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45(D1):D362-d368.

    Article  CAS  PubMed  Google Scholar 

  24. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Rabinovitch M, Guignabert C, Humbert M, Nicolls MR. Inflammation and immunity in the pathogenesis of pulmonary arterial hypertension. Circ Res. 2014;115(1):165–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Arends SJ, Damoiseaux JG, Duijvestijn AM, Debrus-Palmans L, Vroomen M, Boomars KA, Brunner-La Rocca HP, Reutelingsperger CP, Cohen Tervaert JW, van Paassen P. Immunoglobulin G anti-endothelial cell antibodies: inducers of endothelial cell apoptosis in pulmonary arterial hypertension? Clin Exp Immunol. 2013;174(3):433–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Tamosiuniene R, Tian W, Dhillon G, Wang L, Sung YK, Gera L, Patterson AJ, Agrawal R, Rabinovitch M, Ambler K, et al. Regulatory T cells limit vascular endothelial injury and prevent pulmonary hypertension. Circ Res. 2011;109(8):867–79.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ormiston ML, Chang C, Long LL, Soon E, Jones D, Machado R, Treacy C, Toshner MR, Campbell K, Riding A, et al. Impaired natural killer cell phenotype and function in idiopathic and heritable pulmonary arterial hypertension. Circulation. 2012;126(9):1099–109.

    Article  PubMed  Google Scholar 

  29. Vergadi E, Chang MS, Lee C, Liang OD, Liu X, Fernandez-Gonzalez A, Mitsialis SA, Kourembanas S. Early macrophage recruitment and alternative activation are critical for the later development of hypoxia-induced pulmonary hypertension. Circulation. 2011;123(18):1986–95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Thenappan T, Goel A, Marsboom G, Fang YH, Toth PT, Zhang HJ, Kajimoto H, Hong Z, Paul J, Wietholt C, et al. A central role for CD68(+) macrophages in hepatopulmonary syndrome. Reversal by macrophage depletion. Am J Respir Crit Care Med. 2011;183(8):1080–91.

    Article  PubMed  Google Scholar 

  31. Bi X, Xu Y, Li T, Li X, Li W, Shao W, Wang K, Zhan G, Wu Z, Liu W, et al. RNA targets ribogenesis factor WDR43 to chromatin for transcription and pluripotency control. Mol Cell. 2019;75(1):102-116.e109.

    Article  CAS  PubMed  Google Scholar 

  32. Schapira M, Tyers M, Torrent M, Arrowsmith CH. WD40 repeat domain proteins: a novel target class? Nat Rev Drug Discov. 2017;16(11):773–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Fujimura A, Hayashi Y, Kato K, Kogure Y, Kameyama M, Shimamoto H, Daitoku H, Fukamizu A, Hirota T, Kimura K. Identification of a novel nucleolar protein complex required for mitotic chromosome segregation through centromeric accumulation of Aurora B. Nucleic Acids Res. 2020;48(12):6583–96.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Di Y, Jing X, Hu K, Wen X, Ye L, Zhang X, Qin J, Ye J, Lin R, Wang Z, et al. The c-MYC-WDR43 signalling axis promotes chemoresistance and tumour growth in colorectal cancer by inhibiting p53 activity. Drug Resist Updat. 2023;66:100909.

    Article  CAS  PubMed  Google Scholar 

  35. Sun H, Sun Q, Qiu X, Zhang G, Chen G, Li A, Dai J. WD repeat domain 43 promotes malignant progression of non-small cell lung cancer by regulating CDK2. Int J Biochem Cell Biol. 2022;151:106293.

    Article  CAS  PubMed  Google Scholar 

  36. Li Z, Feng M, Zhang J, Wang X, Xu E, Wang C, Lin F, Yang Z, Yu H, Guan W, et al. WD40 repeat 43 mediates cell survival, proliferation, migration and invasion via vimentin in colorectal cancer. Cancer Cell Int. 2021;21(1):418.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Qin Y, Qiao Y, Li L, Luo E, Wang D, Yao Y, Tang C, Yan G. The m(6)A methyltransferase METTL3 promotes hypoxic pulmonary arterial hypertension. Life Sci. 2021;274:119366.

    Article  CAS  PubMed  Google Scholar 

  38. Essers PB, Pereboom TC, Goos YJ, Paridaen JT, Macinnes AW. A comparative study of nucleostemin family members in zebrafish reveals specific roles in ribosome biogenesis. Dev Biol. 2014;385(2):304–15.

    Article  CAS  PubMed  Google Scholar 

  39. Matsuo Y, Granneman S, Thoms M, Manikas RG, Tollervey D, Hurt E. Coupled GTPase and remodelling ATPase activities form a checkpoint for ribosome export. Nature. 2014;505(7481):112–6.

    Article  PubMed  Google Scholar 

  40. Iuchi S, Paulo JA. RNAmetasome network for macromolecule biogenesis in human cells. Commun Biol. 2021;4(1):1399.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Nakamura K, Reid BM, Chen A, Chen Z, Goode EL, Permuth JB, Teer JK, Tyrer J, Yu X, Kanetsky PA, et al. Functional analysis of the 1p34.3 risk locus implicates GNL2 in high-grade serous ovarian cancer. Am J Hum Genet. 2022;109(1):116–35.

    Article  CAS  PubMed  Google Scholar 

  42. Wang Z, Chen H, Peng L, He Y, Wei J, Zhang X. DNER and GNL2 are differentially m6A methylated in periodontitis in comparison with periodontal health revealed by m6A microarray of human gingival tissue and transcriptomic analysis. J Periodontal Res. 2023;58(3):529–43.

    Article  CAS  PubMed  Google Scholar 

  43. Wang J, Niu Y, Luo L, Lu Z, Chen Q, Zhang S, Guo Q, Li L, Gou D. Decoding ceRNA regulatory network in the pulmonary artery of hypoxia-induced pulmonary hypertension (HPH) rat model. Cell Biosci. 2022;12(1):27.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Kuksin M, Morel D, Aglave M, Danlos FX, Marabelle A, Zinovyev A, Gautheret D, Verlingue L. Applications of single-cell and bulk RNA sequencing in onco-immunology. Eur J Cancer. 2021;149:193–210.

    Article  CAS  PubMed  Google Scholar 

  45. He M, Tao K, Xiang M, Sun J. Hpgd affects the progression of hypoxic pulmonary hypertension by regulating vascular remodeling. BMC Pulm Med. 2023;23(1):116.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Funding

This study was supported by grants from the National Natural Science Foundation of China (No. 82170433 and No. 81970237).

Author information

Authors and Affiliations

Authors

Contributions

Yuhan Qin and Gaoliang Yan designed the study. Yuhan Qin, Gaoliang Yan, Yong Qiao, and Dong Wang collected the data and performed bioinformatics analysis. Yuhan Qin wrote the manuscript. Gaoliang Yan and Chengchun Tang: reviewed and edited the manuscript. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Gaoliang Yan or Chengchun Tang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Yan, G., Qiao, Y. et al. Identification of hub genes based on integrated analysis of single-cell and microarray transcriptome in patients with pulmonary arterial hypertension. BMC Genomics 24, 788 (2023). https://doi.org/10.1186/s12864-023-09892-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12864-023-09892-3

Keywords