Detection of protein-protein interaction modules relevant to human neuron maturation
To comprehensively investigate changes of functional modules during the process of neuron maturation in humans, we adapted the module detection algorithm based on the topological overlap matrix (TOM) [6], from the widely used gene co-expression network analysis pipeline WGCNA [7], to the protein-protein interaction network annotated by Reactome [8, 9]. To include gene differential expression information, each edge in the network was weighted by the difference of expression level changes between linked genes, which were smoothed with the insulated heat diffusion procedure to reduce influence of noise (see Materials and Methods). Gene expression level changes during the neuron maturation process in humans were estimated based on the published single cell RNA-seq (scRNA-seq) data of fetal and adult human brains. [10].
The analysis resulted in 109 functional modules with sizes ranging from 21 to 203 genes, with a median size of 38 genes (Fig. 1a). As shown by the calculated adjusted random index (ARI) [11, 12], choice of insulating parameter did influence the identified modules, but the modular composition remained generally robust (Additional file 1: Figure S1). A two-sided Wilcoxon signed rank test was applied to each module in order to identify functional modules with significant expression level changes with concordant direction. Thirty-three functional modules with significant directional changes, which were referred to discriminating modules, were identified (Benjamini-Hochberg (BH) corrected P < 0.05, Additional file 2: Table S1). Among them, 17 modules accounting for 964 genes in the network showed higher activity in mature neurons (referred as mature-high modules). On the other hand, the remaining 16 modules accounting for 1125 genes showed higher activity in immature neurons (referred as immature-high modules). Among the 33 discriminating modules, 31 of them were significantly overlapped with the three co-expression modules identified by applying WGCNA to the 20 cell-pooling samples (Fig. 1b). Gene Ontology (GO) enrichment analysis by topGO [13] and GOSemSim [14] indicated that genes encoding for membrane proteins which participate in cell communication, signaling and oxidation-reduction processes for energy generation were strongly enriched in mature-high modules (Fig. 1b, Additional file 3: Table S2). On the other hand, genes encoding for nuclear proteins related to transcription and post-transcriptional processing including splicing and translation were enriched in immature-high modules (Fig. 1b, Additional file 3: Table S2). The neuron specificity index (NSI) for each of the 33 discriminating modules suggested that, genes in majority of those modules (22 out of 33, one-sided binomial test P = 0.04, Fig. 1b) presented a trend of higher gene expression levels in neurons compared to glial cells, which further implied the biological importance of those modules in neurons.
Although lacking additional data for in vivo transcriptome of human neurons across the whole neuron maturation process, it has been reported that neuron maturation explains the majority of brain transcriptome changes during prenatal and new-born postnatal development [15]. Therefore, we took the advantage of fetal and early postnatal brain RNA-seq dataset in BrainSpan and another age series RNA-seq data [16], to compare the brain transcriptome before and after postnatal day 100. Remarkably, 28 out of the 33 discriminating modules showed significant concordant expression level changes (one-sided Wilcoxon signed rank test to fold changes (FC), BH-corrected P < 0.1) in at least one dataset, while 20 of them showed significant concordance in both datasets (Fig. 1). In addition, although not significant, another two modules showed consistent direction of changes in both datasets. These results suggest that the discriminating modules represent the reproducible functional modules discriminating mature and immature neurons.
Adversarial functional module pairs
Interestingly, a further comparison with PPI functional modules, which were detected without integrating with expression level differences, identified six adversarial functional module (AFM) pairs. The two modules in one AFM pair were corresponding to the same module when the differential expression information was not integrated (Fig. 1). Three out of six AFM pairs (M4-M14, M36-M108, M8-M72) showed significant expression changes with consistent directions in at least one bulk brain RNA-seq dataset (one-sided Wilcoxon signed rank test, BH corrected P < 0.01). In addition, consistent discordance in all pairs were observed in both bulk brain datasets (one-sided Wilcoxon rank sum test, P < 0.01). Further functional analysis revealed highly consistent, connected but varied GO term and biological pathway enrichment in each pair of adversarial modules (topGO with the parentchild algorithm for GO terms, one-sided Fisher’s exact test for pathways; BH-corrected P < 0.05, Additional file 3: Table S2). This analysis indicated that highly connected biological pathways may play distinct roles during neuron maturation in humans. They may reflect decoupling of components in the same pathway during the neuron maturation process.
A good example is the AFM pair M4-M14 (Fig. 2). Genes in both modules participate in signaling by Rho GTPases, and more specifically, by activating the Rhotekin and Rhophilins pathway according to the Reactome annotation. Interestingly, this pathway splits into two parts: RHOB/C and RTKN in mature-high M4, and RHOA, RHPN1/2 and TAX1BP3 in immature-high M14. This partition implies that, although Rhotekin and Rhophilins both participate in Rho GTPases signaling, they interact with different members of the Rho protein family and play different roles in the process of neuron maturation. Rhophilins interact with RhoA and take part in neuron maturation including neuron migration, which is supported by previous studies suggesting interaction between them [17] as well as the role of Rhophilins in cell migration [18]. Rhetekin, on the other hand, while being important in neural differentiation and neurite outgrowth, is also required for neuron survival [19]. This may explain why the expression level of RTKN gene remains high in mature neurons.
Another AFM pair, M32-M39, represents another scenario. While both modules show significant enrichment of pathways related to endocytosis, genes in the two modules also participate in distinct pathways (Fig. 2). Spry regulation of FGF signaling pathway, which has been reported to be required for cortical development [20], only appears in the immature-high module M32, whereas EPH-ephrin mediated cell repulsion, whose role extends from development to adulthood regulating neuronal plasticity [21], only appears in the mature-high module M39. In summary, the pleiotropy of genes and pathways leads to the separation of the two modules.
Identified functional modules discriminated different maturity states of neurons from in vitro models
To further estimate how well the neuron-maturation-related transcriptome transitions we identified, especially genes participating in the detected discriminating modules, reflect status of neuron maturation, we establish a machine-learning-based quantitative estimate of neuronal maturity state and tried to apply it to other data sets.
In brief, we constructed a LASSO-regularized logistic regression model based on the standardized expression level of genes involved in each identified module. Each model provided a value ranging between zero and one, namely a modular Neuron Maturity Index (mNMI), with values closer to 1 indicating higher maturity. Ten-fold cross-validation suggested high performances for most of the mNMIs (median AUC = 0.87, Additional file 4: Figure S2). Applying the models in the test set also resulted in accurate estimations (median AUC = 0.84, Additional file 4: Figure S2), with those based on discriminating modules performing marginally better (two-sided Wilcoxon rank sum test, P = 0.11). The mNMIs were further added to two integrated NMIs to represent the overall maturity state, by taking their averages weighted by their performances. This procedure was done by either including all mNMIs (transcriptome NMI or tNMI), or only those based on discriminating modules (discriminating NMI or dNMI). Both general NMIs performed perfectly in distinguishing mature and immature neurons in the test set (AUC = 1, Additional file 4: Figure S2).
With the NMI models constructed, we applied them to neuron transcriptome data sets of in vitro neuron models in order to check whether the identified transcriptome transition could be reproduced and therefore represent the general molecular signature of neuron maturation. In a previous study, Bardy et al. combined patch clamping and scRNA-seq to investigate the relationship between transcriptome and electrophysiology of iPSC-derived neurons [22]. The estimation of NMIs indicated trend of increased neuron maturity accompanying increased action potential, i.e. the electrophysiological maturity, especially between the most immature and mature neurons (one-sided Wilcoxon rank sum test, P = 0.12 for dNMI, P = 0.02 for tNMI, Fig. 3 and Additional file 5: Figure S3).
While this dataset was limited by its relatively small number of neurons (N = 56), Close et al. applied scRNA-seq to interneurons generated by in vitro differentiation of human embryonic stem cells (hESCs) to characterize temporal interneuron transcriptome during its maturation, generating another dataset which involved 1733 cells [23]. By estimating NMIs for each DCX+ interneuron (N = 993), we observed the significant increase of integrated NMIs across the time course, especially between 54-day and 100-day (Wilcoxon rank sum test, P < 0.0001, Fig. 3 and Additional file 5: Figure S3). We also noticed that both tNMI and dNMI did not present significant increase between 100-day and 125-day interneurons (Wilcoxon rank sum test, P = 0.26 for tNMI, P = 0.58 for dNMI), which is consistent with the weak discrimination between them at the whole transcriptome level proposed by Close et al.
It is worth noting that even at the most electrophysiologically mature state (Bardy et al. dataset) or at the latest time point (Close et al. dataset), a large proportion of interneurons were still in immature state (Fig. 3 and Additional file 5: Figure S3). These observations may be due to the technical issue that the NMI model failed to provide prediction of mature neurons, or reflected the failure to complete the neuron maturation process in vitro. To answer this question, we examined the human single neuronal nucleus RNA-seq in adult brains [24], resulting in both tNMI and dNMI values significantly larger than 0.5 (Fig. 3 and Additional file 5: Figure S3). As expected, no significant difference of both tNMI and dNMI was observed between excitatory and inhibitory neurons (Wilcoxon rank sum test, P = 0.22 for tNMI, P = 0.27for dNMI, Fig. 3 and Additional file 5: Figure S3). Hierarchical clustering based on Pearson’s correlation coefficient among samples revealed that cell type makes more contributions to sample separation than source of dataset, showing that the estimation is less likely to be biased by batch effect (Additional file 6: Figure S4). The above results suggested the potential maturation arrest of the in vitro differentiated neurons.
Transcriptome transition during maturation is conserved in mouse neurons
To check whether the detected transcriptome transition during neuron maturation was conserved in mice, the most widely used animal model for brain development and mental disorders, we applied the constructed human-based NMI model to neuron transcriptome data in mice. Chen et al. extracted maturing interneurons from mouse embryonic medial ganglionic eminence (MGE) and applied scRNA-seq to measure their transcriptome [25]. Estimation of dNMI suggested a boost of maturity state at E17.5, the latest time point across the time course. This result suggested that the human-based NMI models successfully recurred the neuron maturation process in mouse, implying the conserved maturation programs of neuron between humans and mice. Interestingly, the three subtypes of maturing interneurons identified in the study showed significantly different dNMIs (ANOVA, df1 = 2, df2 = 130, F = 55.2, P < 0.0001, Fig. 4a), suggesting that they represented interneurons at distinct stages of maturation.
Activities of mature-high modules reflect mature neuron functionality level
Interestingly, applying the dNMI model to the purified neuron transcriptome of PS2APP Alzheimer’s disease mouse model [26] suggested a significantly weaker maturity state than controls (median dNMIPS2APP = 0.782, median dNMIcontrol = 0.791, two-sided Wilcoxon rank sum test, P = 0.003). Further studies on each of the mNMIs indicated that three mNMIs, all of which were based on mature-high modules, significantly decreased in PS2APP neurons (Wilcoxon’s rank sum test, BH corrected P < 0.1). In addition, among the top-ten of the 27 discriminating modules with reliable mNMIs (AUC > 0.8 in cross-validation in training set) and strongest decrease in PS2APP comparing to control neurons, eight were mature-high modules (Fisher’s exact test, odds ratio (OR) = 4.25, P = 0.1). The bias of changes to the mature-high modules was different from observation from the above MGE interneurons data set, as only nine out of 15 (60%) modules with mNMIs significantly different among the three subtypes of maturing interneurons were mature-high modules (Fisher’s exact test, OR = 1.83, P = 0.49).
Considering that the mature-high modules are more likely to be responsible for mature neuronal function maintenance, the biased changes implied that the lower tNMI of PS2APP neurons represented impairment of neuronal function rather than maturation, which has been reported previously [27]. Therefore, we constructed the third integrated index, the neuron functionality index (NFI), which integrated mNMIs from only the mature-high discriminating modules. As expected, the estimated NFIs of PS2APP neurons were significantly lower than those of control neurons (median NFIPS2APP = 0.836, median NFIcontrol = 0.850, Wilcoxon rank sum test, P = 0.05, Fig. 4b). On the other hand, the integrated NMIs of immature-high discriminating modules did not show any significant difference (Wilcoxon rank sum test, P = 0.58, Fig. 4b). For comparison, no significant difference of either dNMI or NFI was observed between purified neuron transcriptome of a lipopolysaccharide-treated neuroinflammation mouse model and control mouse (Fig. 4b). These results indicated that the activities of mature-high, but not the immature-high, modules may act as signatures of neuron functionality.