- Research article
- Open Access
When pitch adds to volume: coregulation of transcript diversity predicts gene function
BMC Genomics volume 19, Article number: 926 (2018)
Genes corregulate their overall transcript volumes to perform their physiological functions. However, it is unknown if they additionally coregulate their transcript diversities. We studied the reliability, consistency and functional associations of co-splicing correlations of genes of interest, across two independent studies, multiple tissues and two statistical methods. We thoroughly investigated the reproducibility of co-splicing correlations of APP, the candidate gene of Azheimer’s disease (AD). We then studied how co-splicing correlations in different tissues contributed to predict functional interactions of three other genes and finally computed co-splicing frequency for 17 thousand genes across 52 human tissues.
We replicated co-splicing correlations between APP and 5 AD-related genes and reproduced expected enrichment of APP co-splicing in synaptic vesicle cycle and proteosome pathways. We observed novel associations for tissue vulnerability to disease with enrichment in APP co-splicing, co-expression and epistasis in AD. APP co-splicing was the strongest predictor and replicated between studies. We confirmed known gene interactions of PRPF8 and GRIA1 in testis and brain cortex, and observed a novel interaction of FGFR2, in breast and prostate, modulated by cancer risk-variants. We produced a co-splicing map across 52 human tissues to help predict the function of over 17 thousand genes.
We show that coregulation of transcript diversities provides novel biological insights in gene physiology and helps to interpret GWAS results. Co-splicing correlations are reliable and frequent and should be further pursued to help predict gene function. Our results additionally support current AD interventions aiming at the ubiquitin proteosome pathway but unveil the need to consider transcript diversity in addition to volume to assess treatment response and susceptibility to the disease.
The correlation between gene expression levels has been a prominent tool to study the emergence and conservation of biological functions . However, it is now clear that most genes support a diverse repertoire of transcript isoforms , produced by alternative transcription initiation, alternative polyadenylation and alternative splicing . Consequently, gene expression can change not only in volume but also in quality when some isoform groups are more frequently produced than others. As coregulation of gene expression contributes to orchestrate the function of genes, it is then conceivable that genes may also tune in their transcript distributions in functions that involve selected isoforms.
The transcript diversity of genes is strongly dependent on the tissue , suggesting ample physiological consequences of its regulation . Some insights have been made from functional associations of gene coregulation networks. Recently, Iancu and colleagues described co-splicing correlations across species as a measure of the coregulation of the transcript diversities between gene pairs . They observed that network hubs strongly represented neurobiological functional pathways. More recently, Saha et al. demonstrated strong presence of splicing regulatory hubs from co-splicing networks across numerous tissues, using Genotype Tissue-Expression (GTEx) data . To gain further understanding into the biological consequences of the regulation of transcript diversity, we ask whether significant coregulation among genes is consistent with their physiological functions. Therefore, unlike previous studies that aimed to identify structural properties of co-splicing networks, here, we considered whether the co-splicing correlations of a gene of interest play a relevant role in the gene’s biological function. In this context, we attempted to assess how reliable, widespread and informative on gene function co-splicing correlations are.
We therefore aim to first study the reproducibility and biological significance of co-splicing correlations of a candidate gene for which there is evidence that splicing ratios play an important role. APP is a prominent candidate gene for Alzheimer’s disease (AD) . The gene supports numerous isoforms  and research indicates that its spliceform ratios are different between AD patients and controls, triggering the amyloid cascade and apoptosis or being involved in aging hippocampi [10,11,12,13]. In addition to the hypothesized toxicity of APP’s amyloid-β fragment, disruptions of the gene’s function may also be important contributors to the disease . Increasing our understanding of APP’s physiological regulation is, therefore, essential. Validating between independent studies, methods and brain regions, we therefore aimed to determine the robustness of APP co-splicing correlations with genes and pathways that contribute to the gene’s physiological function and role in AD. We therefore analyzed two independent studies (GTEx and BRAINEAC) [15, 16], two co-splicing methods and ten different brain regions to fully characterize the transcriptome-wide co-splicing of APP. To assess the further biological insight given by the regulation of transcription diversity, we studied the relative contribution of APP co-splicing, co-expression and genome-wide epistasis (defined in Methods) to a transcriptomic signature of tissue vulnerability for the disease .
Since, to our knowledge, this is the first effort to try to infer a gene’s function through transcriptome-wide coregulation of transcript diversity, we then studied whether transcriptome-wide co-splicing correlations were also relevant to predict the function of other genes in different tissues. We therefore hypothesized that transcriptome-wide co-splicing of genes of interest in relevant tissues predicts expected protein-protein interactions and informs on plausible new interactions. We thus studied PRPF8, GRIA1 and FGFR2, which are involved in the regulation of splicing in testis, glutamatergic neurotransmission in brain cortex and susceptibility to breast cancer, respectively [18,19,20,21]. Finally, we assessed the frequency of co-splicing correlations across 17 thousand genes and 52 human tissues and made the results available at coSplicing4GTEx web application as novel gene function predictor .
Methods to compute co-splicing correlations
Gene expression levels are intensively used to determine groups of genes that by coregulating their transcription volumes perform biological functions. However, it is conceivable that genes also coregulate their transcription diversities to determine which transcript isoforms should be involved in a particular function. The coregulation of transcript diversity has been recently estimated from co-splicing correlations [6, 7]. From RNA-seq data, co-splicing correlations across subjects are derived from the exon count distributions of genes [6, 7]. As defined by Iancu and colleagues, the co-splicing between two genes can be computed from the Mantel’s correlation between the distance matrices of exon count distributions across subjects  (see Methods). We propose to visualize these co-splicing correlations back in terms of exon count data as shown in Fig. 1. The figure illustrates two genes in high co-splicing, APP and UBQLN1, and low co-splicing, APP and SSC5D, derived from the exon count data (RNA-seq) of hippocampus from the GTEx project, version 6. High co-splicing between two genes, as measured by Mantel’s correlation, results in a coherent ranking of individuals across numerous exons between genes. We thus observe that the subject ranking of count frequencies at APP’s exon 1 is coherent with the rankings across numerous exons of APP and UBQLN1 but not across the exons of SSC5D (compare Fig. 1a with b and with c). To gain further insight into the coregulation of transcription diversity, we additionally propose an alternative measure of co-splicing based on the first principal component (PC) of the exon count distributions across individuals. Interpreting the first PC as providing main differences between transcript mixtures between individuals for a given gene, co-splicing can then be measured as the correlation between the first PCs of two genes. PC-based co-splicing is computed by the partial correlation adjusting for covariates. We observed, in the example of Fig. 1, that the partial correlation between the first PCs of genes is high between APP and UBQLN1 and low between APP and SSC5D (Fig. 1d and e). The PC based method was used for validation of Mantel’s correlations and to test interactions with genomic variants.
Co-splicing complements co-expression
We aimed to characterize the co-splicing of APP from its relationship to co-expression. We first studied the association between transcriptome-wide co-expression and co-splicing of APP with 17,368 genes in the hippocampus data of GTEx (N = 94), one of the main regions affected in AD. We computed the transcriptome-wide co-expression of APP using a partial Pearson’s correlations of kilobase per million mapped reads (RPKM), adjusting by tissue-specific covariates. We estimated transcriptome-wide co-splicing from Mantel’s correlations, also adjusting by covariates. We noticed high correlation between APP co-expression and co-splicing (Pearson’s r = 0.51, P < 1 × 10− 16, Fig. 2a). Interestingly, genes in high co-splicing with APP appear as a clear subset of those in high co-expression, indicating that coregulation of transcript diversity is likely accompanied by coregulation of transcription volume; as some isoforms become more abundant, they increase the overall transcription of genes. Whereas, changes in trascription levels do not necessarily imply changes in splicing ratios.
Reliability of co-splicing correlations between different methods, studies and tissues
We studied the replication of co-splicing of APP with different inference methods and its validation across two independent studies and numerous brain regions. Using exon-array data of hippocampus from the BRAINEAC project (N = 130), we investigated the reproducibility of transcriptome-wide co-splicing of APP between studies and inference methods. We computed, in both studies, the co-splicing correlations of APP with 12,577 commonly-annotated genes, using the Mantel and PC methods, to determine the level of agreement between analyses and studies. We found high agreement with a Pearson’s correlation of 0.71 and 0.52 between methods, in GTEx and BRAINEAC, respectively (Additional file 1: Figures S1-S2). We also calculated the consistency of the transcriptome-wide co-splicing of APP between studies (Fig. 2b). While the correlation between studies was moderate (0.36), it was highly significant (P < 1 × 10− 16), demonstrating that a high degree of APP co-splicing across the genome was reproduced between studies. This is a relevant result, as the differences between studies are numerous: particularly, microarray’s exon expression distributions are reconstructed from subject ranking (see Methods); and more generally, they differ in samples, experiments (RNAseq and microarray), data reduction algorithms and co-splicing inference.
We then studied the co-splicing of APP across multiple brain regions, where deposition of amyloid plaques begins . We considered six different brain regions from GTEx and four form BRAINEAC (see Methods), and searched for the genes that were significantly co-spliced with APP across methods, brain regions and studies. We observed, in particular, that the PC-based co-splicing method in hippocampus had more statistical power than Mantel’s correlation; P-values for Mantel’s correlations were computed from the normal distribution of z-transformed correlations, further standardized and cube-root transformed to account for right skewness (Additional file 1: Figures S3-S4). Genes with significant co-splicing corrected for multiple comparisons were identified from the PC-based method and validation between studies was based on Mantel’s correlations, the reference method. As a result, a total of 11 genes with reproducible co-splicing across methods, regions and studies were identified, see Table 1. Five genes, UBQLN1, APLP2, ATP2A2, ATP6AP2 and DNER, have been previously linked to AD, suggesting probable disruptions of APP’s physiological regulation during disease. UBQLN1 links proteosomes and ubiquitin ligases for protein degradation and its variants have been associated to AD , APLP2 has been involved in glutamatergic transmission and synaptogenesis in association with APP [25, 26], ATP2A2 is a SERCA Ca(2+) ATPase, involved in calcium homeostasis in the endoplasmatic reticulum, and regulates amyloid-β production , ATP6AP2 is a constituent of the renin-angiotensin system whose down-regulation has been associated to AD  and DNER promotes glia differentiation activating gamma secretase signaling .
Modulation of co-splicing by a SNP
Notably, we found reproducible APP co-splicing with UBQLN1, which in physiological state is a chaperone of APP that contributes to its synthesis and processing . UBQLN1’s alternative splicing and common variation has been associated with increased susceptibility to AD [24, 31,32,33], not without controversy [34, 35]. Since the polymorphism UBQ-8i (rs12344615) in UBQLN1 has been associated with increased risk of late-onset Alzheimer’s disease (LOAD) and differences in UBQLN1 exon 8 splicing, we asked whether APP/UBQLN1 co-splicing was modulated by UBQ-8i. We found that, in the frontal cortex, the interaction between the first PC of APP’s exon count distribution and the SNP was significantly associated with the first PC of UBQLN1 (β = − 0.08, P = 0.035, Additional file 1: Figure S5). We adjusted for 6 genome-wide surrogate covariates and sex. While the interaction was not significant in other tissues tested, they all showed a consistent direction of the estimate, suggesting a link between UBQ-8i and APP splicing ratios.
Contribution of APP’s co-splicing, co-expression and epistasis in tissue vulnerability to AD
We then studied whether transcriptome-wide co-splicing of APP lead to significant enrichment of biochemical pathways, exploring further the physiological functions of APP at pathway level through co-splicing correlations. We computed the pathway enrichment of APP co-splicing on 277 KEGG pathways, given by genes with APP co-splicing P-value > 0.05. The overall agreement between the GTEx and BRAINAC studies to declare a pathway significantly enriched in APP co-splicing was given by a Cohen’s κ of 0.37 (P = 1.71 × 10− 10), in line with the observed reproducibility at gene level. After correcting for multiple comparisons, that included number of pathways and brain regions, we observed a total of six significant pathways validated between studies (Table 2). Consistent with known APP’s physiological functions, Synaptic vesicle cycle showed the most significant combined P-value between experiments  (P = 8.67 × 10− 7), followed by Protein processing in endoplasmic reticulum  (P = 2.10 × 10− 5), a pathway in which UBQLN1 is involved (Additional file 2: Table S1). Also relevant to AD, Ubiquitin mediated proteolysis was reproducible between studies, as it was long term potentiation. Consistent with co-splicing being a factor influencing co-expression, we confirmed that all pathways with reproducible enrichment in co-splicing were also enriched in co-expression and that many more pathways were enriched in co-expression than co-splicing (Additional file 2: Table S1).
To determine a putative impact of a deregulation of transcript diversity in disease, we correlated the pathway’s enrichment in APP co-splicing with a recent transcriptional signature of tissue vulnerability to AD  The signature is based on expression differences between Braak regions I to III and unaffected brain regions, and it is enriched for pathways that co-aggregate with amyloid-β and tau protein, in plaques and tangles . As a measure between normal APP functioning and genetic susceptibility to disease derived from inadequate protein degradation, we tested whether the pathway vulnerability score correlated with enrichment in APP co-splicing (Additional file 2: Table S1). We observed that the tissue vulnerability score of pathways strongly correlated with their enrichment in APP co-splicing, both in GTEx (β = 0.18, P = 8.6 × 10− 9) and BRAINEAC (β = 0.10, P = 0.018, Fig. 3a and b). We also found that enrichment in co-expression correlated with the vulnerability score weaker than co-splicing and was not validated between studies (β = 0.08, P = 1.1 × 10− 4 in GTEx, β = 0.05, P = 0.13 in BRAINEAC). This results shows that the specificity gained by enrichment in co-splicing improved the associations between the physiological function of APP and the vulnerability score.
We applied an enrichment analysis of epistatic effects as described in Caceres et al. , with the objective of finding significant genomic associations of co-splicing correlations at a pathway level. Specifically, we asked which pathways were enriched in interactions with multiple SNPs in APP, and if the pathways correlated with those enriched in APP co-splicing. This analysis is well powered (80%) to detect interactions with SNPs having odds ratios greater than 1.28 in studies with more than 1000 subjects, and false positive rate is kept under control when adjusting for multiple comparisons for the number of pathways tested (Additional file 1: Figures S6-S11). We analyzed three LOAD GWAS: ADG (2686/935 cases/controls), NIA (587/289) and GENADA (806/782). We pruned SNPs within APP keeping those that were in pair-wise linkage equilibrium at R2 < 0.2 within studies. We performed 27 genome-wide interaction analyses (GWIAs) for the 27 SNPs in APP, covered in the three studies (Additional file 1: Tables S2-S4). Each GWIA tested the interactions of one the SNPs in APP with all the SNPs genotyped across the genome. For each GWIA, we then performed enrichment analysis on 277 KEGG pathways. A pathway’s multiple enrichment in APP epistasis was defined as the total number of GWIAs in which the pathway was found significantly enriched (Additional file 3: Table S5). We did not find statistical evidence for the association between multiple enrichment in APP epistasis and enrichment in APP co-splicing (β = 0.18, P = 0.36 in GTEx, β = 0.13, P = 0.62 in BRAINEAC). However, we observed a strong correlation between multiple enrichment in APP epistasis and the tissue vulnerability score (robust regression, β = 0.033, P = 3.96 × 10− 5, R2 = 0.045, Fig. 3c), confirming a strong link between the vulnerability score and the genetic variability of APP. We fitted, in GTEx, a regression model for the vulnerability score that included enrichment of APP co-splicing, co-expression and epistasis, and observed that all the factors contributed significantly to the model (F(3,208) = 19.56, P = 3.2 × 10− 11, R2 = 0.22), being co-splicing the strongest contributor of all (β = 0.15, P = 2.7 × 10− 7).
Co-splicing correlations predict gene function of PRPF8 and GRIA1 in testis and brain
We studied two additional genes of interest for which alternative splicing have shown to be relevant for the genes’ function. We first considered PRPF8. Prp8p is a component of the U5 assembly within the catalytic center of the spliciosome . As testis is a tissue with high levels of splicing events, we looked at the genes that co-spliced the strongest with PRPF8 in this tissue. Remarkably, in the top 20 genes with highest co-splicing, we observed SNRNP200 (small nuclear ribonuclear protein U5 subunit 200), SF3B3 (splicing factor 3B subunit 3), PRPF6 (Pre-mRNA-processing factor 6) and SF3B2 (splicing factor 3B subunit 2) with Mantel’s correlations of 0.9, 0.87, 0.86 and 0.85, respectively. Enrichment analysis of KEGG pathways confirmed spliceosome as the most significant pathway (adjusted-P = 1.24 × 10− 4). These findings suggest that the complex cascade of spliceosome interactions may be a physiological feature of alternative splicing beyond developmental stages  and are in line with the enrichment in splicing genes for the transcriptome-wide networks that combine expression and relative isoform levels .
We further looked at the transcriptome-wide co-splicing of GRIA1 (Glutamate Receptor 1) in the brain cortex. Different isoforms of GRIA1–4 combine in various tetramers to produce different versions of AMPA receptors for glutamate, each with a specific function . We found that the most co-spliced gene in the genome with GRIA1 was GRIA3 (Mantel’s correlation 0.85). Interestingly NSF, which binds to AMPA receptor GluR2 subunit (GRIA2), was the second highest in co-splicing with GRIA1 (0.84). From the top 20 genes in co-splicing with GRIA1, 4 were part of synaptic vesicle cycle (adjusted-P = 5.85 × 10− 4) and 5 form the cAMP signaling pathway (adjusted-P = 1.42 × 10− 3). We therefore confirmed expected interactions between genes in a physiological function that is regulated by the alternative splicing of its factors. We further computed the genome-wide co-expression network of GRIA1 to determine whether the top genes that co-spliced with it also topped in co-expression. We observed a similar but more dramatic pattern than that for APP (Additional file 1: Figure S12). That is, many more genes highly co-expressed with GRIA1 than those that highly co-spliced with it. We also observed that no significant pathways were found for the top 20 co-expressed genes. In particular, we found that the co-expression of GRIA1 and GRIA3 was much lower ranked (1347 highest correlation with value 0.80) than their co-splicing, suggesting that expression levels are not enough to drive the coregualtion between the genes. This type of specific observations indicate a tight regulation between the genes’ isoforms that can be further investigated.
FGFR2 /CASC4 co-splicing is modulated by cancer risk SNPs in breast and prostate
We also asked whether co-splicing correlations could add information to validated transcriptome-wide association studies (GWAS) results. In breast data, we looked at the transcriptome-wide co-splicing correlations of FGFR2, whose SNP rs2981582 has been associated with breast cancer  and for which numerous splicing forms have been found specifically expressed in breast cancer cells lines . Although correlations were smaller than in previous cases, we found that the top 20 genes that co-spliced with FGFR2 significantly enriched the pathway Central carbon metabolism in cancer (adjusted-P = 1.57 × 10− 2). We then considered CASC4, other gene whose alternative splicing has been linked to breast cancer . We tested whether the co-splicing between the genes was modulated by rs2981582. We used PC derived co-splicing, to test the association of the interaction between the first PC of FGFR2 and rs2981582 with the first PC of CASC4, adjusting for covariates (interaction-β = 0.10, P = 0.017). Interestingly, we replicated the interaction in prostate (interaction-β = 0.09, P = 0.027), and also found a significant interaction with rs10749415 (interaction-β = − 0.14, P = 0.038), a SNP in FGFR2 recently associated with prostate-specific antigen levels . While needing to be confirmed, co-splicing correlations suggest an underlying coregulation of the alternative splicing of FGFR2 and CASC4, modulated by risk SNPs for cancer.
Co-splicing is ubiquitous across tissues
We finally analyzed the exon count data from GTEx corresponding to 52 human tissues. For each tissue, we removed genes for which a single exon accounted for 90% of the total gene counts, leaving a total of 17,368. With the aid of a supercomputing facility from the Spanish network for supercomputing (https://www.res.es/), we computed the Mantel co-splicing correlations of all gene pairs across all tissues (coSplicing4GTEx - http://cosplicing.isglobal.org/). We found that co-splicing is ubiquitous across tissues, as numerous gene pairs were observed with high co-splicing with respect to the co-splicing distribution of all possible gene pairs per tissue (Fig. 4a). We computed the fraction of outlier pairs, given by the number of pairs with z-transformed correlations higher than 2.5 times the interquartile range, for each tissue (Fig. 4b). We observed that testis was the tissue with most co-splicing pairs, consistent with previous studies showing this tissue with high rates of alternative splicing events .
We showed that coregulation of transcript diversity among genes is a novel predictor of gene function. To support this, we extracted reliable and robust correlates of the transcript diversity of APP and we illustrated with three other examples how coregulation of transcript diversity can recover established gene interactions, help to interpret GWAS results and provide new biological insights on their function.
We particularly demonstrated that coregulation of transcript diversity among genes provides new important observations of APP’s physiological function and its links to AD. We produced a series of robust and coherent results indicating physiological links between APP’s transcript distribution and the proteosomal degradation system, which may be affected during disease. The results are in line with previous knowledge on APP’s associations with proteosomal degradation and synaptic signaling , adding the important contribution of APP’s transcript variety to health and disease, besides transcript volume . Our observations particularly support a role played by the regulation of transcript diversity in shaping the gene’s function. In particular, we observed a physiological modulation of the coregulation between APP and UBQLN1 given by UBQ-8i. This result suggests that inconsistencies in the risk of UBQ-8i associated to AD could be derived from the non-accounted disruption of the coregulation between the genes. In line with the association between the physiological coregulation of APP’s transcription diversity and proteosomal degradation pathways, we also found evidence for the association between APP co-splicing and protein aggregation [17, 36, 44]. In particular, we observed that the association of tissue vulnerability to AD was stronger for APP’s co-splicing than co-expression. Our result may help to explain why no strong evidence of APP’s function is derived from co-expressed genes, as provided by gene function predictors such as GeneMania  or STRING , in addition to recent transcriptome meta-analyses that have not been able to confirm up-regulation of APP in cases with respect to controls [44, 47]. We observe that specific coregulation of transcript diversity explains better the links between the physiological function of APP and tissue-vulnerability to disease. Our analyses in particular support new calls for interventions targeting the ubiquitin proteosomal system  and underscores the need to study its transcript diversity regulation in association to AD.
We used co-splicing correlations to study the coregulation of transcript diversities among genes. Our estimates are based on exon count distribution data (RNA-seq) which is an indirect measure of isoform mixture . New methods to estimate co-splicing correlations methods should be considered for full-length RNA-sequencing, as it becomes more available. In addition, co-splicing correlations include transcript diversity that could also be derived from alternative transcription at different start and end sites or splicing. The contributions of each of these processes to the correlations need to be further explored because mayor differences in transcript diversity across tissues are given by alternative transcription and not splicing . However, regardless their origin, we show that the coregulation of transcript diversity predicts expected gene correlates. In relation to transcription volume, we observed that changes in transcription quality is likely accompanied with increases in overall gene expression (Fig. 2a), as increments in isoform subgroups are likely to increase the overall expression. Increments in only transcription volume without changes in diversity are also common.
Co-splicing correlations predict functional correlates of genes. To support this, we extracted reliable and robust correlates of the transcript diversity of APP and we illustrated with three other examples how co-splicing correlations can recover established gene interactions, help to interpret GWAS results and provide new biological insights on their function. We also produced a comprehensive and accessible map of co-splicing between genes across numerous human tissues (http://cosplicing.isglobal.org/), showing that the specific co-regulation of genes at isoform level is ubiquitous and can be an important contributor to overall co-expression. Numerous gene function predictors, such as STRING , GeneMania  or FunCoup , amongst many others, currently use multiple sources of evidence that include co-expression but are yet to consider co-splicing. Co-splicing correlations of genes of interest are an important additional source of evidence to predict their function.
Co-splicing correlations between two genes
The fact that individual exons from different genes can have correlation expression levels, even when there is no correlation between overall expression levels, is due to the coregulation of splicing [6, 7]. Co-splicing networks have been derived in two different ways, each based on a different method to measure the pair-wise coregulation of splicing, see Fig. 5. For instance, Saha et al. have recently used the correlation between individual isoform levels . Here, exon sequencing data is first used to infer the isoform abundance in each gene and then subject correlations between the isoforms across genes are computed. A co-splicing network can then be built form the correlation network based on isform correlations. While this method informs on specific splicing events, it requires isoform inference and numerous correlations to be tested. Using RNA-sequencing (RNA-seq) data, the co-splicing between two genes can also be estimated from the correlation between the exon count distributions of the genes, as proposed by Iancu et al. . This approach is based on distance matrix correlations where a single measure is obtained for each gene-pair. No intermediate isoform inference is required, reducing data processing and likelihood of introducing modeling error. Regarded as a multivariate isoform method, greater detection power can be expected at the expense of interpreting correlations at isoform level. Despite this, the method is adequate for our objective to identify the genes that coregulate their splicing with a gene of interest, and from this perspective, inform on its possible physiological functions.
For matrix based cosplicing cosplicing correlations, the exon count distributions of the genes, per subject, are obtained from the relative frequencies at which the exons of a gene are mapped by reads. The exon count distribution of a gene a for subject i is therefore defined as the normalized vector
of dimension equal to the number of exons in a. ai is vector with Euclidean norm that encodes the counts for each exon of a for individual i. dThe distance between subjects i and j, given by their respective exon count distributions of a, is computed by the inner product
The matrix da measures the differences in exon count distributions among individuals, which correlates with the distance matrix of another gene b when the genes coregulate their transcript distributions, i.e. their transcript diversities. The overall co-splicing between genes a and b is computed from the Mantel’s correlation between their distance matrices Mantel(da, db) . The correlation can be adjusted for transcriptome-wide batch effects using a partial Mantel’s correlation, where the correction matrix is the distance between subjects given by their exon count distributions of the entire genome.
We propose to measure the co-splicing between two genes from the correlation between the first principal components (PC) of the exon count distributions of each gene across subjects. A PC is performed on the exon count distribution of a gene a across subjects. We assume that first PC captures the variability in the isoform transcript diversity among subjects. As such, we correlate the first PCs obtained for two genes a and b using a partial correlation that can be adjusted by covariates. The PC-based co-splicing correlation was used to validate between methods significant co-splicing identified by Mantel’s correlation. In addition the method allows the testing of interactions with genomic variants.
We downloaded version-6 data from the GTEx project web-site . RNA-seq count data was obtained for 52 different tissues. Pair-ended RNA-seq was performed with Illumina HiSeq 2000 following the TrueSeq RNA protocol. See The GTEx Consortium (2017) and Saha et al. for further details . We were given access to download GTEx genotypes from dbGAP  with accession number phs000424.v6.p1. Approximately 1.9 million SNPs were genotyped using whole blood samples with Illumina HumanOmni 2.5 M and 5 M BeadChips. Tissue specific covariates that included transcriptome-wide PC, inferred batch effects and sex were also downloaded from the GTEx web-site.
We also downloaded the brain expression data of the BRAINEAC project  that consists on transcriptomic data of about hundred healthy individuals across 9 different brain tissues. We obtained exon expression data corresponding to Affymetrix Human Exon 1.0 ST array. Gene expression data, from winsorized values, was also downloaded. All data had been previously normalized and corrected for batch effects. See the BRAINEAC consortium for further details.
Reproducibility of transcriptome-wide co-splicing of APP between two independent studies
We studied the co-splicing of APP across multiple brain regions, where deposition of amyloid plaques begins. We selected brain regions within the neocortex and the limbic system, which are early affected by amyloid-β deposition, Thal stages 1–3 (23). From the GTEx project, we analyzed data corresponding to the hippocampus (N = 94), amygdala (N = 72) and hypothalamus (N = 96) from the limbic system, and anterior cingulate cortex (N = 84), cortex (N = 114) and frontal cortex (N = 108) from the neocortex. Transcriptome-wide expression data was obtained from the BRAINEAC project, which includes exon-wise expression of 318,197 probes and 26,493 transcripts and genotype data for 134 neurological normal brains in 9 brain regions. Normalized and batch corrected data for Affymetrix 1.0 ST exon array was directly downloaded. From the limbic region, we obtained data for hippocampus (N = 130) and, from the neocortex, we downloaded data for frontal cortex (N = 134), occipital cortex (N = 134) and temporal cortex (N = 134). We selected genes between 3 and 40 exon probes, leaving 19,027 transcripts. For this microarray study, we adjusted the exon data by the principal components of the transcriptome-wide distance matrix and use the residuals as exon expression levels. As microarrays are subject-wise normalized, to recover a measure of the exon expression distribution in a gene a, we ranked the individuals within each exon and used the subject rankings across a to compute Pa(i) for each subject i.
Pathway enrichment in APP co-splicing
We computed, for each study, the enrichment of 277 KEGG pathways in APP’s co-splicing correlations, using Bioconductor’s package clusterProfiler. The genes selected for enrichment analysis were those with Mantel correlation P-values < 0.05. For each pathway and brain region, we computed the enrichment P-values, (Benjamini and Hochberg) adjusted for the number of pathways tested. We additionally adjusted for the number of brain regions and took the minimum adjusted value, across regions, as the P-value for the pathway. We look for the pathways significantly enriched in co-splicing in both studies and computed the combined P-value between studies. We also tested, using Cohen’s κ, the agreement between studies to declare a pathway significantly enriched in APP co-splicing.
Pathway enrichment in epistases with multiple SNPs in APP
We designed a framework to test whether pathways were enriched in epistases with more than one independent SNP of APP. The method, an extension of a previous method , comprised three steps: (1) We first performed genome-wide interaction analyses (GWIA) of late-onset Alzheimer’s disease (LOAD), for uncorrelated SNPs in APP (paired-wise R2 < 0.2) (see Supplementary Methods). A set of genome-wide additive-by-additive interaction P-values was obtained for all uncorrelated t SNPs in the gene (APPsnpi, i = 1...t), using the likelihood ratio test, χ2(1), between the logistic models
where y is the case/control status, snpj encodes the number of variant alleles of the SNP with index j in the array, j varies over the whole array, and the covariates included the genome-wide principal components and the principal components times APPsnpi. Q-Q plots were created for each GWIA in order to detect and remove APP-SNPs with possible false interactions due to unobserved latent variables (Supplementary Tables S1-S3). (2) We then applied an enrichment pathway analysis for all the GWIAs and counted the number of GWIAs, or associated SNPs in APP, for which each pathway was found significantly enriched. We used iGSEA4GWAS-v2 on KEGG pathways , where we mapped genes to SNPs within 100Kb distance. (3) We tested if the number of GWIAs, for which a pathway was found enriched, was significantly greater than what would be found by chance (Supplementary Methods). False positive rates and statistical power of the method were also assessed (Supplementary Methods).
LOAD genotype data
We used data of three genome-wide studies of LOAD, all accessible in dbGAP . We selected individuals from European ancestry. The studies are: (1) National Institute of Aging (NIA) study (accession: phs000168.v1.p1) with 587 cases and 289 controls with 590,247 SNPs. (2) GenADA study (phs000219.v1.p1) with a total of 806 cases, 782 controls and 349,252 SNPs. (3) Alzheimer’s Disease Genetics Consortium (ADGC) (phs000372.v1.p1). We kept the first genotyped batch (ADG12) as the second one (ADG3) had limited number of individuals with age at on-set. ADG12 has 2686 cases and 935 controls with 592,652 SNPs. Written informed consent was obtained for all participants at local ethics committee of each study, see dbGap under the accession numbers.
We filtered SNPs and samples by standard quality control measures. We analyzed SNPs with minor allele frequency > 5%, Hardy-Weinberg Equilibrium Z score < 4 and call rate > 80%. At the sample level, genome-wide principal components were calculated with Bioconductors’s snpStats package. Individuals with more than four standard deviations from the mean of the first two components were removed from the analysis. Principal components were recomputed with the selected individuals and the first five components were used in epistasis models as covariates to control for ancestry.
- GTEx project:
Genotye tissue expression project
Genome-wide association studies
Genome-wide interaction analysis
Kyoto encyclopedia of genes and genomes
Late-onset Alzheimer’s disease
Single nucleotide polymorphism
Stuart JM, Segal E, Koller D. Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302:249–55.
Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413–5.
de Klerk E. AC‘t Hoen P. alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Genet. 2015;31:128–39.
Wang ET, Sanberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transctriptomes. Nature. 2008;456:470.
Reyes A, Huber W. Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. Nucleic Acids Res. 2017;46:582–92.
Iancu OD, ColvilleA, Oberbeck D, Darakjian P, McWeeney SK, Hitzemann R. Cosplicing network analysis of mammalian brain RNA-Seq data utilizing WGCNA and mantel correlations. Front Genet 2015;6:174.
Saha A, Kim Y, Gewirtz ADH, Jo B, Gao C, McDowell IC, et al. Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res. 2017;27:1843–58.
Selkoe DJ, Hardy J. The amyloid hypothesis of Alzheimer's disease at 25 years. EMBO Mol Med. 2016;8:595–608.
Sandbrink R, Masters CL. Beyreuther K. APP gene family alternative splicing generates functionally related isoforms. Ann N Y Acad Sci. 1996;777:281–7.
Menéndez-González M, Pérez-Pinera P, Martínez-Rivera M, Calatayud MT, Blázquez Menes B. APP processing and the APP-KPI domain involvement in the amyloid cascade. Neurodegener Dis. 2005;2:277–83.
Matsui T, Ingelsson M, Fukumoto H, Ramasamy K, Kowa H, Frosch MP, et al. Expression of APP pathway mRNAs and proteins in Alzheimer's disease. Brain Res. 2007;1161:116–23.
Masliah E, Mallory M, Alford M, Tanaka S, Hansen LA. Caspase dependent DNA fragmentation might be associated with excitotoxicity in Alzheimer disease. J Neuropathol Exp Neurol. 1998;57:1041–52.
Stilling RM, Benito E, Gertig M, Barth J, Capece V, Burkhardt S, et al. De-regulation of gene expression and alternative splicing affects distinct cellular pathways in the aging hippocampus. Front Cell Neurosci. 2014;8:373.
Gaiteri C, Mostafavi S, Honey CJ, De Jager PL, Bennett DA. Genetic variants in Alzheimer disease—molecular and brain network approaches. Nat Rev Neurol. 2016;12:413–27.
GTEx. https://gtexportal.org. Accessed 4 Apr 2018.
BRAINEAC. http://www.braineac.org. Accessed 4 April 2018.
Freer R, Sormanni P, Vecchi G, Ciryam P, Dobson CM, Vendruscolo M. A protein homeostasis signature in healthy brains recapitulates tissue vulnerability to Alzheimer’s disease. Sci Adv. 2016;2:e1600947.
Grainger RJ, Beggs JD. Prp8 protein: at the heart of the spliceosome. RNA. 2005;11:533–57.
Derkach VA, Oh MC, Guire ES, Soderling TR. Regulatory mechanisms of AMPA receptors in synaptic plasticity. Nat Rev Neurosci. 2007;8:101.
Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447:1087.
Moffa AB, Ethier SP. Differential signal transduction of alternatively spliced FGFR2 variants expressed in human mammary epithelial cells. J Cell Physio. 2007;210:720–31.
Cosplice4GTEx. http://cospicing.isglobal.org. Accessed 4 Apr 2018.
Thal DR, Rüb U, Orantes M, Braak H. Phases of a β-deposition in the human brain and its relevance for the development of AD. Neurology. 2002;58:1791–800.
Bertram L, Hiltunen M, Parkinso M, Ingelsson M, Lang C, Ramasam K, et al. Family-based association between Alzheimer's disease and variants in UBQLN1. New Eng J Med. 2005;352:884–94.
Schrenk-Siemens K, Perez-Alcala S, Richter J, Lacroix E, Rahuel J, Korte M, et al. Embryonic stem cell-derived neurons as a cellular system to study gene function: lack of amyloid precursor proteins APP and APLP2 leads to defective synaptic transmission. Stem Cells. 2008;26:2153–63.
Wang Z, Wang B, Yang L, Guo Q, Aithmitti N, Songyang Z, et al. Presynaptic and postsynaptic interaction of the amyloid precursor protein promotes peripheral and central synaptogenesis. J Neurosci. 2009;29:10788–801.
Green KN, Demuro A, Akbari Y, Hitt BD, Smith IF, Parker I, et al. SERCA pump activity is physiologically regulated by presenilin and regulates amyloid β production. J Cell Biol. 2008;181:1107–16.
Goldstein B, Speth RC, Trivedi M. Renin–angiotensin system gene expression and neurodegenerative diseases. J Renin-Angiotensin-Aldosterone Syst. 2016;17:1470320316666750.
Eiraku M, Tohgo A, Ono K, Kaneko M, Fujishima K, Hirano T, et al. DNER acts as a neuron-specific notch ligand during Bergmann glial development. Nat Neurosci. 2005;8:873–80.
Stieren ES, El Ayadi A, Xiao Y, Siller E, Landsverk ML, Oberhauser AF, et al. Ubiquilin-1 is a molecular chaperone for the amyloid precursor protein. J Biol Chem. 2011;286:35689–98.
Slifer MA, Martin ER, Haines JL, Pericak-Vance MA. The ubiquilin 1 gene and Alzheimer's disease. (Letter) New Eng J Med. 2005;352:2752–3.
Thornton-Wells TA, Moore JH, Martin ER, Pericak-Vance MA, Haines JL. Confronting complexity in late-onset Alzheimer disease: application of two-stage analysis approach addressing heterogeneity and epistasis. Genet Epidemiol. 2008;32:187–203.
Yue Z, Wang S, Yan W, Zhu F. Association of UBQ-8i polymorphism with Alzheimer's disease in Caucasians: a meta-analysis. Int J Neurosci. 2015;125:395–401.
Smemo S, Nowotny P, Hinrichs AL, Kauwe JS, Cherny S, Erickson K, et al. Ubiquilin 1 polymorphisms are not associated with late-onset Alzheimer's disease. Ann Neurol. 2006;59:21–6.
Martin XE, Martinez MF, Fernandez DG, Busto FG, Alcelay LG, Calzada Ferreras MJ, et al. UBQ-8i polymorphism is not an independent risk factor for mild cognitive impairment and Alzheimer’s disease in APOE-4 carriers. Curr Alzheimer Res. 2012;9:467–72.
Kohli BM, Pflieger D, Mueller LN, Carbonetti G, Aebersold R, Nitsch RM, et al. Interactome of the amyloid precursor protein APP in brain reveals a protein network involved in synaptic vesicle turnover and a close association with Synaptotagmin-1. J Proteome Res. 2012;11:4075–90.
Plácido AI, Pereira CM, Duarte AI, Candeias E, Correia SC, Santos RX, et al. The role of endoplasmic reticulum in amyloid precursor protein processing and trafficking: implications for Alzheimer's disease. Biochim Biophys Acta. 2014;842:1444–53.
Cáceres A, Vargas JE, González JR. APOE and MS4A6A interact with GnRH signaling in Alzheimer's disease: enrichment of epistatic effects. Alzheimers Dement. 2017;13:493–7.
Barberan-Soler S, Zahler AM. Alternative splicing regulation during C. elegans development: splicing factors as regulated targets. PLoS Genet. 2008;4:e1000001.
Anczuków O, Akerman M, Cléry A, Wu J, Shen C, Shirole NH, et al. SRSF1-regulated alternative splicing in breast cancer. Mol Cell. 2015;60:105–17.
Hoffmann TJ, Passarelli MN, Graff RE, Emami NC, Sakoda LC, Jorgenson E, et al. Genome-wide association study of prostate-specific antigen levels identifies novel loci independent of prostate cancer. Nat Commun. 2017;8:14248.
Yeo G, Holste D, Kreiman G, Burge CB. Variation in alternative splicing across human tissues. Gen Bio. 2004;5:R74.
Dawkins E, Small DH. Insights into the physiological function of the β -amyloid precursor protein: beyond Alzheimer’s disease. J Neuroch. 2014;129:756–69.
Ciryam P, Kundra R, Freer R, Morimoto RI, Dobson CM, Vendruscolo M. A transcriptional signature of Alzheimer’s disease is associated with a metastable subproteome at risk for aggregation. Proc Natl Acad Sci U S A. 2016;113:4753–8.
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014;43:D447–52.
Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 2010;38:W214–20.
Wruck W, Schröter F, Adjaye J. Meta-analysis of transcriptome data related to hippocampus biopsies and iPSC-derived neuronal cells from Alzheimer’s disease patients reveals an association with FOXA1 and FOXA2 gene regulatory networks. J Alzheimers Dis. 2016;50:1065–82.
Gadhave K, Bolshette N, Ahire A, Pardeshi R, Thakur K, Trandafir C, et al. The ubiquitin proteasomal system: a potential target for the management of Alzheimer's disease. J Cell Mol Med. 2016;20:1392–407.
Schmitt T, Ogris C, Sonnhammer EL. FunCoup 3.0: database of genome-wide functional coupling networks. Nucleic Acids Res. 2013;42:D380–8.
DbGap. https://www.ncbi.nlm.nih.gov/gap. Accessed 4 Apr 2018.
iGSEA4GWAS-v2. http://gsea4gwas-v2.psych.ac.cn. Accessed 25 Feb 2017.
We would like to thank Pau Torres for his help in part of the analysis. We would like to thank GTEx, BRAINEAC and dbGAP for access to data.
This work supported by Red Española de Supercomputación [BCV-2016-3-0002] and by Ministerio de Economía e Innovación (Spain) [MTM2015–68140-R].
Availability of data and materials
The datasets generated during the current study are available in the CoSplicing4GTE repository http://cosplicing.isglobal.org/. The datasets analysed during the current study are available in DbGap, http://www.ncbi.nlm.nih.gov/gap, under accession numbers phs000424.v6.p1 (GTEx), phs000168.v1.p1 (NIA), GenADA (phs000219.v1.p1), phs000372.v1.p1(ADGC). Transcriptomic data for GTEx and BRAINEAC are available at https://www.gtexportal.org/ and http://www.braineac.org/.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Methods, Supplemental Figures S1-S11. and Tables S2-S4. (PDF 968 kb)
Table S1. KEGG Pathways’ enrichment in APP co-splicing, co-expression, epistasis and vulnerability index (Delta BI-III) (XLS 58 kb)
Table S5. Pathways with multiple enrichment in APP epistasis across three LOAD GWAS. (XLS 58 kb)
About this article
Cite this article
Cáceres, A., González, J.R. When pitch adds to volume: coregulation of transcript diversity predicts gene function. BMC Genomics 19, 926 (2018). https://doi.org/10.1186/s12864-018-5263-z
- Alternative splicing
- Transcription diversity
- Alzheimer’s disease
- Gene function prediction