Mapping splice QTLs reveals distinct transcriptional and post-transcriptional regulatory variation of gene expression and identifies putative alternative splicing variation mediating complex trait variation in pigs
BMC Genomics volume 24, Article number: 240 (2023)
Alternative splicing is an important step in gene expression, generating multiple isoforms for the same genes and greatly expanding the diversity of proteomes. Genetic variation in alternative splicing contributes to phenotypic diversity in natural populations. However, the genetic basis of variation in alternative splicing in livestock including pigs remains poorly understood.
In this study, using a Duroc x Pietrain F2 pig population, we performed genome-wide analysis of alternative splicing estimated from stranded RNA-Seq data in skeletal muscle. We characterized the genetic architecture of alternative splicing and compared its basic features with those of overall gene expression. We detected a large number of novel alternative splicing events that were not previously annotated. We found heritability of quantitative alternative splicing scores (percent spliced in or PSI) to be lower than that of overall gene expression. In addition, heritabilities showed little correlation between alternative splicing and overall gene expression. We mapped expression QTLs (eQTLs) and splice QTLs (sQTLs) and found them to be largely non-overlapping. Finally, we integrated sQTL mapping with phenotype QTL (pQTL mapping to identify potential mediator of pQTL effect by alternative splicing.
Our results suggest that regulatory variation exists at multiple levels and that their genetic controls are distinct, offering opportunities for genetic improvement.
Expression of genes is regulated by a multitude of molecular processes which collectively determine the abundance and spatiotemporal distribution of gene expression products of different forms. Both genetic and environmental factors can lead to variation in gene expression regulation, which can ultimately cause variation in organismal phenotypes within and between species. For example, it has been well established that variation in gene expression contributes to the divergence between species . Genome-wide mapping efforts have identified numerous expression quantitative trait loci (eQTLs), i.e. DNA variants that are associated with changes in gene expression, within different organisms, including plants , humans , insects , and livestock .
Alternative splicing of pre-mRNA is a major step in post-transcriptional regulation of complex eukaryotic gene expression. The most prominent consequence of alternative splicing is the vast expansion of the transcriptome and proteome by forming multiple transcript isoforms, which have the potential to code for different proteins . Alternative splicing is tightly controlled, often in a cell type and developmental stage specific manner, to regulate fundamental biological processes including growth and development. Despite the importance of alternative splicing in gene expression regulation, the genetic basis of alternative splicing variation between and within species and its relationship with phenotypic variation remain less understood than steady state RNA abundances.
Many non-genetic and factors can influence alternative splicing, including developmental stages , cell and tissue types , disease status , as well as sequence variation between individuals . Importantly, genetic variation in alternative splicing is abundant and contributes significantly to phenotypic variation in natural populations . For example, aberrant alternative splicing caused by mutations in cis regulatory elements or splicing factors underlies many human diseases [11,12,13]. In livestock, QTLs affecting alternative splicing have been mapped in dairy and beef cattle and found to be extensively shared across tissues . Many tissue specific QTLs influencing alternative splicing were also present . In addition, eQTLs and splice QTLs (sQTLs) were found to be associated with meat quality traits in a population of Angus-Brahman crossbred cattle .
In a previous study, we mapped hundreds of eQTLs for RNA abundance in the longissimus dorsi muscle of pigs from a Duroc x Pietrain F2 population, providing potential links to facilitate the functional interpretation and dissection of phenotype QTLs . In the present study, we extended this work to map sQTLs, taking advantage of the paired-end and strand-specific RNA-Seq data. We compared the heritability of overall gene expression and alternative splicing which showed little correlation. Furthermore, the comparison of mapped eQTLs and sQTLs showed little overlap and different location patterns, suggesting distinct transcriptional and post-transcriptional regulatory variation of gene expression. Finally, we integrated sQTLs with mapped phenotype QTLs (pQTLs) to identify putative causal genes whose genetic variation in alternative splicing may have mediated the pQTL effects.
Identification of alternative splicing events in pigs using RNA-Seq
To understand the genetic basis of alternative splicing regulation in skeletal muscle in pigs, we analyzed the RNA-Seq dataset from a previous study  to obtain both overall gene expression and alternative splicing quantifications in longissimus dorsi muscle in the MSU Pig Resource Population derived from a F2 cross of Duroc x Pietrain pigs. We focused on 143 of the 168 animals for which strand-specific RNA-Seq libraries were sequenced. On average, 45.8 million 125 bp paired-end reads were sequenced per sample, 37.5 million of which (Additional File 1, Table S1) were uniquely mapped using HISAT2 . Importantly, of the uniquely mapped reads, 21.5 million covered junction sites, enabling us to classify and quantify alternative splicing (Table S1).
We assembled transcript models from each sample using StringTie  and merged and filtered them to produce a combined annotation. We identified alternative splicing events from merged transcript annotations across all RNA-Seq samples using rMATs , which classified them into five classes, including skipped exon (SE), retained intron (RI), alternative 5′ splice site (A5SS), alternative 3′ splice site (A3SS), and mutually exclusive exons (MXE). We considered an alternative splicing event supported by experimental evidence in our muscle RNA-Seq data if both splice isoforms were supported by at least two junction reads in at least two samples. There were 88,670, 4,980, 3,044, 3,591, and 23,187 SE, RI, A5SS, A3SS, and MXE events respectively, among which, 7,323, 2,285, 1,720, 1,963, and 2,081 were novel (Fig. 1). Despite that only muscle RNA was sequenced, the large number of unannotated alternative splicing events testified to the power of population-scale RNA-Seq and the incompleteness of the current gene annotation in pigs. However, it should be noted that novelty of alternative splicing events in RNA-Seq data depends on completeness of reference annotations, which improves as large-scale genome annotation projects such as the FAANG accumulate data .
Quantitative genetics of alternative splicing and steady state RNA abundance
To understand the genetic basis of alternative splicing and steady state RNA abundance, we first estimated the genomic heritability of these two types of traits. Genomic heritability quantifies the proportion of variation in alternative splicing or overall gene expression that can be attributed to additive genetic variance. We defined the percent spliced-in (PSI) score as the proportion of junction reads supporting one of the splice isoforms versus the sum of both isoforms to quantitatively represent alternative splicing.
We applied several non-specific filters to PSI and TPM (transcripts per million) representing overall gene expression, including filters on overall magnitude of variances, coefficient of variation and skewness of distribution. The remaining genes or alternative splicing events were Normal quantile transformed and fitted in a linear mixed model with genomic relationship matrix calculated from 50K SNP chip genotypes of the animals. Significance of the deviation of additive genetic variance from zero was determined using a likelihood ratio test comparing the full mixed model and a reduced model without the additive genetic variance component. At a false discovery rate (FDR) = 0.01, we identified 2,200 genes out of a total of 11,368 expressed genes that had significant heritability. For alternative splicing, 405 alternative splicing events in 190 genes were significant out of a total of 22,018 events in 6,004 genes. A total of 65 genes had both significant heritable overall gene expression and alternative splicing.
The estimated heritability was in general larger for overall gene expression than for all five types of alternative splicing (Fig. 2). Importantly, the heritability of these two aspects of gene expression showed little correlation (Fig. 3). Bivariate quantitative genetic analysis suggested that the genetic correlation was generally small. Of the 20,625 gene expression and alternative splicing pairs, only 585 had a significant (FDR = 0.01) genetic correlation. These results suggested that the heritable genetic control of transcriptional and post-transcriptional regulation of gene expression may be distinct, even though the two molecular processes are highly coupled spatially, temporally, and functionally .
Mapping splice QTLs (sQTLs) and expression QTLs (eQTLs)
To understand the distinct genetic control of transcription and splicing, we mapped splice QTLs (sQTLs) and expression QTLs (eQTLs) for all genes using genome-wide association analysis. At an empirical FDR = 0.01, we identified a total of 35,687 sQTLs for 796 alternative splicing events in 475 genes and 43,432 eQTLs for 1,098 genes (Fig. 4a, b). The large number of sQTLs and eQTLs per alternative splicing or gene expression trait was the result of the high level of linkage among genetic markers in the F2 population. We therefore performed a forward model selection approach to eliminate associations due to linkage and classified sQTLs and eQTLs based on the markers retained in the models. Both sQTLs and eQTLs exhibited an enrichment of cis (defined as 1Mb within the boundaries of genes) QTLs as compared to trans. Despite that trans QTLs have a much larger space to exist, of the 1,338 genes with at least one eQTL and 961 alternative splicing events with at least one sQTL, 552 (41%) and 467 (48%) had cis eQTLs and sQTLs, respectively (Table 1). These results, as many studies have shown for eQTLs previously, indicated that proximal regulation is a primary mode of regulation for both transcription and alternative splicing.
Among the 1,338 genes with eQTLs and 586 genes with sQTLs, 105 were in common. Of the 105 genes with mapped eQTLs and sQTLs, only 16 contained SNPs that were associated with both overall gene expression and alternative splicing. These results, together with the lack of correlation between SNP based heritability estimates for overall gene expression and alternative splicing and low genetic correlation, suggested that the genetic control of transcription and alternative splicing were likely distinct.
Identification of alternative splicing variation that putatively mediates QTL effects on economic traits
To understand the effects of sQTLs on economic traits and their distinction with eQTL effects, we performed GWAS to identify phenotype-associated QTLs (pQTLs) in this pig population for 67 economic traits . Using a permutation based FDR threshold of 0.05, we identified 208 pQTLs for 17 unique traits. Similar to eQTL and sQTLs, pQTLs were clustered due to high level of LD in the F2 population. Many of these pQTLs are also sQTLs or eQTLs. Among the pQTLs, we identified a total of 389 trios of sQTLs, alternative splicing events (PSI), and economic traits, as well as 1,070 trios of eQTLs, genes, and economic traits. This allowed us to test for mediation of the pQTL effects on economic traits by alternative splicing and gene expression. We performed two tests for this mediation analysis. First, we asked if the association between the pQTL and the trait diminished when PSI or gene expression when PSI or gene expression was included as a covariate. Reduction in association would indicate that the pQTL effect was at least in part mediated by the alternative splicing or gene expression. Second, we tested for the proportion of mediation using a model based approach implemented in the mediation R package . At an FDR = 0.05, we identified 15 sQTLs (Table S2) and 152 eQTLs (Table S3) that mediated pQTL effect on economic traits. For example, we identified a significant pQTL (ASGA0070742) for the protein percent in the muscle (Fig. 5a, P = 2.86e-5). The same pQTL is also a sQTL (Fig. 5b, P = 1.75e-6) for a skipped exon (SE) event in the caldesmon 1 (CALD1) gene, which plays an important role in muscle contraction. Remarkably, the PSI of the SE event in CALD1 is highly associated with protein percent (Fig. 5c, P = 4.78e-10). However, adjusting for CALD1 PSI nearly completely abolished the pQTL effect (Fig. 5d, P = 0.06), strongly suggesting that the pQTL effect was mediated by the alternative splicing in CALD1. Taken together, these results suggest that in addition to eQTLs , sQTLs can also provide important information regarding the molecular mechanism of how regulatory variation contribute to phenotypic variation.
In this study, we carried out an analysis to estimate heritability of alternative splicing and map sQTLs in skeletal muscle of 143 pigs in a Duroc x Pietrain F2 population . The samples were deeply sequenced with stranded paired-end reads, enabling us to estimate both alternative splicing and steady state RNA abundances. While many studies have mapped eQTLs for overall gene expression in livestock, genetic characterization of alternative splicing remains understudied.
With a single tissue type, we found a significant number of novel alternative splicing events supported by multiple reads in multiple samples (Fig. 1). Skeletal muscle is one of the most intensively studied tissues in pigs and yet we were able to substantially improve existing annotations, underscoring the incompleteness of the present annotation, which is expected to further improve as large-scale annotation projects are completed .
By comparing genomic heritability of quantitative gene expression and alternative splicing, we found heritability of alternative splicing was generally lower than that of overall expression (Fig. 2). It’s important to point out that the magnitude of heritability depends on the precision of measurements. An explanation for the lower heritability for alternative splicing may be that alternative splicing utilizes only reads spanning junctions, whereas overall gene expression takes advantage of all exonic reads.
We found little correlation between the heritabilities of overall gene expression and alternative splicing (Fig. 3). This suggested that genetic controls of transcription and alternative splicing were likely distinct. Heritabilities for overall gene expression were generally higher than alternative splicing. There can be several reasons, including but not limited to higher environmental (non-genetic) sensitivity of alternative splicing and the necessity to use junction reads thus smaller coverage for PSI estimation. In addition, we estimated generally low genetic correlation between overall expression and alternative splicing with high standard errors. Out of 20,625 gene expression and alternative splicing pairs, only 585 had a significant (FDR = 0.01) genetic correlation. Both eQTLs and sQTLs were enriched for SNPs proximal to the genes they regulated, where cis regulatory elements were concentrated. A substantial fraction of genes contained both mapped eQTLs and sQTLs, yet only a small fraction shared the same genetic markers as eQTLs and sQTLs. The resolution of the F2 design and the density of the SNP chip did not allow us to perform a comprehensive evaluation of overlap. Taken together, these results suggested that transcription and alternative splicing were distinctly regulated genetically.
Ultimately, regulatory variation in transcription and splicing lead to phenotypic variation. We performed mediation analyses to identify alternative splicing events whose genetic variation may mediate the effects of pQTLs on phenotypic variation. Our results suggested that in addition to RNA abundance, alternative splicing can also facilitate interpretation of genetic associations between pQTLs and economic traits and identification of putative causal mediating factors. These efforts ultimately enhance genetic improvement of animals. Our study suggests that when considering regulatory variation as an intermediate step that connects DNA variation and organismal phenotypic variation, alternative splicing should not be ignored. Indeed, studies in humans suggest that alternative splicing may be a major contributor to phenotypic variation . As large-scale annotation projects improve annotation of the pig genome, estimation of alternative splicing will become easier and more accurate, which will ultimately improve our ability to understand and utilize its contribution to phenotypic variation for genetic improvement of animals.
Using a stranded RNA-Seq dataset in a F2 population of pigs, we mapped eQTL and sQTL in the skeletal muscle. We found overall gene expression and alternative splicing to have low genetic correlation, and heritabilities of these two classes of traits to have low correlation. In addition, although both classes of traits had QTLs concentrated near the genes, we found very little overlap between sQTLs and eQTLs. These results suggested that the genetic regulation of transcription and alternative splicing were likely distinct that both need to be considered when mapping regulatory variation that connects DNA variation and phenotypic variation.
Materials and methods
RNA-Seq data mapping and alternative splicing event calling
All RNA-Seq data were obtained from a previously published study with accession number PRJNA403969 . To reduce the influence of batch effect, we focused on 144 samples which were prepared uniformly to produced stranded sequencing libraires and sequenced by the Illumina HiSeq 2500 platform with 125 bp paired end reads. These 144 F2 individuals were offspring of 35 F1 animals descended from 19 purebred animals. Raw sequence reads were aligned to the Sus scrofa reference genome (11.1) and transcriptome (Ensembl Release 97) using HISAT2 2.1.0 . There were on average 45.8 million of reads per sample, with 90.9% overall mapping and 81.9% unique mapping rates. One of the samples (1833) had an uncharacteristically low (49.64%) unique mapping rate and was removed from further analysis, resulting in 143 samples. Mapped reads that covered junctions comprised 46.9% of all reads on average, providing a strong basis to estimate alternative splicing accurately. StringTie 2.04  was used to assemble transcripts based on HISAT2 alignments, producing a gene annotation per sample. The StringTie parameters were set to allow a minimum transcript length of 200 bp, minimum transcript coverage of 2.5 and minimum isoform fraction of 0.1. Annotations from all samples were merged to a set of 78,351 transcripts, among which 29,193 (37.26%) were novel. Alternative splicing events were identified by rMATs 4.1.0  using the combined gene annotations and novel alternative splicing was called if they did not exist in the reference annotation. rMATs identifies alternative splicing events by considering assembled transcript models and classifies them into five major categories including alternative 5’ splice site (A5SS), alternative 3’ splice site (A3SS), mutually exclusive exons (MXE), retained intron (RI), and skipped exon (SE). We conservatively considered novel events where both isoforms were supported by at least two reads in at least two samples.
Heritability estimation of alternative splicing and gene expression
To quantify alternative splicing, percent splice-in (PSI) scores  were calculated for each gene using the following formula:
When estimating heritability, we only considered alternative splicing events with at least 5 total reads supporting both isoforms in at least 50% of the samples. Only alternative splicing events with a mean absolute deviation greater than 1, log2 variance greater than 2, skewness < 1.1 were considered variable and retained. In addition, to eliminate the influence of sex specific alternative splicing, we only retained events where samples from either sex contributed at least 20% of the estimable samples. PSI was normal quantile transformed using the R package bestNormalize . The transformed PSI was used to estimate genomic heritability (h2) using the R package rrBLUP  with sex of the animal as a fixed effect. The R package fdrtool was then used to calculate the FDR value based on P values obtained from likelihood ratio tests.
Overall gene expression was estimated using transcript per million (TPM) output by StringTie. TPM of each gene was also subjected to a normal quantile transformation. The transformed TPM was then used to estimate the genomic heritability (h2) of gene expression by rrBLUP package. Genetic correlation was computed using the GCTA package . The R package fdrtool was then used to calculate the FDR values based on the Benjamini-Hochberg approach.
Expression QTL and splice QTL mapping
eQTL and sQTL mapping was performed using the fastGWA function of GCTA  with sex of the animal fitted as a covariate. To estimate empirical FDR, we permuted the data set (animal ID) 200 times while preserving correlation among genes and performed eQTL and sQTL mapping in the permuted datasets. Empirical FDR for each was estimated as the observed number of significant QTLs at a given threshold divided by the average number of significant QTLs among the 200 permutations. eQTLs or sQTLs located within 1 Mb of gene boundaries were consider cis, and otherwise trans.
To identify potential mediator of pQTL effects, we first performed pQTL mapping in 67 traits using the same procedure as eQTL and sQTL mapping as described above. We then integrated pQTL, sQTL/eQTL, PSI/gene expression, and phenotypes using two approaches. First we tested association between pQTLs and phenotypes by adjusting the phenotypes for PSI or gene expression. Second, we performed a model based mediation analysis  to estimate proportion of the pQTL effect on phenotype that is mediated by the PSI/gene expression mediator. Significant mediators were identified at FDR = 0.05.
Availability of data and materials
All Data are downloaded from a previous study with the accession number PRJNA403969  from SRA. Codes are available at: https://github.com/zhangfei1947/MSUPRP.
King MC, Wilson AC. Evolution at two levels in humans and chimpanzees. Science. 1975;188:107–16.
Zhang H, Mao R, Wang Y, Zhang L, Wang C, Lv S, et al. Transcriptome-wide alternative splicing modulation during plant-pathogen interactions in wheat. Plant Sci. 2019;288:110160.
GTEx Consortium, Laboratory, Data Analysis & Coordinating Center (LDACC)—Analysis Working Group, Statistical Methods groups—Analysis Working Group, Enhancing GTEx (eGTEx) groups, NIH Common Fund, NIH/NCI, et al. Genetic effects on gene expression across human tissues. Nature et al. 2017;550:204–13.
Huang W, Carbone MA, Magwire MM, Peiffer JA, Lyman RF, Stone EA, et al. Genetic basis of transcriptome diversity in Drosophila melanogaster. Proc Natl Acad Sci U S A. 2015;112:E6010–6019.
Velez-Irizarry D, Casiro S, Daza KR, Bates RO, Raney NE, Steibel JP, et al. Genetic control of longissimus dorsi muscle gene expression variation and joint analysis with phenotypic quantitative trait loci in pigs. BMC Genomics. 2019;20:3.
Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463:457–63.
Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L, et al. The developmental transcriptome of Drosophila melanogaster. Nature. 2011;471:473–9.
Brown JB, Boley N, Eisman R, May GE, Stoiber MH, Duff MO, et al. Diversity and dynamics of the Drosophila transcriptome. Nature. 2014;512:393–9.
Ule J, Blencowe BJ. Alternative Splicing Regulatory Networks: functions, mechanisms, and evolution. Mol Cell. 2019;76:329–45.
Li YI, van de Geijn B, Raj A, Knowles DA, Petti AA, Golan D, et al. RNA splicing is a primary link between genetic variation and disease. Science. 2016;352:600–4.
Waldron D. Human genetics: splicing: linking genetic variation and disease. Nat Rev Genet. 2016;17:317.
Sterne-Weiler T, Sanford JR. Exon identity crisis: disease-causing mutations that disrupt the splicing code. Genome Biol. 2014;15:201.
Garcia-Blanco MA, Baraniak AP, Lasda EL. Alternative splicing in disease and therapy. Nat Biotechnol. 2004;22:535–46.
Xiang R, Hayes BJ, Vander Jagt CJ, MacLeod IM, Khansefid M, Bowman PJ, et al. Genome variants associated with RNA splicing variations in bovine are extensively shared between tissues. BMC Genomics. 2018;19:521.
Leal-Gutiérrez JD, Elzo MA, Mateescu RG. Identification of eQTLs and sQTLs associated with meat quality in beef. BMC Genomics. 2020;21:104.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15.
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5.
Shen S, Park JW, Lu Z, Lin L, Henry MD, Wu YN, et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci U S A. 2014;111:E5593–5601.
Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, et al. Coordinated international action to accelerate genome-to-phenome with FAANG, the functional annotation of animal genomes project. Genome Biol. 2015;16:57.
Montes M, Becerra S, Sánchez-Álvarez M, Suñé C. Functional coupling of transcription and splicing. Gene. 2012;501:104–17.
Tingley D, Yamamoto T, Hirose K, Keele L, Imai K. Mediation: R Package for Causal Mediation Analysis. J Stat Softw. 2014;59:1–38.
Peterson RA. The R Journal: finding Optimal Normalizing Transformations via bestNormalize. R J. 2021;13:294–313.
Endelman JB. Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome. 2011;4:250–5.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.
We thank the MSU Institute for Cyber Enabled Research for providing the computing infrastructure.
This work was supported by Agriculture and Food Research Initiative competitive grant no. 2014–67015-21619 from the USDA National Institute of Food and Agriculture and USDA Hatch Project MICL02560. DVI was supported by a fellowship from the Food and Agricultural Sciences National Needs Graduate and Postgraduate Fellowships Grants Program award no. 2012–38420-30199 from the USDA National Institute of Food and Agriculture. The funders had no role in the study design, data collection, analysis, and interpretation, or in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhang, F., Velez-Irizarry, D., Ernst, C.W. et al. Mapping splice QTLs reveals distinct transcriptional and post-transcriptional regulatory variation of gene expression and identifies putative alternative splicing variation mediating complex trait variation in pigs. BMC Genomics 24, 240 (2023). https://doi.org/10.1186/s12864-023-09314-4