EventPointer: an effective identification of alternative splicing events using junction arrays

Background Alternative splicing (AS) is a major source of variability in the transcriptome of eukaryotes. There is an increasing interest in its role in different pathologies. Before sequencing technology appeared, AS was measured with specific arrays. However, these arrays did not perform well in the detection of AS events and provided very large false discovery rates (FDR). Recently the Human Transcriptome Array 2.0 (HTA 2.0) has been deployed. It includes junction probes. However, the interpretation software provided by its vendor (TAC 3.0) does not fully exploit its potential (does not study jointly the exons and junctions involved in a splicing event) and can only be applied to case–control studies. New statistical algorithms and software must be developed in order to exploit the HTA 2.0 array for event detection. Results We have developed EventPointer, an R package (built under the aroma.affymetrix framework) to search and analyze Alternative Splicing events using HTA 2.0 arrays. This software uses a linear model that broadens its application from plain case–control studies to complex experimental designs. Given the CEL files and the design and contrast matrices, the software retrieves a list of all the detected events indicating: 1) the type of event (exon cassette, alternative 3′, etc.), 2) its fold change and its statistical significance, and 3) the potential protein domains affected by the AS events and the statistical significance of the possible enrichment. Our tests have shown that EventPointer has an extremely low FDR value (only 1 false positive within the tested top-200 events). This software is publicly available and it has been uploaded to GitHub. Conclusions This software empowers the HTA 2.0 arrays for AS event detection as an alternative to RNA-seq: simplifying considerably the required analysis, speeding it up and reducing the required computational power. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2816-x) contains supplementary material, which is available to authorized users.


Introduction
EventPointer is a R package used to identify alternative splicing events in complex experimental designs, such as time course studies, paired samples or any other. The algorithm just requires the corresponding design and contrast matrices to be used for the experiment.
The algorithm tests all the identifiable events by the Affymetrix arrays: Human Transcriptome Array 2.0 (HTA 2.0) & Human Junction Array (Hjay). Each event is statistically tested to identify if the most expressed isoform changes between different conditions. This vignette is not related with the detection of Alternative splicing events but on how to use the aroma.affymetrix package on these arrays. # Standard preprocessing pipeline for the microarray data # functions and parameters are predefined according to # the aroma.affymetrix R package # Conversion of the CDF from Brainarray to binary file. Needs to be done only once. library(aroma.affymetrix) setOption(aromaSettings, "memory/ram", 8) # UNCOMMENT THESE TWO LINES TO PERFORM THE CONVERSION #setwd("~/../aroma.affymetrix/annotationData/chipTypes/HTA20") #convertCdf ("hta20_Hs_ENSG.cdf",r_ENSG,brainarray,v19.cdf")

# CONVERSION DONE # Done
It is necesssary to have a directory structure as explained in, http://www.aroma-project.org/setup/QuickSummaryOfRequiredFileStructure/ Once the structure is properly set, this code should run without errors. It performs the background removal, quantile normalization and summarization for all the arrays in a experiment.

Gene Ontology Analysis
It is possible to run an enrichment analysis with the data.

# Underexpressed
GOdata@geneSelectionFun <-function(t) t < -3 resultFisher <-runTest(GOdata, algorithm = "weight01", statistic = "fisher") and One of the advantages of the topGO package is that avoids providing redundant information. For example, if a function is strongly enriched, also its parent in the ontology will probably be enriched. By using proper pruning, only the most significant leaves of the tree are provided. In some cases, a function expected to be affected does not appear simply because one of its descendant (and therefore more specific) functions are included in the output. The "classical" analysis can be performed by using a different algorithm that does not prune the significant functions. The results are in this case: # GO enrichment analysis library(org.Hs.eg.db) library(topGO)