Global analysis of aberrant pre-mRNA splicing in glioblastoma using exon expression arrays

Background Tumor-predominant splice isoforms were identified during comparative in silico sequence analysis of EST clones, suggesting that global aberrant alternative pre-mRNA splicing may be an epigenetic phenomenon in cancer. We used an exon expression array to perform an objective, genome-wide survey of glioma-specific splicing in 24 GBM and 12 nontumor brain samples. Validation studies were performed using RT-PCR on glioma cell lines, patient tumor and nontumor brain samples. Results In total, we confirmed 14 genes with glioma-specific splicing; seven were novel events identified by the exon expression array (A2BP1, BCAS1, CACNA1G, CLTA, KCNC2, SNCB, and TPD52L2). Our data indicate that large changes (> 5-fold) in alternative splicing are infrequent in gliomagenesis (< 3% of interrogated RefSeq entries). The lack of splicing changes may derive from the small number of splicing factors observed to be aberrantly expressed. Conclusion While we observed some tumor-specific alternative splicing, the number of genes showing exclusive tumor-specific isoforms was on the order of tens, rather than the hundreds suggested previously by in silico mining. Given the important role of alternative splicing in neural differentiation, there may be selective pressure to maintain a majority of splicing events in order to retain glial-like characteristics of the tumor cells.


Background
In alternative pre-mRNA splicing, multiple transcript isoforms are expressed from a single gene by varying the combination of exons that are included in the mature mRNA. These isoforms may differ in their transcript and protein stabilities and/or in their protein structures and activities, which allows for functional and physiological diversity [1,2]. Alternative splicing affects up to 74% of all genes and may cause epigenetic instability when aberrant [3]. In cancer, two major mechanisms lead to the dysregulation of proper splicing: somatic mutations in splice regulatory cis-elements and mis-expression of trans-acting factors [4,5]. The second phenomenon has been reported in numerous cancers including glioma, ovarian and colon cancer [6][7][8][9][10][11]. Furthermore, many individual genes have cancer-predominant splicing patterns that contribute to tumorigenesis [5,12,13]. However, it is unclear whether the tumor-specific misexpression of splice factors leads to global aberrant splicing in cancer. Genome-wide attempts to address this have been performed mostly in silico by aligning and comparing EST libraries. Several hundred isoforms appear to be unique to tumor libraries, but these analyses are largely incomplete as they can miss known isoforms and are intrinsically biased in their scoring of single clones [14][15][16][17][18][19][20].
Of all tissues, the brain has the most cassette exon alternative splicing [21,22]. This tissue-specific splicing is responsible for proper neural cell differentiation and neurotransmitter signaling that may be misregulated to allow stem-cell like proliferation to form brain tumors [23][24][25][26][27]. Gliomas are glial-like tumors, with glioblastoma (GBM) being the most severe form [28]. Independent and in silico genome-wide assessments have identified genes expressing particular splice isoforms more frequently in glioma than in normal brain. Among the 27 individual (Table 1; see Additional file 1) and the five in silico studies (see Additional file 2), only three of the genes, BIN1, MAX and MPZL1 were in common. Because of these discrepancies, we performed a global, unbiased study using the human exon expression array (Affymetrix) to experimentally identify common glioma-specific aberrant splicing events and misexpressed splicing factors. Our data indicate that overall aberrant tumor-specific cassette exon splicing events involve small changes, less than 5-fold.
Few splicing factors had dramatically altered expression in glioma, but could be targeting the genes that were identified as having significant glioma-specific splicing in our study.

Exon array analysis
To measure significant alternative splicing changes associated with primary brain tumors, we compared genomewide exon expression levels of 24 grade IV glioblastoma (GBM) and 12 nontumor brain samples using the Human Exon Array 1.0 ST (Affymetrix, Santa Clara, CA) [29]. Multiple statistical algorithms were developed to identify alternative pre-mRNA splicing events (see Methods). The Differential Expression (DE) value describes the difference in the average expression of all exons for a given RefSeq entry between two groups of samples (tumor vs. nontumor). A DE = 0 indicates no change in transcript expression between the two groups, a DE < 0 and a DE > 0 indicates decreased or increased gene expression in the disease group, respectively. The Alternative Splicing (AS) score was generated to identify all types of alternative pre-mRNA splicing events as detected by differential hybridization of a PSR (probe selection region). The higher the AS score for a given RefSeq entry, the greater the extent to which at least one PSR deviated in its differential hybridization compared with all other probe sets in that RefSeq. To assign a biologically relevant parameter to the AS score, we performed modeling of a cassette exon splicing event for a 3-exon gene. This gave median theoretical AS scores when comparing a 5% level of exon inclusion in one sample with a 25%, 50%, and 100% inclusion level in the second sample of 38.3, 78.7 and 132.9, respectively. As a final parameter, we included a p-value calculation, which indicates the probability that an AS score would show the presence of alternative splicing. We focused on a core set of RefSeq entries (20,157) with p-values of less than 0.05, and used the DE value plotted against the AS score to evaluate the relationship between expression and change in alternative splicing.

Examination of FGFR1 glioma-specific splicing by exon array
In order to assess the specificity, sensitivity and feasibility of an array-based, genome-wide approach to identify alternative splice events, we determined the effect of altering a glioma-specific splicing event in U251 cells. Antisense morpholino oligonucleotides were used to switch exon 3 inclusion in FGFR1 mRNA precursors from its aberrant skipping to inclusion, as occurs in normal brain [30] ( Figure 1A Inset). Figure 1A shows the plot of DE vs. AS values for 20,157 core set RefSeq entries from this exon array experiment. The FGFR1-specific splicing switch was easily distinguishable for five FGFR1 RefSeq entries that include probesets that monitor exon 3 inclusion ( Figure  1A; see Additional file 3). The change in exon 3 splicing led to significant AS scores (81.99 to 91.68, p-value < 0.05; see Additional file 4), which were well above the median derived value for a 10-fold increase in exon inclusion. Finally, the DE values between 0.38 to 0.89 also agreed with RT-PCR results that showed little change in FGFR1 expression when splicing of exon 3 was altered.
The induced FGFR1 splicing switch also caused a large change in the splicing score of NRG1 (AS score of 82.57, p-value < 0.05). However, the hybridization map and RT-PCR validation suggest it is a cross-hybridization artifact (see Additional file 3). There were 16 other RefSeq entries, representing 11 genes that showed significant (p-values < 0.05) changes in exon 1 usage after treatment (see Additional file 4). These changes may be regulated by the effect of exon 3 inclusion on FGFR1 signaling. Overall, the U251 experiments confirmed that targeted changes in alternative splicing of cassette exons would be reflected in high AS scores and p-values < 0.05 on the exon array. These data indicated the general feasibility of the exon array and our analytical approach to identify cassette exon changes on a genome-wide level.

Detection of alternative splicing in GBM patient samples
Next, we compared the genome-wide exon expression levels in 24 GBM and 12 nontumor samples ( Figure 1B; see Additional file 5 for RefSeq entries with significant p-values < 0.05). The shape and distribution of the RefSeq values differed greatly for the patient samples compared to the FGFR1 experiment. Only four genes had AS scores above the derived median for a 10-fold change in splicing (CNTNAP4, HIST1H3B, MAL2 and MOBP; see Additional file 3). Unlike FGFR1, these genes had high levels of differential expression between nontumor and tumor samples, which appeared to impact their AS scores. CNTNAP4 and HIST1H3B are intronless genes that do not represent alternative splicing. The event in MAL2 involved exon 1 and could not be explained by alternative splicing. The event identified for MOBP was not amenable to RT-PCR verification (see Additional file 3). Finally, a significant 3' splice site event was predicted in the heat map of NKX6-2, which could not be confirmed by RT-PCR ( Figure 1A; see Additional file 3, and data not shown).
In the absence of readily identifiable large splicing changes, we went on to validate our algorithm parameters. From several hundred manually examined RefSeq entries, we chose to validate over 50 genes with hybridization heatmaps suggestive of cassette exon pre-mRNA splicing. These genes represented a broad range of AS scores, DE values, and p-values ( Figure 2A). Validation was performed by semi-quantitative RT-PCR on three glioma cell lines (U251, SNB19 and T98G), a normal brain control, and a subset of samples from the patient set used for the array experiments ( Figure 2B). We could make confident conclusions for 38 (76%) of these genes (see Additional file 6). We chose to exclude the remaining genes from our analysis due to low expression levels, which made it difficult to make definitive interpretations.  Figure 2; see Additional file 6). However, RT-PCR data suggested that most of these genes would have had AS scores exceeding the 5-fold threshold with the exception of NF1 and CLTB. Figure 2B shows the RT-PCR results of 10 representative genes. In total, we found 14 genes with glioma-specific splicing: A2BP1, APPA4, BCAS1, CACNA1G, CALD1, CLTA, CLTB, DYNC1I2, KCNC2, NF1, RTN4, SNCB, TNC and TPD52L2.
As an additional step to determine the accuracy of the exon array platform and our algorithm, we specifically queried previously reported GBM-specific splicing events within our dataset. Figure 3A plots the array values obtained in our patient set for 32 independently identified GBM-specific pre-mRNA splicing events (Table 1), represented by a total of 55 RefSeq entries (see Additional files 1 and 8). For all of these genes, the AS scores fell below the derived 5-fold change in splicing (the highest Evaluation of glioma-specific splicing events The arrows indicate the isoform(s) that is differentially expressed in GBM; the involved exons are schematically presented to the right (NB, nontumor brain; GBM, GBM tumor brain). MBP and UBE2C were not observed to generate GBM-specific bands. GAPDH was used as a loading control. MW, molecular weight marker. For hybridization intensity maps for the highlighted genes see Additional file 7.
An array-based examination of published and in silico-predicted glioma-specific events  Figure 1B showing RefSeq entries that monitor published glioma-specific splicing events ( . CACNA1G had enhanced exon inclusion in the glioma cell lines, with a less pronounced change in the GBM patient samples (data not shown). The CALD1 differential exon inclusion event was present predominantly in glioma cell lines and GBM patient samples. For TNC, the GBM patient samples had more exon inclusion compared with the nontumor brain samples, which was opposite to the cell lines and previously reported findings [31]. In NF1, both cell line and patient samples had greatly enhanced inclusion of exon 23A, which agreed with published findings [14]. The results for FGFR1 were inconclusive since the glioma cell lines showed the expected predominant skipping of exon 3 compared with our normal brain control, but enhanced exon skipping was not as prominent in the patient samples. The three remaining genes (MBD1, MDM2 and PECAM1) had no detectable GBM-specific splicing, consistent with their low AS scores (circled in Figure 3A). Therefore, the RT-PCR analyses on these seven published events were consistent with arrayderived results except for exon 23A of NF1, which was not represented by an established RefSeq.

Low concordance with in silico studies
Previous in silico studies identified at least 186 genes with purported glioma-specific splicing events [15][16][17][18]20]. ; they were also listed in one of these reports [18]. MBP and UBE2C had p-values < 0.05 (Figure 2A and 3B). However, RT-PCR analysis failed to validate the presence of a consistent glioma-specific splicing event ( Figure 2B).

Expression profiling of RNA processing factors
Recently, upregulation of the splicing factor SF2/ASF was shown to be oncogenic [11]. To determine changes in splicing factor expression, we also performed expression profiling on a subset of 10 GBM and 10 nontumor sam-ples using the established U133 Plus 2 expression array. Overall, only 13 of 499 probe sets queried (2.6%) showed significant differences (> 4-fold, p-value < 0.05) in expression levels between GBM and nontumor samples (see Additional file 9). To determine whether splice factor expression could be used as a marker of gliomagenesis, we performed an unsupervised clustering analysis. In this clustering, the samples separated into GBM and nontumor groups with the exception of a single nontumor sample (data not shown). Clustering of the 25 most differentially expressed probe sets with significant p-values representing 19 unique genes (see Additional file 10) is shown in Figure 4. The majority of these genes are not known to be associated with alternative splicing. However, three splice factors could be linked to GBM-specific splicing events having p < 0.05 (see Additional file 9): A2BP1 regulates CLTB, GRIN1, MAG, NF1 and SCN8A splicing, PTB regulates CLTB, GABA, GRIN and FGFR2 splicing, and CUGBP2 regulates RASGRF1 splicing [32][33][34][35].

Discussion
Aberrant pre-mRNA splicing may be an important epigenetic factor for tumor progression. However, it is unclear how many genes are mis-spliced in a given tumor and whether aberrant expression of splice factors is responsible for their appearance. We used an exon array and designed analytical algorithms and parameters to identify GBM-specific splicing events in an unbiased manner. By plotting scores for differential expression (DE) against those for alternative splicing (AS) for genes and exons interrogated by the exon array, we were able to distinguish a single targeted induced splicing change in FGFR1 among 20,157 RefSeq entries and to monitor the concomitant splicing and gene expression changes. Using the same approach for comparing an extended GBM and nontumor sample set, GBM-specific splicing events of similar magnitude were not identified, which suggests that large-scale aberrant splicing in GBM are infrequent. We do not discount the fact that individual instances of dramatic changes in splicing were present; however, they were not shared by the majority of samples. The lack of events with large changes in differential exon expression led us to mine our data for splicing changes with at least a 2-fold change in probe set hybridization (AS score) and a p-value < 0.05. In the hundreds of heatmaps examined, there were many changes indicative of the usage of alternative promoters or polyadenylation sites. However, we chose to focus on cassette exons as events that could be readily examined by RT-PCR. This led to the validation of 14 GBM-specific events: A2BP1, APPA4, BCAS1, CACNA1G, CALD1, CLTA,CLTB, DYNC1I2, KCNC2, NF1, RTN4, SNCB, TNC and TPD52L2. Moreover, our expression profiling analysis indicated that there were relatively few GBM-specific changes for splicing regulators. Among the genes with the greatest differential expression only A2BP1, CUGBP2, ELAV1, MBNL2, PTBP1 and YBX1, have known functions in alternative splicing. At least three of these (A2BP1, PTBP1 and CUGBP2) could be linked to GBMspecific splicing events.
The identified and validated GBM-specific isoforms encode proteins that primarily affect cell growth and mobility. A2BP1, which shows both differential expression and splicing, is a neuronal-specific splicing regulator for multiple targets [36]. CLTB and DYNC1I2 are involved in intracellular transport and may play a role in cell migration [37,38]. Four genes have clearly identified functions in the central nervous system: APPA4 functions in Notch signaling during neural development, cell adhesion and apoptosis; RTN4 is a neurite growth inhibitor; and SNCB, which is upregulated in glial tumors, is thought to regulate phospholipase D2 activity. NF1 is believed to be a glial-cell marker and mutated in multiple CNS tumors [14,39]. For the remaining genes, little is known about their function in normal brain or gliomagenesis. Comparing glioblastoma and oligodendroglioma as two histological glioma subgroups on the same exon array platform that we used here, French and colleagues recently identified a total of 11 differentially expressed splice variants [40], one of which overlapped with our validated genes (CAMK2A) (see Additional file 1). In the exon array study for prostate and colon cancer, only CALD1 was in common with our validated gene list [29,33]. Therefore, it is unclear whether common pathways are targeted for splicing changes during tumorigenesis. It remains to be determined whether these splicing targets have a synergistic effect on gliomagenesis.
Genes with glioma-predominant splice isoforms have previously been identified through global EST alignments.
MAX was the only gene found both experimentally and in silico (see Additional files 1 and 2). Despite using similar datasets, the number of in silico derived genes with GBMspecific isoforms varied and only three genes were found to be shared between two of the five studies: AP2A1, CPNE1 and KPNB1 [14,15,17,26,27]. The lack of agreement between all of these studies can be explained by several technical and biological factors. First, the small sample sizes used did not allow for statistical calculations. Second, sample heterogeneity affected normalization and interpretation. Third, the nature of the samples being compared, which can be normal or nontumor vs. tumor, or tumor type A vs. tumor type B. For EST libraries, the true splicing frequency could be masked by too few ESTs, normalization strategies, and/or low-stringency criteria that enriched for rare ESTs [41,42]. The bias towards 3' and 5' ends of transcripts could also lead to the under-representation of isoforms with internal differences [41,42]. In contrast, using a large sample set on exon arrays circumvented these problems and allowed for objective measurements of isoform frequencies. It should be noted, however, that array-based studies are limited by the quality of the target preparation, probe specificity and sensitivity, and for the Affymetrix platform we used the lack of interrogation of exons < 25 bp., and can be confounded by tumor tissue heterogeneity. Of our 14 validated GBMassociated splicing events, eight (CAMK2A,CACNA1G, CALD1, CLTA, NF1,RTN4, TNC and TPD52L2) were previously reported (see Additional files 1 and 2). Most of the genes that were discordant had splicing changes that were outside the range of detection for the array (less than 2fold). Two additional genes (MBP and UBE2C) had a pvalue < 0.05 that were concordant with in silico determined genes, but could not be validated by RT-PCR (Figure 2B). Finally, the lack of complete agreement in all of these gene lists could be due to the overall low level of aberrant splicing in GBM.
Many studies have shown that overexpression of a single cancer-enhancing isoform is sufficient to alter glioma cell proliferation or invasion [43][44][45]. What remains unclear is how these specific isoforms are produced. Many cancers have over-or under-expression of splicing factors, which suggests that global aberrant splice events are possible. However, our analysis revealed that aberrant splicing factor expression does not lead to either large changes in specific exon utilization or widespread changes in the splicing of multiple targets. It is likely that titration of levels of multiple splicing factors is required to "fine tune" splicing decisions.

Conclusion
The relatively small number of aberrant pre-mRNA splicing events that we observed in our GBM sample set suggests that systematic, epigenetic targeting of splicing that leads to large changes may not be an important mechanism in gliomagenesis. Other exon array studies measuring tumor-specific aberrant alternative pre-mRNA splicing in prostate cancer and colon cancer also support the finding that extreme splicing changes in cancer are less frequent than was suggested by in silico studies [29,33,40,46]. We found that only 612 of 20,157 (3%) fully annotated RefSeq entries on the exon array showed significant changes in exon expression. We interpret our results to indicate that aberrant pre-mRNA splicing in GBM is a low frequency event. However, this analysis does not rule out patient-or gene-specific aberrant splicing events, or smaller magnitude splicing changes with critical functions in gliomagenesis. Furthermore, the differential expression of several RNA processing factors not involved in alternative splicing suggests that other aspects of RNA biology may play critical roles in gliomagenesis. Our validation experiments indicate that GBM-specific splicing is generally a partial event, with varying degrees of exon inclusion. The 14 genes identified in our study are potentially the most important GBM-specific splicing events and constitute promising targets for therapeutic intervention.

Cell line information
The U251, SNB19 and T98G cell lines were grown as previously described [30]. Morpholino oligonucleotides targeting the intron splicing silencer sequences flanking FGFR1 exon 3 was performed by double "scrape-loading" as previously described [30]. RNA isolation was performed using the RNeasy Micro kit according to the instructions provided (Qiagen, Valencia, CA).

Patient information
Informed consent was obtained from all patients prior to sample collection in accordance with the guidelines set by the Institutional Review Board. Biopsy samples from surgical resections were collected and banked through the brain tumor tissue bank at the Brain Tumor Center at the University of Texas M. D. Anderson Cancer Center. All samples were immediately placed on ice and snap-frozen for -80°C storage within 30 min of devascularization after removal of portions needed for pathological diagnosis. We obtained 24 newly diagnosed primary glioblastoma (WHO grade IV astrocytoma; K. Aldape) samples from patients who received no therapy, but underwent gross total resection before sample collection. Glioblastoma samples contained at least 90% tumor. 12 frozen samples of nonneoplastic brain tissue without histologic evidence of tumor or another significant abnormality were used for comparison. RT-PCR validation also included Human Brain Total RNA, extracted from a 78-year-old Caucasian female with congestive heart failure (Ambion, Austin, TX).

RNA extraction
RNA was extracted using the TriZol Reagent according to the manufacturer's suggestions (Invitrogen, Carlsbad, CA) and further purified using the RNeasy kit. The quality and integrity of the RNA was then analyzed on an Agilent Bio-Analyzer (RNA 6000 Nano LabChip). Total cellular RNA samples with a RIN (RNA integrity number) > 7 were used for further microarray studies. Reactions were incubated at 70°C for 5 min. Magnetic beads used for the ribo-reduction step were washed twice with RNase-free water, once with hybridization buffer with betaine and resuspended with 20 μl of hybridization buffer with betaine. The magnetic beads were warmed at 37°C for 2 min prior to the addition of the RNA poly(A) control probe. Following the 70°C incubation, the samples were immediately cooled on ice for 2 min. Samples were then added to the magnetic beads, mixed, centrifuged and incubated at 37°C for 10 min with a mix after 5 min. Following incubation the samples were placed on a magnetic stand for 5 min, and the supernatant was removed and placed on ice. The beads were washed with hybridization buffer and betaine and incubated at 50°C for 5 min. The samples were then placed back on the magnetic stand for 5 min, and the supernatant was removed and placed on ice. The rRNA-reduced RNA was concentrated using the GeneChip IVT cRNA Cleanup Kit (Affymetrix). One μl of concentrated RNA was analyzed on an Agilent BioAnalyzer using the RNA 6000 Nano LabChip to check the quality and percent of ribosomal reduction of the starting material. Percent ribosomal reduction of the RNA ranged between 60-90%. Four μl of ribosomal reduced RNA was added to 500 ng/μl of T7- Reactions were mixed and briefly centrifuged then incubated at 99°C for 5 min, cooled to 45°C for 5 min and then centrifuged at maximum speed for 1 min to collect precipitate. Exon arrays were equilibrated to room temperature prior to injection; 200 μl of the hybridization solution was added to each array and incubated in a rotating hybridization oven (Affymetrix) at 45°C and 60 rpm for 16 hrs. Following hybridization, the arrays were washed and stained on an Affymetrix Fluidics 450 workstation using the FS450_0001 fluidics script, and scanned on an Affymetrix GeneChip 3000. GeneChip Operating Software (GCAS) v1.3 was used to produce .cel intensity files.

Exon array data analysis
Exon arrays were quantified using the PLIER algorithm introduced by Affymetrix. The arrays were quantile-normalized, and GC-specific background was estimated and subtracted using the PM-GCBG option. All quantifications used log 2 (PLIER + 8) values, where the value "8" is an arbitrary shrinkage constant. PLIER summarizes groups of probe-level intensities, and we used two different groupings: (1) all probes within a single PSR, and (2) all probes within a given gene. Genes were defined as Ref-Seq clusters using groupings of PSRs supplied by Affymetrix. While many PSRs on the array are not part of a RefSeq cluster, we chose to focus primarily on those where we had good annotation. Exon numbering within the gene was checked by mapping the reported sequences against data from the UCSC genome browser (Build 16).

Alternative splicing
In order to identify alternative splicing within a gene, we determined instances where the PSR-level quantifications differed significantly from the gene-level quantifications. For example, let us consider the case of a hypothetical gene with several component PSRs (Gene A). The genelevel PLIER quantifications suggest mean log 2 intensities of 8 in GBMs and 7 in nontumor brain (a two-fold increase in expression in GBMs). Quantifications of the first PSR suggest mean values of 7 in GBMs and 6 in nontumor brain, again showing a two-fold increase in the GBMs, consistent with the overall behavior of the gene. Taking the mean difference for the PSR (7 -6 = 1), we subtract the mean difference for the gene (8 -7 = 1) to get the amount by which the behavior of the PSR differs from that of the overall gene; large differences specific to the PSR indicate potential alternative splicing. While "large" can be assessed in terms of fold-change, we prefer to scale the observed (PSR change) -(gene change) by an estimate of the within group (GBM or nontumor) standard deviation, as in a standard t-test. For example, let us assume that these differences for one PSR have values of (0 -2, 0 -1, 0, 0 + 1, 0 + 2) for the GBM and (2 -2, 2 -1, 2, 2 + 1, 2 + 2) for the nontumor samples. The mean difference between groups is 2. Now let us assume that for another PSR we see (0 -6, 0 -3, 0, 0 + 3, 0 + 6) vs. (2 -6, 2 -3, 2, 2 + 3, 2 + 6). The mean difference (on a log 2 scale) is still 2, but the var-iability is a lot higher. The latter PSR would get a smaller t-value than the former, since the signal does not stand out as clearly above the noise. This approach is very similar to starting with a standard two-way Analysis of Variance (ANOVA) with main effects for condition (GBM or nontumor) and PSR, and testing for a significant interaction between condition and PSR. There, the scaled PSRgene differences are squared and summed to give the test statistic. However, since we are looking for differences in cassette exons (a very small number of PSRs), and summing spreads the differences across several PSRs, we decided to work first with the squared terms for each individual PSR and look at the size of the largest of these terms [the maxT 2 , which is our Alternative Splicing (AS) score]. This, in turn, is similar to the ANalysis of Splice Variation (ANOSVA [47]) save that separate variance estimates are computed for each PSR. This focus on PSR-specific variation also differs from the correlation-based method used by French et al. [40]. We estimated the null distribution of this statistic through simulations. Some of the PSRs found using this approach were driven by a combination of "dead probes" and gene-level differential expression: e.g., the mean level of gene expression went from 6 in the GBMs to 9 in the nontumor, but the values for one PSR stayed fixed at 3 throughout. While this PSR behaves differently from the rest of the gene, it may be doing so because the probes simply fail to hybridize to the target at all. In order to address this problem, we added a further filtration step. First, every PSR was assessed a prior probability of being measurable by taking the PLIER values across all samples from PSRs that were part of a RefSeq gene, forming a histogram, and modeling these expression values as coming from a mixture of a normal distribution (noise) and an exponential distribution (signal). Then, using the estimated parameters of the normal distribution, the prior probability with mean intensity × is normcdf(x, mu+3 * sigma, sigma 2 ). PSRs close to the noise level are not likely to be measurable. Based on the observed data, this prior is updated to give a posterior distribution suggesting whether the PSR is measurable. If the PSR shows reasonable variation or difference in expression between conditions, the posterior probability of measurability should be near certainty. Updating was performed by computing p-values for PSR specific t-tests without adjusting for gene-level differences; the associated p-values were then used to compute posterior odds using the method of Sellke et al. [48]. We then computed p-values (using the conservative assumption that the maxT 2 distribution was the maximal order statistic from k independent tests) for each inclusion/exclusion combination, i.e., there are 2 k-k-1 possible combinations of "include this PSR, exclude that one" for defining a gene if we omit the null cases of 0 or 1 PSRs from consideration. For each such combination, we compute a T 2 value and p-value as noted above, and we also assign a weight proportional to the product of the probabilities of inclusion (exclusion) used. The weighted sum of these is the p-value reported here. In the case of the "dead" PSR discussed above, the terms with that PSR included would have small p-values (it is different from all of the others), but low weight (because that PSR is deemed not measurable), and the final weighted p-value will be large.
We also determined modal values of the test statistic via simulation. Our simulations assumed the "ideal case." The mean expression levels for individual PSRs were chosen from the interval [3,12], reflecting the range of log 2 PLIER scores observed; normal noise was added. Variability was allowed to be different for each PSR, with the variance drawn at random from the set of PSR variances observed in our studies. Pooled variance estimates were used. Differences were concentrated in a single PSR, with the size of the difference being log 2 of 5, 10, or 20, respectively. A difference of 0 was also examined to supply estimates of the null distribution. Simulations were run for genes involving 4, 8 and 20 exons.
Differential expression Differential expression at the gene level was assessed using 2-sample t-tests. This is our measure of Differential Expression (DE). We corrected for multiple testing by using betauniform mixture (BUM) models [49] and targeting a falsediscovery rate (FDR) of 5%. An arbitrary constant (0.1) was added to the pooled variance estimates before the tstatistics were computed to ensure a certain minimal foldchange that was statistically significant.

Microarray expression profiling Target preparation
For RNA expression profiling on the U133 Plus 2 Gene-Chip (Affymetrix, Santa Clara, CA), a total of 5 μg of total cellular RNA from each sample was used for cDNA synthesis according to the manufacturer's protocol. Briefly, a mixture of in vitro transcribed cRNAs of cloned bacterial genes for lysA, pheB, thrB, and dap (American Type Culture Collection) was added as external controls to monitor the efficiency of cRNA synthesis. First-strand cDNA synthesis was performed at 42°C for 1 hr with the Superscript II system (GIBCO/BRL) at a final concentration of 1× firststrand synthesis buffer, 10 mM DTT, 500 μM dNTPs, 100 pmol of T7-(T) 24  Expression array data analysis For the U133 Plus 2.0 expression array, we identified a comprehensive set of 499 probe sets representing RNA processing factors (RPFs) with known functions in general and alternative splicing, RNA export, RNA degradation, miRNA processing and nonsense-mediated decay (see Additional file 9). Gene-level RMA quantifications were used for unsupervised clustering analysis on the 499 probe sets. To identify the most relevant RPFs, we performed two-sample t-tests for each gene to contrast the groups revealed by the clustering analysis. Values with a pvalue < 0.05 were extracted and sorted by the difference in mean expression between the two groups. Values for the top 25 positive or negative mean expression differences were then used to perform a second clustering analysis shown in Figure 4.

RT-PCR validations
cDNA was generated from 2 μg of DNaseI treated brain, tumor, or cell line RNA using standard methods. Equal amounts of random decamer and oligo-(dT) primed cDNA were pooled and diluted to 200 μl with molecular grade H 2 O. For PCR, 5 μl for patient samples and 3 μl for glioma cell lines of cDNA were used. PCR was carried out in 1× PCR buffer (Invitrogen, Carlsbad, USA), 1.5 mM MgCl 2 , 0.25 mM dNTPs, 20 pmol of each forward and reverse primer (for primers see Additional file 111), and 2.5 U of Taq Polymerase (Invitrogen, Carlsbad, USA). The reaction was carried out for 35-37 cycles at 55-62°C annealing depending on the primer set and transcript abundance. The PCR products were electrophoresed on 1% agarose gels and visualized with ethidium bromide staining.
and wrote the statistical sections of the manuscript. ST performed statistical analyses of exon array data and generated data tables. LLB contributed to experimental design, data interpretation, and writing of the draft manuscript. VLN and TJN performed all exon array experiments. KDA ascertained the patient samples and their clinical information. GJC and RK designed the experiments, interpreted exon array data and analyses, performed the RNA processing factor analysis, and wrote the final manuscript. RK supervised the overall project. All authors read and approved the final manuscript.