Prediction of alternatively skipped exons and splicing enhancers from exon junction arrays
© Kechris et al; licensee BioMed Central Ltd. 2008
Received: 15 March 2008
Accepted: 20 November 2008
Published: 20 November 2008
Alternative splicing of exons in a pre-mRNA transcript is an important mechanism which contributes to protein diversity in human. Arrays for detecting alternative splicing are available using several different probe designs, including those based on exon-junctions. In this work, we introduce a new method for predicting alternatively skipped exons from exon-junction arrays. Predictions based on our method are compared against controls and their sequences are analyzed to identify motifs important for regulating alternative splicing.
Our comparison of several alternative methods shows that an exon-skipping score based on neighboring junctions best discriminates between positive and negative controls. Sequence analysis of our predicted exons confirms the presence of known splicing regulatory sequences. In addition, we also derive a set of development-related alternatively spliced genes based on fetal versus adult tissue comparisons and find that our predictions are consistent with their functional annotations. Ab initio motif finding algorithms are applied to identify several motifs that may be relevant for splicing during development.
This work describes a new method for analyzing exon-junction arrays, identifies sequence motifs that are specific for alternative and constitutive splicing and suggests a role for several known splicing factors and their motifs in developmental regulation.
Eukaryotic gene expression is controlled by a series of biological events involving various interactions among proteins, DNA and RNA that are subject to complex regulation. One of the essential processes is pre-mRNA splicing, in which the spliceosome complex recognizes splice sites (SS) in the precursors of mRNA following transcription, removes noncoding introns and joins neighboring exons to form mature messenger RNA that can be further translated into protein sequence. Both 5' splice sites (marking exon-intron boundaries, or donor sites) and 3' splice sites (marking intron-exon boundaries, or acceptor sites) consist of short basal signals well conserved from yeast to human, but contain insufficient information to accurately identify exon-intron boundaries in vertebrates . Consequently, there are numerous cryptic splices embedded throughout the pre-mRNA transcript. Variations in the splice site detection process can create multiple transcript variants for a gene and is referred to as alternative splicing. Alternative splicing events may include the skipping or retention of entire exons, intron retention or alternative usage of 3' or 5' splice sites. These changes often lead to modifications in the encoded proteins and have been shown to play a critical role in development and disease [2–4]. Alternative splicing also provides an efficient means to expand protein diversity from the limited gene pool. It is estimated that 50–80% of the approximately 25,000 human protein-coding genes are subject to alternative splicing [5–7] underscoring the importance of splicing regulation.
The correct recognition of splice sites is facilitated by various protein-protein and protein-RNA interactions. In addition to exon-bridging or intron-bridging interactions of splice sites, there are many non-splice-site sequences in exons and adjacent introns, termed enhancers or silencers, that stimulate or repress the splicing of constitutive or alternative exons. Both enhancers and silencers have been identified through in vitro and in vivo methods [8–11]. Large-scale identification of these sequence elements are based on computational methods which analyze multiple sequences simultaneously. Some authors have focused on constitutively spliced exons [12–15], while others have focused on identifying the sequence signals that discriminate between constitutively and alternatively spliced exons [16–21]. More recent methods now take advantage of the availability of genomes from related species to make predictions based on evolutionarily conserved sequences [22–25]. Although motif search methods such as  identify sequences that are similar to previously determined enhancers and silencers, most computational approaches require as input a large collection of alternatively spliced exons.
The most widely used method to identify alternative splice variants is based on aligning EST and mRNA sequences to the genome [27–29]. Application of this method has provided estimates for the percentage of human genes with multiple splice variants between 40% and 60% [27–29]. However, EST-based methods are often biased towards the transcript ends, prone to sequencing errors, biased towards more highly expressed genes, limited by the availability of clone libraries for particular cell and tissue types and not uniformly annotated. The recent adaptations of microarray technology in the form of splice arrays are now providing new directions for detecting alternative splicing. DNA microarrays were originally developed to measure expression levels for a large number of genes simultaneously. Arrays for detecting alternative splicing (splice arrays) contain probes specifically designed to investigate splice events. They have been used to explore splicing in yeast.[30, 31] and in mammalian cells (e.g., [6, 32–34]). Several different designs reviewed in  have been developed including tilling arrays, arrays with specific probes that distinguish between known splice forms [32, 36], arrays based on exon probes [37, 38], exon-junction probes [6, 39] or both [32, 36, 40, 41]. Most of the data analysis methods for splice arrays and levels of alternative splice inference depend on the array design (see review in ), and the design of these arrays typically rely heavily on known or predicted annotation of gene structures. In summary, splice arrays have been shown to complement the EST-based approaches by identifying a large number of novel alternative splices with no previous EST support, despite a moderate validation rate of ~50% by RT-PCR [6, 42, 43].
In this paper, we propose a new statistical approach for identifying alternatively spliced exons from exon-junction arrays, and predict a large set of alternatively skipped exons. This is achieved using the most comprehensive exon-junction array analysis to date , which monitored all exon junctions of over 10,000 RefSeq genes in 52 tissues samples. Our approach consists of a novel exon-skipping score that serves as a quantitative measure of evidence for alternatively skipped exons. By applying kmer-contrast and regression-based sequence analysis methods to the top ~8400 exons according to our score, we are able to recover several known splicing enhancers and identify additional novel candidate splicing regulatory motifs associated with skipped exons. Finally, we also identify a set of development-related alternative splices and their associated enhancers using a tissue-pair analysis followed by de novo motif finding algorithms. Enrichment analysis of gene ontology annotation supports the functional roles of the predicted development-related alternative splices and suggests a new scheme for identification of process-specific alternative splicing.
Results and discussion
Analysis of exon-junction array
The pre-processed array data described in  were obtained from the NCBI Gene Expression Omnibus (GEO) database with accession number GSE740. The dataset contains the average background subtracted intensity yi, j, kfrom two dye-swapped arrays for each exon-junction probe j of gene k in tissue i, where i is one of the 52 tissues surveyed and j = 1, ..., n k for gene k, which has n k probes bridging n k +1 exons. Note that intensity patterns of exon-junction (EJ) probes alone cannot distinguish between intron retention or alternative 5' or 3' splice site usage, and hence we only focus on the effective detection of exon skipping in the 8618 genes containing 5 exons or more (n k ≥ 4 on the array). For each gene, we fit a linear model with terms for probe and tissue effects. Because of the large number of effects that need to be estimated, we must restrict the analysis to genes where there are enough data points. We found that a minimum of 5 exons was an adequate cutoff that avoids over-fitting in our statistical analysis but does not filter too many genes. For each gene, 5 exons corresponds to 4 exon-junctions (4 data points per tissue) and consequently 3 adjacent exon-junction pairs.
If we treat exon inclusion as a simplified binary "on" or "off" event in each tissue, then conceptually, there are three main signal sources, other than noise, for the observed EJ probe intensity yi, j, k: (i) probe-specific effect describing sequence binding affinity, (ii) tissue-specific effect resulting from differential expression, and (iii) alternative splice (AS) effect determined by exon inclusion or alternative splice site selection. A simple approach to recover the variance components of individual signal contribution is to use linear models on the properly transformed data (EJ probe intensities), essentially assuming that the transformed intensity is the sum of the probe, tissue, AS effect and white noise. A skipped exon will result in the two spanning EJ probes to be switched "off". This supports an exon-skipping score that measures the magnitude and concordance of AS effects in adjacent EJ probes spanning the same exon. Identifying exons that are alternatively skipped in some of the tissues surveyed is then equivalent to the detection of scores that deviate across tissues. In summary, our data analysis method can be described as a three-step procedure:
1. Estimate the variance stabilizing transformation  for the EJ probe intensities to satisfy the constant variance assumption for linear models.
2. Fit a linear model on the transformed data to estimate probe and tissue effects. The residuals of the fit ri, j, krepresent AS effects adjusted for probe- and tissue-specific effects.
3. Summarize exon-skipping events at the exon and gene level by defining an exon-skipping score based on the residuals ri, j, kand test for large deviations of this score across tissues.
Note that , among others (e.g., ), also used a linear model on log-transformed (step 1) intensities but estimated linear model parameters by medians (step 2), followed by ad hoc thresholding of the residuals r i,j,k to define AS scores (step 3). We considered several alternatives to this approach. First, the original method of  only examined individual exon-junctions by thresholding the residuals into four categories from 0,1,2 and 3. The largest residuals, those given a score of 3, were predicted as splicing events. For each junction, the authors then tallied the number of tissues where the residuals had a score of 3. There appeared to be no systematic evaluation of neighboring pairs of exon-junctions in this scoring procedure. Therefore, as an alternative, we calculated the novel statistic of residual products so that both flanking exon-junctions for an exon contribute to its score. Second, our preliminary analysis (data not shown) indicated that the logarithm function used by  to stabilize the variance may be insufficient. Therefore, we also tried alternative transformations. Third, we used statistical significance cutoffs to threshold the residuals (or residuals products).
In particular, at each stage of the three-step procedure, we considered several alternative methods for the same objective: applying the logarithm or arcsinh-based variance stabilizing transformation (VSN) function for stabilizing variances in step 1; using the standard least square fit (mean) or robust fit (median or median polish) for linear model parameter estimation in step 2; and measuring deviations from normality of residuals in step 3 using the non-parametric Kolmogorov-Smirnov (KS) test, a weighed non-parametric Kolmogorov-Smirnov test (WKS), parametric Wilk-Shapiro test (WS)  or simple thresholding of the coefficient of variation (CV, standard deviation normalized by the mean) on the residuals or residual products (see Methods). We viewed the selection of these options as an optimization problem, and determined the best procedure in terms of prediction accuracy using control data. We used two sets of alternatively and constitutively spliced exons curated from independent sources as controls (see Methods). The positive control (AS) consists of 164 genes supported by EST data. The negative control (CS) consists of 282 genes selected from the analysis of an Affymetrix exon array experiment that show constitutive splicing across a panel of 11 tissues. These two examples represent "approximate" controls because they rely on available ESTs or other data on selected tissues. However, we expect that they will be dominated by the desired events.
To further validate these results, we examined the WKS scores on another set of positive controls; the positive RT-PCR results in  and the 2656 RefSeq entries on the array with "cassetteExon" events from the UCSC "knownAlt" table hg18 annotation . Although the RT-PCR set also contain genes with alternative splice site selections and intron insertions, the two new combined positive controls had WKS-statistics significantly greater than the negative control (Mann-Whitney test p-value = 4.9 × 10-4). Several of the highest scoring examples using the WKS method, such as UBA3 (ubiquitin-activating enzyme 3 isoform 1), MAPKAP1 (mitogen-activated protein kinase associated) and PPM1B (protein phosphatase 1B isoform 1) were identified as "cassetteExon" gene entries from UCSC but were low scoring using the CV method, which only uses data from a single exon-junction. This illustrates that the combined product score based on neighboring exon-junctions can predict known examples which are missed by single junction methods. In summary, given the original linear model in , we derived a new exon-skipping score that 1) is based on an alternative transformation which improves the correction for non-constant variance and 2) draws on information from both splice junctions of an exon using the product residuals.
Using a score cutoff that balances the rate of true positive and false positives in our original two controls, we identified a set of alternatively spliced genes and corresponding exons (see Methods). We then determined the lowest ranking genes and exons according to our score to obtain an equivalently sized set of constitutive spliced exons. In total we predicted 8433 alternatively skipped exons and 8113 constitutive exons. An analysis of the Gene Ontology (GO) terms  for our predicted alternatively spliced genes (see Methods) using GOstat  showed enrichment in 114 GO terms. More than half of these GO terms are related to metabolism, cell death, regulation, transcription, splicing and protein targeting and localization (See Additional file 1, Table S1), which are categories consistent with prior studies [19, 29, 45].
Sequences associated with alternatively splicing
To identify sequence motifs associated with the regulation of alternatively skipped exons, we adapted the contrast-kmer-based RESCUE-ESE algorithm for splicing  and the regression-based REDUCE program for transcription factor binding sites . The RESCUE-ESE algorithm accounts for the fact that splicing enhancers compensate for weak splice site signals, and finds candidate exonic splicing enhancers (ESE) for constitutive splicing from statistically over-represented hexamers in exons versus introns and associated with exons defined by weak splice sites versus strong splice sites. REDUCE enumerates all possible kmers in the promoters up to a specified length and uses a linear regression model to find kmers that show significant correlation with gene expression levels from a single microarray experiment. To identify putative splicing regulatory elements associated with alternative splicing, we used different contrast sets for RESCUE-ESE and employed our exon-level statistic for REDUCE on exons and their flanking intron sequences.
Exonic splicing enhancers (ESE) for alternatively and constitutively spliced exons
We also focused the contrast analysis on constitutively spliced exons to select for ESEs relevant for constitutive splicing using CE versus AE and exons versus introns comparisons and found 23 motifs for the 5' analysis and 12 motifs for the 3' analysis. The majority of motifs detected are CG-rich with diverse consensus patterns, most likely reflecting the coding bias of CE (Figure 4 and Additional file 1, Tables S5 and S6). Besides the CG-rich motifs, 6 of the 23 non-redundant motifs found also overlap known motifs or natural occurring enhancers, including CA-rich motifs (CAAC, CCAC) for splicing factor YB1, TGCCGTT for SC35 and functional SELEX motifs for Schaal-II-D (TCTCC) . The resulting motifs were similar when we also restricted the comparison to weak splice sites. Finally, no motifs were found to be associated with strong splice sites in support of CE, consistent with the notion that splicing enhancers are required only when splice sites do not contain enough information in order to facilitate unambiguous exon recognition.
Significant kmers found by the REDUCE algorithm.
We applied similar contrast analysis to identify intronic motifs associated with AE by selecting for kmers over-represented in introns flanking AE versus CE-associated introns and in introns versus exons. We found A/T rich motifs associated in introns flanking AE and pyrimidine tract-like motifs and G-triplets in introns flanking CE (see Figure 4, Additional file 2). Randomization runs were performed on the exon data to assess the frequency of observing the predicted kmers in random data. The results indicate that we are observing more predicted kmers than expected by chance (see Additional file 2).
Development-related alternative splices derived from tissue-pair analysis
These results imply that some SR proteins may be involved in a general developmental response. To explore this hypothesis, we further investigated the exon-junction array, which has extensive results from different tissues (e.g., liver), developmental stages (e.g., fetal) and disease cell lines (e.g., lung carcinoma). Therefore, it is possible to identify AS involved in specific biological process by examining splice patterns between tissue pairs. For instance, genes that show different splicing profiles in fetal versus adult brain tissues may be indicative of functional roles in the regulation of brain development through splicing control. By comparing fetal versus adult tissues for several different organs, we adapted our gene-level exon-skipping score to obtain alternatively spliced genes that are associated with development.
Development-related alternatively spliced genes.
Genes with PubMed tissue keyword
% of all genes with keyword
Genes with "splicing" keywords
% of all genes with keywords
Genes with PubMed tissue, "development" or "splicing" keywords
% of all genes with keywords
To systematically assess the functional validity of our predictions, we analyzed the functional annotation of the predicted genes in the literature. Tissue-specific keywords were searched in the abstracts listed in the PubMed entries for each gene (see Methods). For example, for our lung-developmental genes (column 1 in Table 2), we searched for the keyword "lung" in each abstract, and a p-value was derived for tissue-specific keyword enrichment compared to all genes on the exon-junction arrays using Fisher's exact test (Table 2). We also specifically searched for a splicing-related keyword to check that the tissue annotations are not only due to transcription related information. In summary, all our predicted tissue-specific gene sets showed statistically significant (p < .05) enrichment for the respective tissue-specific keyword, splicing-related keyword and for a combination of tissue-, splicing- and development-related keywords. A separate analysis of the GO terms in the 127 predicted brain-development associated genes with the "brain" keyword in PubMed showed functional enrichment in 26 GO terms using GOstat (Benjamini-corrected false discovery rate < 0.01). The most common GO terms (11 out of 26) were those with functions related to signaling and ion transport (Additional file 1, Table S12), which are relevant for synaptic transmission and neurogenesis. In summary, the two annotation analyses collectively support the validity of our predicted genes.
Experimentally verified binding sites in their natural context and from SELEX experiments are listed in Additional file 1, Table S14 for the three SR proteins that we identified based on the expression data as possibly involved in general developmental roles (SRp55, SRp40 and ASF/SF2). Most of these sequence elements contain the "GAA" sequence and are similar to the motifs discovered by our motif search in the development set of AS exons (Figure 5). The known binding sites, expression data, annotation results, and motif analysis provide consistent evidence supporting the hypothesis that these proteins may be involved in the regulation of alternative splicing during development.
We have developed a novel approach for the analysis of exon-junction arrays that will specifically search for exon-skipping alternative splice events. Our proposed score evaluates the product of expression levels across exon-junctions to more accurately reflect exon-skipping. Our score also accounts for overall expression levels for a gene and uses an improved variance stabilizing correction. We have shown that the combination of these approaches improves the discrimination of positive and negative control sets determined from independent data sources. We could not, however, directly compare our method with existing methods in  because that analysis was based on individual array replicates, which we could not access. Nevertheless, utilizing another source of external validation, we demonstrate that the annotations for our alternatively spliced gene predictions were consistent with previous literature. There are several other exon-junction studies, but our method was not applicable because they were either disease specific or a detailed examination of a relatively smaller number of genes [39, 63, 64].
Following the array analysis, we examined our sequence predictions for dual purposes; to validate our analysis of the exon-junction array and to predict novel splicing enhancers and silencers. The results from the sequence analysis provide a further source of validation for the quality of our alternative and constitutive exon predictions. Using randomization trials, we found that our predictions were enriched in sequences that were not contained in random sets of sequences. We also identified several sequence signals that are consistent with the experimental literature (e.g., GGG triplets and pyrimidine-rich motifs in introns) and identified several motifs that are highly specific. For example, the known exonic sequence enhancer CACC was discovered in the constitutive exons but not in the alternative exons. Another known enhancer, GAAGAA, appears to be more specific to alternative exons than constitutive exons. Furthermore, we found that the occurrence of GAAGAA is biased towards exons with weak splice sites, while the other identified motifs associated with alternatively skipped exons do not have splice strength specificity. The sequence analysis also shows that, although originally developed for gene expression data , correlation-based methods utilizing whole genome data like REDUCE are applicable to splice arrays and corroborate the kmer enrichment analysis without relying on pre-determined cutoffs.
By our definition, the alternative exons show patterns of exon skipping among different tissues. The presence of known exonic enhancers in these sequences supports the hypothesis that depending on tissue-specific expression, the corresponding binding factors are enhancing the splicing of these exons, which would otherwise be skipped. An alternative hypothesis, not supported by these results, but also discussed in the literature , is that silencer sequences in the exon or flanking introns prevent proper splicing and are responsible for exon-skipping. These hypotheses are not mutually exclusive because both enhancer and silencer sequences may function cooperatively in splicing regulation. Furthermore, some sequence elements have been shown to act as both enhancers and silencers .
A useful feature of the Rosetta array is that both fetal and adult samples were included for three different tissues. We used this to develop a method for predicting genes with pairwise tissue differences in exon-skipping and applied our procedure to explore both the cis and trans-regulation of splicing during development. Our method for pairwise tissue analysis is not limited to the developmental comparison but can be extended to other tissues or cell lines (e.g., cancer versus normal cell lines). We made gene predictions for three tissue types (lung, brain and liver) but only looked at the intersection to focus on general developmental alternative splicing, which was motivated by observed expression changes in specific SR proteins. Our final predictions for development-related alternative splicing were consistent with functional annotation and literature searches. In our sequence analysis of the development-related predictions, we found that a form of the GAAGAA motif may also have a role in alternative splicing during development. Furthermore, the changes in gene expression between fetal and adult tissues for several SR proteins that bind to this motif provide further evidence for their roles in developmental regulation.
Because of the array design, our work focuses on the detection of exon-skipping. However, these data can also be used to predict and analyze other types of alternative splice events. In particular, if alternative splice site selection or intron retention occurs, we would observe variation in a single exon-junction across tissues, but not necessarily in consecutive junctions, as observed with exon-skipping. Therefore, a product summarization step would not be necessary, but we could adapt our procedure in other ways to predict alternative splice selection and intron retention. We will, however, not be able to discriminate between these other types of splice events because of the nature of the array design.
A recent direction taken by several groups is to use sequence conservation across multiple species to aid in the search for enhancers and silencers [22–25, 66, 67]. Comparative approaches have also been used to examine the conservation of alternative splice events by comparing genomic or EST data from multiple species [29, 68–71]. As more splice array data become available from different organisms and tissues, there may also be opportunities to explore the conservation of splicing events from meta-analysis of splice arrays, without relying on ESTs from orthologous genes that are often limited across species.
Within each gene, the alternative splice (AS) effect for every probe j and tissue i is estimated by the residual ri, j, kfrom a linear model fit. To identify whether exon-skipping has occurred in some tissues, we considered four exon-skipping scores that measure the deviations of the residuals across tissues. All calculations described in this and all other Methods sections are performed in the software R http://www.r-project.org/ unless otherwise noted.
One score is based on the two-sample Kolmogorov-Smirnov (KS) test statistic of the residual products of adjacent exon-junction (EJ) probes spanning exons j, pi, j, k= ri, j, kri, j +1, k. For the exon-level score, residual products across tissues i for exon j are examined, and for the gene-level score, the residual products across tissues i and exons j for gene k are examined. To obtain p-values based on the two-sample KS test, we compared the observed residual products with randomly sampled residual products from the null hypothesis that that they are a product of normally distributed random variables with mean 0, ri, j, k~N(0, σ2), where σ is sampled from the empirical distribution for all genes.
Exploratory data analysis showed greater variability for genes with low overall expression signal and thus we considered a more appropriate exon-skipping score, the two-sample weighted Kolmogorov-Smirnov (WKS = w k KS k ) test statistic. The linear weight function is constructed to adjust for the overall expression signal strength and results in increasing the weight of more highly expressed genes. It is defined as with , where yi, j, kis the average background subtracted intensity. To obtain p-values, observed values of WKS were compared with randomly sampled WKS statistics based on the product of randomly sampled weights, from a uniform distribution within the empirical range of calculated weights for all genes, and randomly sampled KS statistics, sampled from the theoretical distribution of the KS statistic.
Another score is based on the coefficient of variation (CV), a statistic that measures large variations and is defined as the standard deviation divided by the mean. For exon-level score, we calculated the CV of the residual products across tissues, and for the gene-level score, we calculated the standard deviation across exons within the gene, . This method is closest to the Rosetta approach  which relies on a statistic similar to the coefficient of variation.
Finally, we considered a score based on a direct test for normality using the Wilk-Shapiro test (WS) . In this case we did not use the product summarization, but tested the following hypothesis using the WS test: ri, j, k~X where X is distributed as N(0, σ2). As before, the tests are performed across exons or genes to obtain the exon- or gene-level scores respectively.
A collection of sample Affymetrix Human Exon 1.0 ST array data (from ) was used to generate an independent set of negative controls. These arrays represent a collection of 11 tissues with 3 replicates each. All of the 11 tissues were included in the Rosetta data set  except breast. Identifying gene sets from these arrays that demonstrate no changes across all 11 tissues provide an independent source of experimental evidence for constitutive exons. The data are background-corrected, quantile-normalized and summarized at the exon level using the Robust Multi-Array Average (RMA) method  to generate a logarithm base 2 RMA value, xi, j, k, for each exon probe set. To identify genes that are likely to be constitutively spliced, we rank genes according to their expression variability across the 11 tissues and across corresponding core exon probesets, . The top 5% least variable RefSeq genes that are both present in the Affymetrix exon arrays and Rosetta exon-junction arrays (n = 282) were selected as our negative controls.
The positive control (AS) consists of 164 EST-supported alternatively spliced genes that were placed on the arrays in Le et al., 2004 .
Constitutive and alternative exons
To determine a set of alternatively spliced genes, we used a cutoff for the gene-level WKS-statistic that produced a true and false positive rate of ~70% and ~25% respectively on the controls. This cutoff resulted in 3912 predicted alternatively spliced genes. Of the exons in these genes, we then extracted exons in the 75th percentile according to their exon-level KS-statistic, which resulted in 9178 predicted exons. The KS-statistic is used for the exon-level score because the WKS-statistic has a gene-level correction for expression variability that is the same for all exons in a gene. Using the UCSC Genome Browser collection of RefSeq sequences (July 2003), 8433 exons were identified along with their neighboring introns (n = 13813) by sequence matching. Multiple occurrences of neighboring introns were removed. For the GO term analysis, we used a more stringent cutoff and only examined genes in the 95th percentile according to their WKS-statistic (n = 431). GO term enrichment was determined using the Benjamini-Hochberg correction for controlling the false discovery rate at the .001 level.
To obtain a similar sized set of constitutive genes and exons so that the sequence comparisons would be balanced, we evaluated the lowest ~15th percentile of genes according to the gene-level WKS-statistic and extracted the exons within these genes that scored below the median exon-level KS-statistic. Using the UCSC Genome Browser collection of RefSeq sequences, 8113 of these exons were identified along with their neighboring introns (n = 10081). Multiple occurrences of neighboring introns were removed.
We performed a search for over-represented kmers in our exonic and intronic sequences using the RESCUE-ESE method as described in . Briefly, using the Perl programming language, counts for all kmers between two sets of sequences are tallied and evaluated for significant over-representation between the sets using a Z-test, and all kmers with Bonferroni-adjusted p-value < 0.01 were clustered and aligned into groups of motifs. Distances between motifs were defined as the absolute deviation between the nucleotide frequencies of two aligned positions in the best local alignment. Motif pairs with distance less than 2 were deemed as a motif match. For kmer calculations with exons, the comparisons are 1) exons versus introns and 2) constitutive versus alternative exons. We also considered two comparisons for each of 5' and 3' SS based on splice site strength scored by the position-specific weight matrices and applied the 1st and 3rd quartiles of all known splice sites as the cutoffs for weak and strong splice sites: 3) weak versus strong splice site exons and 4) strong versus weak splice site exons. For kmer calculations with introns, the comparisons are 1) introns versus exons and 2) flanking introns of constitutive versus alternative exons; 3) flanking introns of weak versus strong splice site exons and 4) flanking introns of strong versus weak splice site exons. All sequence analyses used intronic/exonic sequences up to 200 bp from the SS and excluded the splice site regions that cover the first 5 bp at each end of exons and the first 20 bp and 10 bp for the 3' and 5' ends of introns respectively.
In addition, we searched for novel sequence motifs utilizing the whole data set without the need to dichotomize exons into AE and CE by applying the correlation-based method REDUCE  to the splice site proximal sequences and our exon-level score for measuring exon skipping (n = 83789). REDUCE enumerates all possible 5–6 base pair kmers and finds those that show significant correlation with expression values from a single microarray experiment. Multiple kmers are identified by iteratively removing the significant kmers from sequences and re-evaluating the correlation between the remaining kmer frequencies and residuals of the linear regression fit from the previous run (i.e., subtracting the contribution from the previously selected kmer(s)). We masked the splice sites by removing 5 bp at the 5' and 3' end of each exon. The output lists all significant kmers with p-values adjusted using the Bonferroni correction.
To look for motifs in the development-related AS exon sets, we examined the genes that had appropriate keyword hits (see below). Within these genes, sequences were extracted for exons predicted to be AE based on our exon-skipping score. De novo motif finders MEME  and BioProspector  were applied to these selected exon sequences subtracting the first 5 bp at either end of each exon. BioProspector was run with the following options: only examine the forward stand (-d), assume every sequence has a motif (-a 1) and motif width range from 6 to 18 (-w 6,8,10,12,14,16). MEME was run with the following options: default value of only looking at the forward strand, DNA sequence (-dna), several choices for the expected number of motif occurrences (zero or one in each sequence -zoops, one in each sequence -oops, variable number -tcm) and the motif width (-w 6,8,10,12, and the range -minw 6 -maxw 20). For each of the 15 MEME runs and the top three motifs from the 6 BioProspector runs, we aligned the consensus sequences for the final predicted motifs with ClustalW . All motifs and alignments are displayed in the Figures using WebLogo 
PubMed keyword search
For all genes in the Genome Browser RefSeq transcript list, the software R http://www.r-project.org/, with the "annotate" and "XML" packages was used to search each abstract of the references listed in the PubMed report for this gene. A hit for a gene was defined if the keyword(s) occurred in at least two of the abstracts for the gene. For each keyword, we performed a Fisher's exact test to evaluate whether the keyword appeared more frequently than expected by chance in our gene set compared to all genes. For the splicing keyword we searched for either "splicing" or "splice".
alternatively spliced exons
constitutively spliced exons
coefficient of variation
exonic splicing enhancers
receiver operating characteristic.
This research was supported by the Center for Bioinformatics and Molecular Biostatistics at University of California, San Francisco and a Sloan Foundation/DOE post-doctoral fellowship to KK.
- Lim LP, Burge CB: A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci USA. 2001, 98 (20): 11193-11198. 10.1073/pnas.201407298.PubMedPubMed CentralView ArticleGoogle Scholar
- Lopez AJ: Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu Rev Genet. 1998, 32: 279-305. 10.1146/annurev.genet.32.1.279.PubMedView ArticleGoogle Scholar
- Blencowe BJ: Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci. 2000, 25 (3): 106-110. 10.1016/S0968-0004(00)01549-8.PubMedView ArticleGoogle Scholar
- Black DL: Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003, 72: 291-336. 10.1146/annurev.biochem.72.121801.161720.PubMedView ArticleGoogle Scholar
- Modrek B, Resch A, Grasso C, Lee C: Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 2001, 29 (13): 2850-2859. 10.1093/nar/29.13.2850.PubMedPubMed CentralView ArticleGoogle Scholar
- Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003, 302 (5653): 2141-2144. 10.1126/science.1090100.PubMedView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.PubMedView ArticleGoogle Scholar
- Coulter LR, Landree MA, Cooper TA: Identification of a new class of exonic splicing enhancers by in vivo selection. Mol Cell Biol. 1997, 17 (4): 2143-2150.PubMedPubMed CentralView ArticleGoogle Scholar
- McCullough AJ, Berget SM: G triplets located throughout a class of small vertebrate introns enforce intron borders and regulate splice site selection. Mol Cell Biol. 1997, 17 (8): 4562-4571.PubMedPubMed CentralView ArticleGoogle Scholar
- Fairbrother WG, Chasin LA: Human genomic sequences that inhibit splicing. Mol Cell Biol. 2000, 20 (18): 6816-6825. 10.1128/MCB.20.18.6816-6825.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Schaal TD, Maniatis T: Selection and characterization of pre-mRNA splicing enhancers: identification of novel SR protein-specific enhancer sequences. Mol Cell Biol. 1999, 19 (3): 1705-1719.PubMedPubMed CentralView ArticleGoogle Scholar
- Fairbrother WG, Yeh RF, Sharp PA, Burge CB: Predictive identification of exonic splicing enhancers in human genes. Science. 2002, 297 (5583): 1007-1013. 10.1126/science.1073774.PubMedView ArticleGoogle Scholar
- Sironi M, Menozzi G, Riva L, Cagliani R, Comi GP, Bresolin N, Giorda R, Pozzoli U: Silencer elements as possible inhibitors of pseudoexon splicing. Nucleic Acids Res. 2004, 32 (5): 1783-1791. 10.1093/nar/gkh341.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang XH, Chasin LA: Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 2004, 18 (11): 1241-1250. 10.1101/gad.1195304.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang XH, Heller KA, Hefter I, Leslie CS, Chasin LA: Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. Genome Res. 2003, 13 (12): 2637-2650. 10.1101/gr.1679003.PubMedPubMed CentralView ArticleGoogle Scholar
- Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG: Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Res. 2001, 29 (11): 2338-2348. 10.1093/nar/29.11.2338.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark F, Thanaraj TA: Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum Mol Genet. 2002, 11 (4): 451-464. 10.1093/hmg/11.4.451.PubMedView ArticleGoogle Scholar
- Miriami E, Margalit H, Sperling R: Conserved sequence elements associated with exon skipping. Nucleic Acids Res. 2003, 31 (7): 1974-1983. 10.1093/nar/gkg279.PubMedPubMed CentralView ArticleGoogle Scholar
- Zavolan M, Kondo S, Schonbach C, Adachi J, Hume DA, Hayashizaki Y, Gaasterland T: Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res. 2003, 13 (6B): 1290-1300. 10.1101/gr.1017303.PubMedPubMed CentralView ArticleGoogle Scholar
- Yeo G, Holste D, Kreiman G, Burge CB: Variation in alternative splicing across human tissues. Genome Biol. 2004, 5 (10): R74-10.1186/gb-2004-5-10-r74.PubMedPubMed CentralView ArticleGoogle Scholar
- Lian Y, Garner HR: Evidence for the regulation of alternative splicing via complementary DNA sequence repeats. Bioinformatics. 2005, 21 (8): 1358-1364. 10.1093/bioinformatics/bti180.PubMedView ArticleGoogle Scholar
- Yeo GW, Nostrand EL, Liang TY: Discovery and analysis of evolutionarily conserved intronic splicing regulatory elements. PLoS Genet. 2007, 3 (5): e85-10.1371/journal.pgen.0030085.PubMedPubMed CentralView ArticleGoogle Scholar
- Sorek R, Ast G: Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res. 2003, 13 (7): 1631-1637. 10.1101/gr.1208803.PubMedPubMed CentralView ArticleGoogle Scholar
- Goren A, Ram O, Amit M, Keren H, Lev-Maor G, Vig I, Pupko T, Ast G: Comparative analysis identifies exonic splicing regulatory sequences – The complex definition of enhancers and silencers. Mol Cell. 2006, 22 (6): 769-781. 10.1016/j.molcel.2006.05.008.PubMedView ArticleGoogle Scholar
- Itoh H, Washio T, Tomita M: Computational comparative analyses of alternative splicing regulation using full-length cDNA of various eukaryotes. Rna. 2004, 10 (7): 1005-1018. 10.1261/rna.5221604.PubMedPubMed CentralView ArticleGoogle Scholar
- Stadler MB, Shomron N, Yeo GW, Schneider A, Xiao X, Burge CB: Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genet. 2006, 2 (11): e191-10.1371/journal.pgen.0020191.PubMedPubMed CentralView ArticleGoogle Scholar
- Mironov AA, Fickett JW, Gelfand MS: Frequent alternative splicing of human genes. Genome Res. 1999, 9 (12): 1288-1293. 10.1101/gr.9.12.1288.PubMedPubMed CentralView ArticleGoogle Scholar
- Kan Z, Rouchka EC, Gish WR, States DJ: Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 2001, 11 (5): 889-900. 10.1101/gr.155001.PubMedPubMed CentralView ArticleGoogle Scholar
- Modrek B, Lee CJ: Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 2003, 34 (2): 177-180. 10.1038/ng1159.PubMedView ArticleGoogle Scholar
- Clark TA, Sugnet CW, Ares M: Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002, 296 (5569): 907-910. 10.1126/science.1069415.PubMedView ArticleGoogle Scholar
- Srinivasan K, Shiue L, Hayes JD, Centers R, Fitzwater S, Loewen R, Edmondson LR, Bryant J, Smith M, Rommelfanger C: Detection and measurement of alternative splicing using splicing-sensitive microarrays. Methods. 2005, 37 (4): 345-359. 10.1016/j.ymeth.2005.09.007.PubMedView ArticleGoogle Scholar
- Pan Q, Shai O, Misquitta C, Zhang W, Saltzman AL, Mohammad N, Babak T, Siu H, Hughes TR, Morris QD: Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. Mol Cell. 2004, 16 (6): 929-941. 10.1016/j.molcel.2004.12.004.PubMedView ArticleGoogle Scholar
- Shai O, Morris QD, Blencowe BJ, Frey BJ: Inferring global levels of alternative splicing isoforms using a generative model of microarray data. Bioinformatics. 2006, 22 (5): 606-613. 10.1093/bioinformatics/btk028.PubMedView ArticleGoogle Scholar
- Sugnet CW, Srinivasan K, Clark TA, O'Brien G, Cline MS, Wang H, Williams A, Kulp D, Blume JE, Haussler D: Unusual intron conservation near tissue-regulated exons found by splicing microarrays. PLoS Comput Biol. 2006, 2 (1): e4-10.1371/journal.pcbi.0020004.PubMedPubMed CentralView ArticleGoogle Scholar
- Cuperlovic-Culf M, Belacel N, Culf AS, Ouellette RJ: Microarray analysis of alternative splicing. Omics. 2006, 10 (3): 344-357. 10.1089/omi.2006.10.344.PubMedView ArticleGoogle Scholar
- Pan Q, Saltzman AL, Kim YK, Misquitta C, Shai O, Maquat LE, Frey BJ, Blencowe BJ: Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expression. Genes Dev. 2006, 20 (2): 153-158. 10.1101/gad.1382806.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark TA, Schweitzer AC, Chen TX, Staples MK, Lu G, Wang H, Williams A, Blume JE: Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol. 2007, 8 (4): R64-10.1186/gb-2007-8-4-r64.PubMedPubMed CentralView ArticleGoogle Scholar
- Gardina PJ, Clark TA, Shimada B, Staples MK, Yang Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S: Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array. BMC Genomics. 2006, 7: 325-10.1186/1471-2164-7-325.PubMedPubMed CentralView ArticleGoogle Scholar
- Ule J, Ule A, Spencer J, Williams A, Hu JS, Cline M, Wang H, Clark T, Fraser C, Ruggiu M: Nova regulates brain-specific splicing to shape the synapse. Nat Genet. 2005, 37 (8): 844-852. 10.1038/ng1610.PubMedView ArticleGoogle Scholar
- Fehlbaum P, Guihal C, Bracco L, Cochet O: A microarray configuration to quantify expression levels and relative abundance of splice variants. Nucleic Acids Res. 2005, 33 (5): e47-10.1093/nar/gni047.PubMedPubMed CentralView ArticleGoogle Scholar
- Heinzen EL, Yoon W, Weale ME, Sen A, Wood NW, Burke JR, Welsh-Bohmer KA, Hulette CM, Sisodiya SM, Goldstein DB: Alternative ion channel splicing in mesial temporal lobe epilepsy and Alzheimer's disease. Genome Biol. 2007, 8 (3): R32-10.1186/gb-2007-8-3-r32.PubMedPubMed CentralView ArticleGoogle Scholar
- Le K, Mitsouras K, Roy M, Wang Q, Xu Q, Nelson SF, Lee C: Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data. Nucleic Acids Res. 2004, 32 (22): e180-10.1093/nar/gnh173.PubMedPubMed CentralView ArticleGoogle Scholar
- Hu GK, Madore SJ, Moldover B, Jatkoe T, Balaban D, Thomas J, Wang Y: Predicting splice variant from DNA chip expression data. Genome Res. 2001, 11 (7): 1237-1245. 10.1101/gr.165501.PubMedPubMed CentralView ArticleGoogle Scholar
- Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 (Suppl 1): S96-104.PubMedView ArticleGoogle Scholar
- Cline MS, Blume J, Cawley S, Clark TA, Hu JS, Lu G, Salomonis N, Wang H, Williams A: ANOSVA: a statistical method for detecting splice variation from expression data. Bioinformatics. 2005, 21 (Suppl 1): i107-115. 10.1093/bioinformatics/bti1010.PubMedView ArticleGoogle Scholar
- Shapiro SS, Wilk MB: An analysis of variance test for normality (complete samples). Biometrika. 1965, 52: 591-611.View ArticleGoogle Scholar
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004, 32 (suppl_1): D493-496. 10.1093/nar/gkh103.PubMedPubMed CentralView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- GOstat. [http://gostat.wehi.edu.au/]
- Bussemaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression. Nat Genet. 2001, 27 (2): 167-171. 10.1038/84792.PubMedView ArticleGoogle Scholar
- Hofmann Y, Lorson CL, Stamm S, Androphy EJ, Wirth B: Htra2-beta 1 stimulates an exonic splicing enhancer and can restore full-length SMN expression to survival motor neuron 2 (SMN2). Proc Natl Acad Sci USA. 2000, 97 (17): 9618-9623. 10.1073/pnas.160181697.PubMedPubMed CentralView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101 (16): 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31 (4): e15-10.1093/nar/gng015.PubMedPubMed CentralView ArticleGoogle Scholar
- Hastings ML, Krainer AR: Pre-mRNA splicing in the new millennium. Curr Opin Cell Biol. 2001, 13 (3): 302-309. 10.1016/S0955-0674(00)00212-X.PubMedView ArticleGoogle Scholar
- Vandenbroucke I, Vandesompele J, De Paepe A, Messiaen L: Quantification of NF1 transcripts reveals novel highly expressed splice variants. FEBS Lett. 2002, 522 (1-3): 71-76. 10.1016/S0014-5793(02)02887-9.PubMedView ArticleGoogle Scholar
- Danglot G, Regnier V, Fauvet D, Vassal G, Kujas M, Bernheim A: Neurofibromatosis 1 (NF1) mRNAs expressed in the central nervous system are differentially spliced in the 5' part of the gene. Hum Mol Genet. 1995, 4 (5): 915-920. 10.1093/hmg/4.5.915.PubMedView ArticleGoogle Scholar
- Jouet M, Rosenthal A, Kenwrick S: Exon 2 of the gene for neural cell adhesion molecule L1 is alternatively spliced in B cells. Brain Res Mol Brain Res. 1995, 30 (2): 378-380. 10.1016/0169-328X(95)00027-P.PubMedView ArticleGoogle Scholar
- Strehler EE, James P, Fischer R, Heim R, Vorherr T, Filoteo AG, Penniston JT, Carafoli E: Peptide sequence analysis and molecular cloning reveal two calcium pump isoforms in the human erythrocyte membrane. J Biol Chem. 1990, 265 (5): 2835-2842.PubMedGoogle Scholar
- Fu L, Abu-Khalil A, Morrison RS, Geschwind DH, Kornblum HI: Expression patterns of epidermal growth factor receptor and fibroblast growth factor receptor 1 mRNA in fetal human brain. J Comp Neurol. 2003, 462 (2): 265-273. 10.1002/cne.10727.PubMedView ArticleGoogle Scholar
- Jiao J, Greendorfer JS, Zhang P, Zinn KR, Diglio CA, Thompson JA: Alternatively spliced FGFR-1 isoform signaling differentially modulates endothelial cell responses to peroxynitrite. Arch Biochem Biophys. 2003, 410 (2): 187-200. 10.1016/S0003-9861(02)00681-1.PubMedView ArticleGoogle Scholar
- Bailey T, Elkan C: Learning of Multiple Motifs in Biopolymers using EM. Machine Learning. 1995, 21 (1–2): 51-80.Google Scholar
- Liu X, Brutlag DL, Liu JS: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001, 127-138.Google Scholar
- Zhang C, Li HR, Fan JB, Wang-Rodriguez J, Downs T, Fu XD, Zhang MQ: Profiling alternatively spliced mRNA isoforms for prostate cancer classification. BMC Bioinformatics. 2006, 7: 202-10.1186/1471-2105-7-202.PubMedPubMed CentralView ArticleGoogle Scholar
- Nagao K, Togawa N, Fujii K, Uchikawa H, Kohno Y, Yamada M, Miyashita T: Detecting tissue-specific alternative splicing and disease-associated aberrant splicing of the PTCH gene with exon junction microarrays. Hum Mol Genet. 2005, 14 (22): 3379-3388. 10.1093/hmg/ddi369.PubMedView ArticleGoogle Scholar
- Ladd AN, Cooper TA: Finding signals that regulate alternative splicing in the post-genomic era. Genome Biol. 2002, 3 (11): reviews0008-10.1186/gb-2002-3-11-reviews0008.PubMedPubMed CentralView ArticleGoogle Scholar
- Minovitsky S, Gee SL, Schokrpur S, Dubchak I, Conboy JG: The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons. Nucleic Acids Res. 2005, 33 (2): 714-724. 10.1093/nar/gki210.PubMedPubMed CentralView ArticleGoogle Scholar
- Das D, Clark TA, Schweitzer A, Yamamoto M, Marr H, Arribere J, Minovitsky S, Poliakov A, Dubchak I, Blume JE: A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing. Nucleic Acids Res. 2007, 35 (14): 4845-4857. 10.1093/nar/gkm485.PubMedPubMed CentralView ArticleGoogle Scholar
- Thanaraj TA, Clark F, Muilu J: Conservation of human alternative splice events in mouse. Nucleic Acids Res. 2003, 31 (10): 2544-2552. 10.1093/nar/gkg355.PubMedPubMed CentralView ArticleGoogle Scholar
- Sorek R, Shamir R, Ast G: How prevalent is functional alternative splicing in the human genome?. Trends Genet. 2004, 20 (2): 68-71. 10.1016/j.tig.2003.12.004.PubMedView ArticleGoogle Scholar
- Yeo GW, van Nostrand E, Holste D, Poggio T, Burge CB: Identification and analysis of alternative splicing events conserved in human and mouse. Proc Natl Acad Sci USA. 2005, 102 (8): 2850-2855. 10.1073/pnas.0409742102.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohler U, Shomron N, Burge CB: Recognition of unknown conserved alternatively spliced exons. PLoS Comput Biol. 2005, 1 (2): 113-122. 10.1371/journal.pcbi.0010015.PubMedGoogle Scholar
- Exon 1.0 ST Array Sample Data. [http://www.affymetrix.com/support/technical/sample_data/exon_array_data.affx]
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4 (2): 249-264. 10.1093/biostatistics/4.2.249.PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralView ArticleGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.