Complex splicing pattern generates great diversity in human NF1 transcripts

Background Mutation analysis of the neurofibromatosis type 1 (NF1) gene has shown that about 30% of NF1 patients carry a splice mutation resulting in the production of one or several shortened transcripts. Some of these transcripts were also found in fresh lymphocytes of healthy individuals, albeit typically at a very low level. Starting from this initial observation, we were interested to gain further insight into the complex nature of NF1 mRNA processing. Results We have used a RT-PCR plasmid library based method to identify novel NF1 splice variants. Several transcripts were observed with specific insertions/deletions and a survey was made. This large group of variants detected in one single gene allows to perform a comparative analysis of the factors involved in splice regulation. Exons that are prone to skipping were systematically analysed for 5' and 3' splice site strength, branch point strength and secondary structure. Conclusion Our study revealed a complex splicing pattern, generating a great diversity in NF1 transcripts. We found that, on average, exons that are spliced out in part of the mRNA have significantly weaker acceptor sites. Some variants identified in this study could have distinct roles and might expand our knowledge of neurofibromin.


Background
The genome of higher eukaryotes codes for most of its proteins through short coding sequences (exons) interrupted by non-coding sequences (introns). These intervening sequences must be removed precisely after transcription of the gene, a process called splicing. Splicing is carried out by the spliceosome, a large complex consisting of proteins and small nuclear RNAs (snRNAs) [1][2][3]. RNA splicing is dependent upon the identity of nucleotide sequences at the exon/intron boundaries. The 5' boundary or donor site of introns in most eukaryotes con-tains the dinucleotide GU, while the 3' boundary or acceptor site contains the dinucleotide AG. These GU-AG motifs are merely parts of longer consensus sequences that span the 5' and 3' splice sites. With the exception of the conserved canonical GU and AG at the 5' and 3' splice sites, splice site sequences may contain several deviations from the consensus. The match of a splice site to this consensus reflects the strength of the site. In addition, a third conserved intronic sequence that is known to be functionally important in splicing is the branch point, located 10-50 nucleotides upstream from the AG dinucleotide. The branch point is characterised by a rather weak consensus sequence YURAY [4]. In the first step of splicing the 2' hydroxyl of an adenosine of the branch point attacks the phosphate at the 5' splice site, producing a free 5' exon and a lariat intermediate. During the second step, the 3' hydroxyl of the 5' exon attacks the phosphate at the 3' splice site, yielding a ligated mRNA and a lariat intron. Now that the total human genome has been sequenced, the surprising outcome is that differences between humans and other organisms are not reflected by the genome complexity. Alternative splicing might play a role in this apparent discrepancy. Alternative splicing is an important mechanism to create protein diversity and to regulate gene expression in a tissue-or developmentalspecific manner. To understand the mechanism of alternative splice site selection the identification of all cis-acting elements that contribute to the efficiency with which splice sites are recognised must be taken into account. These include 1) the strength of the 3' and 5' splice sites, which can be calculated using the consensus values that were determined by Shapiro and Senepathy [5] or by the Splice Site Prediction program by Neural Networks (SSPNN, [http://www.fruitfly.org/seq_tools/splice.html] ) which takes more known sequences into account, 2) the branch point sequences and 3) certain sequences such as exonic splicing enhancers (ESE) that can activate the use of a weak 3' splice site or elements such as exonic splicing silencer elements (ESS) that inhibit splicing [1]. In addition to these cis-acting elements, exon length [6] and secondary structure have also been mentioned to play a role in splicing [7].
Neurofibromatosis type 1 (NF1) is an autosomal dominantly inherited disorder affecting one in 3500 individuals [8]. The gene spans 350 kb of genomic DNA and encodes approximately 13 kb mRNA containing 60 exons. Three exons (9br, 23a and 48a) are known to be subjected to alternative splicing and these isoforms were also detected at the protein level with isoform-specific antibodies [9,10]. Four additional alternative transcripts were described (ex29-, ex30-, ex29/30-and the N-isoform) but no further analysis at the protein level was reported [11,12]. During mutation analysis in the NF1-gene we disclosed several additional splice variants in which specific exons were skipped and that were also present in fresh lymphocytes of non-affected persons, albeit typically at a low level [13]. Several other studies also report the existence of lowly expressed splice NF1 variants. Some of these variants are more abundant when RNA is analysed from aged blood or from blood that was kept at a non-physiological temperature [13][14][15][16]. Starting from this initial observation, we were interested to determine the extent of aberrant and/ or alternative splicing in NF1. A RT-PCR plasmid library was made to identify new NF1 splice variants. We have made a survey of these variants and searched systematically for a number of factors involved in splice site selection aiming to explain the observed variants.

Identification of novel NF1 splice variants in normal cells
The entire NF1-coding region from two normal non-affected individuals was amplified in 5 overlapping segments by RT-PCR according to Heim [17] starting from fresh blood leukocytes. As expression of specific NF1 transcripts can change due to environmental changes [13][14][15][16]18], we extracted the RNA immediately after blood drawing, allowing to identify splice variants that are not induced by stress factors. Plasmid libraries were created and 150 colonies per fragment were lysed and reamplified with the same primers in order to search for the presence of minor splice variants. The PCR products were digested with a restriction enzyme and separated by electrophoresis on an agarose gel (Fig. 1). For the five fragments, a total of 114 clones out of 750 colonies in total showed a different restriction pattern. Sequence analysis of the inserts revealed that these represented NF1 transcripts with specific insertions/deletions. Some of these resulted from cryptic splice site usage, others were found to have deletions precisely at the exon boundaries ('exon skipping'), very few resulted from intron retention. All variants that were found are summarised in Fig. 2. Transcripts containing more than one change per single transcript were also identified. Some changes were in frame (14 variants, green in Fig. 2) others were out of frame (32 variants, red in Fig. 2) and hence would lead to a premature stop codon. All together, 46 different transcripts were detected. The number Schematic representation of all splice variants detected in five overlapping NF1 fragments. In frame events are indicated in green whereas out of frame events are red. Transcript variants described in blue are identical to misspliced transcripts, described previously for NF1 patients, caused by a genomic mutation. The number of colonies detected containing a specific variant is indicated between brackets.

Role of splice site strength in splice site selection
Intensive effort has already been made to understand the mechanisms involved in splice site selection. Because we detected such a large group of variants in one single gene, a detailed examination of the different parameters that play a role in alternative splice regulation could be performed and compared statistically. Early splice site choice in mammals involves protein-protein interactions across the exon, as is proposed in the exon definition model of Berget [21]. Exon definition is mediated in part by SR proteins bridging between the small nuclear ribonucloprotein (snRNP) U1 at a downstream 5' splice site and the U2AF heterodimer at the upstream 3' splice site. First we calculated the strength of the 5' and 3' splice sites of all constitutive and alternative exons and of all deletions and insertions reported in this study. Calculations were performed using the Splice Site Prediction by Neural Network (SSPNN, [http://www.fruitfly.org/seq_tools/ splice.html] ) program (Table 1). In general, sequences that show a better match to the consensus sequences, indicated by a higher calculated score, are considered to be 'strong', as specificity of the splicing reaction is mediated in part by RNA-RNA interactions between these sequences and snRNA molecules. The calculated scores demonstrate that deletions or insertions sometimes occur at strong cryptic splice sites (e.g. the NF1-∆2618_2850 variant), whereas in a few cases no or only a weak consensus splice site was found (score of < 0.7: underscored in Table 1, e.g. NF1-∆5701_6946). The latter might result from aberrant splicing or PCR artefacts.
We wanted to examine whether an association exists between the observed splicing pattern and splice site strength. To this end, a comparison of the scores was made between those single exons that were spliced out in a fraction of the mRNA (bold in Table 1), including the alternative exons 9br, 23a and 48a, and the mean values for all NF1 exons ( Table 2). Skipping of more than one exon probably involves other factors and mechanisms than those based on splice parameters alone. The exon definition model is no more applicable as the distance would be too long to accommodate exon-bridging interactions. Therefore, only exons that were spliced out individually were compared to the mean values. The SSPNN program gave the exons that can become skipped a significantly lower score (0.63, p = 0.03) at the acceptor site, compared to the mean NF1 acceptor score for all NF1 exons (0.81). No such correlation could be found at the donor site. Exon bridging interactions may occur less efficiently as a result of weak binding of splice factors at the acceptor site, thus modulating the splicing efficiency. This finding underlines the importance of the acceptor site in splice site selection. This is in agreement with a recent study [22] in which it was shown that mutating the acceptor site to a stronger site was more effective for full intron splicing of an alternative intron, than strengthening the donor site. Likewise, there was an inverse relationship found between the length of the polythimidine tract, which influences the acceptor strength, at the exon 9 acceptor site and the proportion of exon 9 deleted CFTR mRNA transcripts [23].
Our results indicate that acceptor splice site strength is an important parameter in the regulation of NF1 splicing.

Role of branch point position and strength
Some studies have pointed to the importance of the branch point in alternative splicing [24,25]. The distance between the branch point and the 3' splice site and the base pairing of the branch point with the U2RNA is important for splicing. The majority of branch points used in vertebrate splicing are between 10 and 50 nucleotides upstream from the acceptor splice site. We have analysed 1) the position of the putative branch point sequence and 2) the match of the putative branch point sequence to the loosely defined consensus YURAY. We developed a program to calculate the strength of a branch point. The region between -50 and -10 upstream from the acceptor site was examined for a branch point signal and the corresponding branch point score was calculated (Table 1). A relatively low score points to a suboptimal branch point or to a branch point situated outside the chosen window (-50, -10). Branch points located at long distances are also suboptimal. However, mean branch point strength (Table 2) was not significantly lower (p = 0.411) for the exons associated with skipping.

Role of secondary structure in splice site selection
Recently, it was suggested that NF1 exon skipping events could be explained by structural alterations in possible higher free-energy structures of the pre-mRNA at the donor site [19]. Here in, it was hypothesised that when an al-ternative structure is too different from the lowest freeenergy structure (ground state) this sequence is not recognised by the splicing machinery and that, as a consequence, the corresponding exon becomes skipped. We further elaborated this hypothesis by comparing the lowest free energy structures of the donor-and acceptor sites of all exons that were prone to skipping (bold in Table 1) to their first higher free-energy structures using the mfold program [26,27]. Some structures showed remarkable alterations between both energy states, but many others did not at all. A representative example is given in figure 3 where a striking structural change was found between the energy states of the donor of exon 7 (Fig. 3A,3B). On the other hand, very similar structures were obtained for the donor of exon 4b (Fig. 3C,3D) although this exon was shown to be skipped in a significant part of the mRNA in several tissues (unpublished results) using the highly accurate real-time PCR technique [28]. Moreover, when the window is displaced over a few nucleotides, as exemplified here for exon 8, the ground state alters considerably, (Fig. 3E,3F), demonstrating the difficulty of assigning sig-

Figure 3
Simulations of secondary structure. Secondary structure of the sequence surrounding the donor site of exon 7 (A, B), exon 4b (C, D) and exon 8 (E-F). A and C: lowest free-energy structures; B and D: first higher energy-structure; E and F: prediction using a window of (-50, +50) (E) or (-56, +44) (F) with respect to the splice site. C D   intron 4b  F nificance to these in silico results. These data indicate that one has to be extremely careful when correlating exon skipping events to secondary structure predictions. Limitations of the mathematical model, uncertainties in the thermodynamic parameters and influences of the secondary structure in vivo by specific interactions of RNA-protein complexes limit the prediction capacities of secondary structure by these programs.
On the other hand, comparison of the ground state of a mutant donor site, as was done for G1185+1U and G1185+1A in [19], to the higher energy structure of the normal donor is superfluous to explain skipping. It is well known that splice sites mutated at their highly conserved AG-GT bases are not recognised anymore by the splicing machinery since recognition e.g. of the splice donor by the U1snRNP is mainly directed by RNA-RNA base pairing [29,30].

Conclusions
The data presented in this study indicate that splicing in NF1 is extremely complex. Although several alternative transcripts have been found previously for the NF1 gene [9][10][11][12], we identified multiple novel NF1 splice variants and made a survey of them. This large catalogue of variants detected in one single gene allowed to perform a comparative analysis of the factors involved in splice regulation in order to explain the observed variants. We performed an extensive analysis of different splicing parameters. These include the strength of the 5' and 3' splice site, the branch point strength and secondary structure predictions at the donor and acceptor site. On the basis of the analysed features we can conclude that an interplay of various factors is involved in regulation of NF1 splicing. Competition between the alternative splice sites depends on the relative quality of the different constitutive splice signals and we found that the acceptor strength probably plays a major role in this regulation. The functional significance of the transcripts identified in this study still remains unclear. What fraction of the splicing represents 'noise', caused by the relaxation of the RNA splicing, is currently unknown. Although the RNA was extracted immediately after blood drawing, excluding as far as possible influences due to stress factors, some splice events occurred at non-consensus splice sites and probably represent 'aberrant' splicing. However, some of these novel identified transcripts could potentially encode proteins of different sizes and may have distinct roles. Further research was started to analyse tissue-specific expression of several transcripts. Preliminary results show that some transcripts are highly expressed in specific tissues, demonstrating that splicing in NF1 may be regulated in a tissue specific manner. Further analysis of these transcripts is needed and it will remain a challenging task to elucidate the biological significance of all of them.

RNA isolation and cDNA preparation
RNA was isolated from fresh blood leukocytes from two normal individuals using TRIzol LS Reagent (Invitrogen) according to the manufacturer's instructions. RNA extraction was performed immediately after blood drawing, excluding as far as possible influences due to stress factors. Total RNA (2 µg) was reverse transcribed using Super-Script II Reverse transcriptase (Invitrogen) and random hexamers (Amersham Biosciences).

RT-PCR library formation and PCR analysis
Primers used for the amplification of the total NF1 cDNA in 5 overlapping fragments were as described [17]. Reactions were performed as described [13]. The PCR products were cloned in the pCR2.1-TOPO-vector (Invitrogen) and transformed into TOP-10F' cells. The colonies were lysed and the subcloned fragments were reamplified with the same primers. The five NF1 fragments (NF1-F1 to NF1-F5) were digested subsequently with different restriction enzymes: ScrfI, StyI, and BsrI for NF1-F1, HaeIII and AvaII for NF2-F2, HaeII for NF1-F3 and NF1-F5, and AvaII for NF1-F4. All transcripts with a different restriction pattern were sequenced (primer sequences available upon request) using the Thermo Sequenase fluorescent labelled primer cycle sequencing kit (Amersham Biosciences) and analysed on an ALF-express automated DNA sequencer (Amersham Biosciences).

Nomenclature for the description of the different NF1 transcripts
In order to clearly describing the identity (ID) of the transcripts, following nomenclature was used: • The ID of all transcript variants start with 'NF1' followed by '-'.
• Sequence changes are all described at the RNA level, with 1 = first base of methionine (ATG) at the start. Numbering system for insertions are based on the NF1 genomic sequence (Genbank Accession Nr. AC004526).
• Deletions are described with a '_' separating the first and the last deleted exon.
• Skipping of a complete exon is designated by an 'E' preceding the exon number. Two consecutive deleted exons are separated by '/'. When more than two consecutive exons were deleted, the first and the last deleted exon are separated by a '_' character, non-consecutive exons are separated by a ',' character to indicate that intermediate exons are not deleted.
• Insertions are described by a '+' or a '-' after the nucleotide flanking the insertion site, followed by the position of the first inserted nucleotide of the intron (beginning of intron = +1, for insertions flanking the acceptor site: end of preceding intron = -1), followed by 'ins' and the length of the inserted sequence.
• Insertions of a complete intron are designated by 'IVS' preceding the intron number.
• When more than one change was detected, these changes are described between square brackets, separated by a ',' character.

Splice site scores
The sequence environment of all donor-and acceptor sites was analysed using Splice Site Prediction by Neural Network (SSPNN, URL address: [http://www.fruitfly.org/ seq_tools/splice.html] ).

Branch point scores
We searched for branch point sequences in a window of -50, -10 with respect to the acceptor site. A program was written in DELPHI to calculate the strength of a branch point. Every 5-nucleotide sequence within the window was evaluated as a potential branch point. The 'standard' branch point sequence weight-table [4] was used to assign a score at every 5-nucleotide sequence within the window. The computed score reflects how closely it resembles known branch point sequences. The sequence with the highest score was selected as the 'putative' branch point.

Secondary structures
The mfold computer program version 3.0 [http://bioinfo.math.rpi.edu/~mfold/rna/form1.cgi] was used to predict the RNA secondary structure and to calculate the folding free energy [26,27]. We have computed the minimum-free-energy structures from all the donor-and acceptor sites of the 60 exons of NF1 within a window of 100 bp. It has been proposed [31] that there is a window of about 100 nucleotides after transcription where the pre-mRNA is free to fold. The size of this window would be defined by the time taken for the protein complexes to bind as the nascent RNA emerges.

Statistical analysis
The one-way ANOVA test was used to assess significance in the comparison of mean values.