Unusual misregulation of RNA splicing caused by insertion of a transposable element into the T (Brachyury) locus

Background The TWis mutant allele of the Brachyury, or T, gene was created by insertion of an endogenous retrovirus-like early transposon (ETn) element into the exon 7 splice donor consensus sequence of the 8 exon T locus. While the developmental consequences of this disruption have been well characterized, the molecular consequences have not been previously investigated, and it has been assumed that the insertion results in a truncated protein. This study sought to further characterize the mutant TWis allele by investigating the nature of the transcripts produced by insertion of this transposable element. Results Using an RT-PCR based approach, we have shown that at least 8 different mutant transcripts are produced from the TWis allele. All TWis transcripts bypass the mutated exon 7 splice donor site, such that wild type T transcripts are not produced from the TWis allele. Conclusions This result shows an unsuspected misregulation of RNA splicing caused by insertion of a transposable element, that could have more widespread consequences in the genome.


Background
The history of the T locus began with the discovery in 1927 of a semi-dominant mutation in mice, named Brachyury, or T for tail, that affects both embryonic viability in homozygotes and tail development in heterozygotes [1]. This original T allele represents a deletion spanning 160-200 kb (reviewed in [2]), the developmental effects of which have been well characterized [3][4][5][6][7][8]. Homozygous mutant embryos show a developmental failure of the notochord and posterior mesoderm, and die at midgestation. Heterozygous mutant mice are born with shortened tails and malformed vertebrae. In 1988, another spontaneous Brachyury mutation, called T Wis , was reported [9]. The T Wis homozygous and heterozygous mutant phenotypes are more severe than those of the T deletion, suggesting that the T Wis allele acts as a dominant negative.
Homozygous mutants have no somites at all and heterozygotes have no tail, rather than a shortened tail (reviewed in [2]).
The T gene was cloned in 1990 [10], and its expression pattern was found to correlate with the tissue types affected in T mutants [8]. Subsequently, T protein was shown to bind specifically to DNA and its preferred in vitro target sequence was identified [11]. It was further shown that T encodes a transcription factor capable of regulating expression of a reporter via the identified target sequence [12]. In their report describing the initial cloning of the T gene, Herrmann et al. [10] also demonstrated that the T Wis allele of T results from the insertion of an endogenous retrovirus-like early transposon (ETn) element into the splice donor site of exon 7 of the 8 exon T locus. They showed that the splice site at the 3' end of exon 7 was mutated from TAGGTATGT to TAGGTGTTG (where underlined sequence is the 3' end of exon 7 and ETn insertion sequence is in bold) and predicted that this site would be nonfunctional, thereby abolishing splicing of exon 7 to exon 8. They proposed that the T Wis allele would produce a transcript comprised of exons 1 through 7 followed by read-through transcription of the ETn element, resulting in a modified C-terminal end, or alternatively, that upstream splice donor sites would be used, shortening the transcript and protein product. In later whole mount in situ analysis, Herrmann [13] confirmed expression of a T locus-derived transcript in T Wis /T Wis embryos, but did not examine its exact nature. The position of the ETn insertion (see Figure 3A) is compatible with the idea that a modified protein could still bind DNA but might compete with the wild type protein or form an inactive complex with it, thus leading to a dominant negative effect.
The aim of this study was to characterize the sequence and structure of the T Wis allele transcript(s), with an eye towards understanding the specific molecular consequences of disruption of the exon 7 splice donor site by the inserted ETn element. Using an RT-PCR-based approach, we have shown that at least 8 different transcripts are produced from the T Wis allele of T. In addition to the mutant transcript predicted by Herrmann et al. [10], 4 transcripts composed solely of T exonic sequences, and 3 transcripts containing ETn sequences spliced between T exonic sequences were identified. All T Wis transcripts bypass the mutated exon 7 splice donor site, such that wild type T transcripts cannot be produced from the T Wis allele.

Identity of the T Wis ETn element
As a first step in characterizing the molecular nature of the T Wis allele of T, the exact identity of the ETn element inserted into the locus was determined. The primer ETn1 was designed to a region of conservation across related but different ETn. PCR was performed on yolk sac lysates from e9.5 T Wis /T Wis mutant embryos using the primer set F3 -ETn1 ( Figure 1A). Sequence analysis of the ~1300 bp PCR product amplified from T Wis /T Wis DNA in two independent reactions showed that the ETn element of the T Wis allele is identical to the 5542 bp ETn sequence, Genbank accession #Y17106, with the T Wis ETn element inserted in reverse orientation relative to the direction of transcription of the T locus (data not shown).

Transcripts produced from the T Wis allele of T
The next step in characterizing the molecular nature of the T Wis allele was to identify the aberrant transcripts produced from this allele. RT-PCR was performed on RNA from T +/+, T Wis /+, and T Wis /T Wis embryos using forward and reverse primers located in different exons of the T lo-cus, as well as a reverse primer specific to the ETn element of the T Wis insertion (Table 1 and Figure 1B). The RT-PCR products were excised, purified, and sequenced twice on each strand with the primers used to generate them. The F2-R1 products were additionally sequenced one time each with primers F1 and R2.
In this preliminary analysis, no wild type T transcript was identified in T Wis /T Wis RNA using any of the primer sets. These results suggest that no T Wis transcripts maintain normal exon 7 to exon 8 splicing, as postulated by Herrman et al. [10]. To further clarify this issue, the splicing of exon 7 to exon 8 was specifically examined using the forward primer F3. This primer is located within the last 35 bp of exon 7, and therefore is deleted in all of the exon 7-containing mutant transcripts already described (see Table 1, Table 2, and Figure 3). RT-PCR analysis with this primer confirmed that while the F1-BGH037 transcript is only produced off the T Wis allele, the F3-R1 transcript is only produced off of the wild type T allele ( Figure 2).
In some cases, a prominent band larger than the expected wild type size was amplified by the F3-R1 primers from T Wis /T Wis embryo RNA (Figure 2). Sequence analysis showed that this larger T Wis /T Wis specific band represented two additional aberrant transcripts (Table 2 and Figure  3C). Notably, these two products were never observed in T Wis /+ RNA (Figure 2). A shorter variant contained the intact 3' end of exon 7 and 60 nt of read-through transcription into the ETn element spliced to exon 8 (7+60ETn,8). Given that the BGH037 primer is contained within the 60

Figure 1 Genomic PCR of the T Wis allele ETn element insertion and identification of wild type T versus T Wis allele transcripts.
A. Primers F3 and ETn1 were used to amplify a portion of the T Wis allele ETn element. T allele genotype of template genomic DNA is indicated above the lanes. As expected, no product was amplified from T +/+ control samples. Brackets indicate bands excised, purified, and sequenced. Ladder = 100 bp DNA ladder, with sizes indicated to right. B. RT-PCR analysis of T +/+, T Wis /+, and T Wis /T Wis e8.5 embryo RNA. Genotypes of embryos are indicated above the lanes; -cont = no RNA negative control. Primer pairs used to amplify the product are indicated to the right (see also Table 1). Brackets indicate bands excised, purified, and sequenced. Products were named based on the RT-PCR primer set followed by the lower case letter next to the bracket (see Table 2).
1000bp nt of ETn read-through, this species is not necessarily distinct from the 6,7+ETn transcript amplified by F1-BGH037, as both could be amplified from the same hypothetical RNA 6,7+60ETn,8. The longer variant contained the intact 3' end of exon 7 and 60 nt of read-through into the ETn element spliced to a 37 nt mini-exon of ETn sequence, then spliced to exon 8 (7+60ETn,+37ETn,8). The 37 nt ETn mini-exon was the same sequence found in the 4,5,6,7∆35,+37ETn,8 and 6,7∆35,+37ETn,8 products described above. This species is also not necessarily distinct from the 6,7+ETn transcript amplified by F1-BGH037, as both could be amplified from the same hypothetical RNA 6,7+60ETn,+37ETn,8. Alternatively, the 6,7+ETn product might have amplified from an RNA 6,7+?ETn, where an unknown amount of the ETn insertion was transcribed and the transcript terminated in ETn sequence without being spliced to exon 8. Overall, wild type exon 7 to exon 8 splicing was never identified in transcripts produced from the T Wis allele of T.
The characterization of T Wis allele transcripts presented here confirms that the intron 7 donor splice site is indeed non-functional such that wild type T mRNA is never produced. Furthermore, the exon 1-7+ETn transcript predicted by Herrmann et al. [10] was identified. However, at least 7 other T Wis mutant transcripts are also produced ( Figure 3C). All of these transcripts serve to bypass the mutated splice donor site; either by splicing over exon 7, activating a cryptic splice site within exon 7, or transcribing through exon 7 into the adjacent ETn sequences. The cryptic exon 7 splice site used, CCTGTGAGT, is a strong match to the C/A,AGGT,A/G,AGT splice donor consensus. It is important to note that in addition to the exon 4 to 8, exon 5 to 8, and exon 6 to 8 splicing events detected here, the potential remains that transcripts resulting from exon 1 to 8, exon 2 to 8, and/or exon 3 to 8 splicing events are produced but were not discovered.  [10,16], and Bernhard G. Herrmann personal communication. 2 Assuming proper T exon 6 to 7 splicing, then read-through transcription from exon 7 into adjacent ETn sequences, in T Wis allele transcripts only.    1 A comma (,) denotes the joining by splicing of non-adjacent sequences. 7∆35 indicates an incomplete exon 7 with a 35 nucleotide deletion at the 3' end. 7+ETn and 7+60ETn indicate read-through transcription from an intact exon 7 into adjacent ETn sequences. The +37ETn mini-exon sequence is Genbank accession #Y17106 bases 4515-4479, 5'-GAAACTCAGAAATGGTCAAGCTGGACCTTCCCTTGCA-3'. The +60ETn readthrough sequence is 5'-GTGTTGCGGCCGCCAGCAGCTCGCAACGTGA-ACGGTTCGACTGAGAAGGCCGCTCGAGCT-3', where the 5' most G is the first base of T intron 7 [10], and the remainder of the sequence is Genbank accession #Y17106 bases 5542-5484.

RT-PCR product Embryo Genotype Product Size (in bp)
Genetically, the T Wis allele acts as a dominant negative in that T Wis /T Wis embryos have a more severe phenotype than the null T/T embryos. However, the biochemical nature of this dominant negative effect is not known. This study confirms previous work [13] showing the existence of a transcript that would result in a truncated protein if translated. Because the DNA-binding T-domain would be intact, this hypothetical protein could compete with the wild type protein producing a dominant negative effect. Alternatively, the presence of heterologous Etn element sequences in T Wis transcripts may stimulate posttranscriptional gene silencing by an RNA-induced silencing complex (RISC)-based RNAi degradation pathway which could target degradation of wild type transcripts [14].

Conclusions
We have shown that disruption of a single splice donor site within a multi-exon locus can lead to a dramatic misregulation of RNA splicing. A straightforward prediction suggests that in T Wis transcripts, splicing "out of" exon 7 would simply fail and result in a transcript terminating with the ETn element. In reality the effect is much more complex, cautioning against the use of straightforward assumptions in predicting the molecular consequences of genetic alterations.

Figure 2
Examination of exon 7 to exon 8 splicing in wild type versus T Wis allele-specific transcripts. RT-PCR analysis of T +/+, T Wis /+, and T Wis /T Wis e7.5 embryo RNA. Genotypes of embryos are indicated above the lanes; + cont = T +/+ RNA control; -cont = no RNA negative control. A. Primer pairs used to amplify the product are indicated to the right (see also Table  1). F1-BGH037, an exon 6-ETn element primer pair, only amplifies from T Wis /+ and T Wis /T Wis samples. F3-R1, a 3' end of exon 7 to exon 8 primer pair, only amplifies from +/+ and T Wis /+ samples. B. Additional RT-PCR using primers F3-R1. Size, in base pairs (bp), indicated to the right. * = unincorporated primers visible on the gel. Brackets indicate bands excised, purified, and sequenced. Products were named based on RT-PCR primer set followed by lower case letter next to the bracket (see Table  2).
B.  [10,16], and Bernhard G. Herrmann personal communication). Primers used for RT-PCR are indicated (not to scale), where the arrow indicates the approximate primer position within the exon or ETn element. Primer ETn1 used for PCR is located within the ETn insertion approximately 1.3kb 3' of primer BGH037. B. Wild type T transcript: 5' and 3' UTR sequences are in black; ORF sequences in various colors, with exon numbers indicated. RT-PCR products amplified from this RNA in T +/+ embryo controls are indicated to the right (see Table 2). C. Structures of T Wis RNA species transcribed from T Wis allele as deduced from amplified RT-PCR products (indicated to the right, see Table 2) assuming that transcripts are wild type 5' and 3' of the RT-PCR primers used to amplify products. Note that the RT-PCR product 6,7+ETn could have amplified from any of three transcripts. In no case was wild type T RNA detected in RT-PCR on T Wis /T Wis embryo RNA.