Cloning and characterization of the mouse Mcoln1 gene reveals an alternatively spliced transcript not seen in humans

Background Mucolipidosis type IV (MLIV) is an autosomal recessive lysosomal storage disorder characterized by severe neurologic and ophthalmologic abnormalities. Recently the MLIV gene, MCOLN1, has been identified as a new member of the transient receptor potential (TRP) cation channel superfamily. Here we report the cloning and characterization of the mouse homologue, Mcoln1, and report a novel splice variant that is not seen in humans. Results The human and mouse genes display a high degree of synteny. Mcoln1 shows 91% amino acid and 86% nucleotide identity to MCOLN1. Also, Mcoln1 maps to chromosome 8 and contains an open reading frame of 580 amino acids, with a transcript length of approximately 2 kb encoded by 14 exons, similar to its human counterpart. The transcript that results from murine specific alternative splicing encodes a 611 amino acid protein that differs at the c-terminus. Conclusions Mcoln1 is highly similar to MCOLN1, especially in the transmembrane domains and ion pore region. Also, the late endosomal/lysosomal targeting signal is conserved, supporting the hypothesis that the protein is localized to these vesicle membranes. To date, there are very few reports describing species-specific splice variants. While identification of Mcoln1 is crucial to the development of mouse models for MLIV, the fact that there are two transcripts in mice suggests an additional or alternate function of the gene that may complicate phenotypic assessment.


Background
Mucolipidosis type IV (MLIV; MIM 252650) is an autosomal recessive lysosomal storage disorder that is characterized by corneal clouding, delayed psychomotor development, and mental retardation that usually presents during the first year of life [1]. Another interesting clinical characteristic is that patients are constitutively achlorhydric with associated hypergastremia [2]. Patients with MLIV do not show mucopolysaccharide excretion, skeletal changes, or organomegaly like the other mucolipidoses. Abnormal lysosomal storage bodies and large vac-uoles have been found in skin and conjuctival biopsies using electron-microscopy and, prior to gene identification, served as the only means of diagnosis [3][4][5]. A recent report estimates that the carrier frequency of MLIV in the Ashkenazi Jewish population is 1 in 100, and mutations have been reported in Jewish and non-Jewish families [6][7][8][9].
The human gene MCOLN1 (GenBank #AF287270) maps to chromosome 19p13. 2-13.3 and encodes a novel protein that is a member of the transient receptor potential (TRP) cation channel gene superfamily [7][8][9][10]. Protein trafficking studies suggest that MLIV is the result of a de-fect in the late endocytic pathway, contrary to the other mucolipidoses which are typically caused by defective lysosomal hydrolases [11,12]. Recent work in Caenorhabditis elegans supports this hypothesis. Loss of function mutants of the MCOLN1 C. elegans homologue, cup-5, result in an increased rate of endocytosis, accumulation of large vacuoles, and a decreased rate of endocytosed protein breakdown; while over-expression of this gene reverses the phenotype [13]. Cloning and characterization of the mouse homologue of MCOLN1 is crucial for the development of mouse models of MLIV to further study this disorder.

Cloning and mapping of the mouse homologue Mcoln1
In order to clone the mouse homologue of MCOLN1, the human amino acid sequence was compared to the high throughput genomic sequence (HTGS) database using TBLASTN, which identified the mouse BAC clone RPCI-23_387H4 (GB No. AC079544.1). Correspondence with the Joint Genome Institute and the Lawrence Livermore National Laboratory (LLNL) Human Genome Center confirmed the location of this BAC to mouse chromosome 8 and allowed us to construct a physical map of this region [14] (Fig. 1A). The BAC sequence was then compared to the mouse EST database using BLASTN, and multiple ESTs and their corresponding I.M.A.G.E. clones were identified.  [15], and the presence of the zinc finger gene and Nte confirms and extends the region of synteny between human chromosome 19 and mouse chromosome 8.

Characterization of Mcoln1
Comparison of the mouse and human peptide sequences showed 91% identity (Fig. 2). The C. elegans homologue cup-5 shows 34% identity with Mcoln1 and BLASTP analysis of Mcoln1 identified a putative Drosophila melanogaster homologue that shows 38% identity (Fig. 2). Interestingly, two MCOLN1 amino acid substitutions that result in MLIV occur at conserved amino acids. TMPred analysis [http://www.ch.embnet.org/software/ TMPRED_form.html] predicts a protein structure that is nearly identical to MCOLN1, containing 6 transmembrane domains with the N-and C-termini residing in the cytoplasm (Fig. 2) [9].

Expression analysis of Mcoln1
Mouse adult multiple tissue and embryonic Northern blots were hybridized using a probe generated from mouse exon 2 (probe 1, Fig. 1B  UTR, and all probes identified the same two transcripts in the mouse (data not shown). In order to verify the presence of a single mouse locus, we hybridized a mouse Southern blot with the exon 2 probe. Four different restriction enzymes were used, and only the expected size bands for the chromosome 8 locus were detected (data not shown).

Characterization of the Mcoln1 alternative splice variant
In order to determine the coding sequence for the larger transcript, we searched the mouse EST database using each intron as well as the genomic sequence flanking the Mcoln1 gene. Two ESTs were identified that contained sequence from intron 12 (GB No. AI430291 and AA874645), and the corresponding clones were sequenced. Clone 408619 (ESTs: GB No. AI430291, AI429558) begins approximately 1.1 kb before exon 13 and continues through the exon and splices correctly to exon 14. Clone 1281641 (EST:GB No. AA874645) begins 175 bp before exon 13 and also splices correctly to exon 14. A mouse multiple tissue Northern was hybridized using a probe generated from the putative intron sequence in clone 408619 (Probe 2, Fig. 1B), which detected only the 4.4 kb band (Fig. 3C).
In order to determine the sequence of the entire transcript, RT-PCR using primers in exons 10 and 11 paired with a primer in intron 12 was performed using BALB/c mouse brain total RNA and the resulting products sequenced. These products show that the larger transcript is due to an alternative splice event that results in an expanded exon 13. Specifically, exon 12 splices at bp 436 of intron 12, creating a large 1614 bp exon 13 that splices correctly to exon 14. The open reading frame of this alternatively spliced transcript is 611 amino acids, 28 amino acids longer than the message encoded by the 2.4 kb transcript.
TMPred analysis predicts that isoform 2 encodes a protein identical in structure to Mcoln1, possessing 6 transmembrane domains and a channel pore, however the protein sequences diverge at amino acid 526. The 55 amino acid C-terminal cytoplasmic tail encoded by the 2.4 kb transcript is completely different from the 86 amino acid tail encoded by the murine specific 4.4 kb transcript (Fig. 4). Clontech Mouse RNA Master Blots were hybridized with the exon 2 and intron 12 probes mentioned above in an attempt to determine if these two transcripts showed differences in expression patterns, however, there was no significant difference in the 22 tissues represented (data not shown).
Next, we directly compared the nucleotide and amino acid sequence of the alternatively spliced mouse transcript to the entire human MCOLN1 genomic sequence and found no significant similarity. As mentioned previously, Northern blots performed with human MCOLN1 probes show only one 2.4 kb transcript. In addition, we hybridized a human multiple tissue Northern and human Southern with a probe in human intron 12 that is adjacent to exon 13. The probe was located in the region syntenic to that which encodes the alternate mouse transcript.
Only the expected bands were detected on the Southern and no bands were detected on the Northern, confirming that this alternative transcript is specific to murine Mcoln1. Recent BLASTP analysis of the alternate Mcoln1 transcript yields a match to a putative 145 amino acid anonymous protein (GB No. BAB25862) predicted from a RIKEN clone. It is obvious from our results, however, that the identification of this sequence as a full-length protein is incorrect since probes unique to the clone, as well as probes containing the Mcoln1 coding sequence, identify the same transcripts.

Conclusions
Comparison of Mcoln1 isoform 1 to its human homologue shows striking similarity at both the amino acid and nucleotide level. All six of the transmembrane domains, as well as the putative cation channel are highly conserved. The putative di-leucine (L-L-X-X) motif at the Cterminus, which may act as a late endosomal/lysosomal targeting signal, is also conserved [9]. This speculation is supported by work with cup-5 [13], the c. elegans homologue of MCOLN1, since cellular localization studies suggest that the protein is found in the late endosomes and/ or lysosomes.
The mouse Mcoln1 gene has two alternatively spliced isoforms, with isoform 2 having a different c-terminal cytoplasmic tail. The unique 86 amino acid c-terminal tail lacks the lysosomal targeting signal and does not contain any conserved domains when compared against the current profile databases. We speculate that this protein may have similar channel function but an alternate subcellular localization, but this must be proven once isoform-specific antibodies are raised. However, our results suggest that phenotypic assessment of Mcoln1 knock-out mice may be complicated and that care must be taken when interpreting data on mouse gene expression and phenotype.
Interestingly, the second Mcoln1 isoform is not seen in humans and the sequence of the alternatively spliced region is not conserved between man and mouse. To date, very few genes have been reported that show species specific alternative splice variants. MOG, myelin/oligodendrocyte glycoprotein, has many different splice variants in humans that are not found in mice [17]. ATP11B, a P-type ATPase, has a rabbit-specific splice variant that deletes a transmembrane domain and therefore likely alters the putative function of the protein [18]. Sequencing of the human genome has led to estimates of approximately 32,000 genes, a total surprise given the previous significantly higher estimates that were based on the number of expressed sequence tags (ESTs) in the public databases. This apparent disparity suggests a major role for alternative splicing in creating genetic complexity, and has brought the study of splicing regulation to the forefront of molecular genetics. It is likely that an abundance of species-specific splice variants will be identified as the characterization of alternatively spliced transcripts progresses.

Figure 4
Peptide sequence comparison of the two alternatively spliced Mcoln1 isoforms. The green box surrounds the divergent c-terminal cytoplasmic tails. The blue lines indicate the transmembrane domains.