Identification of the gene encoding Brain Cell Membrane Protein 1 (BCMP1), a putative four-transmembrane protein distantly related to the Peripheral Myelin Protein 22 / Epithelial Membrane Proteins and the Claudins

Background A partial cDNA clone from dog thyroid presenting a very significant similarity with an uncharacterized mouse EST sequence was isolated fortuitously. We report here the identification of the complete mRNA and of the gene, the product of which was termed "brain cell membrane protein 1" (BCMP1). Results The 4 kb-long mRNA sequence exhibited an open-reading frame of only 543 b followed by a 3.2 kb-long 3' untranslated region containing several AUUUA instability motifs. Analysis of the encoded protein sequence identified the presence of four putative transmembrane domains. Similarity searches in protein domain databases identified partial sequence conservations with peripheral myelin protein 22 (PMP22)/ epithelial membrane proteins (EMPs) and Claudins, defining the encoded protein as representative of the existence of a novel subclass in this protein family. Northern-blot analysis of the expression of the corresponding mRNA in adult dog tissues revealed the presence of a huge amount of the 4 kb transcript in the brain. An EGFP-BCMP1 fusion protein expressed in transfected COS-7 cells exhibited a membranous localization as expected. The sequences encoding BCMP1 were assigned to chromosome X in dog, man and rat using radiation hybrid panels and were partly localized in the currently available human genome sequence. Conclusions We have identified the existence in several mammalian species of a gene encoding a putative four-transmembrane protein, BCMP1, wich defines a novel subclass in this family of proteins. In dog at least, the corresponding mRNA is highly present in brain cells. The chromosomal localization of the gene in man makes of it a likely candidate gene for X-linked mental retardation.


Background
We recently developped a screening procedure for the selection of sequences encoding proteins targeted to the cell nucleus. Our method relies on the expression in transfected cells of enhanced green fluorescent protein (EGFP) fusion proteins from cDNA library constructs [1]. The selected clones encode EGFP fusion proteins that accumulate in the cell nucleus. Many of them were shown to harbor cDNA sequences corresponding to nuclear proteins that were translated in frame with the EGFP coding sequence. However, in nearly half of the selected clones the production of a fusion protein able to accumulate in the nucleus was shown to result from out of frame translation of the cDNA sequence fused to the EGFP coding region. On the average indeed, only one out of three cDNAs was positionned in frame with the EGFP coding sequence in the starting library. It was not expected that functional nuclear localization sequences would be generated at random (i.e. by out of frame translation of cDNA sequences) as often as was observed.
One clone, called "C60", that was isolated in this approach exhibited a significant DNA sequence similarity with a mouse EST sequence present in the EMBL/Gen-Bank database (clone MNCb-0941, accession #: AU035837) [1]. No open reading frame (ORF) had been identified in this sequence yet, but the comparison of our dog sequence with the one from mouse identified a putative ORF on the basis that in the 385 bp-long region of similarity most of the differences occurred at the third position of base triplets in frame with a starting ATG codon. However, both sequences diverged before the stop codon was reached. Assuming that this was the correct reading frame, the cDNA portion in our EGFP fusion construct was translated out of frame (frame +2). This out of frame translation generated a 201aa-long sequence presenting several neighbouring clusters of arginine residues, which somehow resembled basic type nuclear localization signals. Although it could explain why this cDNA was isolated in the screening, it did not allow us to conclude whether the protein normally encoded by the cDNA is a nuclear protein or not. To further characterize the protein encoded by the cloned sequences we decided to isolate a complete copy of the corresponding mRNA.

Identification of the complete dog BCMP1 mRNA
The random primed cDNA insert harbored by clone C60 [1] was used as probe to screen a dog thyroid oligo-dT primed cDNA library in λ ZAPII phage vector [2]. Sixty positive clones were obtained out of the 500,000 cDNA clones screened. The longest insert (from clone C60-1) had a size of 4 kb and was entirely sequenced. Compared to the sequence of the insert of clone C60, this cDNA ex-hibited a 2 bp extension in 5' and a 2,944 bp extension in 3'. The 3' poly-A tail was preceded by a correctly placed AATAAA motif ( fig. 1). The longest ORF corresponded to the putative ORF identified previously by comparing the sequence from clone C60 with that of the mouse EST present in the database (see background section). It extended over 543 bp (181 aa), from position 193 to 735 in the cDNA sequence. The translation initiator codon was located in a suitable sequence context according to Kozak's rule [3]. As in the interval an updated homologous mouse sequence had been deposited in the database (clone MNCb-0941, EMBL/GenBank acc. #: AB041540), the comparison of both sequences revealed that the coding region was entirely conserved in dog and mouse ( fig.  2). The 3.2 kb-long sequence located in 3' of the TAG codon (3' UTR) in the dog cDNA was distinctly AT-rich and contained 9 ATTTA motifs. These characteristics have been implicated in the rapid decay and restricted translation of mRNA molecules [4,5,6]. This 3' UTR was shorter in the mouse (1.3 kb) but several portions of it exhibited a remarkably high sequence conservation when compared with the dog sequence ( fig. 2). Especially, the AT-rich character and the occurence of multiple ATTTA motifs were preserved. A search in the database also identified a human sequence (DKFZp564E153, EMBL/ GenBank acc. #: AL049257) presenting a very high degree of sequence conservation over 2.5 kb with the 3' part of our dog cDNA ( fig. 2). The coding region of the mRNA was not contained in this human sequence and the observation of such an extended conservation of DNA sequence between UTRs from different species was unexpected. During the preparation of this manuscript, a completed human sequence appeared in the database (DKFZp761J17121, EMBL/GenBank acc. #:AL136550). The coding region was entirely conserved between dog, mouse and man, and most of the ATTTA motifs present in the dog sequence were also preserved in man ( fig. 2). It may suggest that BCMP1 mRNA is indeed subjected to tight post-transcriptional controls. However, whether the presence of these sequences really confers instability to the mRNA and restricts its translation remains to be determined experimentally.
A number of EST sequences from various species which were clearly homologous to dog BCMP1 could be retrieved from the database by BLAST searches. They indicated that the BCMP1 gene must also exist in the rat (e.g. acc. # BG381247), beef (e.g. acc. # AW352911), pig (e.g. acc. # BF704530) and in the fish Gillichthys mirabilis (acc. # AF266205), in addition to the already cited dog, mouse and man.

Figure 1
Nucleotide sequence of dog BCMP1 cDNA. The aminoacid sequence encoded by the ORF appears above the corresponding DNA sequence. The underlined sequence corresponds to the insert of the original clone C60 (see text). ATTTA motifs appear in bold and the polyadenylation signal is highlighted in blue.

Analysis of BCMP1 mRNA expression in the dog
Originally, the cDNA had been isolated from a dog thyroid cDNA library. In order to investigate whether the corresponding mRNA was also present in other cell types, a northern blot experiment was performed using poly-A+ RNA preparations from various dog tissues ( fig.  3). Huge amounts of the 4 kb transcript were detected in brain cells. The presence of the mRNA was also detectable in most of the other RNA preparations but to lesser extents as compared to that found in brain RNA. The encoded protein is thus expected to be particularly abundant in the brain, unless the peculiar 3' UTR of the mRNA mediates a deep control on its translation (see above).

Prediction of BCMP1 protein structure and subcellular localization of an EGFP-BCMP1 fusion protein
The 181 aa-long protein sequence encoded by the mRNA did not present any significant ressemblance with sequences present in protein databases. A search for the presence of protein family signatures (PfamHMM on Expasy server) revealed the occurrence in the novel protein of sequence motifs ressembling significantly to one of the two motifs specific to the peripheral myelin protein 22 (PMP22) family of proteins and to the motif specific to the claudins ( fig. 4). The two identified signatures overlapped partially in the novel protein sequence. PMP22 and the related epithelial membrane proteins (EMPs) [7], as well as the claudins [8], all belong to the superfamily of four-transmembrane domain (4TM) proteins. As could be expected, the search for the existence of putative transmembrane domains in the novel protein (HMMTOP on Expasy server) identified the presence of four of such domains ( fig. 5). The protein thus appeared to be a novel member of this large family of proteins, somehow related to PMP22/EMPs and the claudins. As these are integral membrane proteins, and as the mRNA encoding the novel protein was predominantly found in brain, the newly identified protein was termed "brain cell membrane protein 1" (BCMP1).
According to the putative BCMP1 structure, the extracellular loop between TM1 and TM2 would be larger than the intracellular loop between TM2 and TM3 and the ex-   tracellular loop between TM3 and TM4, as it was supposed to be also the case in PMP22/EMPs and claudins. However, the intracellular amino-terminal arm proceeding the first transmembrane domain appeared to be much longer in BCMP1 than in its relatives.
In order to refine the classification of BCMP1 within the four-transmembrane domain protein family, a phylogenetic tree was constructed on the basis of the alignment of the available protein sequences related to the PMP22/ EMPs and claudins ( fig. 6). Dog BCMP1 and its mouse and human orthologs segregated as a distinct subgroup in the tree. Their closest relatives were the recently identified mouse PERP gene product [9] and the protein encoded by the CG6982 gene in drosophila (EMBL/ GenBank acc. #: AAF56054). This group of proteins thus shared primary structure determinants which defined a distinct subclass in the protein family.
In order to assess experimentally the postulated membranous localization of BCMP1, an EGFP-BCMP1 fusion protein was expressed in transiently transfected COS-7 cells and the subcellular localization of the hybrid protein was observed by fluorescence microscopy ( fig. 7A and 7B). A fine granular fluorescence was observed all over the surface of cells expressing the EGFP-BCMP1 fusion protein, consistent with a plasma membrane localization of the tagged protein. A stronger fluorescence surrounding the cell nucleus was also observed. It indicated that a significant part of the expressed fusion protein accumulated in the endoplasmic reticulum. The pattern of EGFP fluorescence remained almost un- changed when the cells were permeabilized with saponin in order to stain the nuclear DNA ( fig. 7C and 7D). This indicated that the fusion protein was embedded in the membranes and that it was not able to readily diffuse out of the cell.

Localization of the BCMP1 gene in dog, man and rat
In dog, the BCMP1 coding sequence was typed in duplicate on the 118 cell lines of the RHDF5000-2 radiation hybrid panel [10] on the latest version of the RH map [11 ].The BCMP1 gene was linked to chromosome X close to FH2548 with a Lod score of 11.88. Marker FH2548 is located close to the DMD locus in dog (distance: 4.4 cR 5000 , approx. 500 kb). More informations about dog RH maps can be found at http://www-recomgen.univrennes 1. fr/doggy. html.
The human EST sequence DKFZp564E153 (EMBL/Gen-Bank acc. #: AL049257) that corresponds to the 3' UTR of dog BCMP1 mRNA had been localized on chromosome X. The corresponding human genomic sequence could not be found by BLAST searches against sequences available in the database. However, by using the coding region of dog BCMP1 a significant match was identified with genomic sequences assigned to human chromosome 8 (clone RP11-31H18, EMBL/GenBank acc. #: AC041003). The similarity extended from position 1 to 418 in the human cDNA sequence ( fig. 2), which corresponded to the amino-terminal part of the protein up to the first extracellular loop. As an intron was found at this same position in PMP22, EMP-1 and EMP-3 genes, it appeared likely that we had identified the first coding exon of the human BCMP1 gene. PMP22, EMP-1 and EMP-3 genes all contain an additional intron separating the sequences encoding the first transmembrane domain and the first extracellular loop into two exons [7]. This intron is clearly not present in the human BCMP1 gene. In order to clarify the location of the gene in the human genome (chromosome X or chromosome 8 ?), the GeneBridge-4 WGRH panel was used to map the sequences encoding human BCMP1 using a pair of primers directing the amplification of a 666 bp-long fragment encompassing the entire first coding exon and the exon-intron junction. It revealed that the amplified segment was located on chromosome X, 0.20 cR 3000 from marker Wl-7096 and 6.51 cR 3000 from marker DXS1214. This location agreed with the previous assignment of the EST sequence DKFZp564E153. It also corresponds to the cytogenetic location Xp11.4. As the DMD gene maps at Xp21.2 in m an, it is thus also close to the BCMP1 gene in this species. The chromosomal localization result revealed unambiguously the existence of a single BCMP1 locus in the human genome. As a consequence, it indicated that the sequences of the genomic clone RP11-31H18 had been inappropriately assigned to chromosome 8 instead of chromosome X in the database.

Figure 4
Identification of PMP22 and Claudin family signatures in BCMP1 primary structure. Conserved residues are coloured (blue: PMP22 signature, red: claudin signature, violet: overlapping PMP22 and claudin signatures) and conserved spaces between specific residues are over -or underlined. The part of the predicted BCMP1 structure (see fig. 5) to which the primary sequence corresponds is shown (TM1 = first transmembrane domain).
In the annotated human genome sequence available on the Ensembl server, the first coding exon of the human BCMP1 gene (gene ID:ENSG00000101959) is present in the chromosome X sequence (the sequences of clone RP11-31H18 have now been properly reassigned to chromosome X; see ContigView on Ensembl server). Part of the coding region of human BCMP1 and the whole 3'  UTR region corresponding to DKFZp564E153, are still missing in the currently available human genome sequence.
Using primers derived from the rat EST236642 (EMBL/ GenBank acc. # AI408352), which is 91% identical to the segment 2028-1434 of the mouse brain cDNA MNCb-0941, itself similar to dog BCMP1 cDNA, the rat gene (symbol: Bcmp1) was assigned to the chromosome X, between DXRat67 and DXRat28, at 497.9 cR along the MCW map (LOD score: 9.0; the local map is:DXRat67 -29.9 CR -Bcmp1 -0.4 cR -DXRat28). The marker DXRat67 co-localizes with the gene Dmd [12], itself cytogenetically assigned to Xq22 [13]. The rat genes Bcmp1 and Dmd are thus closely linked, as was already observed in dog and man.

Conclusions
We have described here the identification of the gene encoding a novel protein, called Brain Cell Membrane Protein 1 (BCMP1), which belongs to the large family of fourtransmembrane proteins and appears to be highly expressed in the brain. The gene seems to be conserved on chromosome X within mammals, in close association with the DMD locus in man, rat and dog at least. The encoded BCMP1 protein shares significant ressemblances with both PMP22/EMPs [7] and the claudins [8], but exhibits distinct features, notably a predicted intracellular amino-terminal extension, which distinguishes it from the other known members of the family.
PMP22/EMPs are integral membrane proteins that seems to be implicated in various cellular processes, such as cellular differentiation, control of proliferation, and apoptosis [7]. PMP22 has been shown to play a critical role in peripheral nerves, where it is involved in the as- sembly of peripheral nerve myelin and in the regulation of proliferation and differentiation of Schwann cells. The claudins also constitute integral membrane proteins which are localized exclusively at tight junctions [8]. Claudin-1, -2 and -3 have been shown to present calcium-independent cell-adhesion activity [14].
Alterations in the PMP22 gene are responsible for hereditary motor and sensory neuropathies in human and rodents, known as Charcot-Marie-Tooth type 1A (CMT1A) disease and Trembler (Tr) mouse respectively [7]. Individuals presenting nonsyndromic recessive deafness (autosomal recessive deafness DFNB29) were recently shown to harbor mutations in the gene encoding claudin-14 [15]. The Xp11.4 region of the human genome which comprises the BCMP1 gene has been linked to several forms of syndromic X-linked mental retardation, such as MRXS-2, -4, -6 and -10, and to a number of nonsyndromic MRX cases [16]. The TM4SF2 gene which apparently encodes another member of the superfamily of four transmembrane proteins, a tetraspanin [17] more distantly related to BCMP1 than are PMP22/EMPs and the claudins, is located very close to the BCMP1 gene in man. Mutations in the TM4SF2 gene and gene inactivation resulting from chromosomal translocation have been shown to be involved in several cases of X-linked mental retardation [18]. Whether the BCMP1 gene is also involved in such genetic disorders and what is the function of the encoded protein thus constitute the obvious questions which will support our future investigations.

DNA constructions
Standard DNA manipulations were conducted according to published procedures [19]. The full length BCMP1 cDNA clone was obtained by screening a dog thyroid cDNA library in λ ZAPII phage vector [2] using the original clone C60 [1] as probe. The DNA sequences corresponding to the cDNA insert in clone C60 were amplified by PCR using primers complementary to the sequences flanking the insert in the construct, 5' CAGATCTCGACCCACGCG 3' and 5' TACCTGCGGCCGCGATAT 3' respectively, and were labeled with digoxigenin (DIG labeling and detection kit, Boehringer Mannheim). Hybridization, washing and signal detection were performed as recommended by the supplier of the labeling system. The cloned DNA was sequenced on both strands using the Big Dye Terminator methodology and a model 377 DNA sequencer (Applied Biosystems). The construct encoding the EGFP-BCMP1 fusion protein was obtained by inserting a PCR fragment corresponding to the BCMP1 ORF between the EcoRI and BamHI sites in the pEGFP-C1 vector (Clontech). The following primers were used to amplify these sequences from the DNA of clone C60: 5' TTCGAATTCGGCGGGCAGCGGC 3' and 5' TGTGGATCCTAGTAGTAGTCTTC 3' .

RNA analysis
Northern blot analysis was performed on 4 µg of polyA+ mRNA from various dog tissues. Acridine orange staining of the gel confirmed that each lane contained identical amounts of RNA. A 32 P-labeled PCR fragment corresponding to the BCMP1 ORF was used as probe (see above for preparation of the DNA fragment). Hybridization and washes were conducted in standard conditions in the presence of 50 % formamide [19].

Cell transfection
Transfection of COS-7 cells was performed using the DEAE-dextran method [20]. About 200 ng of a crude plasmid DNA preparation was engaged per dish (diameter: 3 cm). The subcellular localization of EGFP fluorescence was observed 48 hours after transfection using an Eclipse TE300 inverted microscope (Nikon) equiped with NB-2A and NG-2A filter blocks. The transfected cells were permeabilized using saponin (0.075% final concentration) and nuclear DNA was stained with ethidium bromide (1 µg/ml final concentration) in order to visualize the cell nucleus.

Chromosomal localization
Dog BCMP-1 could be readily typed on the dog x hamster radiation hybrid panel RHDF5000-2 composed of 118 cell lines from panel RHDF5000 [10]. The following pair of primers, 5' TCTGGAGTGAACTAATGGGCTAA 3' and 5' GCAGTCTGAGATTAGTGGCAAA 3' generated a PCR product of 137 bp on dog genomic DNA. PCR results were scored in terms of present, absent or ambiguous in the 118 hybrid cell lines. The typing data were incorporated into the latest radiation hybrid map [11], using the Multimap package [21]. The GeneBridge 4 human x hamster radiation hybrid panel DNA (Research Genetics Inc.) was screened by PCR using the following primers: 5' GGCAGCGGCATCCAGGAA 3' and 5' TGGGGAAGACCAACAGAGAACC 3' . The PCR results were analyzed according to the prescription of the supplier of the panel DNA.
The panel of rat x hamster radiation cell hybrids [12] was typed by PCR with the following primers: 5'-AACTGT-GAATACCAATCTAAGT-3' and 5'-GTTTTTCATTAT-GCAGTTACAG-3'. The mapping results were obtained from the rat radiation hybrid map server at the