The cloning, genomic organization and tissue expression profile of the human DLG5 gene

Background Familial atrial fibrillation, an autosomal dominant disease, was previously mapped to chromosome 10q22. One of the genes mapped to the 10q22 region is DLG5, a member of the MAGUKs (Membrane Associated Gyanylate Kinase) family which mediates intracellular signaling. Only a partial cDNA was available for DLG5. To exclude potential disease inducing mutations, it was necessary to obtain a complete cDNA and genomic sequence of the gene. Methods The Northern Blot analysis performed using 3' UTR of this gene indicated the transcript size to be about 7.2 KB. Using race technique and library screening the entire cDNA was cloned. This gene was evaluated by sequencing the coding region and splice functions in normal and affected family members with familial atrial fibrillation. Furthermore, haploid cell lines from affected patients were generated and analyzed for deletions that may have been missed by PCR. Results We identified two distinct alternately spliced transcripts of this gene. The genomic sequence of the DLG5 gene spanned 79 KB with 32 exons and was shown to have ubiquitous human tissue expression including placenta, heart, skeletal muscle, liver and pancreas. Conclusions The entire cDNA of DLG5 was identified, sequenced and its genomic organization determined.


Background
Atrial fibrillation is a chaotic atrial rhythm resulting from abnormal signal generation and conduction in the atria. The molecular basis for atrial fibrillation remains unknown. It usually occurs in association with structural or metabolic abnormalities. Atrial fibrillation, however, does occur in individuals with none of these causes, referred to as lone atrial fibrillation. Recently, we identified several families of multiple generations affected with atri-al fibrillation with no other underlying cause detected. This subset of familial atrial fibrillation provides an opportunity to explore a molecular basis for atrial fibrillation.
Atrial fibrillation segregates as an autosomal dominant disease characterized by atrial fibrillation on an electrocardiogram [1]. We diagnosed atrial fibrillation in five families from Spain in which the arrhythmia was the pri-mary manifestation and not associated with any gross cardiac or metabolic abnormality [1]. Genetic linkage analysis was performed and the locus responsible for atrial fibrillation in our family was mapped to 10q22 between markers D101786 and D10S1630, an area of about 11 cM [1]. We proceeded with the candidate gene approach and DLG5 was on of the genes mapped to the critical region between the flanking markers [2]. This gene belongs to the MAGUK (Membrane Associated Gyanylate Kinase) family of proteins known to form scaffolds for proteins involved in intracellular signal transduction [2]. The MAGUK family of proteins has been extensively studied in recent years and shown to play a role in the formation of cell junctions, maintenance of cell shape, [3][4][5] and clustering of channel proteins at the cell surface [6][7][8]. Given its location in our critical region and its function it was selected as a candidate for atrial fibrillation. A partial cDNA sequence of DLG5 (GenBank accession #AB011155) of about 5.3 K B was available. To determine if there is a mutation responsible for the disease, it is necessary to have the complete cDNA. Furthermore, recognizing that the responsible mutation may be present in one of the intron-exon splice junctions, it is also necessary to obtain the corresponding genomic sequence. Thus, we cloned and sequenced the cDNA of the DLG5 gene and, from the complete cDNA sequence, determined the intron-exon boundaries of the gene as well as its human tissue expression profile.

Northern blot analysis
The ESTym59b11 (ATCC clone #409942) clone was confirmed by sequencing a part of exon 31 and exon 32 with the 3' untranslated region of the DLG5 cDNA. It was used as a probe, labeled with the random primer labeling kit from Gibco with 32 P-dCTP and hybridized to an adult multi tissue RNA blot from Clontech. Pre hybridization and hybridization were performed with Express Hyb. Solution (Clontech) using the manufacturer's protocol. The membrane was washed once with 2X SSC/0.1% SDS for 15 minutes at room temperature and twice in 0.2%X SSC/ 0.1% SDS for 20 minutes at 68 C°. Membranes were exposed to X-ray film (Kodak) overnight at -80 C.

Cloning and sequencing of the cDNA for DLG5
Using the RACE technique (Clontech, Marathon Ready Heart cDNA) as per manufacturer's protocol, products were generated using primers designed in the 5' end of the described cDNA DLG5. Nested RACE was performed on the product and the generated PCR product cloned by using TA cloning (Invitrogen). The clones were plated on kanamycin IPTG plates. Colonies were picked and grown in 5 ml LB-Kan broth. Subsequent to miniprep using Qiagen Miniprep Kit (QIAgen, Valencia, CA), the clones were sequenced on the ABI310 genetic analyzer using the Big Dye terminator chemistry. The Race primers are: Primer A 5' GCATACACTCCATTCTCCAGACTGATGC 3'; primer B (nested to primer A) 5'CACTCATGATGAGCTTG-TACTCGCTGTA 3'; primer F 5'GTGTGGTAGAAGT-CAGTCTCCTTGGCCA 3';primer E (nested to primer F) and 5'GCTGAATGGAGAGGTTCTCCACCTTCTC 3. We generated additional 1.702 KB of cDNA for this gene for a total cDNA length of 7.194 KB. The cDNA generated had an overlap of 305 bp with the published DLG5 cDNA (Genbank Acc. #AB011155) in the 5' end of the gene. We In addition to RACE, we also used PCR based heart library screening (Origene Technologies, Cat#1001). We isolated a clone that matched our RACE clone which gave confirmation to the cDNA sequence generated by the RACE technique. We used Sequencer 3.1 program to assemble our sequences. There was considerable overlap between the 305 bp fragment of the new sequence and that of the published cDNA of DLG5. We identified an ORF with a stop codon preceding it indicating we had obtained the entire coding region of the gene. The complete cDNA sequence has been submitted to Genbank, Accession Number AF352034.

In-Silicon mapping and genomic organisation Of DLG5
The newly generated DLG5 cDNA sequence (7.195 Kb) was subjected to a search for homology to genomic sequences in Genbank using BLASTN algorithm on Search Launcher, available at Molecular Biology Computational Resources (MBCR) server, Baylor College of Medicine (BCM) [www.mbcr.bcm.tmc.edu] . Homologous sequences were identified on BAC 651 c23 (Acc. No. AC013252) and BAC 126h7 (Acc. No. AL391421). The Sequencer 3.1 program was employed to assemble genomic sequences from these BACs and to map the cDNA to these sequences. The genomic sequences for the gene were deposited in GenBank with Accession Number AF352033.

Analysis of patient DNA
Using available genomic sequences, primers were designed to analyze the patient DNA. Primers were anchored in introns and PCR products generated from genomic DNA of patients. PCR was performed with 200 ng genomic DNA, 50 µM each deoxynucleotide triphosphate, 02 µM of forward and reverse primers each and 2.5 units of Taq  Polymerase (life Technologies). PCR was done on PE9600 or PE9700 thermocycler. Sequencing reactions were performed using fluorescent labeled Big Dye terminator (Applied Biosystems, Foster City, CA) on a PE9600 thermocycler. Each reaction was then cleaned over the Edge Biosystems column and subsequently sequenced on an ABI310 or ABI377 genetic analyzer. Sequences were assembled using Sequencer 3.1 program.

Use of a novel technique to rule out deletion mutations
We utilized a newly available somatic cell hybrid technique from GMP Genetics, Inc. [9] to obtain haploid cell lines. In brief, this technology provides a means whereby the two homologous chromosomes are separated and isolated into separate cell lines. Our affected patient's diploid leucocytic cell line was converted to haploid somatic cell hybrid lines. The hybrid cell lines were screened to identify those cell lines that contain the chromosome 10 homologues. We further confirmed by genotyping that the DNA of one of these haploid lines had the alleles segregating with the disease. Thus, if a mutation in the DLG5 gene is responsible for the disease it would be in this cell line. We then performed PCR using primers anchored in the introns of DLG5 for each of the exons on both the normal haploid DNA and the disease bearing haploid DNA. The presence of amplification products from both haploid lines would detect any exon deletion that may have eluded detection by PCR from genomic DNA.

Results and discussion
We previously mapped the locus for familial atrial fibrillation to 10q22-24 between markers D10S1694 and D10S1786. In the NCBI database DLG5 had been mapped to 10q22. We subsequently mapped the entire cDNA on to genomic sequences from BACs RPII651C23 & RPII126H7, which also have the marker D10S569 that lies between our flanking markers. Thus DLG5 lies within our critical region at 10q22 within 100 KB of D10s569.

Human tissue expression patterns of DLG5
The expression pattern of the DLG5 gene was analyzed in multiple adult human tissues by Northern Blot Analysis using as a probe the ATCC clone #409942 (EST ym59b11) with insert an size of 2.386 Kb which contains exon 31 and 32 with 3' untranslated region of the gene. A twentyfour hour exposure showed abundant expression in placenta with minimal expression in skeletal muscle, liver, kidney and pancreas and no expression in heart (data not shown). Exposure for six days showed expression in heart, no expression in the brain and abundant expression in placenta as shown in Figure 1. We observed a single tran- script of approximately 7.2 Kb thus, the published sequence of the cDNA for the DLG5 gene was incomplete.

Cloning and sequencing of the DLG5 cDNA
On Northern Blot Analysis a single transcript of approximately 7.2 Kb was observed. Using Race Technique on Heart cDNA and heart library screening as described above we characterized the entire cDNA of 7.195 Kb: a 5'UTR from bp 1 to 94, ORF from bp 95-5547 and 3'UTR from bp 5548-7195. The translation start codon of the cDNA is at bp 118 in exon 2 and the stop codon is at bp 5545 in exon 32. As multiple clones were sequenced after RACE, 2 distinct forms were observed: one with a 12 nucleotide deletion in the beginning of exon 4 and the form without the deletion. The deletion did not result in any change in the ORF of the gene.

Determining the intron exon boundaries of DLG5
We mapped 31 exons (2-32) of the 7.195 Kb of the cDNA to a genomic region spanning about 79 Kb. We could not map 1-94 bp (5'UTR) on to the genomic region. This may be because none of these BACs are completely sequenced in Genbank and there are still gaps in the genomic sequencing. Attempts at direct BAC walking to place these nucleotides were unsuccessful, perhaps because these regions of genomic DNA are difficult to sequence due to its secondary structure. The details of the genomic organization of the gene are shown in Table 1 and Figure 2. All but two of the resulting intron-exon boundaries concur with the typical splice donor and acceptor motif-GT/AG. The sequences around each splice site are shown in Table 2.

Determining the functional domains in DLG50
The MAGUK family of proteins [2,10]  Several domains were identified in the novel cDNA using MOTIF at SEQ WEB. A single repeat unit of the Regulator of Chromosomal Chromatin (RCC1) domain was detected starting at bp 4333. RCC1 is a eukaryotic protein that has seven tandem repeats units which comprise a domain of 50-60 amino acids. This domain binds to chromatin and interacts with Ran, a nuclear GTP-binding protein that stimulates a guanine nucleotide dissociation. Thus, RCC1 probably plays a role as a gene regulator. Only one repeat unit is present in DLG5 which makes it difficult to predict whether it also has such gene regulating function.
A prenyl group-binding site (CAAX box) was also detected starting at 5536 bp. A number of proteins are post translationally modified by the attachment of either a fresnyl or a geranyl-geranyl group to a cysteine residue. This modification occurs on a cysteine residue, three residues proximal to the C-terminus with two aliphatic amino acids separating the cysteine from the C-terminus (hence the term CAAX box). Certain proteins such as Ras proteins and the Ras-like protein Rho, nuclear lamins A and B and some G protein alpha subunits have this modification. These proteins are involved in intracellular signal transduction. This is another mechanism by which DLG5 may be involved in intracellular signal transduction. In addition to these domains there are two leucine zippers starting at base pairs 484 and 505 suggesting protein-protein interaction.

Exclusion of DLG50 as a candidate for familial atrial fibrillation in our family
The exons and flanking intronic sequences of DLG50 were amplified by PCR and sequenced in four members of the family, two affected and two normals. There were no missense or nonsense mutations segregating with the disease in either the coding regions or splice junctions of this gene. To exclude a large deletion that might have eluded amplification from diploid DNA by PCR sequencing was performed on the haploid cell lines and no mutation was identified. Thus, DLG50 is excluded as a possible cause of familial atrial fibrillation in these families.
Utilizing the available partial cDNA sequence, we confirmed the localization of DLG5 to the locus for atrial fibrillation at 10q22 and subsequently cloned and sequenced the DLG5 gene which spanned 79 KB. The available 5.5 KB cDNA sequence was extended by RACE and confirmed on northern analysis to be about 7.2 KB.
The genomic structure of the DLG5 was determined to consist of 32 exons and shown to have ubiquitous human tissue expression including placenta, heart, skeletal muscle, liver and pancreas, with placenta being the tissue of strongest expression. Exclusion of a deletion from the haploid cell lines confirmed the absence of any mutation and thus, DLG5 was excluded as a cause for the atrial fibrillation in these families. Two distinct transcripts of DLG5 were observed differing by only 12 bp, which would predict the addition (or deletion) of 4 amino acids, but would not alter the translation frame of the gene.
The DLG5 gene encodes for a protein that is a member of the MAGUK (Membrane Associated Guan late Kinase homologs) family of proteins located in the plasma membrane [2]. MAGUK is a new family of proteins that act as a molecular scaffold for intracellular signaling pathways [11]. The MAGUK family of proteins is characterized by domains that interact with other proteins to create an as-sembly of large multi-protein complexes [10]. The PDZ (PSD-95, DLG and ZO-1) domains are the best characterized for their binding to the C-terminal region of channels and transmembrane proteins [11]. Usually each protein has 1-3 copies of such domains [8,12,13]. The SH3 domain (Src homology 3) are present in proteins that couple transmembrane receptors to signaling molecules [14]. The presence of this domain in MAGUKs increases their possibilities of interacting with signaling pathways. The GUK domain shows a high degree of similarity to guanylate kinase enzyme, which converts GMP to GDP using ATP as a phosphate donor. The dlg-like MAGUKs do not have kinase activity [15]. Furthermore this domain has been shown to interact with other proteins [16][17][18]. MAGUKs function by binding to the transmembrane proteins at the cytoplasmic side and to other signal transduction proteins and thus appear to provide the platform for efficient and specific signal interactions between the different components of the signaling pathway. Acting as scaffold pro- teins, these proteins play a pivotal role in creating an efficient signaling pathway by localizing the signaling molecules at regions of cell surface preferentially exposed to the ligand, and by spatially restricting the molecules to provide specific downstream responses.
The DLG5 gene encodes for 2 PDZ domains, with another region in the N-terminus having very weak homology to an additional two PDZ domains. It is notable that the PDZ domains predicted for the DLG5 protein do not have the GLGF motif [2], present in most members of the MAGUK family. This motif helps in the binding of the MAGUK with the C-terminus of proteins that have the consensus E (S/T) × (V/l) motif [19]. However the studies on the Drosophila protein InaD, which also has a PDZ domain, show that are other targets for the PDZ domain that do not have the consensus C-terminus binding motif [20]. Thus in DLG5 the PDZ domain may be binding to other targets that do not require the classic GLGF motif for interaction. Like other members of this family DLG5 also has one SH3 and GUK domain. In addition it has two leucine zippers and an RCC1 domain that represents other potential protein-protein interactions.
Previously reported MAGUKs have been studied for these interactions, and elegant work has shown one of its members, p55, to form a ternary complex that links the cytoskeleton and the plasma membrane to maintain proper cell shape [3]. Also there is significant evidence that MAGUKs are active in protein clustering especially of ion channels and membrane bound receptors [6]. This interaction has been proven by experiments in which DLG5 mutants show loss of clustering of the Shaker channels at neuromuscular junction [6]. Mammalian PSD-95 is a MAGUK which binds Kir4.1, an inwardly rectifying K + channel expressed in glial cells and also clusters it to cell membranes in vitro [21]. Thus MAGUKs may regulate the distribution and also the function of transmembrane receptors and channels.

Conclusions
Given these observations, its pattern of expression in the human adult heart, we considered it to be an excellent candidate for the familial atrial fibrillation and evaluated it in our family by sequencing. We have ruled out functional mutation in the coding region and splice junction as well as any exon deletion as possible cause of the disease in this family. This gene remains an interesting candidate for other inherited cardiac diseases.