Comparative study of methyl-CpG-binding domain proteins

Background Methylation at CpG dinucleotides in genomic DNA is a fundamental epigenetic mechanism of gene expression control in vertebrates. Proteins with a methyl-CpG-binding domain (MBD) can bind to single methylated CpGs and most of them are involved in transcription control. So far, five vertebrate MBD proteins have been described as MBD family members: MBD1, MBD2, MBD3, MBD4 and MECP2. Results We performed database searches for new proteins containing an MBD and identified six amino acid sequences which are different from the previously described ones. Here we present a comparison of their MBD sequences, additional protein motifs and the expression of the encoding genes. A calculated unrooted dendrogram indicates the existence of at least four different groups of MBDs within these proteins. Two of these polypeptides, KIAA1461 and KIAA1887, were only present as predicted amino acid sequences based on a partial human cDNA. We investigated their expression by Northern blot analysis and found transcripts of ~8 kb and ~5 kb respectively, in all eight normal tissues studied. Conclusions Eleven polypeptides with a MBD could be identified in mouse and man. The analysis of protein domains suggests a role in transcriptional regulation for most of them. The knowledge of additional existing MBD proteins and their expression pattern is important in the context of Rett syndrome.


Background
Methylation at CpG dinucleotides in genomic DNA is a fundamental epigenetic mechanism of gene expression control in vertebrates [1][2][3]. Strong evidence exists for a correlation between DNA hypermethylation, hypoacetylation of histones, tightly packed chromatin, and transcriptional repression. Effects of DNA methylation are mediated through proteins which bind to symmetrically methylated CpGs. Such proteins contain a specific domain, the methyl-CpG-binding domain (MBD) which consists of ~70 residues in an α/β-sandwich fold built of three to four β-twisted sheets and a helix with a characteristic hairpin loop in the opposite layer [4][5][6][7]. Recently, a transcriptional repressor protein, Kaiso, lacking an MBD, has also been shown to bind to methylated CpG dinucleotides. This binding is mediated through zinc finger motifs [8,9]. Members of the MBD protein family are found in different animal species. So far, five vertebrate MBD proteins have been identified as members of the MBD protein family: MBD1, MBD2, MBD3, MBD4 and MECP2 (for review, see: [6,10]). Except for MBD4, all of them are associated with histone deacetylases (HDAC), and a transcriptional repression mechanism mediated by the recruitment of HDACs has been shown for MECP2, MBD1 and MBD2 [11][12][13].
One of these five MBD proteins, MECP2, is implicated in a human neurological disorder called Rett syndrome [14,15]. Symptoms of this syndrome are mental retardation, loss of speech and purposeful hand use, autism, ataxia, and stereotypic hand movements. The similar phenotype of conditional Mecp2 knockout mice and in vitro studies of functional consequences of MECP2 mutations indicate that the disorder is due to a loss of MECP2 function in the nervous system [16][17][18][19][20][21]. It remains unknown, why these patients present with a neurological phenotype, although MECP2 is ubiquitously expressed. It has been proposed that MECP2 is complemented by other MBD proteins in non-neural tissues and this hypothesis was tested for MBD2 by crossing Mecp2 knockout mice with Mbd2 knockout mice [19]. However, no evidence for functional redundancy of these two genes could be found in this way.
Here we report two new polypeptide sequences with an MBD as well as four MBD proteins in man and mouse that had not been mentioned as MBD protein family members up to date. Analysis of their amino acid sequence revealed additional domains associated with chromatin and point to a function in transcription control.

Human MBD proteins
We used a bioinformatics approach with the MBD of human MECP2 as query sequence to search for new members of the MBD protein family. Initial standard BLAST searches of the NCBI, Celera and SwissProt databases resulted only in five MBD proteins (MECP2, MBD1, MBD2, MBD3 and MBD4) which had previously been described and studied intensively. However the search of protein domain family databases (NCBI, Pfam, Smart and Prosite) revealed similarities to the MBD of MECP2 for four additional proteins, i.e. BAZ2A/TIP5, BAZ2B, CLLD8 and SETDB1 and for two cDNAs, KIAA1461 and KIAA1887. These databases use Hidden Markov Models (HMM) to detect motifs in amino acid sequences. An MBD has been described in CLLD8, SETDB1 and BAZ2A/ TIP5 so far [22][23][24].
Nine of the eleven MBD-containing protein sequences could also be detected by screening the Sequence Similarity DataBase (SSDB) [25] at GenomeNet. The cDNAs for KIAA1461 and KIAA1887 were not found in the KEGG database that underlies the SSDB. These results are summarized in Tab 1. MBD amino acid sequences of the five previously published and the six newly described human polypeptides were aligned (Fig. 1). A sequence logo derived from the alignment of all eleven sequences is shown in Fig. 2. These analyses implicate a small number of highly conserved and apparently essential amino acids within the domain. At three positions identical amino acids are present and five positions with conservative substitutions can be found.
A phylogenetic tree of the MBD amino acid sequences of all eleven polypeptides is shown in Fig. 3. Four major MBD subsets are indicated there. The MBDs of the originally described proteins (MBD1, MBD2, MBD3, MBD4 and MECP2) are found as one group besides a second (BAZ2A/TIP5, BAZ2B) and third subset (CLLL8 and SETDB1) which are joined by a very short branch. KIAA1461 and KIAA1887 appear in a fourth branch. MBDs of the original five proteins are more similar to each other than to the novel ones, which explains why  [50]. Residue conservation above each column indicates: "*" completely conserved; ":" favored substitutions; "." weakly favored substitutions. A quality graph is depicted below the alignment.
BLAST analyses with the MECP2 MBD query failed to identify the second, third or fourth class.

Domain analysis
An analysis of the amino acid sequences revealed that the MBD was the only domain shared by all eleven sequences.

Figure 2
Sequence logo of the eleven human MBD sequences. The height of the letters corresponds to the frequency of the amino acid at its position. The size of each stack stands for the information present at this position, measured in bits. Top letters represent the consensus sequence. Grey bars indicate gaps in some of the aligned sequences. In case of human MBD3 and SETDB1 the MBD has been shown to mediate protein-protein interactions [23,31]. Xenopus MBD3 is exceptional in its binding to methylated CpG which can be explained by the difference of an amino acid residue within the MBD (Lys30) important for DNA binding [31]. It remains to be determined whether the MBDs of BAZ2B, CLLD8, KIAA1461 and KIAA1887 mediate DNA binding or protein-protein interactions. Additional domains found in seven of the eleven polypeptides indicate that they are associated with chromatin and function in epigenetic mechanisms of gene regulation. Some of the proteins are already known to be involved in transcriptional repression, and the domains of the remainder strongly suggest a comparable function.
MECP2 recruits the Sin3A co-repressor complex and MBD2 the NuRD co-repressor complex, which itself contains MBD3. Both complexes contain HDACs, and MBD1 is also associated with HDAC activity although the identity of the deacetylase remains unknown [13]. Within the C-terminal part of MECP2, a histidine and proline-rich region is present which is conserved in certain neural-specific transcription factors [32].
BAZ2A/TIP5 is part of the NoRC, nucleolar remodeling complex, which represses rDNA transcription by recruiting histone methyltransferases, HDACs and DNA methyl- BAZ2B has a domain structure similar to BAZ2A/TIP5, both contain a DDT (DNA binding homeobox and Different Transcription factors) and a tandem PHD-bromodomain. The PHD domain is a C4HC3 zinc-finger-like motif and the bromodomain consists of 110 amino acids and is found in many chromatin-associated proteins that can interact specifically with acetylated lysine. Tandem PHDbromodomains have been found in several transcriptional co-repressors [34]. The DDT domain is exclusively associated with nuclear domains in other proteins and was found in different transcription and chromatin remodeling factors [35]. An AT_hook motif (which allows binding to the minor groove of AT-rich DNA regions) was found in BAZ2A/TIP5 but not in BAZ2B.
The SET domain is a signature motif for lysine-specific histone methyltransferases [37,38]. This domain is also present in CLLD8 to which no function has yet been assigned. In the predicted protein sequence of KIAA1887, only a proline-rich extension (http://www.ebi.ac.uk/interpro) but no protein motif as such could be found.
The co-existence of MBDs and domains involved in chromatin modification, present in many of the identified polypeptides, could also point to a connection between the latter mechanism and methylated DNA. Interestingly, a very recent study [41] has shown that Mecp2 is associated with a H3-K9 methyltransferase activity, indicating a link between DNA methylation and histone methylation.

MBD proteins in other species
In the mouse, homologues were found for all human MBD proteins. DNA methylation as a mechanism of gene expression regulation exists also in plants. In our database searches we detected plant MBD proteins as well. The Pfam database contains polypeptides from Arabidopsis thaliana and Triticum aestivum. BLAST analyses revealed additional proteins in Zea Mays, Hordeum vulgare and Lycopersicon esculetum. Entries for MBD containing proteins from plants over C. elegans to mouse and human are present in Pfam.

Expression patterns of MBD genes
Expression analyses had been carried out previously for all genes of the mouse/human MBD family except for KIAA1461 and KIAA1887 (only the abundance of KIAA1887 ESTs in different tissues has been reported [45]). The results of published Northern blot experiments are summarized in Tab. 3. Since expression levels of MBD4 were too low to be detected by Northern blots, only results of RT-PCR studies in three tissues are shown. However the presence of MBD4 EST sequences from numerous tissues points to a ubiquitous expression (http:// www.ncbi.nlm.nih.gov/UniGene).
We performed Northern blot analyses for KIAA1461 and KIAA1887. Strong signals of ~8 kb were detected for KIAA1461 in skeletal muscle, heart, pancreas, kidney and placenta. A faint band could be detected in brain, lung and liver. For KIAA1887 a strong band of ~5 kb was present in heart, kidney, liver, skeletal muscle, placenta and pancreas, weaker signals could be seen for brain and lung tissue (Fig. 4).
Taken together, MBD1, MBD2, MBD3 and MECP2 as well as SETDB1, CLLL8, BAZ2A, KIAA1461 and KIAA1887 show a broad tissue distribution. The expression of BAZ2B is more restricted according to northern blot results [46]. It is of note that CLLL8, BAZ2A, KIAA1461 and KIAA1887 show a very low expression in brain.

Conclusions
In this study we present additional four proteins and two cDNAs with a methyl-CpG-binding domain in mouse and man. Transcripts of SETDB1, BAZ2A, CLLL8, KIAA1461 and KIAA1887 are found in all adult tissues studied. Among the six proteins described here as new members of the MBD protein family, CLLL8, BAZ2B, KIAA1461 and KIAA1887 show a low expression in brain. MECP2 is found preferentially in mature neurons of the brain [47,48], and the cell types that express CLLD8, BAZ2A/ TIP5, SETDB1 as well as the predicted KIAA1461 and KIAA1887 in brain are not known.
Rett syndrome is caused by mutations in MECP2. Even though MECP2 is ubiquitously expressed, the phenotype of the syndrome is restricted to the brain. This could be explained by a greater need of long lived, non-dividing neuronal cells for a special chromatin state that involves MECP2 and tightly suppresses transcription of undesired genes. Another explanation would be that the loss of function of MECP2 in non-neural tissues is compensated by another protein with similar properties. MBD2 has been studied in this respect, but no evidence for a genetic interaction of the two genes was found by combining Mbd2and Mecp2-null mice [19]. Based on gene expression studies of Mecp2 knockout mice and biochemical evidence, it has been suggested that the essential function of Mecp2 in the brain might not be transcriptional [49]. In view of this aspect and the protein-protein interaction property of the methyl-CpG-binding domain of human MBD3 and SETDB1, functional compensation would not necessarily require a DNA binding property.
Almost all presented polypeptides are known or predicted to be involved in mechanisms of gene expression regulation. In order to understand the higher-order interplay of MBD proteins and associated complexes, it will be a major task to identify interacting proteins as well as regulated targets of all components. This will help to solve the question whether some of the polypeptides can functionally complement MECP2 in tissues other than the brain. Homologues of KIAA1461, CLLL8 and KIAA1887 were identified in the ENSEMBL database.

Alignment and Phylogeny
Alignments and phylogenetic trees were computed using the ClustalW program at GenomeNet (http://clustalw.genome.ad.jp/) with standard settings and the ClustalX program (version 1.8) [50] for graphical representation.    [22] Expression in additional tissues has been reported. d) [46] Expression in additional tissues has been reported. e) [52] SETDB1 was originally called KIAA0067. f) Northern blot results of this study. g) [45] and northern blot results of this study isolated using the Trizol reagent (Invitrogen, Karlsruhe, Germany). A reverse transcriptase reaction was performed in the presence of 50 ng/µl oligo(dT) and 2 mM dA/C/G/ TTP with 10 U/µl SSII reverse transcriptase (Invitrogen, Karlsruhe, Germany). The resulting cDNA was purified with the QIAquick PCR purification kit (QIAGEN, Hilden, Germany). cDNA (200 ng) was used in subsequent 50-µl PCR amplifications with 10 pmol gene-specific primers. Standard PCR conditions and respective annealing temperatures were used.