Identification of four families of yCCR4- and Mg2+-dependent endonuclease-related proteins in higher eukaryotes, and characterization of orthologs of yCCR4 with a conserved leucine-rich repeat essential for hCAF1/hPOP2 binding

Background The yeast yCCR4 factor belongs to the CCR4-NOT transcriptional regulatory complex, in which it interacts, through its leucine-rich repeat (LRR) motif with yPOP2. Recently, yCCR4 was shown to be a component of the major cytoplasmic mRNA deadenylase complex, and to contain a fold related to the Mg2+-dependent endonuclease core. Results Here, we report the identification of nineteen yCCR4-related proteins in eukaryotes (including yeast, plants and animals), which all contain the yCCR4 endonuclease-like fold, with highly conserved CCR4-specific residues. Phylogenetic and genomic analyses show that they form four distinct families, one of which contains the yCCR4 orthologs. The orthologs in animals possess a leucine-rich repeat domain. We show, using two-hybrid and far-Western assays, that the human member binds to the human yPOP2 homologs, i.e. hCAF1 and hPOP2, in a LRR-dependent manner. Conclusions We have identified the mammalian orthologs of yCCR4 and have shown that the human member binds to the human yPOP2 homologs, thus strongly suggesting conservation of the CCR4-NOT complex from yeast to human. All members of the four identified yCCR4-related protein families show stricking conservation of the endonuclease-like catalytic motifs of the yCCR4 C-terminal domain and therefore constitute a new family of potential deadenylases in mammals.


Background
The Carbon Catabolite Repressor 4 factor in Saccharomyces cerevisiae (yCCR4) regulates the expression of a number of genes involved in nonfermentative growth [1], in cell wall integrity [2], in UV sensitivity [3], and in me-thionine biosynthesis [4]. yCCR4 is an essential component of several complexes involved in transcription. A first one is the CCR4-NOT multi-subunit group of proteins, which comprises at least two complexes of 1.0 and 1.9 MDa. The 1.0 MDa complex contains yCCR4, yPOP2 (also referred to as yCAF1) and at least five yNOT1-5 proteins [5][6][7]. In this complex, yPOP2 independently binds to yCCR4 and yNOT1, and is absolutely required for yCCR4 to associate with the 1.0 MDa complex [6,7]. In the 1.9 MDa complex, yCCR4 binds to proteins such as yDBF2, a cell-cycle regulated protein kinase [2], yCAF4 and yCAF16, and is essential for the interaction of both yCAF4 and yCAF16 with ySRB9, a component of the RNA polII holoenzyme [8]. yCCR4 was also reported to be associated with Paf1, Cdc73, and Hpr1 in a RNA polII complex distinct from the SRBP-containing holoenzyme [9]. Accordingly, yCCR4 contains, in its central region, a leucine-rich repeat (LRR) domain [10] which was demonstrated to be necessary for yCCR4 binding to the yPOP2 [5,11], yDBF2 [2], yCAF4 and yCAF16 [8] components of the CCR4-NOT complex. Moreover, its N-terminus discloses two activation domains which are required for transcriptional activation [10]. Hypotheses concerning yCCR4 function in yeast have been documented recently. First, Dlakic [12] and Hofmann et al [13] showed that the yCCR4 C-terminus contains a fold related to the Mg 2+ -dependent endonuclease core, suggesting that it may function as a nuclease. Second, yCCR4 was shown to be a component, in association with yPOP2, of the major yeast cytoplasmic mRNA deadenylase complex, and to be required for efficient poly(A)-specific mRNA degradation [14,15]. yCCR4 might be a catalytic subunit of this complex, because of its nuclease-like domain. Hence, yCCR4-associated complexes are likely to fulfill fundamental functions in gene regulation, and it is therefore of general interest to determine if similar complexes exist in mammals. Several components of the yCCR4-associated complexes have already been identified in humans: two homologs of yPOP2 (named hCAF1 and hPOP2/ hCALIF) [16][17][18][19], and homologs of NOT proteins (named hNOT1, hNOT2, hNOT3, hNOT4) [19]. Moreover, interactions between these proteins were demonstrated [19,20]. Yet, at the present time there has been no report of a human protein that would be structurally and functionally close to yCCR4. We [21] and others [22] previously characterized genes for highly related proteins in X. laevis, M. musculus and H. sapiens (named nocturnin, "mCCR4" and "hCCR4"), disclosing circadian expression [22], and with C-terminus displaying significant similarity with the C-terminus of yCCR4 (close to 30%). Yet, these proteins lack the N-terminal region of yCCR4 (i.e. aa 1 to 505), which contains the two activation domains and the leucine-rich repeat region necessary for protein-protein interactions. To identify orthologs of yCCR4 in higher eukaryotes and unravel the phylogenetical relationships between yCCR4 and the previously identified genes, we made a systematic search for other proteins related to yCCR4. Here we report the identification of nineteen yCCR4-like proteins in eukaryotes and show, by phylogenetic and genomic analyses, that they are grouped into four distinct families, one of which contains the yCCR4 orthologs : these orthologs, in animals, have conserved the yCCR4 leucine-rich repeat, and we show, using two-hybrid assays and far-Western experiments, that the human protein binds to the human yPOP2 homologs, i.e. hCAF1 and hPOP2, in a LRR-dependent manner. Amino acid alignments show that the endonuclease-like catalytic motifs of the yCCR4 C-terminal domain are strictly conserved among all yCCR4-related proteins and identify CCR4-specific residues in this domain. The results therefore strongly suggest the existence of a conserved CCR4-NOT complex in human cells, and of a new family of potential deadenylases in mammals.

Characterization and phylogeny of yCCR4-related proteins
Sorting out of yCCR4-related proteins was performed by i) searches in databases using the yCCR4 C-terminal domain (aa 505-837) conserved in the three previously identified yCCR4-like proteins [21,22], and ii) PCR-amplification of cDNA libraries or reverse-transcribed RNAs using appropriate primers followed by sequencing, when required ( Table 1). The majority of matching hits obtained through BLAST search in eukaryotic databases belongs to Arabidopsis thaliana (plants), Saccharomyces cerevisiae (fungi), and Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens (animals/metazoa), as expected from the abundance of these sequences in present-day databases. Sequences obtained from bacteria databases were not included in the phylogenetic analysis because of their poor lod-score, but were considered in subsequent analyses. After compilation of the positive hits and, for several of them, PCR amplification of the corresponding cDNAs and sequencing, the open reading frame of each of the 19 matching proteins was determined (Fig. 1B, Table 1). A comparative analysis of the full-length sequence of all yCCR4-related proteins indicates that they all share, as expected from the search strategy, a highly conserved yCCR4-related C-terminus (hatched box in Fig. 1B), but, conversely, their N-terminal regions are highly divergent (except for the LRR-containing group of proteins, see below). A multiple sequence alignment performed with the C-terminal regions (see Fig. 2 ) allowed the establishment of phylogenetic relationships among all proteins (see Material and Methods). As illustrated in Fig. 1A, the tree constructed using the neighbor-joining method (similar results were obtained using the parsimony method, not shown), indeed shows that the proteins can be classified into four major groups, with significant bootstrap values, which can therefore be considered as distinct protein families.  Table 1 for protein names and sequences. The four protein families which can be defined from the phylogenetic analyses are indicated on the right. (B) Structure of the yCCR4-related proteins. The C-terminal region conserved in all yCCR4-related proteins is represented by a hatched box. The highly conserved leucine-rich repeat (LRR) domain, found for animal members of the CCR4 family, is indicated by a black box. The yCCR4 activation domains, previously delineated in [5] are represented by grey boxes. The amino acid length of the proteins is indicated. (C) Intron/exon structure of the coding regions of the yCCR4-related genes. Sequences shown are those described in (B) for which genomic organization could be determined. Accession numbers or references for genomic sequences are listed in Table 1. Introns are represented by a line and exons by boxes with coding regions in grey. Sequences are drawn to scale, except large introns which are interrupted (but size indicated), introns of undetermined size (question mark in two human genes), and the non coding terminal exonic sequences (open boxes). Homologous intronic positions within each family are delineated, as well as the 5' bounderies (C-term) of the C-terminal regions, which have been aligned. The leucine-rich repeat (LRR) domains and the primers used to characterize the cDNAs of d.nocturnin, dCCR4 and ceCCR4 are indicated.

Figure 2
Multiple amino acid sequence alignment of the C-terminal region of 23 yCCR4-related proteins, extended to three Mg 2+dependent endonucleases, i.e. HAP1 (human; accession number M80261), exoIII (E. coli; accession number AAC74819) and DNAseI (bovine ; accession number M60606). Only blocks containing more than three positions where residues display at least 60% similarity (calculated from a distance BLOSUM62 matrix, SQX program, Infobiogen) are shown, with the remaining of the sequence indicated by residue numbers in parenthesis. Positions with strict identity are shaded dark grey, and those with >60% similarity light grey. For the other aligned proteins (HAP1, exoIII, and DNAseI), residues identical to those strictly conserved in the yCCR4-related proteins are also shaded dark grey. Residues conserved in the yCCR4-related proteins but not found in the presently aligned endonucleases or other Mg 2+ -dependent endonuclease-fold containing proteins (not shown) are marked with an asterisk. The secondary structure elements (α-helix and β-strand) for HAP1 and/or exoIII (see Fig. 4 in [31] and Fig. 1 in [30]) are indicated under the alignment. Important residues involved in the Mg 2+ -dependent endonuclease activity are indicated as follows: empty circles for phosphate binding residues, filled circles for Mg 2+ binding residues, triangles for catalytic residues, and squares for residues involved in orientation and/or stabilization of the catalytic residues. Amino acid positions above the alignment are for h.nocturnin. 109 VKIMTYNTLAQ (13) VLVANTHLYFHP (3)  The formerly identified "mCCR4", "hCCR4" [21] and nocturnin [22] proteins are grouped within the same family (called hereafter the nocturnin family, with the "mCCR4" and "hCCR4" proteins re-named m.nocturnin and h.nocturnin, and the xenopus nocturnin protein, xe.nocturnin). BLAST searches identified a putative drosophila ortholog from genomic scaffold AE003635. PCR amplification of cDNA drosophila libraries, using primers (indicated in Fig. 1C) designed from the predicted open reading frame, allowed us to provide the entire coding sequence of this gene (named d.nocturnin; assigned GenBank accession number AY043266; Table 1). Human, murine, drosophila and xenopus nocturnin proteins display significant levels of similarity in their Cterminal region (50-95% among themselves and 24-27% with the yCCR4 C-terminus) (Fig. 1B), but not in their N-terminal part. No ortholog of the nocturnin genes could be found in the C. elegans genome (although completely sequenced), thus strongly suggesting that there is no such gene in this species. Besides proteins from animals, two yeast predicted proteins (yml118 and ymr285. Table 1) with significant similarity with the yCCR4 C-terminus (23 and 24%, respectively), also falled within the nocturnin family. Finally, two arabidopsis predicted proteins (Atlg31500 and Atlg31530) were found to be close to the nocturnin family members (not shown), but lacked the most 5' part of the C-terminal region, and were therefore excluded.
The second family, hereafter named the 3635 family, contains as yet unknown proteins from C. elegans, D. melanogaster and H. sapiens (Table 1). The caenorhabditis protein (ce3635) corresponds to the CAB07271 predicted protein. The drosophila protein (d3635) corresponds to a putative protein that we predicted from genomic scaffold AE003635 (nt 118477 to 120285; see Additional File 1). The human protein (h3635) was predicted from the human genomic sequence AC016923, the human ESTs AI369813, AW967801, AW968624, AW965496, H06019, and by comparison with the murine EST BE634658 for determination of the putative initiation codon (see Additional File 2). It should be noted that although some mouse ESTs do match in BLAST searches, suggesting the existence of a murine 3635 protein, its C-terminal region could not be predicted in its entirety because of the lack of corresponding overlapping ESTs or genomic sequences. Again, 3635 proteins display a conserved C-terminal region (32-41% similar- ity among themselves and 25-29% similarity with the yCCR4 C-terminus) but differ in their N-terminal region.
The third family of yCCR4-related proteins (called hereafter the angel family) contains the formerly identified angel drosophila gene product, with as yet no identified function ( [26], accession number X85743), as well as M. musculus, H. sapiens, C. elegans and A. thaliana proteins ( Table 1). The murine protein (m.angel) was determined from the assembly of several murine overlapping matching ESTs (AI151868, BF318959, AB041602, AA647420 ; see Additional File 3). The human protein (h.angel) was predicted from a human full-length cDNA (accession number AL079275) and the corresponding genomic sequence (accession number AC025707)(see Additional File 4). The caenorhabditis protein (ce.angel) derives from the AAC17686 putative protein, predicted from cosmid AF067946, with a modification for a misassignment of a 3' exon (see Additional File 5). The arabidopsis protein (ara3g18500) corresponds to the At3g18500 predicted protein. Once more, the human, murine, drosophila, caenorhabditis and arabidopsis angel proteins have a highly conserved C-terminal region (33-88% similarity among themselves and 25-33% similarity with the yCCR4 C-terminus), but differ in their Nterminus.

Figure 3
Alignment of the leucine-rich repeats of the human, murine, drosophila, caenorhabditis and yeast CCR4 family proteins. The consensus shown above is that of the "S.cerevisae adenylate cyclase" class of 23-amino acid long, proline-containing leucine-rich repeats, which includes the yCCR4 LRR [28]. Conserved or aliphatic residues matching the consensus sequence are boxed or circled, respectively, a denotes an aliphatic residue. position P X X a X X a X X L X X L X L s X N X a X X a The fourth family (hereafter named the CCR4 family) contains the formerly identified yeast CCR4 protein and as yet undefined human, murine, drosophila, caenorhabditis and arabidopsis proteins. The human protein (named hCCR4) is putatively encoded by the KIAA1194 full-length cDNA ( [27], accession number AB033020). The cDNA encoding the murine protein (named mCCR4) was isolated by us using RT-PCR and primers derived from matching EST clones, and its open reading frame was entirely sequenced (assigned Gen-Bank accession number AY043269; Table 1). Aside their C-terminal regions which display the highest similarity with the yCCR4 C-terminus (35-41%), the N-terminus of the human and murine proteins display a leucine-rich repeat (LRR) region very similar to that of the yeast CCR4 ( Fig. 3 and see below). The drosophila and caenorhabditis proteins were characterized from matching genomic clones (accession number AE003746 and Z68753, respectively) in which we identified a similar LRR domain 20 kb and 4 kb, respectively, upstream of the exons for the yCCR4-like C-terminus. PCR and sequencing using cDNA libraries and primers that mapped upstream of the LRRs and in the C-terminal region (primers indicated in Fig. 1C) demonstrated that the caenorhabditis and drosophila putative proteins (hereafter named ceCCR4 and dCCR4, with assigned Gen-Bank accession numbers AY043268 and AY043267, respectively; Table 1) also possess, in their N-terminal region, a LRR motif related to that of the yCCR4 protein ( Fig. 1B and 3 ). Alignment of the leucine-rich repeats (Fig. 3 ) of the human, murine, drosophila, caenorhabditis and yeast proteins disclosed high conservation of the residues previously shown to define the 23-amino acid-long LRRs, including those of yCCR4 and of the adenylyl cyclase (PxxaxxaxxLxxLxLsxNxaxxa, [28]), thus suggesting that the function of this motif has been conserved from yeast to mammals. Besides proteins from animals, two arabidopsis predicted proteins (ara3g58580 and ara3g58560. Table 1) also branch, with significative bootstrap values, with the CCR4 family proteins. Interestingly, although their C-terminal regions display significant similarity with the yCCR4 C-terminus (37%), none of these two proteins possesses a LRR domain (see Fig. 1B and Discussion). Finally, it should be noted that the proteins of the CCR4 family are all significantly smaller than the yeast member, all of them lacking the two activation domains (aa 1 to 350) located in the yCCR4 N-terminal region (see Discussion).
Finally, BLAST searches identified a S. cerevisiae putative protein, yo1042 (accession number NP_014600), and an A. thaliana predicted protein (ara1g02270; accession number At1g02270) which disclose similarity with the C-terminal part of the yCCR4 protein (29% and 25%, respectively), but which do not cluster in an unam-biguous manner with any protein from the four families (Fig. 1A). It should be noted that all the A. thaliana, S. cerevisiae, D. melanogaster, C. elegans and H. sapiens genes disclosing similarities with yCCR4 have most probably been identified, since the genomes of these five species have now been entirely sequenced and since a BLAST search using the N-terminal -instead of the C-terminal-sequence of yCCR4 (aa 1-350) did not reveal any significant similarity with any other protein from the databases.

Genomic organization of the yCCR4-related gene families
Phylogenetic relationships between the yCCR4-related proteins was also analysed through the characterization of the genomic organization of the corresponding genes (Fig. 1C). The intron/exon boundaries were determined upon alignment of the coding regions of cDNA sequences with the corresponding genomic sequences available in genomic databases (Table 1). For all families, the genomic organization of several members of the family could be compared. In all cases, the human locus was significantly longer than the drosophila and/or caenorhabditis locus, as a result of an increase in intron number and/or size. The four families do not behave similarly when considering the evolution of the number of exons, at least for the animal genes. In the CCR4, angel and 3635 families, the drosophila genes contain 5, 1 and 1 exons, respectively, whereas the human genes contain 11, 8 and 3 exons, consistent with a gain of introns in evolution. Conversely, a loss of intron can be suspected for the nocturnin family, for which the drosophila and mammalian genes disclose 6 and 3 exons, respectively. In all families, yeast genes are devoid of intron and arabidopsis genes display the most exons. Although the number of exons is clearly not constant within a given family, intron positions are in several cases conserved -except for the 3635 family, and for arabidopsis genes which do not share any intron position with their family members-(see Fig. 1C). In the CCR4 family, 4/4 introns of the drosophila gene and 3/7 introns of the caenorhabditis gene have positions homologous to those of the human gene. In the nocturnin and angel family, the drosophila and human genes share one intron position. Conversely, paralogous genes in a given species (i.e. genes belonging to different families) do not have any intron position in common. These features suggest (at least for the nocturnin, CCR4 and angel families) that proteins which cluster in the same family indeed share a common ancestor, in agreement with the phylogenetic data. They also strongly suggest that introns were independently acquired, in the four families. This could be easily accounted for assuming that the four families were derived by gene duplication, from an "ancestral" gene (or genes) devoid of intron at the time of duplication.

Conservation, among the yCCR4-related proteins, of a Mg 2+ -dependent endonuclease-like domain, with CCR4specific residues
Dlakic [12] and Hofmann et al [13] recently reported that the xe.nocturnin and yCCR4 proteins contain a domain with similarities with the enzymatic core of Mg 2+ -dependent endonucleases. Therefore, we investigated whether such sequence similarities also exist for the other yCCR4-related proteins with, possibly, conserved features specific for these proteins. The multiple sequence alignment of all yCCR4-related C-terminal regions (Fig.  2 ) allowed the identification of several blocks of local homology, with more than 60% similarity among yCCR4related proteins, as well as residues strictly conserved among these proteins (see below). This alignment could be extended to the enzymatic core of three characteristic Mg 2+ -dependent endonucleases: the bovine DNAseI [29] and two apurinic/apyrimidinic (AP) DNA-repair endonucleases, i.e. the E. coli Exonuclease III [30] and the human HAP1 protein [31]. These nucleases share the same catalytic residues and form a similar four-layered α/β-sandwich motif. Interestingly, the blocks of homology defined for the yCCR4-related proteins all correspond to secondary-structure elements (β-strand or αhelix) which form the core of AP endonucleases (Fig. 2 ). In addition, the putative active residues shared by both AP endonucleases and DNAseI [29][30][31][32], are strictly conserved in all the yCCR4-related proteins: these are residues involved in catalysis (indicated by triangles in Fig. 2 ), Mg 2+ binding (indicated by black circles), orientation or stabilization of catalytic residues (indicated by squares), or interaction with the phosphate group (indicated by empty circles). Hence, all the yCCR4-related proteins possess, as observed for yCCR4, a C-terminal Mg 2+ -dependent endonuclease-like domain. Alignment of the yCCR4-related proteins with the Mg 2+ -dependent endonucleases discloses another important feature : several residues (Leu 151, Arg 174 and Glu 300, indicated with an asterisk in Fig. 2 ) can be identified, which are strictly invariant among all the yCCR4-related proteins, but have no equivalent in the presently aligned AP-endonucleases and DNAseI, or in other Mg 2+ -dependent endonuclease fold-containing proteins previously described (not shown) [12,13]. These CCR4-specific amino acids are most probably relevant to functions common to and specific for all yCCR4-related proteins (see Discussion).

Interaction of the human homolog of yCCR4 with hCAF1 and hPOP2/CALIF
The search for yCCR4-related proteins identifies possible orthologs of yCCR4, according to the sequence analyses reported above. To confirm these data and gain some hints for the existence of a CCR4-NOT complex in mammals, we investigated whether the interaction be-tween yCCR4 and yPOP2 in the CCR4-NOT complex is conserved for the human ortholog, hCCR4. Two homologs of yPOP2 have been identified in mammals: the mouse/human CAF1 [11,16,17,33] and, more recently, the human homolog named hPOP2 or hCALIF [18,19]. Both mCAF1 [11] and hPOP2 [19] interact with yCCR4, and the yCCR4 LRR is essential for these interactions. To determine whether yCCR4-yPOP2 interactions are evolutionarily conserved, we examined whether hCCR4 can bind hCAF1 and hPOP2 in a two-hybrid assay in mammalian cells. The complete hCCR4 cDNA was obtained (KIAA1194 cDNA, Kazusa DNA Research Institute), and vectors were constructed to express hCCR4 fused to the DNA binding domain of the yeast GAL4 transcription factor (pGal4-hCCR4) (Fig. 4A). A vector expressing hCAF1 (pVP16-hCAFl) or hPOP2 (pVP16-hPOP2) fused to the VP16 activation domain was also produced (Fig.  4A). Two-hybrid protein-protein interaction assays were performed in Hela cells, with a reporter plasmid (pG4-TK-Luc) containing six GAL4 binding elements upstream of the thymidine kinase (TK) promoter region fused to the luciferase reporter gene (Fig. 4A). In Hela cells transfected with the fusion constructs and the reporter plasmid, an interaction between GAL4 and VP16 fusion proteins should result in an increase in luciferase expression (Fig. 4A). As observed in Fig. 4B, co-expression of pGal4-hCCR4 with either pVP16-hCAF1 or pVP16-hPOP2 elicited a significant increase (> 35 fold) in pG4-TK-Luc expression, thus demonstrating association between hCCR4 and either hCAF1 or hPOP2. This increase is dependent on the presence of the leucine-rich repeats within hCCR4, since deletion of four out of the five leucine-rich repeats -within pGal4-hCCR4(∆LRR), see Materials and Methods-completely abolished the effect (Fig. 4). Finally, a GAL4 fusion vector expressing a murine paralog of hCCR4, i. e. m.nocturnin, had no effect when co-expressed with either pVP16-hCAF1 or pVP16-hPOP2 (Fig. 4), thus indicating that m.nocturnin does not interact with hCAF1 or hPOP2, as expected from the absence of an LRR domain in this protein. As a control, pGal4-hCCR4 alone, or in association with the pVP16-none plasmid, had no effect.
To verify that hCCR4 can bind directly to hCAF1, we performed a far-Western blot analysis (Fig. 5). Bacterially produced, purified GST-CAF1 and control GST proteins were subjected to SDS-PAGE, transferred onto a membrane and probed with a 35 S-methionine-labeled hCCR4 protein synthesized in vitro. As shown in Fig. 5, specific hybridization was observed only with GST-CAF1, and not with GST. As a control, incubation with a 35 S-methionine-labeled luciferase protein failed to show any interaction.

Figure 4
Two-hybrid assay for in vivo interaction of hCCR4 with hCAF1 and hCALIF/hPOP2. (A) schematic representation of the reporter and fusion protein expression vectors, and rationale of the two-hybrid assay. Coding sequences of hCCR4, hCCR4 deleted for the LRR (hCCR4(∆LRR)) and m.nocturnin are inserted in-frame with the GAL4 binding domain. The hCAF1 and hPOP2 coding sequences (or no sequence in the pVP16-none vector) are inserted in-frame with the VP16 transactivation domain. The pG4-TK-Luc reporter plamid contains six GAL4 binding elements, upstream of the thymidine kinase minimal promoter (TK) region fused to the luciferase reporter gene (Luc). An interaction between the GAL4 and VP16 fusion proteins should result in an increase in luciferase expression. (B) hCCR4 interacts with hCAF1 and hPOP2, in a LRR-dependent manner. Hela cells were cotransfected with a combination of expression vectors, as indicated by crosses. Luciferase activities were measured two days post transfection (see Materials and Methods) and are expressed as the fold increase over the luciferase activity of the reporter vector alone. Bars indicate standard deviation of the mean for at least three independent transfections.

Identification of LRR-containing orthologs of yCCR4
In the course of our search for genes encoding proteins displaying significant similarity with the yeast yCCR4 factor, the best hits were found for cDNAs encoding a family of proteins from C. elegans, D. melanogaster, M. musculus, and H. sapiens, that we further characterized by both in silico and in vivo approaches. By several criteria, these proteins appear to be yCCR4 orthologs. First, they cluster together with yCCR4 in a phylogenetic tree constructed for the yCCR4-related proteins. Likewise, analysis of their genomic organization revealed that the caenorhabditis, drosophila and human CCR4 genes had several introns in common, located at homologous positions, suggesting that they arose from a common ancestor. Second, their N-terminus contain a five leucine-rich repeat domain, disclosing a high level of similarity with that of yCCR4, and with strictly conserved amino acids at consensus positions. Third, the ability of yCCR4 to bind mCAF1 and hPOP2 is conserved for the mammalian protein, since we could demonstrate, using two-hybrid assays and far-Western blot analyses, an in vivo and in vitro interaction between hCCR4 and hCAF1 or hPOP2/ CALIF. Moreover we could show, using a deletion within the leucine-rich repeat domain, that this motif is essential for the interaction, a key feature of the previously characterized yCCR4 factor.
Interestingly, the putative hCCR4, mCCR4, dCCR4 and ceCCR4 proteins are much smaller than yCCR4. In fact, they contain the central LRR plus C-terminal region of yCCR4, but they lack its N-terminus (residues 1-350). It is noteworthy that, in yCCR4, these two parts of the protein (1-350 and 350-837) have well-defined and distinct functions, which can be uncoupled. The N-terminal region contains two transactivation domains, one of which is glucose-repressed (residues 1-160). This region can, by itself, activate transcription when targeted to the DNA via a heterologous DNA-binding domain [5]. On the other hand, the LRR plus C-terminal region of yCCR4 cannot mediate transcription activation, but is involved in protein-protein interactions [5]. The yCCR4 LRR is necessary for the binding of several components of the CCR4-NOT complex, including yPOP2 -and mCAF1- [11], yDBF2 [2], yCAF4 and yCAF16 [8], and deletion of the yCCR4 N-terminus does not impair these associations. Consistently, the region encompassing the LRR and part of the C-terminus (residues 302-668) has been defined as the smallest region necessary and sufficient for the association of yCCR4 with yPOP2 [11].
Conservation of the yCCR4 LRR plus C-terminus, together with the absence of its N-terminus in the CCR4 proteins of animals, could be interpreted along two lines. First, the N-terminal domain might mediate yeast-specific functions (e.g. glucose-regulated growth), not conserved in animals because they were unnecessary or redundant in these species. Second, the N-and C-terminal domains observed in the yCCR4 multifunctional protein might have been split into two -or moreindependent proteins in animals. If neither interpretation can as yet be favoured, it is noteworthy that yCCR4 evolution parallels that of its partner in the CCR4-NOT complex, namely yPOP2. Indeed, both human homologs of yPOP2, hCAF1 and hPOP2, lack the 148 N-terminal amino acids which are, in yPOP2, required for transcriptional activation, whereas the yPOP2 C-terminus, which is involved in the interaction with yCCR4, is conserved in hCAF1 [11] and hPOP2 [19]. Similarly, hNOT2, the human homolog of the yNOT2 component of the CCR4-NOT complex, harbors a N-terminal domain divergent from that of yNOT2 -required for transcriptional activation and interaction with the yeast ADA2 factor-, but has conserved the C-terminal domain which is absolutely required for the yCCR4-associated function of yNOT2 [19,20]. Therefore, it looks as if genes of the CCR4-NOT complex have undergone a "concerted" phylogenetic evolution, with the yeast genes encoding multifunctional proteins and the animal genes only encoding "specialized" proteins specific for the CCR4-NOT complex. It is tempting to speculate that partition of the CCR4 or POP2 functions in the course of evolution has been positively selected as resulting in a gain of efficacy. Along this line, Draper et al [5] observed that the two transactivation domains present in the yCCR4 N-terminus (residues 1 to 350) are, alone, more potent activators than within the full-length yCCR4 protein. Likewise, mCAF1, which has lost the N-terminal transactivation domain of yPOP2, binds to yCCR4 with a higher affinity than yPOP2 [11].
In yeast, yCCR4 is one of the components of the 1.0 MDa CCR4-NOT complex, which also includes yPOP2 and at least five yNOT proteins. In this complex, yPOP2 independently binds to yCCR4 and to yNOT1. A previous report [19] identified the human orthologs of yPOP2 (hPOP2/CALIF) and yNOT1 (hNOT1), and showed that they have conserved their ability to bind to each other. Hence, the present identification of the human ortholog of yCCR4 and the demonstration that it binds to hPOP2 now provide an additionnal piece of the puzzle, strongly suggesting the existence of a CCR4-NOT complex in humans. Additionally, yCCR4 interacts, via its LRR domain, with three other proteins (i.e. yDBF2, yCAF4 and yCAF16) of the 1.9 MDa CCR4-NOT complex. Conservation of the LRR domain in hCCR4 suggests that the interaction with these proteins might be conserved as well in humans. Interestingly, we found that hCCR4 also binds to the second human yPOP2 paralog, hCAF1. However, at variance with hPOP2, hCAF1 is not able to interact with hNOT1 (nor with hNOT3) [19]. This suggests that human cells may contain at least two complexes involving a CCR4/POP2 association, a CCR4-NOT-POP2 complex analogous to the CCR4-NOT yeast complex, and a still unknown CCR4-CAF1 containing complex. Interestingly, in yeast only one CCR4-POP2 complex has been described until now, whereas CCR4/POP2 interactions have been found associated with both transcription regulation and mRNA deadenylase activity. This may suggest a "physical" partition of these two CCR4-POP2associated functions in the course of evolution.

Four families of yCCR4-related genes: function and origin?
Recently, Dlakic [12] and Hofmann et al [13] showed that the yCCR4 C-terminus contains a fold related to the Mg 2+ -dependent endonuclease core. yCCR4 was also shown to be a component of the major yeast cytoplasmic mRNA deadenylase complex, and to be required for efficient poly(A)-specific mRNA degradation [14,15]. Interestingly, this yCCR4-dependent activity copurifies with yPOP2. Since the presently characterized mammalian yCCR4 ortholog has conserved the ability to bind yPOP2 homologs and displays high sequence similarities with yCCR4 in the nuclease core domain, it suggests that the mammalian yCCR4 orthologs might also be associated with a deadenylase function. The three other characterized families of yCCR4-related proteins have also conserved catalytic residues and structural elements of the Mg 2+ -dependent nuclease core. In addition, their nuclease regions possess residues which are strictly invariant among all the yCCR4-related proteins, from plants to humans, and which are not conserved in any other Mg 2+dependent endonuclease-containing protein, thus strenghtening the notion of a common and specific function for the yCCR4-related proteins. Hence, according to the present data, proteins of the angel, 3635 or nocturnin families may deserve the same putative function as yCCR4, i.e. deadenylation, or at least share common substrate binding and/or catalytic specificities (for instance preference for RNA and/or poly (A) tail). Along this line, CCR4-specific conserved residues may be important for the setting of such a specificity, whereas the presence -or not-of a LRR domain may determine different associations for these proteins, possibly responsible for differences in cellular localization or in enzymatic activity. Until now, the only known deadenylase in vertebrates is a poly(A)-specific exoribonuclease, PARN, a member of the RNaseD family of exonucleases [34]. The present results therefore propose a series of new putative mammalian deadenylases, and predict a set of residues that should be functionally important for all yCCR4-related proteins.
Finally, the high level of similarity among the C-terminal regions of all paralogous yCCR4-related proteins, along with the divergence of their N-terminal domains, is intriguing in terms of phylogenetic evolution. The C-terminus strong conservation leads to the assumption that all the related genes derive by duplication events from a common ancestor, but the identity of this ancestral gene and the date of expansion can only be hypothesized. Occurrence within the CCR4 family of LRR-containing genes both in yeast and animals, suggests that a LRRcontaining gene existed before the divergence of yeast and animals. Whether such a LRR-containing gene could be the common ancestor for all the yCCR4-related genes (with deletion of the LRR and several duplication events for the angel, 3635 and nocturnin families) is not known. However, the fact that the two arabidopsis proteins which cluster with the LRR-containing proteins in the CCR4 branch (Fig. 1A) do not possess a LRR domain, is not in favour of the existence of a LRR-containing ancestor: in that case, at least two independent LRR deletion events (versus one LRR acquisition event, see below) would be required to generate the angel, 3635 and nocturnin families on the one hand, and the CCR4 arabidopsis proteins on the other hand. Along this line, BLAST searches in bacteria databases disclosed, in one incompletely sequenced bacterial genome (Paenibacillus azotofixans), an open reading frame (nt 3822 to 4583 in the AJ299453 genomic clone) which contains almost all and displays significant similarity with the yCCR4 Cterminal region, but does not contain a LRR nor a N-terminal domain. It is therefore plausible that a "C-terminal core domain" existed first, which was subsequently duplicated several times giving rise to the four identified protein families, and which has been well conserved because it deserved a common enzymatic function, at the opposite of the N-terminal divergent regions. A LRR domain would have been acquired in one of the duplicates, prior to the divergence of animals and fungi.

Conclusions
The present characterization, in higher eukaryotes, of proteins related to the yeast yCCR4 factor leads to two main conclusions. Firstly, we identified yCCR4 mammalian orthologs and demonstrated that the human member can interact with the human yPOP2 homologs. This result, along with the identification by others of human homologs of NOT proteins, now provides strong evidence for the existence of a human CCR4-NOT complex, with conserved protein-protein interactions. Secondly, we showed that all members of the four yCCR4-related protein families that we have characterized contain the canonical catalytic residues and motifs of the Mg 2+ -dependent endonuclease core, as well as strictly conserved yCCR4-specific residues. These proteins might constitute a new family of deadenylases in mammals, for which critical residues can be predicted.

Sequence analyses
DNA and amino-acid sequences were examined for homology with the non-redundant nucleotide, protein, and dbEST databases at the NCBI, using the BLAST 2 program. Multiple alignments of amino-acid sequences were carried out using CLUSTALW computer algorithms [23]. The aligned sequences were further analysed, either by the distances method (Neighbor-Joining program) or by the parsimony method (Protpars program, not shown). In the former case, trees were constructed using different parameters, i.e. correction -or not-for multiple substitutions, and exclusion -or not-of the positions with gaps. The trees obtained under these various conditions were closely related, disclosing only limited variations in the branching pattern of the identified phylogenetic families.

Mammalian Expression Vectors
The mammalian expression vectors for the GAL4 and VP16 fusions are derivatives of the SV40 promoter-driven expression vector pSG5 (Stratagene). The GAL4 fusion plasmids were obtained by subcloning the appropriate cDNAs into the pGaL4PolyII plasmid [24], in-frame with the yeast GAL4 binding domain coding se-quence. To generate pGaL4-hCCR4, a fragment corresponding to the entire hCCR4 coding sequence (with the exception of the initiation codon) was obtained upon digestion of the KIAA1194 cDNA (accession number AB033020, gift from the Kazusa Research Institute, putative start codon -Met 19-positioned by homology with mCCR4) by SphI and MscI (nt 423 to 2172 in KIAA1194), and was inserted in-frame into the pGaL4PolyII vector opened by XhoI, after Klenow treatment of both vector and insert. Construction of the deleted fusion protein in which four out the five hCCR4 leucine-rich repeats are removed (named pGal4-hCCR4(∆LRR)) was performed in three steps. The KIAA1194 cDNA was first 3'-truncated upon restriction with BsmI (nt 3642) and SacII (in pBluescript II SK+ vector), Klenow treatment and self-ligation, to remove undesirable EcoNI restriction sites. Deletion of the LRR domain from position 547 to 838 was then achieved upon digestion by HindIII (position 547 and 638) and EcoNI (position 817 and 838), followed by Klenow treatment and self-ligation. Finally, the deleted open reading frame-containing fragment was excised using SphI and MscI, and inserted into the pGaL4PolyII vector opened by XhoI, after Klenow treatment of both vector and insert. To generate pGal4m.nocturnin, a fragment from the m.nocturnin cDNA [21] corresponding to the entire coding sequence (with 9 amino acids upstream of the initiation codon) was excised from pGEX-3X upon SmaI and DraI digestion, and was inserted into the pGaL4PolyII vector opened by BamHl, after Klenow treatment of both vector and insert. Expression of the recombinant proteins was checked by in vitro transcription-translation assays (Promega). The pVP16-hCAF1 and pVP16-hPOP2 constructs were obtained upon insertion of the hCAF1 and hPOP2 coding regions into the pSG5FNV vector, inframe with the VP16 transactivation domain coding sequence, as previously described [17,25].

Transfections and luciferase assays
The plasmids used for transfection were prepared by the alkaline/PEG/LiCl method. HeLa cells were grown in DMEM (Gibco/BRL) supplemented with 10% fetal calf serum, seeded at 10 cells/well in 96-well microtiter plates and transfected 8 hours later using Exgen 500 (Euromedex, Souffelweyersheim, France). The DNA for transfection included 100 ng of pG4-TK-Luc reporter plasmid together with 50 ng of GAL4 and/or VP16 fusion vectors, and 10 ng of pCMV-RL vector (cytomegalovirus promoter fused to the renilla luciferase reporter gene [Promega]) as an internal control for transfection efficiency. The amount of SV40 promoter DNA was kept constant upon addition, when necessary, of pSG5 vector to the transfection mixture. The cells were washed and collected 48 h after transfection. Luciferase activity was measured in cell lysates using the Dual Luciferase Kit (Promega) following the manufacturer's instructions. In all experiments, luciferase activities were normalized with the renilla luciferase activity expressed from the pCMV-RL vector. Each set of experiments was performed in quadruplicate and was repeated at least three times.

Far-Western analyses
For in vitro protein-protein interaction assays, 5 µg of either GST-CAF1 or GST purified proteins (prepared as described in [25]) were subjected to 10% SDS/PAGE and transferred onto a polyvinylidene difluoride membrane (Millipore) by electroblotting.