Fungal CSL transcription factors
© Martin et al. 2007
Received: 12 February 2007
Accepted: 13 July 2007
Published: 13 July 2007
Skip to main content
© Martin et al. 2007
Received: 12 February 2007
Accepted: 13 July 2007
Published: 13 July 2007
The CSL (CBF1/RBP-Jκ/Suppressor of Hairless/LAG-1) transcription factor family members are well-known components of the transmembrane receptor Notch signaling pathway, which plays a critical role in metazoan development. They function as context-dependent activators or repressors of transcription of their responsive genes, the promoters of which harbor the GTG(G/A)GAA consensus elements. Recently, several studies described Notch-independent activities of the CSL proteins.
We have identified putative CSL genes in several fungal species, showing that this family is not confined to metazoans. We have analyzed their sequence conservation and identified the presence of well-defined domains typical of genuine CSL proteins. Furthermore, we have shown that the candidate fungal protein sequences contain highly conserved regions known to be required for sequence-specific DNA binding in their metazoan counterparts. The phylogenetic analysis of the newly identified fungal CSL proteins revealed the existence of two distinct classes, both of which are present in all the species studied.
Our findings support the evolutionary origin of the CSL transcription factor family in the last common ancestor of fungi and metazoans. We hypothesize that the ancestral CSL function involved DNA binding and Notch-independent regulation of transcription and that this function may still be shared, to a certain degree, by the present CSL family members from both fungi and metazoans.
The CSL (CBF1/RBP-Jκ/Suppressor of Hairless/LAG-1) proteins compose a family of transcription factors essential for metazoan development [1, 2]. They are present in all metazoan genomes studied and show remarkable sequence conservation across phylogeny. They localize predominantly or exclusively in the cell nucleus where they can either repress or activate transcription depending on the context and the presence of various coregulators. CSL proteins recognize a very tightly defined consensus sequence GTG(G/A)GAA in target promoters. Their best characterized function relates to the signaling pathway of the transmembrane receptor Notch where they mediate the effector nuclear step - activation of Notch-responsive genes. The Notch pathway regulates metazoan embryonic development, cell fate decisions and tissue boundaries specifications [2, 3]. Its deregulation is implicated in several diseases including cancer  and, in addition, several viruses encode factors that misuse this pathway via interaction with CSL proteins .
CSL proteins are essential for the development of the organism as a whole, however, they are dispensable at the cellular level, because CSL knock-out cell lines can be established and do not show any obvious abnormalities. The mutant phenotypes of Notch and CSL genes do not fully overlap, as CSL mutants show more severe developmental perturbations [2, 6]. Recently, several studies reported Notch-independent activities of CSL proteins indicative of their involvement in yet other signaling pathways [7–10]. In addition to the Notch pathway-dependent CSL proteins of the RBP-Jκ type, at least in some metazoan species, CSL transcription factors called RBP-L can be found, which are only beginning to be characterized. They are highly similar to the RBP-Jκ group but seem to act exclusively in a Notch-independent manner. Unlike the ubiquitous RBP-Jκ type proteins the expression of RBP-L is confined to only a few tissue types [11, 12].
In contrast to the generally accepted view, the presence of CSL proteins seems not to be confined to metazoan organisms and the Notch pathway. They are indeed absent from plants but there were indications of CSL proteins in one fungal species - the fission yeast Schizosaccharomyces pombe . We have attempted to confirm the identity of CSL proteins in S. pombe and to further explore the distribution of this transcription factor family in fungi. We have documented the existence of fungal CSL proteins, which indicates that this family originated much earlier in evolution than previously appreciated. We hope that these findings will help to elucidate the CSL family ancestral function in cells and to better understand their complex engagements in metazoans.
Fungal CSL proteins
Supercontig 4 (bases 1104530–1107169)
Supercontig 5 (bases 726033–727712)
Scaffold 6 Contig 19 (bases 50978–54385)
protein id "6518"
Most of the candidates are hypothetical proteins with little or no annotation in the databases. Therefore, we have first verified the quality of each ORF prediction (see Methods). The confidence of exon-intron structure predictions in these less studied organisms is rather limited. Another obstacle is posed by the degree of divergence among the sequences together with the presence of multiple species- and protein-specific insertions. Nevertheless, we were able to construct three completely new gene predictions (designated SjCSL1 and SjCSL2 in S. japonicus, and PcCSL2 in P. chrysosporium) as well as to identify mispredictions and/or possible sequencing errors in other four genes (see Additional files 1 and 2 for a more detailed description). Our corrections comprised of intron inclusion/exclusion, different splice-site selection and exon addition. Some of the intron positions displayed inter-species conservation which supported our predictions (data not shown). We have also identified a less usual intron with a GC-AG boundary in the R. oryzae RO3G_07636.1 gene. Such introns were found in other fungi as well  and are generally a problem for gene prediction algorithms.
Typically, there are two CSL paralogs per genome, differing considerably in length and each belonging to a different class (see below). A notable exception is the genome of R. oryzae which harbors seven CSL genes, three of them being class F1 and four of them belonging to class F2. Most candidate CSL proteins are predicted to be nuclear which supports their putative functioning as transcription factors (see bellow). SPCC736.08 of S. pombe is the only protein predicted to have exclusively non-nuclear subcellular localization but it was shown experimentally to be nuclear .
According to the C. elegans LAG-1 protein crystal structure, the CSL fold is related to Rel-domain proteins, but is uniquely composed of three distinct domains . The amino-terminal RHR-N (Rel-homology region) and central BTD (beta-trefoil domain) domains are involved in DNA-binding. BTD serves also as an interaction platform for Notch/SMRT coregulators. The carboxy-terminal RHR-C domain displays lower conservation in metazoans and its function is not yet clear; one possibility is its participation in Notch-independent regulation of transcription .
To the best of our knowledge, there were only two brief notions of CSL proteins existence outside metazoans up to now. One paper showed Southern blot cross-hybridization of murine RBP-Jκ cDNA probe with S. pombe DNA . The significance of these results is, however, questionable, as the hybridizing chromosomal DNA fragments had lengths differing from that expected for either of S. pombe CSL genes, SPCC736.08 and SPCC1223.13. Potential CSL homologs in S. pombe were also mentioned in the review of Lai , although no supporting evidence was presented.
We have rigorously searched for CSL proteins in eukaryotic genomes from all kingdoms of life to map their distribution. Apart from the known metazoan proteins, we have found no homologs in either plants or protozoa (data not shown), however, we have succeeded in finding CSL family members in several fungal species of the ascomycetes (the basal subphylum Taphrinomycotina), zygomycetes and basidiomycetes groups. These organisms range in complexity from the simple unicellular fission yeast to the macroscopic multicellular and highly differentiated C. cinereus. It is of notion that the presence of CSL homologs in fungi is not universal as there are no representatives found in either of the later branching ascomycetal groups, Saccharomycotina, including the important model organism S. cerevisiae, and Pezizomycotina. Our data support the idea that the ancestral CSL gene originated in the last common ancestor of animals and fungi, thus much earlier than previously assumed. This is in accord with the absence of CSL family in such large groups as plants and mycetozoa, that branched off earlier in evolution [25, 26]. We hypothesize that the first CSL gene might have been created from a Rel-type transcription factor gene by the insertion of a beta-trefoil domain-encoding DNA sequence in between the amino- and carboxy-terminal Rel domains. Subsequently, a duplication event took place in the fungal lineage creating the two CSL classes we see today, class F2 being more alike the metazoan CSL proteins and class F1 being more fungi-specific (see Fig. 4). We consider such explanation more likely than the alternative, where the ancestral CSL gene would both originate and undergo duplication in the common ancestor of metazoans and fungi and one copy would be soon lost again in the metazoan lineage.
Nevertheless, there have been independent losses of CSL genes in the fungal branch. First, we failed to find any CSL homologs in Encephalitozoon cuniculi (data not shown), a parasitic microsporidian and a representative of a group that is sister to fungi . This fact is probably due to the parasitic lifestyle of these organisms, which often leads to pronounced gene eliminations . Second, we have found no evidence of CSL genes in chytridiomycetes (data not shown), a likely polyphyletic group also basal to the fungal lineage . Finally, the CSL family is apparently missing in the later branching ascomycetal fungi of the Saccharomycotina and Pezizomycotina groups , suggestive of another gene loss(es). The losses may have occurred during the transitions between saprophytic and parasitic nutritional modes , indicating that the CSL genes code for functions in fungi that are not universally required in their life cycles. On the other hand, there have been clade specific CSL genes multiplications in fungi illustrated by the three class F1 and four class F2 CSL genes of Rhizopus oryzae. Evolutionary pressure could have favored proliferation and diversification of the CSL family in this branch of zygomycetes, similarly to the expansions that were documented for other gene families and phyla, such as, e.g., nuclear hormone receptors and nematodes, or calmodulin-type proteins and dictyostelids, respectively [28, 29]. A history of gene losses and duplications in the fungal lineage has also been described for proteins involved in various RNA silencing phenomena . The metazoan CSL genes (class M) obviously underwent duplication too. It likely occurred in the common ancestor of all vertebrates and gave rise to the RBP-L type of proteins, in addition to the RBP-Jκ type universally present in both vertebrate and invertebrate animals. It should be noted in this regard, that the RBP-L type gene is present in zebrafish, but so far no homologs have been reported in the genetically rather complicated clawed frog Xenopus laevis. We have also failed to identify an RBP-L homolog in the more tractable species X. tropicalis, thus amphibians likely have developed ways to regulate all their CSL-responsive genes using the RBP-Jκ homolog only. In summary, we have found representatives of the important transcription factor family CSL, up to now generally considered metazoan-only, in several groups of fungi and showed that they are an ancient gene family that originated much earlier than their current metazoan affiliates like Notch or Mastermind .
The degree of conservation of CSL proteins across phylogeny is remarkable, given the evolutionary distances, and points to an important role they likely play in cells . The sequence similarity among metazoan CSL proteins is extremely high and does not allow for finding functionally important regions directly from sequence comparison. On the other hand, the distant CSL homologs from fungi may provide this information more readily. Indeed, we have found that the most prominent conservation can be found in the regions involved in DNA binding with the critical residues and several motifs being invariant in all proteins analyzed (see Fig. 2 and 3). As expected, when compared to metazoans, the rate of divergence has been much faster in fungi, especially in those having small genomes, i.e. C. neoformans, S. pombe and S. japonicus [31–33]. In fact, the C. neoformans CSL proteins are the most divergent ones among fungi and their position in our phylogenetic tree (Fig. 4) differs from that expected by looking at the fungal tree of life . Such discrepancy has also been reported for other C. neoformans proteins  and it has been demonstrated for S. pombe that various types of proteins might produce inconsistent signals when used for phylogenetic analyses .
There are numerous insertions separating the above-mentioned conserved sequence stretches, but these insertions are often rich in amino acids that are likely to appear in loops and solvent-exposed regions . In addition, such insertions are present, to a lesser degree, also in the C. elegans LAG-1, the most evolutionarily primitive CSL protein studied so far . It may be argued that the fungal insertions could be an artifact produced by ORF misprediction. We cannot rule out this possibility completely as the tools for identifying exon-intron boundaries optimized for diverse fungal species are limited or lacking. However, many of these insertions are conserved among the classes of CSL proteins and their positions mostly correspond to the LAG-1 loops and regions exposed on the surface of the protein . Thus the general CSL fold may be well preserved in fungi.
Furthermore, the splicing pattern of some fungal CSL genes is partially conserved among species (data not shown) and the ORF predictions used in this study are in good agreement with the multiple sequence alignment of the proteins they encode. Nevertheless, the prediction reliability of the non-conserved amino-terminal extensions found in some fungal CSL proteins remains questionable. The sequence similarity in the parts of the fungal proteins corresponding to known coregulator interaction sites in metazoans seems not to be significantly preserved. This is of no great surprise as these coregulators are frequently involved in the Notch signaling pathway, which is lacking in fungi, or are encoded by mammalian viruses [5, 13]. Also, the less-conserved metazoan RHR-C domain of yet unknown function is very loosely defined in fungi, as it was identified with confidence only in several class F2 members. Taken together, our data suggest that the fungal CSL proteins may adopt the CSL fold and we further show that these proteins posses notably conserved regions of functional significance related mostly to their ability to bind DNA in a sequence-specific manner.
Our current knowledge of the CSL family derives exclusively from metazoan model organisms and is based mostly on studies concerning development and the Notch pathway [2, 9, 13]. It is now clear that this is not the whole picture as we have presented evidence of CSL proteins in several organisms that are evolutionarily distant to animals and lack the critical Notch pathway components. Moreover, recent reports on metazoan model organisms indicate, that there are yet unrecognized CSL activities in animals as well [7, 8, 10, 11]. It is tempting to speculate that the CSL ancestral function is preserved in the fungal proteins of today and maybe even in metazoans, where it might be responsible for some of the Notch-independent activities observed. If this is the case we would have excellent models, e.g., the genetically tractable fission yeast S. pombe, to study it.
We hypothesize that the ancestral function is likely the regulation of gene expression, where other signals than Notch receptor activation are interpreted. Our first clue comes from the analysis of fungal CSL sequence conservation, which clearly indicates their potential to bind DNA. This includes not only DNA binding in general, but goes further to the ability to recognize the strict CSL consensus. The second clue derives from the lack of conservation of CSL interacting partners from metazoans. As stated above, the Notch receptor, its ligands and coactivators are not present in fungi. Finally, the metazoan CSL proteins are essential for embryonic development but dispensable in cultured cells . Similarly, the deletion of either or both S. pombe CSL genes is viable (MP et al., manuscript in preparation; and [36, 37]). This suggests, together with the secondary loss of CSL genes in some fungi (see above), that the proposed ancestral function in gene regulation is not essential.
We also have to account for the existence of two CSL classes in fungi. There is analogy to the metazoan class M sub-groups, the RBP-Jκ and RBP-L CSL types. Both are involved in transcription regulation, but differ in their interacting partners, their responsiveness to various signals, their expression profiles and their in vivo DNA-binding preferences [11, 12]. The similar may be true for class F1 and class F2 fungal CSL proteins. They may all participate in transcription regulation, but have either distinct or only partially overlapping target gene sets. Alternatively, they may differentially regulate the same genes, with the outcome depending on, e.g., environmental conditions. It was indeed found by whole-genome microarray experiments, that the S. pombe CSL genes display differential expression during sexual differentiation and under various stress conditions [38, 39]. In conclusion, the CSL gene family encodes proteins that are likely universally involved in the regulation of transcription both in animals and fungi.
We have shown the existence of CSL transcription factor family, known from studies of the metazoan Notch signaling pathway, in several fungal species. We have described conserved features of the fungal proteins supporting their identity as true CSL family members. These findings put the CSL family origin further back in evolution, deeper than currently understood. We have mapped the history of CSL gene duplication and gene loss events in the fungal lineage, showing the existence of two well-defined CSL classes, class F1 and class F2, respectively, with the second class being more similar to the metazoan class M proteins. We hypothesize that the ancestral CSL function involved DNA binding and Notch-independent regulation of transcription and that this function may still be shared, to a certain degree, by the present CSL family members from both fungi and metazoans. If true, that would allow for exploiting the simple fungal models to analyze this function. We are currently studying the CSL proteins role in S. pombe and experiments are underway to identify the sets of genes and processes they regulate.
We have searched multiple publicly available fungal genome and protein databases (including NCBI  and UniProt ) using the appropriate BLAST algorithm with default settings and with the mouse CBF1 protein [GenBank:NP_033061] as a query. Candidate hits containing at least one of the conserved CSL motifs (see Results) were considered and used for further analyses. The BLAST searches were then repeated with all the newly identified CSL sequences as queries until no more new hits were found. In cases where two or more nearly-identical candidate sequences, coming from independent sources and obviously representing a single gene, were found, the sequence showing the highest degree of similarity to the fungal CSL consensus was chosen. The final searches were performed between November 24, 2006 and November 30, 2006.
All candidate fungal CSL proteins were checked for the quality of their ORF prediction. We compared each database gene model with GenScan  and/or WebGene  predictions. The models were also compared to a multiple sequence alignment of other CSL proteins. In some cases, the splicing pattern was corrected manually using the Gene Runner 3.05 software (Hastings Software, Inc.) in order to restore a highly conserved region (see Results and Additional files 1 and 2).
Known domains present in the fungal CSL proteins were searched for by the Search Pfam server . Subcellular localization of each CSL protein was predicted by three independent algorithms, namely SubLoc v1.0 , CELLO v.2.5  and PSORT II . Each sequence received score ranging from '-' to '+++' depending on the number of times the protein was predicted to be nuclear (see Table 1).
Alignments used during the sequence retrieval part of the study were performed using ClustalW . The final alignment of all identified fungal and selected metazoan CSL proteins was based on a ClustalX output (Blosum matrix series) , which was then manually edited in BioEdit 220.127.116.11 to correct some obvious alignment errors and to account for the information from the C. elegans CSL protein crystal structure . See Additional file 3 for the final alignment and the list of metazoan sequences used.
For tree construction all positions containing gaps were removed from the final sequence alignment. An unrooted phylogenetic tree was then generated for the region corresponding to RHR-N and BTD domains (from helix α2 just before the βC4 linker, residues 210–535 in the C. elegans LAG-1 reference protein, see ) using the neighbor-joining method in the MEGA 3.1 software package  with 2000 bootstrap replicates.
This work was supported by the Grant Agency of the Charles University grant no. 157/2005/B-BIO/PrF, the Czech Science Foundation grant no. 204/03/H066 and the Czech Ministry of Education, Youth and Sport grant no. MSM0021620858.
We would like to thank Marian Novotný and Fatima Cvrčková for their expert help and suggestions in the initial phase of this study.
Data for P. chrysosporium CSL gene model prediction has been provided freely by the JGI for use in this publication only.
Data for R. oryzae, C. cinereus and S. japonicus CSL gene model prediction were obtained from the Rhizopus oryzae Sequencing Project, Coprinus cinereus Sequencing Project and Schizosaccharomyces japonicus Sequencing Project, respectively. Broad Institute of Harvard and MIT http://www.broad.mit.edu.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.