Prediction of the general transcription factors associated with RNA polymerase II in Plasmodium falciparum: conserved features and differences relative to other eukaryotes
© Callebaut et al; licensee BioMed Central Ltd. 2005
Received: 12 March 2005
Accepted: 23 July 2005
Published: 23 July 2005
To date, only a few transcription factors have been identified in the genome of the parasite Plasmodium falciparum, the causative agent of malaria. Moreover, no detailed molecular analysis of its basal transcription machinery, which is otherwise well-conserved in the crown group of eukaryotes, has yet been reported. In this study, we have used a combination of sensitive sequence analysis methods to predict the existence of several parasite encoded general transcription factors associated with RNA polymerase II.
Several orthologs of general transcription factors associated with RNA polymerase II can be predicted among the hypothetical proteins of the P. falciparum genome using the two-dimensional Hydrophobic Cluster Analysis (HCA) together with profile-based search methods (PSI-BLAST). These predicted orthologous genes encoding putative transcription factors include the large subunit of TFIIA and two candidates for its small subunit, the TFIIE β-subunit, which would associate with the previously known TFIIE α-subunit, the TFIIF β-subunit, as well as the p62/TFB1 subunit of the TFIIH core. Within TFIID, the putative orthologs of TAF1, TAF2, TAF7 and TAF10 were also predicted. However, no candidates for TAFs with classical histone fold domain (HFD) were found, suggesting an unusual architecture of TFIID complex of RNA polymerase II in the parasite.
Taken together, these results suggest that more general transcription factors may be present in the P. falciparum proteome than initially thought. The prediction of these orthologous general transcription factors opens the way for further studies dealing with transcriptional regulation in P. falciparum. These alternative and sensitive sequence analysis methods can help to identify candidates for other transcriptional regulatory factors in P. falciparum. They will also facilitate the prediction of biological functions for several orphan proteins from other apicomplexan parasites such as Toxoplasma gondii, Cryptosporidium parvum and Eimeria.
Each year 300–500 million people suffer from malaria while 1.5 to 2 million, mostly children, die as a result of the infection (Global Health Council, 2003). The lethal form of human malaria is caused by the infection with the obligate intracellular protozoan parasite Plasmodium falciparum, which displays a developmental life cycle alternating between a vertebrate and an invertebrate host. Infection by the sporozoite form of the parasite occurs after the female Anopheles mosquito's bite. The parasite then enters hepatocytes and multiplies by an asexual division process named schizogony. The resulting merozoites then invade erythrocytes and the parasite goes through a series of morphological changes upon massive rounds of asexual division (ring, trophozoite, schizonte and merozoite). The intermittent fevers, characteristic of malaria infection, are attributed to cycles of erythrocyte invasion, asexual reproduction by schizogony, and release of asexual parasites (merozoites) after rupture of infected red blood cells. For completion of the host-vector cycle, some intra-erythrocytic asexual forms do not undergo schizogony but transform into sexually dimorphic male and female gametocytes upon differentiation. Gametocytes are taken into the mosquito's midgut during a blood meal and complete their sexual development to gametes which will fuse to form a motile zygote named the ookinete. The ookinete grows into an oocyst, dividing into numerous sporozoites that will invade the salivary glands of the mosquito ready for a new cycle of infection .
During the complex life cycle of P. falciparum which takes place in both a vertebrate and an invertebrate host, the intracellular development of the different asexual and sexual stages proceeds through a dynamic and multistep process for which the parasite has evolved complex molecular strategies. Several pioneering studies have previously demonstrated that transcriptional regulations are involved in the control of gene expression in the various P. falciparum life cycle forms [2–6]. The recent completion of the full genome sequence of P. falciparum has been useful in studying the global and complex gene expression patterns using microarrays and proteomic approaches. Indeed, these studies suggested that there is a coordinated program of gene expression during the intra-erythrocytic development of the parasite. Microarray data have revealed a sequential expression of transcripts in which messenger RNAs involved in protein synthesis peak at first, followed by metabolism-related genes, then adhesion/invasion genes, and lastly protein kinases [7–10]. Global proteome analysis of sporozoites, merozoites, trophozoites, and gametocytes using tandem mass spectrometry analysis have been used to show that many co-expressed proteins are encoded by genes that are clustered on certain chromosomes [11, 12]. These recent studies on gene expression also show that transcription of multiple genes may be achieved by a single developmental induction event resulting in a cascade of gene expressions. This further suggests that only a few specific transcription factors may be required . Nevertheless, it has been established that the gene structure of P. falciparum is similar to that of other eukaryotes [13, 14], with the common features including the monocistronically transcribed genes, the presence of 5' and 3' untranslated regions, introns, promoter regions and probably the myriad of transcription factors that are involved in eukaryotic gene expression in general.
Transcription in eukaryotic structural genes requires the assembly of RNA polymerase II (RNAP II) and the general transcription factors (GTFs) on the promoter to form a pre-initiation complex. These basic factors include RNA polymerase II itself and at least six GTFs: TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH, most of which are themselves multiprotein complexes [15, 16]. While the RNA polymerase I, II, III and TATA-binding protein [17–21] have been described in P. falciparum, the elucidation of the mechanisms involved in transcriptional regulation in the parasite is still challenging. For example, the identification of orthologous proteins including the general transcription factors (GTFs) involved in the RNAP II transcription machinery remains elusive. Therefore, the composition and nature of the highly conserved general transcription factors associated with the RNAP II are presently unknown in P. falciparum. In contrast to most eukaryotic genomes, the extensive analysis of the P. falciparum genome has only revealed a few general transcription factors, like TBP and TFIIB . More recently, Coulson et al.  have utilized profile-Hidden Markov Models (HMMs) of transcriptional regulators and found a relatively low number of malarial transcription-associated proteins (TAPs) including the general transcription factors associated with RNAP II. Only TFIIB, TFIIEα and a few components of TFIIH were identified in P. falciparum. In addition, no homolog of the RNAP II-associated TFIID complex, which is essential for the basal transcription in eukaryotes, was found, except the TATA-binding protein (TBP) . Therefore, it has been suggested that only a few specific transcription factors may be required for transcription regulation in the parasite. However, the parasite protein levels may also be primarily determined by posttranscriptional mechanisms [9, 10, 23]. The high proportion of orphans in the Plasmodium genome relative to other organisms (~60% ORFs which have no match with any known sequences ) suggests that the paucity of recognizable orthologous GTFs associated with RNAP II in P. falciparum may be explained in a different way. As ORF, gene and function predictions have been performed in a similar way in Plasmodium and in other sequenced genomes (such as various predictive tools trained on the Plasmodium sequences; BLAST with default parameters [24, 25]), two hypotheses can be raised. First, it is possible that the parasite proteins have structurally evolved beyond the point where they cannot be identified by simple similarity searches [23, 26]. Second, the extraordinary bias toward A+T richness (80%) in nucleotide composition of the parasite, may introduce large changes in both DNA and amino acid sequences which may affect the search procedures. This is particularly striking with an overall high A+T nucleotide content in protein-coding regions, leading to a remarkable bias toward the presence of stretches composed of a few amino acids only. Therefore, it is likely that a substantial number of the unusually high proportion of malarial orphan proteins with no predictable function may actually correspond to «hidden» orthologues. Interestingly, we and others ([22, 27]; our unpublished results) have observed that there is a strong selection against low complexity inserts within core secondary elements of secondary structures of P. falciparum proteins. The low complexity sequences are mostly located between two adjacent globular domains and only infrequently invade globular domains.
In the present study, we postulated that the hydrophobic cores of globular domains in functional proteins of P. falciparum should be largely conserved. Consequently, these hydrophobic cores could be identified using appropriate tools involving the analysis of the secondary structure, which is often much more conserved than the primary structure . We have developed and applied a two-dimensional approach of sequence analysis, called Hydrophobic Cluster Analysis (HCA), which has been useful for the prediction of orthologous proteins in different eukaryotic lineages [29, 30]. HCA is based on the physico-chemical and topological properties underlying the fold of globular domains. It allows a direct access to the gravity centers of regular secondary structures (RSSs). This information can be used to pick up hidden relationships within non-significant results provided by standard similarity search methods, based on literal approaches. Indeed, the positions of hydrophobic clusters defined using HCA, which distinguish from simple binary patterns, mainly correspond to those of regular secondary structures [31, 32]. Importantly, HCA is not sensitive to gaps, even large, the handling of which is one of the main obstacles of conventional sequence comparison methods. The distribution of the secondary structures also indicates the limits of structured domains. This information can help the computational analysis, in particular for P. falciparum sequences for which low complexity regions often disturb standard similarity searches. Using the HCA methodology in combination with standard similarity search methods, we have explored the P. falciparum sequences for the presence of subunits of the basal transcription factors and cofactors associated with RNA polymerase II (RNAP II). Our data suggested that several orphan proteins of P. falciparum can be predicted as general transcription factors involved in the parasite RNAP II transcription machinery.
General transcription factors predicted in Plasmodium falciparum
Nuclear signals prediction
TFIIH core p62/TFB1
MAL3P7.42 *+ (Chr3.phat_258)
TFIIH core p52/TFB2
TFIIH core p44/SSL1
TFIIH core p34/TFB4
TFIIH core TFB5
R, T, G
TFIIH core XPB/SSL2-RAD25
TFIIH CAK MAT1/TFB3
TFIIH CAK Cdk7/KIN28
TFIIH CAK Cyclin H/CCL1
P51946 (cyclin H)
In all cases with marginal similarities (PSI-BLAST E-values > 0.005), the alignment with the candidate hypothetical protein has low Expected E-values, proximal to the threshold value. These values are lower than those observed for alignments with other P. falciparum hypothetical proteins. However, other potential candidates (which have higher Expected E-value) were carefully checked by HCA for similarities that might be supported at the 2D level.
General transcription factors predicted in this study in four Plasmodium species
P. yoelii yoelii
TFIIA large subunit
MAL7P1.78 197 aa
PY01022 57% N-ter 133 aa*
PC302380.00.0 57% N-ter 133 aa*
PB000347.02.0 59% N-ter 133 aa*
TFIIA small subunit
PFL2435w 131 aa
chrPyl_02265-4-2031-1630 71% tl 134 aa
PC403116.00.0 68% 43 aa (partim)
PB101071.00.0 63% 36 aa (partim)
B: A.thaliana (gi 1429228)
PFI1630c 184 aa°
PY01831 51%tl 200 aa
PC000365.00.0 80% tl 105 aa
PB001668.02.0 95% tl 200 aa
B: G. theta (gi 4583664)
PFL1645w 3896 aa
PY03752 65% $ 3182 aa
PC000201.00.0 64% $ 1254 aa
PB000870.00.0 64% $ 843 aa
MAL7P1.134 3351 aa
PY03343 80% $ 1684 aa
PC000872.02.0 80% $ 1353 aa
PB000540.02.0 78% $ 926 aa
PY04173 58% tl 321 aa
PC000532.04.0 56% tl 387 aa
PB000149.02.0 54% 325 aa
PFE1110w 116 aa
PB108412.00.0 51% tl 93aa
B: O.sativa (gi 50726230)
MAL7P1.86 400 aa
PY00824 57% tl 369 aa
PC000361.01.0 64% tl 386aa
PB000518.01.0 64% tl 381 aa
MAL13P1.360 542 aa
PY01317 53% $ 2329 aa
PC103304.00.0 81% $ 207 aa
PB100065.00.0 54% $ 548 aa
B: S.cerevisiae (sp P36145)
PF11_0458 317 aa
PY03467 60% tl 310 aa
PB000215.00.0 60% tl 175 aa*
B: C.parvum (gi 46228562)
MAL3P7.42 670 aa
PY00359 59% tl 674 aa
PC000077.04.0 62% tl 682 aa
PB000867.00.0 71% tl 343 aa
chrPyl_00238-4-3595-3377 92% tl 73 aa
Pc_1897-6-1673-1455 92% tl 73 aa
PB000215.03.0 91% tl 67 aa
Reciprocal searches were carried out for all the predicted GTF components. In most cases (indicated with a "+" in Table 2), these led to the retrieval, with significant E-values, of the corresponding sequences in other eukaryotes. The reciprocal searches were often conducted using as a probe the similarity region, excluding low complexity regions that are abundant in P. falciparum sequences. However, the profiles deduced from the P. falciparum sequences are generally less informative than those constructed using as probes the human or yeast sequences. As a consequence, such reciprocal searches resulted, in a few cases, in the retrieval of the corresponding sequences in other eukaryotes with marginal, but low expected E-values (just above the threshold E-value). It should be noticed that the sequences of another apicomplexan parasite, Cryptosporidium parvum, often constituted the link between Plasmodium and the crown group eukaryotes.
Additional support for our predictions also comes from other data, such as the prediction of nuclear localisation and nuclear export signals (NLS and NES), as well as the analysis of expression patterns (Table 1). However, it should be mentioned that nuclear factors do not always require the presence of NLS or NES for their targeting into the nucleus. For instance, it has recently been described that the nuclear transport of human TAF10, which lacks both NLS and NES, is mediated by its interacting partners, which contain the nuclear targeting signals .
Throughout this study, we decided to designate the putative transcription factor, Pf TFIIA for P. falciparum ortholog of higher eukaryote TFIIA. The same nomenclature will be used for the other basal transcription factors and cofactors identified here.
The TFIIA proteins form a ternary complex with TBP and DNA. It stabilizes the TBP-DNA binding and promotes the binding of TFIID complex to DNA. Yeast TFIIA is composed of two subunits (TOA1 and TOA2), which can each be divided in two parts, a N-terminal helical region and a beta-strand containing C-terminal region. The N-terminal regions of the two subunits form together a four-helix bundle, whereas the two C-terminal ones fold as a six-stranded beta-barrel contacting TBP-DNA [34, 35]. The human TFIIA homologue is made of three polypeptide chainsα/β (large subunit encoded by a single chain, which is post-translationally processed) and γ (small subunit) .
Using the sequence of the yeast small subunit TOA2, as query in a PSI-BLAST search, no significant sequence similarity could be found with P. falciparum proteins derived from the whole genome databases at convergence by iteration 2. However, a marginal similarity (E-value of 4.6) was highlighted with the PFL2435w hypothetical sequence, over 83 amino acids (22% identity). This similarity was supported at the 2D level using HCA (Fig. 2, panel B). It covers the N-terminal region as well as the two first strands of the C-terminal region. The third strand can be tentatively identified at the C-terminus of the P. falciparum sequence, when a large insertion is made between the second and third beta-strands. This large insertion likely corresponds to a globular sequence, as assessed by the presence of hydrophobic clusters. A large loop region also exists in this location in the human and yeast sequences, but was not observed in the solved corresponding three-dimensional structures. Another marginal similarity was observed at similar level of Expected-value in the PSI-BLAST results with a hypothetical protein of P. yoelii (PY01831; 24% identity over 82 amino acids, E-value= 0.084). However, this hypothetical protein does not correspond to the PFL2435w homolog. Instead, the PY01831 homolog in P. falciparum corresponds to the PFI1630c hypothetical protein (43% identity). This similarity was however not detected by PSI-BLAST because the PFI1630c sequence was incorrectly predicted (part of the coding region was inappropriately predicted as an intron; more explanations are given in the legend of Fig. 2). The corresponding alignment was also supported at the 2D level using HCA (Fig. 2; panel B). The PFI1630c hypothetical protein contains an N-terminal extension, relative to the human TFIIA γ/yeast TOA2 sequences. This suggests that two genes could exist as functional TFIIA small subunits in Plasmodium falciparum. Multiple genes that encode general transcription factors have already been described for the TATA-box binding protein (TBP) in several species [38–40] and for TFIIA α/β in humans .
TFIIB, which associates with TFIIA, is the only putative general transcription factor (PFA0525w) that was so far identified during the annotation of P. falciparum genome. It was confirmed by specific HMM searches performed by Coulson et al. .
Evidence for the presence of some P. falciparum TBP-associated factors (TAFs) involved in the multiprotein PfTFIID complex
The TATA-binding protein (TBP) and many TBP-associated factors (TAFs) form the multimeric TFIID complex . While TBP is sufficient for basal transcription in vitro, the TAF subunits of TFIID are essential cofactors for transcriptional activation by providing interaction sites for activators. Yeast TFIID contains 14 TAFs and homologues of many of these TAFs are found in metazoans (Table 1 and ). Analysis of the architecture of yeast and metazoan TFIID revealed that more than half of the TAFs contains a histone fold motif (HFD) (Table 1 and ). These HFDs specifically assemble into five histone-like pairs.
While TBP was clearly identified in the P. falciparum genome (PFE0305w, Table 1), no orthologous TAFs have been described so far from the genome sequence data of several apicomplexan parasites including several Plasmodium species [24, 43]http://www.plasmodb.org, Cryptosporidium parvum  and T. gondii http://www.toxodb.org. This suggests that this TAF detection failure cannot only be ascribed to the A+T richness of the genome. Indeed, unlike Plasmodium species, the other apicomplexan parasites do not display a bias toward A+T richness. Instead, it is likely that the amino acid sequences of TAFs in apicomplexan parasites reached a point of divergence that hinders their prediction using classical similarity searches. Here, we searched for the presence in P. falciparum of each of these TAFs, including those which contain histone fold motifs.
• PfTAF1 (hTAF250/yTAF145)
• PfTAF2 (hTAF150/yTAF150)
TAF150 proteins have a non-specific aminopeptidase domain in their N-terminal parts. We therefore focused our searches on the C-terminal parts of the proteins. Using PSI-BLAST and the yeast TAF150 C-terminal domain (aa 701 to 1407) as query, a significant hit appeared by iteration 2 with the P. yoelii PY03343 hypothetical protein (E-value 0.002), together with those relative to other metazoan TAF150. The identification of P. yoelii orthologous TAF2 has been used to discover the P. falciparum TAF2, which is currently named in the annotated genome as the hypothetical protein MAL7P1.134. This sequence was scored with a significant E-value (3 10-7) by iteration 3 (Fig. 3, panel B). This similarity is supported at the 2D level and concerns the region which is most conserved in the TAF150 C-terminal domain amongst the different species. This suggests that the P. falciparum protein pinpointed here might be the TAF2 ortholog in the parasite.
• PfTAF7 (hTAF55/yTAF67)
TAF7 proteins possess a conserved domain (TAFII55 protein conserved region), located between amino acids 112 and 305 (yeast) or amino acids 12 and 178 (human) . Using this domain as query in PSI-BLAST led to the identification of significant similarities from the second iteration with both P. yoelii PY04173 (E-value 2 10-6) and P. falciparum PFI1425w (E-value 2 10-6) hypothetical proteins. This similarity, limited to the first part of the TAFII55 protein conserved region (PFI1425w aa 161 to 242), is supported at the 2D level (Fig. 3, panel C). This similarity was also retrieved when scanning the Pfam database (pfam04658.5, TAFII55_N). However, the globular domain of the P. falciparum proteins in which the TAFII55-like region is included appears to be larger (aa 148 to ~ 325), and thus might share a similar length to the complete TAFII55 protein conserved region. The region of similarity shared by P. falciparum PFI1425w and other TAF7 was previously shown to be critical for interaction with the bromo domain factor Bdf1 of yeast cells .
• PfTAF10 and the apparent lack of TAFs assembling into histone-like pairs in P. falciparum
The histone fold domain (HFD), the core of which is characterized by three alpha-helices, is a fundamental interaction motif involved in heterodimerization of the core histone (H4-H3, H2A-H2B) and their assembly into a nucleosome octamer. This motif is thought to have arisen from the duplication of a minimal helix-extended-helix structure. The two middle helices of the duplicated structure would have fused to form a long, central helix. The histone fold domain can be accompanied by N- or C-terminal extensions, also made of alpha-helices and is found in several non-histone proteins, in addition to core histones [42, 47].
Analysis of TFIID has shown that more than half of the TAFs constituting this complex are HFD containing proteins (reviewed in ). This led to the first hypothesis of a compact nucleosome-like octamer core in TFIID, which could bind DNA and around which other TAFs could associate  (reviewed in [49, 50]). This proposal has however to be revisited in light of recent experimental data, highlighting a more complex situation than initially thought. First, irrespective to the nature of the quaternary structure (nucleosome-like octamers, as observed for the TAF4/TAF12 – TAF6/TAF9 assembly , or other structures), it has been shown that surface residues of core histones known to make critical contacts with DNA in the nucleosome are generally not conserved in TAF HFDs [52, 53]. This suggests an alternative role for HFD in TAFs than DNA binding. Second, immunolabeling electron microscopy experiments have demonstrated that the HFD-containing TAFs are located in three distinct substructures of TFIID, which are assembled by thin linker domains in a molecular clamp architecture . The TAF4/TAF12 – TAF6/TAF9 assembly was shown to co-localize in one of the three lobes of native TFIID . These structural data were supported and enriched by additional mapping of other TFIID functional sites .
Our searches for HFD-containing TAFs in the Plasmodium genome did not lead to the identification of any of the five histone-like pairs currently known in other eukaryotic species (TAF6-9, TAF11-13, TAF4-12, TAF3-10 and TAF8-10). These searches were performed using as queries the full-length sequences of yeast and human TAFs, their HFDs, and specific domains accompanying HFDs (e.g. for TAF4, we considered the specific TAF4 domain, including HFD (hTAF135 aa 832 to 1083); the HFD (hTAF135 aa 835 to 950) and the TAFH sequence (hTAF135 TAF homology region, also known as nervy homology region 1 (NHR1); smart00549; aa 590 to 649)).
Taken together, our data strongly suggest the apparent and unexpected lack of HFD containing TAFs in P. falciparum, except from TAF10. This TAF however remains to be determined as a genuine HFD-containing factor in the parasite and also in the other eukaryotic cells.
Other undetected TAFs within the multiprotein PfTFIID complex
• PfTAF5 (hTAF100/yTAF90)
This protein, interacting with TFIIFβ, contains WD40 repeats. We specifically limited our searches to the WD40 associated region in TFIID subunit (pfam04494; aa 194–340 of hTAF100), but these did not lead to the identification of potential TAF5 candidates. We were thus unable to report the presence of a putative TAF5 ortholog in P. falciparum.
• PfTAF14 (hENL-AF9/yTAF30)
No P. falciparum orthologue could be found for the second subunit of TFIIE, named TFIIE in a first approach using as queries the human and yeast sequences in PSI-BLAST searches. These two sequences display a low level of sequence identity (less than 30%), especially concentrated in the C-terminal segment. We adopted an iterative strategy, by using distant sequences of the family described in the PSI-BLAST data and thereby discovered potential relationships with Plasmodium proteins. Hence, using the sequence of the hypothetical protein CNBI3180 from Cryptococcus neoformans, which shares 22% of sequence identity with the S. cerevisiae TFA2 over 293 amino acids (E-value by convergence 1 10-28), we found by iteration 3, a significant similarity with the C-terminal fragment of a hypothetical sequence from P. yoelii (PY01317; 15% identity over 219 amino acids, E-value 2 10-38). This relationship was supported at the 2D level (Fig. 6, panel B), especially highlighting well conserved hydrophobic clusters common to distant members of the family (for example, see the hydrophobic cluster highlighted with an asterisk in Fig. 6, panel B). The corresponding sequence in P. falciparum, MAL13P1.360, was found by searching the predicted annotated proteins within PlasmoDB (version 4.3, November 2, 2004; 81% sequence identity with PY01317). The comparison of the whole set of sequences of the TFIIE β subunit family revealed a region of variable length in the middle of the TFIIE β domain. This region is particularly rich in cysteine residues in the proteins of apicomplexan parasites such as P. falciparum, P. yoelii and Cryptosporidium (boxed in Fig. 6, panel B).
TFIIF, a tetramer of two subunits, named α (mammalian RAP74/yeast TFG1) and β (mammalian RAP30/yeast TFG2), is intimately associated with the RNA polymerase II enzyme . The TFIIF complex directly binds promoter DNA, TFIIB and the TAF250 subunit of TFIID, and recruits TFIIE and TFIIH to the preinitiation complex 
RAP74 (α subunit) also possesses two N- and C-terminal globular domains, separated by an unstructured linker sequence. As indicated above, the N-terminal domain, which is responsible for TAF250 and RAP30 binding, forms with the RAP30 N-terminal domain a triple barrel dimerization fold . The C-terminal domain, interacting with TFIIB, FCP1 and DNA, folds as a winged-helix . We thus restricted our queries to these two domains, the limits of which were identified through HCA (from aa 1 to 180 and 450 to end). However, we found no significant similarity with any parasite proteins including several Plasmodium species and Cryptosporidum parvum. We also did not see any marginal similarity, which could be confirmed at the 2D level. Given the single core structure that the two subunits form together, with three interwoven beta- barrels, it seems unlikely that the RAP30 homolog exists alone in apicomplexan parasites. Alternatively, either the RAP74 subunit is too divergent to be detected using the available tools, including HCA, or it does not really exist, suggesting the replacement of a α-β heterodimer by a β-β homodimer instead, given the similar architecture of the two subunits.
The general transcription factor TFIIH is the largest and most complex of all. Indeed, it is composed of nine subunits with molecular mass (460 kDa) similar to that of RNA PII, with several subunits having enzymatic activities (reviewed in [67, 73–75]). TFIIH has a dual action in both transcription initiation and nucleotide excision repair (NER). It is organized into two structural and functional entities. The first of these, the TFIIH core, includes four polypeptides (named P62, P52, P44 and P34 in human; yeast orthologous sequences are indicated in Table 1) and the xeroderma pigmentosum B (XPB) helicase. The second functional entity, the CDK-Activating Kinase (CAK) complex, is composed of the cyclin-dependent kinase Cdk7, cyclin H and MAT1. The XPD (RAD3) helicase bridges the two complexes, being associated either with the core or CAK. In addition to this, a new subunit of the TFIIH core, TFB5, has recently been discovered, associated in humans with DNA repair-deficient trichothiodystrophy [76, 77].
Only a few components of the general transcription machinery have been identified to date in P. falciparum . Of the 33 general transcription factors listed in Table 1, only one third (ten subunits) were predicted from simple similarity searches  and previous analysis . This percentage may reflect the poor proportion of gene with automatically predicted function in the complete parasite genome . Hence, the TATA binding protein or TBP is the only known component of the TFIID complex, which has been identified. The multicomplex TFIID remains however essential for accurate and higher transcription levels in eukaryotic cells. Therefore, the paucity of both defined malarial TFIID orthologous components and of general transcriptional factors, contrasts significantly with the situation reported for the crown group eukaryotes, in which TFIID is well conserved even though some differences can be seen for transcription cofactor complexes. Here, the use of the sensitive Hydrophobic Cluster Analysis (HCA) in combination with profile-based search methods suggests that the genome of P. falciparum contains several gene products annotated as hypothetical proteins, which can be predicted as putative general transcription factors (GTF) associated with RNAP II. These include several members of TFIID even if most of the TAF containing histone fold domains (HFDs) remain undetected using the sensitive Hydrophobic Cluster Analysis (HCA). Nine other GTFs were predicted in this way, which brings the total number of predicted subunits of the general transcription machinery to approximately 60% that of the number observed in the crown group eukaryotes.
A first original feature in the P. falciparum predicted GTF sequences is the presence of two genes candidates for the TFIIA small subunit. The presence of two genes has already been described for some GTFs. Indeed, TBP-like proteins are found in A. thaliana, D. melanogaster and H. sapiens [38–40]. Moreover, a functional homolog of TFIIA α/β subunit, which is expressed almost exclusively in testis, has been described . This multiplicity of GTFs is thought to contribute to tissue- and gene-specific regulation. It is therefore possible that the gene candidates for the TFIIA small subunit in P. falciparum are stage-specifically expressed in the parasite life cycle.
Another striking observation is that, among the GTFs associated with RNAP II predicted here, no HFD-containing TAFs could be identified, except for the putative ortholog of TAF10, which was reported by Gangloff and colleagues  as containing a potential HFD. However, the potential HFD of TAF10 might correspond to a distant member of the HFD family and remains to be experimentally proven. On the other hand, TAF10 is the only "HFD" protein which is shared by TFIID and SAGA . It is interesting to note the apparent lack of TAF10 candidate in P. yoelii (Table 2), suggesting that this protein might not be essential in all Plasmodium species. This may be consistent with the apparent absence of other HFD-containing TAFs in all Plasmodium species. The apparent absence of canonical HFD TAFs leads to the hypothesis of a higher divergence of proteins of the HFD family in the Plasmodium genome than that of other proteins, beyond the point where they can be identified using homology searches, even the most sensitive of them. Alternatively, if the absence of HFD containing TAFs is confirmed, this will provide evidence for a striking difference in the quaternary structure of TFIID by comparison to the yeast or human complexes. The latter display a similar architecture, formed by three lobes organized into a molecular clamp [54, 86, 87]. Experimental investigations are needed to further explore this hypothesis. To date, the only Plasmodium proteins with HFD domains listed in the histone database  corresponds to classic nucleosomal histones H2A, H2B, H3 and H4. The linker histone H1 is not found . This suggests that the apparent absence of histone fold proteins in Plasmodium is not only restricted to the TAF proteins of TFIID complex.
In conclusion, we have shown that more general transcription factors can be predicted in the genome of P. falciparum than initially thought. It can be anticipated that the HCA method can also be an additional and important tool for the finding of new orthologs amongst the high proportion of hypothetical proteins or orphans in P. falciparum and other apicomplexan parasites such as Cryptosporidium parvum, Eimeria and Toxoplasma gondii. Virtually nothing is known about transcription regulation in these apicomplexan parasites. To our knowledge, this study describes for the first time the prediction of general transcription factors in the genome of P. falciparum using a sensitive predictive method based on secondary structure considerations (HCA). Based on the GTF orthologs predicted here, there are some differences in the composition, and probably in the nature of some multicomplex factors, as illustrated by the possible absence of HFD containing TAFs in the TFIID complex. The identification of novel transcription elements and understanding how the basal transcription differs in the parasite may be exploited to design selective therapeutic agents against P. falciparum. Additionally, further elucidation of mechanisms controlling transcriptional expression in protozoa may provide a unique perspective on how these systems evolved in early eukaryotic cells.
The non-redundant database (NR; 2 456 374 protein sequences at May 3, 2005) at NCBI (National Center for Biological Information) was searched using the BLASTP program with default parameters  (BlastP 2.2.10, Oct 19, 2004; Blosum 62, gap penalties: existence 11, extension 1). Profile searches were conducted using PSI-BLAST, run until convergence with a default profile inclusion expect (E) value threshold of 0.005. Reciprocal searches were carried out for all the predicted GTF components of Plasmodium falciparum (see comments in the Result sections). The PlasmoDB (version 4.3, November 2, 2004) [25, 89] was also searched using the same tools (BLASTP 2.1.2). Other databases (Pfam , Smart ; CDD ) were also searched for the presence of known domains.
The two-dimensional Hydrophobic Cluster Analysis (HCA) [29, 30] was used to sort at the two-dimension level (2D) the potential sequence and structure relationships. HCA offers the possibility to add to a literal analysis, a lexical one by identifying the regular secondary structures from the consideration of a single sequence. Indeed, the positions of hydrophobic clusters were shown to mainly correspond to the positions of regular secondary structures . These non-intertwined binary patterns, constrained by the consideration of a connectivity distance separating two distinct clusters on the two-dimensional plot (the currently used alpha-helix is associated with a connectivity distance of 4), are much more informative than non constrained ones . Hence, similar structures are often associated with conservation of hydrophobic cluster features, which participate in the protein core, together with sequence similarities. This conservation often helps or allows the alignment procedure for highly divergent sequences (typically within and below the twilight level). This approach has been used to identify new domains (e.g. [93, 94]), link orphan sequences to structural and functional families (e.g.[95, 96]) or identify and characterize catalytic sites (e.g. [97–99]). Other examples can be found at .
We would like to thank Dr. Steven Ball for critically reading the manuscript and the reviewers for providing helpful comments and suggestions. This work was supported by the CNRS through the interdisciplinary program "Protéomique et Génie des Protéines". The support of the CEA (LRC27V) to I.C., K.P. and J.P.M. is also acknowledged.
- Sinden RE: Molecular interactions between Plasmodium and its vectors. Cell Microbiol. 2002, 4: 713-724. 10.1046/j.1462-5822.2002.00229.x.PubMedView ArticleGoogle Scholar
- Lanzer M, de Bruin D, Ravetch JV: A sequence element associated with the Plasmodium falciparum KAHRP gene is the site for developmentally regulated protein-DNA interactions. Nucleic Acids Res. 1992, 20: 3051-3056.PubMedPubMed CentralView ArticleGoogle Scholar
- Lanzer M, de Bruin D, Ravetch JV: Transcriptional mapping of a 100 kb locus of Plasmodium falciparum identifies a region in which transcription terminates and reinitiates. EMBO J. 1992, 11: 1949-1955.PubMedPubMed CentralGoogle Scholar
- Waters AP: The ribosomal RNA genes of Plasmodium. Adv Parasitol. 1994, 34: 33-79.PubMedView ArticleGoogle Scholar
- Scherf A, Hernandez-Rivas R, Buffet P, Bottius E, Benatar C, Pouvelle B, Gysin J, Lanzer M: Antigenic variation in malaria : In situ switching, relaxed and mutually exclusive transcription of var genes during intra-erythrocytic development in Plasmodium falciparum. EMBO J. 1998, 17: 5418-5426. 10.1093/emboj/17.18.5418.PubMedPubMed CentralView ArticleGoogle Scholar
- Dechering KJ, Kaan AM, Mbacham W, Wirth DF, Eling W, Konings RM, Stunnenberg HG: Isolation and functional characterization of two distinct sexual-stage-specific promoters of the human malaria parasite Plasmodium falciparum. Mol Cell Biol. 1999, 19: 967-978.PubMedPubMed CentralView ArticleGoogle Scholar
- Hayward RE, DeRisi JL, Alfadhli S, Kaslow DC, Brown PO, Rathod PK: Shotgun DNA microarray and stage-specific gene expression in Plasmodium falciparum malaria. Mol Microbiol. 2000, 35: 6-14. 10.1046/j.1365-2958.2000.01730.x.PubMedView ArticleGoogle Scholar
- Mamoun C, Gluzman I, Hott C, MacMillan S, Amarakone A, Anderson D, Carlton J, Dame J, Chakrabarti D, Martin R, Brownstein B, Goldberg D: Co-ordinated programme of gene expression during asewual intraerythrocytic development of the human malaria parasite Plasmodium falciparum revealed by microaaay analysis. Mol Microbiol. 2001, 39: 26-36. 10.1046/j.1365-2958.2001.02222.x.PubMedView ArticleGoogle Scholar
- Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch K, Haynes D, De la Vega P, Holder A, Batalov S, Carucci DJ, Winzeler EA: Discovery of gene function by expression profiling of the malaria parasite life cycle. Science. 2003, 301: 1503-1508. 10.1126/science.1087025.PubMedView ArticleGoogle Scholar
- Bozdech Z, Llinas M, Pulliam B, Wong E, Zhu J, DeRisi JL: The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003, 1: E5-10.1371/journal.pbio.0000005.PubMedPubMed CentralView ArticleGoogle Scholar
- Florens L, Washburn M, Raine D, Anthony R, Graiger M, Haynes D, Moch J, Muster N, Sacii J, Tabb D, Witner A, Wolters D, Wu Y, Garder M, Holder A, Sinden R, Yates J, Carucci D: A proteomic view of the Plasmodium falciparum life cycle. Nature. 2002, 419: 520-526. 10.1038/nature01107.PubMedView ArticleGoogle Scholar
- Lasonder E, Ishihama Y, Andersen JS, Vermunt AMW, Pain A, Sauerwein RW, Eling WMC, Hall N, Waters A, Stunnenberg HG, Mann M: Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature. 2002, 419: 537-542. 10.1038/nature01111.PubMedView ArticleGoogle Scholar
- Lanzer M, Wertheimer S, De Bruin D, Ravetch JV: Plasmodium: control of gene expression in malaria parasites. J Exp Parasitol. 1993, 77: 121-128. 10.1006/expr.1993.1068.View ArticleGoogle Scholar
- Horrocks P, Dechering K, Lanzer M: Control of gene expression in Plasmodium falciparum. Mol Biochem Parasitol. 1998, 95: 171-181. 10.1016/S0166-6851(98)00110-8.PubMedView ArticleGoogle Scholar
- Hahn S: Structure and mechanism of the RNA polymerase II transcription machinery. Nat Struct Biol Mol Biol. 2004, 11: 394-402. 10.1038/nsmb763.View ArticleGoogle Scholar
- Veenstra GJC, Wolffe AP: Gene-selective developmental roles of general transcription factors. Trends Biochem Sci. 2001, 25: 665-671. 10.1016/S0968-0004(01)01970-3.View ArticleGoogle Scholar
- McAndrew MB, Read M, Sims PF, Hyde JE: Characterization of the gene encoding an unusually divergent TATA-binding protein (TBP) from the extremely A+T-rich human malaria parasite Plasmodium falciparum. Gene. 1993, 124: 165-171. 10.1016/0378-1119(93)90390-O.PubMedView ArticleGoogle Scholar
- Hirtzlin J, Farber PM, Franklin RM: Isolation of a novel Plasmodium falciparum gene encoding a protein homologous to the Tat-binding protein family. Eur J Biochem. 1994, 226: 673-680. 10.1111/j.1432-1033.1994.tb20095.x.PubMedView ArticleGoogle Scholar
- Fox BA, Li WB, Tanaka M, Inselburg J, Bzik DJ: Molecular characterization of the largest subunit of Plasmodium falciparum RNA polymerase I. Mol Biochem Parasitol. 1993, 61: 37-38. 10.1016/0166-6851(93)90156-R.PubMedView ArticleGoogle Scholar
- Li WB, Bzik DJ, Gu HM, Tanaka M, Fox BA, Inselburg J: An enlarged largest subunit of Plasmodium falciparum RNA polymerase II defines conserved and variable RNA polymerase domains. Nucleic Acids Res. 1989, 17: 9621-9636.PubMedPubMed CentralView ArticleGoogle Scholar
- Li WB, Bzik DJ, Tanaka M, Gu HM, Fox BA, Inselburg J: Characterization of the gene encoding the largest subunit of Plasmodium falciparum RNA polymerase III. Mol Biochem Parasitol. 1991, 46: 229-239. 10.1016/0166-6851(91)90047-A.PubMedView ArticleGoogle Scholar
- Aravind L, Iyer LM, Wellems TE, Miller LH: Plasmodium biology: genomic gleanings. Cell. 2003, 115: 771-785. 10.1016/S0092-8674(03)01023-7.PubMedView ArticleGoogle Scholar
- Coulson RM, Hall N, Ouzounis CA: Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. Genome Res. 2004, 14: 1548-1554. 10.1101/gr.2218604.PubMedPubMed CentralView ArticleGoogle Scholar
- Gardner MJ, Shallom SJ, Carlton JM, Salzberg SL, Nene V, Shoaibi A, Ciecko A, Lynn J, Rizzo M, Weaver B, Jarrahi B, Brenner M, Parvizi B, Tallon L, Moazzez A, Granger D, Fujii C, Hansen C, Pederson J, Feldblyum T, Peterson J, Suh B, Angiuoli S, Pertea M, Allen J, Selengut J, White O, Cummings LM, Smith HO, Adams MD, Venter JC, Carucci DJ, Hoffman SL, Fraser CM: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419: 498-511. 10.1038/nature01097.PubMedView ArticleGoogle Scholar
- Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, Labo P, Li L, Mailman MD, Milgram AJ, Pearson DS, Roos DS, Schug J, Stoeckert CJ, Whetzel P: PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003, 31: 212-215. 10.1093/nar/gkg081.PubMedPubMed CentralView ArticleGoogle Scholar
- McConkey GA, Pinney JW, Westhead DR, Plueckhahn K, Fitzpatrick TB, Macheroux P, Kappes B: Annotating the Plasmodium genome and the enigma of the shikimate pathway. Trends Parasitol. 2004, 20: 60-65. 10.1016/j.pt.2003.11.001.PubMedView ArticleGoogle Scholar
- Pizzi E, Frontali C: Low-complexity regions in Plasmodium falciparum proteins. Genome Res. 2001, 11: 218-229. 10.1101/gr.GR-1522R.PubMedPubMed CentralView ArticleGoogle Scholar
- Chothia C, Lesk AM: The relation between the divergence of sequence and structure in proteins. EMBO J. 1986, 5: 823-826.PubMedPubMed CentralGoogle Scholar
- Gaboriaud C, Bissery V, Benchetrit T, Mornon JP: Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett. 1987, 224 (1): 149-155. 10.1016/0014-5793(87)80439-8.PubMedView ArticleGoogle Scholar
- Callebaut I, Labesse G, Durand P, Poupon A, Canard L, Chomilier J, Henrissat B, Mornon JP: Deciphering protein sequence information through hydrophobic cluster analysis (HCA):current status and perspectives. Cell Mol Life Sci. 1997, 53: 621-645. 10.1007/s000180050082.PubMedView ArticleGoogle Scholar
- Woodcock S, Mornon JP, Henrissat B: Detection of secondary structure elements in proteins by hydrophobic cluster analysis. Protein Eng. 1992, 5: 629-635.PubMedView ArticleGoogle Scholar
- Hennetin J, Le Tuan K, Canard L, Colloc'h N, Mornon JP, Callebaut I: Non-intertwined binary patterns of hydrophobic/nonhydrophobic amino acids are considerably better markers of regular secondary structures than nonconstrained patterns. Proteins. 2003, 51: 236-244. 10.1002/prot.10355.PubMedView ArticleGoogle Scholar
- Soutoglou E, Demeny MA, Scheer E, Fienga G, Sassone-Corsi P, Tora L: The nuclear import of TAF10 is regulated by one of its three histone fold domain-containing interaction partners. Mol Cell Biol. 2005, 25: 4092-4104. 10.1128/MCB.25.10.4092-4104.2005.PubMedPubMed CentralView ArticleGoogle Scholar
- Geiger JH, Hahn S, Lee S, Sigler PB: Crystal structure of the yeast TFIIA/TBP/DNA complex. Science. 1996, 272: 830-836.PubMedView ArticleGoogle Scholar
- Tan S, Hunziker Y, Sargent DF, Richmond TJ: Crystal structure of a yeast TFIIA/TBP/DNA complex. Nature. 1996, 381: 127-151. 10.1038/381127a0.PubMedView ArticleGoogle Scholar
- Bleichenbacher M, Tan S, Richmond TJ: Novel interactions between the components of human and yeast TFIIA/TBP/DNA complexes. J Mol Biol. 2003, 332: 783-793. 10.1016/S0022-2836(03)00887-8.PubMedView ArticleGoogle Scholar
- Upadhyaya AB, Lee SH, De Jong J: Identification of a general transcription factor TFIIAa/b homolog selectively expressed in testis. J Biol Chem. 1999, 274: 18040-18048. 10.1074/jbc.274.25.18040.PubMedView ArticleGoogle Scholar
- Gasch A, Hoffmann A, Horikoshi M, Roeder RG, Chua NH: Arabidopsis thaliana contains two genes for TFIID. Nature. 1990, 346: 390-394. 10.1038/346390a0.PubMedView ArticleGoogle Scholar
- Crowley TE, Hoey T, Liu JK, Jan YN, Jan LY, Tjian R: A new factor related to TATA-binding protein has highly restricted expression patterns in Drosophila. Nature. 1993, 361: 557-561. 10.1038/361557a0.PubMedView ArticleGoogle Scholar
- Wieczorek E, Brand M, Jacq X, Tora L: Function of TAF(II)-containing complex without TBP in transcription by RNA polymerase II. Nature. 1998, 393: 187-191. 10.1038/30283.PubMedView ArticleGoogle Scholar
- Chen B-S, Hampsey M: Transcription activation: unveiling the essential nature of TFIID. Current Biol. 2002, 12: R620-R622. 10.1016/S0960-9822(02)01134-X.View ArticleGoogle Scholar
- Gangloff Y-G, Romier C, Thuault S, Werten S, Davidson I: The histone fold is a key structural motif of transcription factor TFIID. Trends Biochem Sci. 2001, 26: 250-257. 10.1016/S0968-0004(00)01741-2.PubMedView ArticleGoogle Scholar
- Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ: Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature. 2002, 419: 512-519. 10.1038/nature01099.PubMedView ArticleGoogle Scholar
- Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, Deng M, Liu C, Widmer G, Tzipori S, Buck GA, Xu P, Bankier AT, Dear PH, Konfortov BA, Spriggs HF, Iyer L, Anantharaman V, Aravind L, Kapur V: Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science. 2004, 304: 441-445. 10.1126/science.1094786.PubMedView ArticleGoogle Scholar
- Mizzen CA, Yang XJ, Kokubo T, Brownell JE, Bannister AJ, Owen-Hughes T, Workman J, Wang L, Berger SL, Kouzarides T, Nakatani Y, Allis CD: The TAF(II)250 subunit of TFIID has histone acetyltransferase activity. Cell. 1996, 87: 1261-1270. 10.1016/S0092-8674(00)81821-8.PubMedView ArticleGoogle Scholar
- Matangkasombut O, Buratowski RM, Swilling NW, Buratowski S: Bromodomain factor 1 corresponds to a missing piece of yeast TFIID. Genes Dev. 2000, 14: 951-962.PubMedPubMed CentralGoogle Scholar
- Sullivan AS, Aravind L, Makalowska I, Baxevanis AD, Landsman D: The Histone Database: a comprehensive WWW resource for histones and histone fold-containing proteins. Nucleic Acids Res. 2000, 28: 320-322. 10.1093/nar/28.1.320. [http://research.nhgri.nih.gov/histones]PubMedPubMed CentralView ArticleGoogle Scholar
- Hoffmann A, Chiang CM, Oelgeschlager T, Xie X, Burley SK, Nakatani Y, Roeder RG: A histone octamer-like structure within TFIID. Nature. 1996, 380: 356-9. 10.1038/380356a0.PubMedView ArticleGoogle Scholar
- Albright SR, Tjian R: TAFs revisited: more data reveal new twists and confirm old ideas. Genes. 2000, 242: 1-13.Google Scholar
- Gangloff YG, Sanders SL, Romier C, Kirschner D, Weil PA, Tora L, Davidson I: Histone folds mediate selective heterodimerization of yeast TAF(II)25 with TFIID components yTAF(II)47 and yTAF(II)65 and with SAGA component ySPT7. Mol Cell Biol. 2001, 21: 1841-1853.PubMedPubMed CentralView ArticleGoogle Scholar
- Selleck W, Howley R, Fang Q, Podolny V, Fried MG, Buratowski S, Tan S: A histone fold TAF octamer within the yeast TFIID transcriptional coactivator. Nat Struct Biol. 2001, 8: 695-700. 10.1038/90408.PubMedView ArticleGoogle Scholar
- Luger K, Richmond TJ: The histone tails of the nucleosome. Curr Opin Genet Dev. 1998, 8: 140-146. 10.1016/S0959-437X(98)80134-2.PubMedView ArticleGoogle Scholar
- Werten S, Mitschler A, Romier C, Gangloff YG, Thuault S, Davidson I, Moras D: Crystal structure of a subcomplex of human transcription factor TFIID formed by TATA binding protein-associated factors hTAF4 (hTAF(II)135) and hTAF12 (hTAF(II)20). J Biol Chem. 2002, 277: 45502-45509. 10.1074/jbc.M206587200.PubMedView ArticleGoogle Scholar
- Leurent C, Sanders S, Ruhlmann C, Mallouh V, Weil PA, Kirschner DB, Tora L, Schultz P: Mapping histone fold TAFs within yeast TFIID. EMBO J. 2002, 21: 3424-3433. 10.1093/emboj/cdf342.PubMedPubMed CentralView ArticleGoogle Scholar
- Leurent C, Sanders SL, Demeny MA, Garbett KA, Ruhlmann C, Weil PA, Tora L, Schultz P: Mapping key functional sites within yeast TFIID. EMBO J. 2004, 23: 719-727. 10.1038/sj.emboj.7600111.PubMedPubMed CentralView ArticleGoogle Scholar
- Sanders SL, Klebanow ER, Weil PA: TAF25p, a non-histone-like subunit of TFIID and SAGA complexes, is essential for total mRNA gene transcription in vivo. J Biol Chem. 1999, 274: 18847-18850. 10.1074/jbc.274.27.18847.PubMedView ArticleGoogle Scholar
- Le Masson I, Yu DY, Jensen K, Chevalier A, Courbeyrette R, Boulard Y, Smith MM, Mann C: Yaf9, a novel NuA4 histone acetyltransferase subunit, is required for the cellular response to spindle stress in yeast. Mol Cell Biol. 2003, 23: 6086-6102. 10.1128/MCB.23.17.6086-6102.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohkuma Y, Sumimoto H, Hoffmann A, Shimasaki S, Horikoshi M, Roeder RG: Structural motifs and potential sigma homologies in the large subunit of human general transcription factor TFIIE. Nature. 1991, 354: 398-401. 10.1038/354398a0.PubMedView ArticleGoogle Scholar
- Feaver WJ, Henry NL, Bushnell DA, Sayre MH, Brickner JH, Gileadi O, Kornberg RD: Yeast TFIIE. Cloning, expression, and homology to vertebrate proteins. J Biol Chem. 1994, 269: 27549-27553.PubMedGoogle Scholar
- Maxon ME, Goodrich JA, Tjian R: Transcription factor IIE binds preferentially to RNA polymerase IIa and recruits TFIIH: a model for promoter clearance. Genes Dev. 1994, 8: 515-524.PubMedView ArticleGoogle Scholar
- Ohkuma Y, Hashimoto S, Wang CK, Horikoshi M, Roeder RG: Analysis of the role of TFIIE in basal transcription and TFIIH-mediated carboxy-terminal domain phosphorylation through structure-function studies of TFIIE-alpha. Mol Cell Biol. 1995, 15: 4856-4866.PubMedPubMed CentralView ArticleGoogle Scholar
- Meinhart A, Blobel J, Cramer P: An extended winged helix domain in general transcription factor E/IIE alpha. J Biol Chem. 2003, 278: 48267-48274. 10.1074/jbc.M307874200.PubMedView ArticleGoogle Scholar
- Okuda M, Tanaka A, Arai Y, Satoh M, Okamura H, Nagadoi A, Hanaoka F, Ohkuma Y, Nishimura Y: A novel zinc finger structure in the large subunit of human general transcription factor TFIIE. J Biol Chem. 2004, 279: 51395-51403. 10.1074/jbc.M404722200.PubMedView ArticleGoogle Scholar
- Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol. 2000, 299: 499-520. 10.1006/jmbi.2000.3741.PubMedView ArticleGoogle Scholar
- Kuldell NH, Buratowski S: Genetic analysis of the large subunit of yeast transcription factor IIE reveals two regions with distinct functions. Mol Cell Biol. 1997, 17: 5288-5298.PubMedPubMed CentralView ArticleGoogle Scholar
- Flores O, Ha I, Reinberg D: Factors involved in specific transcription by mammalian RNA polymerase II. Purification and subunit composition of transcription factor IIF. J Biol Chem. 1990, 265: 5629-5634.PubMedGoogle Scholar
- Woychik NA, Hampsey M: The RNA polymerase II machinery: structure illuminates function. Cell. 2002, 108: 453-463. 10.1016/S0092-8674(02)00646-3.PubMedView ArticleGoogle Scholar
- Fang SM, Burton ZF: RNA polymerase II-associated protein (RAP) 74 binds transcription factor (TF) IIB and blocks TFIIB-RAP30 binding. J Biol Chem. 1996, 271: 11703-11709. 10.1074/jbc.271.20.11703.PubMedView ArticleGoogle Scholar
- Groft CM, Uljon SN, Wang R, Werner MH: Structural homology between the Rap30 DNA-binding domain and linker histone H5: implications for preinitiation complex assembly. Proc Natl Acad Sci U S A. 1998, 95: 9117-9122. 10.1073/pnas.95.16.9117.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaiser F, Tan S, Richmond TJ: Novel dimerization fold of RAP30/RAP74 in human TFIIF at 1.7 Å resolution. J Mol Biol. 2000, 302: 1119-1127. 10.1006/jmbi.2000.4110.PubMedView ArticleGoogle Scholar
- Tan S, Garrett KP, Conaway RC, Conaway JW: Cryptic DNA-binding domain in the C terminus of RNA polymerase II general transcription factor RAP30. Proc Natl Acad Sci U S A. 1994, 91: 9808-9812.PubMedPubMed CentralView ArticleGoogle Scholar
- Kamada K, De Angelis J, Roeder RG, Burley SK: Crystal structure of the C-terminal domain of the RAP74 subunit of human transcription factor IIF. Proc Natl Acad Sci U S A. 2001, 98: 3115-3120. 10.1073/pnas.051631098.PubMedPubMed CentralView ArticleGoogle Scholar
- Schultz P, Fribourg S, Poterszman A, Mallouh V, Moras D, Egly JM: Molecular structure of TFIIH. Cell. 2000, 102: 599-607. 10.1016/S0092-8674(00)00082-9.PubMedView ArticleGoogle Scholar
- Chang WH, Kornberg RD: Electron crystal structure of the transcription factor and DNA repair complex, core TFIIH. Cell. 2000, 102: 609-613. 10.1016/S0092-8674(00)00083-0.PubMedView ArticleGoogle Scholar
- Zurita M, Merino C: The transcriptional complexity of the TFIIH complex. Trends Genet. 2003, 19: 578-584. 10.1016/j.tig.2003.08.005.PubMedView ArticleGoogle Scholar
- Ranish JA, Hahn S, Lu Y, Yi EC, Li XJ, Eng J, Aebersold R: Identification of TFB5, a new component of general transcription and DNA repair factor IIH. Nat Genet. 2004, 36: 707-713. 10.1038/ng1385.PubMedView ArticleGoogle Scholar
- Giglia-Mari G, Coin F, Ranish JA, Hoogstraten D, Theil A, Wijgers N, Jaspers NG, Raams A, Argentini M, van der Spek PJ, Botta E, Stefanini M, Egly JM, Aebersold R, Hoeijmakers JH, Vermeulen W: A new, tenth subunit of TFIIH is responsible for the DNA repair syndrome trichothiodystrophy group A. Nat Genet. 2004, 36: 714-719. 10.1038/ng1387.PubMedView ArticleGoogle Scholar
- Edwards MC, Wong C, Elledge SJ: Human cyclin K, a novel RNA polymerase II-associated cyclin possessing both carboxy-terminal domain kinase and Cdk-activating kinase activity. Mol Cell Biol. 1998, 18: 4291-4300.PubMedPubMed CentralView ArticleGoogle Scholar
- Doerks T, Huber S, Buchner E, Bork P: BSD: a novel domain in transcription factors and synapse-associated proteins. Trends Biochem Sci. 2002, 27: 168-170. 10.1016/S0968-0004(01)02042-4.PubMedView ArticleGoogle Scholar
- Gervais V, Lamour V, Jawhari A, Frindel F, Wasielewski E, Dubaele S, Egly JM, Thierry JC, Kieffer B, Poterszman A: TFIIH contains a PH domain involved in DNA nucleotide excision repair. Nat Struct Mol Biol. 2004, 11: 616-622. 10.1038/nsmb782.PubMedView ArticleGoogle Scholar
- Coulson RM, Ouzounis CA: The phylogenic diversity of eukaryotic transcription. Nucleic Acids Res. 2003, 31: 653-660. 10.1093/nar/gkg156.PubMedPubMed CentralView ArticleGoogle Scholar
- Bastien O, Lespinats S, Roy S, Metayer K, Fertil B, Codani JJ, Marechal E: Analysis of the compositional biases in Plasmodium falciparum genome and proteome using Arabidopsis thaliana as a reference. Gene. 2004, 336: 163-173. 10.1016/j.gene.2004.04.029.PubMedView ArticleGoogle Scholar
- Soyer A, Chomilier J, Mornon JP, Jullien R, Sadoc JF: Voronoi tessellation reveals the condensed matter character of folded proteins. Phys Rev Lett. 2000, 85: 3532-3535. 10.1103/PhysRevLett.85.3532.PubMedView ArticleGoogle Scholar
- Pintar A, Carugo O, Pongor S: Atom depth in protein structure and function. Trends Biochem Sci. 2003, 28: 593-597. 10.1016/j.tibs.2003.09.004.PubMedView ArticleGoogle Scholar
- Pintar A, Carugo O, Pongor S: Atom depth as a descriptor of the protein interior. Biophys J. 2003, 84: 2553-2561.PubMedPubMed CentralView ArticleGoogle Scholar
- Andel F, Ladurner AG, Inouye C, Tjian R, Nogales E: Three-dimensional structure of the human TFIID-IIA-IIB complex. Science. 1999, 286: 2153-2156. 10.1126/science.286.5447.2153.PubMedView ArticleGoogle Scholar
- Brand M, Leurent C, Mallouh V, Tora L, Schultz P: Three-dimensional structures of TAFII-containing complexes TFIID and TFTC. Science. 1999, 286: 2151-2153. 10.1126/science.286.5447.2151.PubMedView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped-BLAST and PSI-BLAST : a new generation of protein database search programs. Nucl Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Kissinger JC, Brunk BP, Crabtree J, Fraunholz MJ, Gajria B, Milgram AJ, Pearson DS, Schug J, Bahl A, Diskin S, Ginsburg H, Grant GR, Gupta D, Labo P, Li L, Mailman MD, McWeeney SK, Whetzel P, Stoeckert CJ, Roos DS: The Plasmodium genome database. Nature. 2002, 419: 490-492. 10.1038/419490a. [http://www.plasmodb.org]PubMedView ArticleGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-141. 10.1093/nar/gkh121.PubMedPubMed CentralView ArticleGoogle Scholar
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004, 32: D142-144. 10.1093/nar/gkh088.PubMedPubMed CentralView ArticleGoogle Scholar
- Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Marchler GH, Mullokandov M, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Yamashita RA, Yin JJ, Zhang D, Bryant SH: CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res. 2005, 33: D192-D196. 10.1093/nar/gki069.PubMedPubMed CentralView ArticleGoogle Scholar
- Callebaut I, Mornon JP: From BRCA1 to RAP1 : a widespread BRCT module closely associated with DNA repair. FEBS Letters. 1997, 400: 25-30. 10.1016/S0014-5793(96)01312-9.PubMedView ArticleGoogle Scholar
- Callebaut I, Mornon JP: OCRE: a novel domain made of imperfect, aromatic-rich octamer repeats. Bioinformatics. 2005, 21: 699-702. 10.1093/bioinformatics/bti065.PubMedView ArticleGoogle Scholar
- Callebaut I, Mornon JP: The V(D)J recombination activating protein RAG2 consists of a six-bladed propeller and a PHD fingerlike domain, as revealed by sequence analysis. Cell Mol Life Sci. 1998, 54: 880-891. 10.1007/s000180050216.PubMedView ArticleGoogle Scholar
- Girault JA, Labesse G, Mornon JP, Callebaut I: The N- termini of FAK and JAKs contains divergent band 4.1 domains. Trends Biochem Sci. 1999, 24: 54-57. 10.1016/S0968-0004(98)01331-0.PubMedView ArticleGoogle Scholar
- Henrissat B, Callebaut I, Fabrega S, Lehn P, Mornon JP, Davies G: Conserved catalytic machinery and the prediction of a common fold for several families of glycoside hydrolases. Proc Natl Acad Sci USA. 1995, 92: 7090-7094.PubMedPubMed CentralView ArticleGoogle Scholar
- Callebaut I, Moshous D, Mornon JP, de Villartay JP: Metallo-beta-lactamase fold within nucleic acids processing enzymes: the beta-CASP family. Nucleic Acids Res. 2002, 30: 3592-3601. 10.1093/nar/gkf470.PubMedPubMed CentralView ArticleGoogle Scholar
- Callebaut I, Curcio-Morelli C, Mornon JP, Gereben B, Buettner C, Huang S, Castro B, Fonseca TL, Harney JW, Larsen PR, Bianco AC: The iodothyronine selenodeiodinases are thioredoxin-fold family proteins containing a glycoside hydrolase-clan GH-A-like structure. J Biol Chem. 2003, 276: 36887-36896. 10.1074/jbc.M305725200.View ArticleGoogle Scholar
- Guermah M, Ge K, Chiang CM, Roeder RG: The TBN protein, which is essential for early embryonic mouse development, is an inducible TAFII implicated in adipogenesis. Mol Cell. 2003, 12: 991-1001. 10.1016/S1097-2765(03)00396-4.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.