cTFbase: a database for comparative genomics of transcription factors in cyanobacteria
© Wu et al; licensee BioMed Central Ltd. 2007
Received: 10 January 2007
Accepted: 18 April 2007
Published: 18 April 2007
Comprehensive identification and classification of the transcription factors (TFs) in a given genome is an important aspect in understanding transcriptional regulatory networks of a specific organism. Cyanobacteria are an ancient group of gram-negative bacteria with strong variation in genome size ranging from about 1.6 to 9.1 Mb and little is known about their TF repertoires. Therefore, we constructed the cTFbase database to classify and analyze all the putative TFs in cyanobacterial genomes, followed by genome-wide comparative analysis.
In the current release, cTFbase contains 1288 putative TFs identified from 21 fully sequenced cyanobacterial genomes. Through its user-friendly interactive interface, users can employ various criteria to retrieve all TF sequences and their detailed annotation information, including sequence features, domain architecture and sequence similarity against the linked databases. Furthermore, cTFbase provides phylogenetic trees of individual TF family, multiple sequence alignments of the DNA-binding domains and ortholog identification from any selected genomes. Comparative analysis revealed great variability of the TF sequences in cyanobacterial genomes. The high variance on the gene number and domain organization would be related to their diverse biological functions and their adaptation to various environmental conditions.
cTFbase provides a centralized warehouse for comparative analysis of putative TFs in cyanobacterial genomes. The availability of such an extensive database would be of great interest for the community of researchers working on TFs or transcriptional regulatory networks in cyanobacteria. cTFbase can be freely accessible at http://cegwz.com/ and will be continuously updated when the newly sequenced cyanobacterial genomes are available.
Deciphering and reconstructing gene transcriptional regulatory networks are important for better understanding the fundamental cellular processes, such as cell division, growth control and gene expression by which cells can adapt to environment more effectively . The most basic components of transcriptional regulatory networks are the transcription factors (TFs), TF binding sites at upstream and downstream of the target genes and the target genes. Among them, TFs play the crucial role by enhancing or inhibiting the target gene expression by means of binding to the promoter sequences. Studies on TFs are of extreme significance and can glean more information about the mechanism of transcriptional regulatory networks. Genome-wide analysis of completely sequenced genomes have revealed that TFs account for a large proportion of all encoded proteins [2–4]. Escherichia coli was one of the best-studied organisms and was revealed to have more than 271 TFs in its whole genome. Despite the divergent domain organizations and sequence identities, the TFs characterized to share a significant degree of structural similarity of the DNA-binding domain (DBD), which binds to the specific DNA region . TFs can be classified into several families based on structure difference of DBD which include helix-turn-helix motif, Zinc fingers, Leucine zippers, and Basic-helix-loop-helix, etc. Helix-turn-helix motif is the most common structure of DBD in prokaryotes [3, 4, 6]. Cyanobacteria are an ancient group of gram-negative bacteria, which exhibit extraordinary diversity in physiological properties, ecological niches and morphology . They survive in different environments, such as fresh and marine waters and extreme conditions. Different member of cyanobacteria shows a remarkable size variation ranging from about 1.6 to 9.1 Mb. For example, Prochlorococcus sp. 1986 is only 1.75 Mb in genome size which is supposed to be one of the smallest genomes and most compact oxyphototrophic organisms discovered to date . Nostoc punctiforme has a large genome size and complicated ecological niches which may suggest a relatively sophisticated organization of this species . Currently, 21 cyanobacterial genomes have been fully sequenced, representing a wide range of species from unicellular to filamentous ones. In addition, more than 20 members of cyanobacteria are in the process of in-finishing or being sequenced [10, 11]. Such data resources undoubtedly provide an opportunity for genome-wide analysis of TFs in cyanobacteria.
To date, only a few numbers of TFs in cyanobacteria have been studied in detail. However, those studied provide useful insights into the crucial roles of TFs in biology, and further into their functions. NtcA, one of the extensively studied TF that mediates global nitrogen control and regulates many genes involved in nitrogen assimilation, was identified in all cyanobacteria . FUR could regulate iron assimilation and storage, and modulate the expression of genes involved in the response to different environmental stresses . NtcB, another member of TFs, was identified to activate nitrate assimilation . In order to make all the putative TFs in cyanobacteria available to the scientific community and fill in the gap without online databases, we present cTFbase, which will be a valuable resource for further research of TFs and transcriptional regulatory networks in cyanobacteria. Whole genome comparative analysis revealed great variability of the TFs in cyanobacterial genomes. The high variance on the gene number and domain organization would be related to their diverse biological functions and their adaptation to various environmental conditions.
Construction and content
Collection of TFs
Implementation and web interface
The popular MySQL backend was used as the database machine to store the results. Web interfaces for database browsing and the browse result pages were developed using PHP scripts. Through the cTFbase web system, the following main functionalities are implemented in the current release:
(1) To browse domain architectures of TFs. The repertoire of TFs can be browsed and the domain architectures can be viewed from a selected genome. It can also show the special TF families and their related domain architectures in one or all the cyanobacterial genomes;
(2) To identify the orthologs from the selected genomes. The results of orthology relationships from multiple species was previously performed using the program OrthoMCL ;
(3) To browse the phylogeny of individual TF family. Using the neighbor joining method, the phylogenetic tree for each of the TF families was constructed based on the whole TF sequences. The reasons for using the whole sequences of TFs instead of their DBD motifs to perform phylogenetic analysis was that the sequence of DBD motif in specific family is well conserved and the length is quite short, which may lead to very few deep nodes supported by high bootstrap values;
(4) To search the database via protein ID, species or family;
(5) To perform a BLAST-based sequence similarity search. Users can search their target sequences or identify homologs in the database;
(6) To perform sequence alignment. The sequence alignment tool, MUSCLE , was implemented to enable users to align amino acid sequences of DBDs within the specific families;
(7) To link some useful references, including literatures and databases;
(8) To download one (or all) specific TF sequence(s), including proteins and/or their corresponding DNA sequences, and phylogenetic trees in phylip format.
Furthermore, each entry provides the sequence itself and detailed annotations, including basic information, domain architecture assigned by Pfam database (version 21.0) and SUPERFAMILY database (release 1.69) and sequence similarity against major databases (PDB collected by 11-March-2007, Swiss-prot release 52.0, Refseq release 22 and DBD version 2.0). Through links to inner section and other databases, it would be a platform in which information on putative TFs in cyanobacteria has highly integrated and will be a centralized warehouse for the comparative genomic analysis. We expect that this database will help to further understand the transcriptional regulatory networks of microbiology.
Utility and discussion
Comparison of TFs among different species of cyanobacteria
Furthermore, we found that there were 12 putative TF families were present in all cyanobacterial genomes. Among them, four families (BolA, DUF387, SfsA and DnaA) have nearly the same gene copies over the genomes, which highlight the fundamental importance of these families. They are presumably very ancient families shared by the most recent common ancestor of cyanobacteria and may have not undergone lineage-specific expansions/loss or horizontal gene transfer. The remaining eight families (OmpR, GerE, Crp, LysR, arsR, FUR, GntR, Bac_DNA_binding) exhibit different distribution patterns among various species. However, we found that a variety of orthologous TFs in these families formed monophyletic clades, which were strongly supported by their high bootstrap values (nearly 100% in several clades) of the constructed phylogenetic trees mentioned above (phylogenetic trees could be queried on the website). Within the families of FUR, Crp, LysR and GntR, only one such branch is found, whereas two branches are observed in GerE and OmpR phylogenies, respectively. ArsR and Bac_DNA_binding phylogenies, however, do not have any such branches. Previously, Brune et al.  and Moreno-Campuzano et al.  identified the conserved TFs among Corynebacterium and Firmicutes, respectively. Here, we for the first time defined a minimal core of conserved TFs in cyanobacteria: the putative TFs in these nine branches plus these four TF families mentioned above (BolA, DUF387, SfsA and DnaA). These "universal" putative TFs mediate the functions of response regulators of two-component systems (OmpR, GerE), global nitrogen control (Crp), cell-cycle regulation (BolA), sugar fermentation (SfsA), chromosomal replication initiating and regulating (DnaA), general metabolism (GntR), chromosome condensation and segregation (DUF387), iron homeostasis control (FUR), CO2 fixation (LysR) and so on. As the physiological function of most TFs in cyanobacteria is still unknown, this identified core set of conserved TFs might thus provide some guidance for further investigations.
Currently, the cTFbase is limited to 21 cyanobacterial genomes retrieved from IMG database and works as a centralized warehouse for the comparative genomic analysis of putative TFs in cyanobacteria. Without regular update, however, the database would quickly lose its advantages. Therefore, we prepared to update its data on a regular basis and our update policy is mainly based on following three cases. First, the repertoire of TFs will be identified and integrated into the database when newly completed or draft cyanobacterial genomes are available. Second, novel TFs verified by experiments will be also added into the cTFbase. We encourage users to submit new TFs to our database through the interactive web interface. Third, cTFbase will be updated periodically according to main databases, such as the Pfam and SUPERFAMILY database. Any questions, comments and suggestions will be welcome, which will be a useful feedback for future updating.
Availability and requirements
cTFbase: a database for comparative genomics of transcription factors in cyanobacteria
Project home page
For user: Standard WWW browser (Safari, Mozilla and Internet Explorer);
For server: Linux
PHP, SQL, Perl and Bioperl
Any restrictions to use by non-academics
We are grateful to Dr. Juyuan Zhang from Huazhong Agricultural University and Mr. Rusty Childers from Wenzhou Medical College for their checking the writing of our manuscript. We are indebted to institute of biomedical informatics and Zhejiang provincial key laboratory of medical genetics (Wenzhou Medical College, China). This work was supported by the National Natural Science Foundation of China (30571009).
- Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA: Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol. 2004, 14 (3): 283-291. 10.1016/j.sbi.2004.05.004.PubMedView ArticleGoogle Scholar
- Brune I, Brinkrolf K, Kalinowski J, Puhler A, Tauch A: The individual and common repertoire of DNA-binding transcriptional regulators of Corynebacterium glutamicum, Corynebacterium efficiens, Corynebacterium diphtheriae and Corynebacterium jeikeium deduced from the complete genome sequences. BMC Genomics. 2005, 6 (1): 86-10.1186/1471-2164-6-86.PubMed CentralPubMedView ArticleGoogle Scholar
- Minezaki Y, Homma K, Nishikawa K: Genome-wide survey of transcription factors in prokaryotes reveals many bacteria-specific families not found in archaea. DNA Res. 2005, 12 (5): 269-280. 10.1093/dnares/dsi016.PubMedView ArticleGoogle Scholar
- Perez-Rueda E, Collado-Vides J, Segovia L: Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput Biol Chem. 2004, 28 (5-6): 341-350. 10.1016/j.compbiolchem.2004.09.004.PubMedView ArticleGoogle Scholar
- Aravind L, Anantharaman V, Balaji S, Babu MM, Iyer LM: The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev. 2005, 29 (2): 231-262. 10.1016/j.femsre.2004.12.008.PubMedView ArticleGoogle Scholar
- Huffman JL, Brennan RG: Prokaryotic transcription regulators: more than just the helix-turn-helix motif. Curr Opin Struct Biol. 2002, 12 (1): 98-106. 10.1016/S0959-440X(02)00295-6.PubMedView ArticleGoogle Scholar
- Stanier RY, Cohen-Bazire G: Phototrophic prokaryotes: the cyanobacteria. Annu Rev Microbiol. 1977, 31: 225-274. 10.1146/annurev.mi.31.100177.001301.PubMedView ArticleGoogle Scholar
- Dufresne A, Salanoubat M, Partensky F, Artiguenave F, Axmann IM, Barbe V, Duprat S, Galperin MY, Koonin EV, Le Gall F, Makarova KS, Ostrowski M, Oztas S, Robert C, Rogozin IB, Scanlan DJ, Tandeau de Marsac N, Weissenbach J, Wincker P, Wolf YI, Hess WR: Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci U S A. 2003, 100 (17): 10020-10025. 10.1073/pnas.1733211100.PubMed CentralPubMedView ArticleGoogle Scholar
- Meeks JC, Elhai J, Thiel T, Potts M, Larimer F, Lamerdin J, Predki P, Atlas R: An overview of the genome of Nostoc punctiforme, a multicellular, symbiotic cyanobacterium. Photosynth Res. 2001, 70 (1): 85-106. 10.1023/A:1013840025518.PubMedView ArticleGoogle Scholar
- NCBI . [http://www.ncbi.nlm.nih.gov/]
- IMG database . [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi]
- Herrero A, Muro-Pastor AM, Flores E: Nitrogen control in cyanobacteria. J Bacteriol. 2001, 183 (2): 411-425. 10.1128/JB.183.2.411-425.2001.PubMed CentralPubMedView ArticleGoogle Scholar
- Ghassemian M, Straus NA: Fur regulates the expression of iron-stress genes in the cyanobacterium Synechococcus sp. strain PCC 7942. Microbiology. 1996, 142 ( Pt 6): 1469-1476.View ArticleGoogle Scholar
- Aichi M, Takatani N, Omata T: Role of NtcB in activation of nitrate assimilation genes in the cyanobacterium Synechocystis sp. strain PCC 6803. J Bacteriol. 2001, 183 (20): 5840-5847. 10.1128/JB.183.20.5840-5847.2001.PubMed CentralPubMedView ArticleGoogle Scholar
- Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001, 313 (4): 903-919. 10.1006/jmbi.2001.5080.PubMedView ArticleGoogle Scholar
- Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A: Pfam: clans, web tools and services. Nucleic Acids Res. 2006, 34 (Database issue): D247-51. 10.1093/nar/gkj149.PubMed CentralPubMedView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006, 34 (Database issue): D187-91. 10.1093/nar/gkj161.PubMed CentralPubMedView ArticleGoogle Scholar
- Dodd IB, Egan JB: Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990, 18 (17): 5019-5026. 10.1093/nar/18.17.5019.PubMed CentralPubMedView ArticleGoogle Scholar
- Kummerfeld SK, Teichmann SA: DBD: a transcription factor prediction database. Nucleic Acids Res. 2006, 34 (Database issue): D74-81. 10.1093/nar/gkj131.PubMed CentralPubMedView ArticleGoogle Scholar
- Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13 (9): 2178-2189. 10.1101/gr.1224503.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralPubMedView ArticleGoogle Scholar
- Konstantinidis KT, Tiedje JM: Trends between gene content and genome size in prokaryotic species with larger genomes. Proc Natl Acad Sci U S A. 2004, 101 (9): 3160-3165. 10.1073/pnas.0308653100.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang CC, Jang J, Sakr S, Wang L: Protein phosphorylation on Ser, Thr and Tyr residues in cyanobacteria. J Mol Microbiol Biotechnol. 2005, 9 (3-4): 154-166. 10.1159/000089644.PubMedView ArticleGoogle Scholar
- Zhao F, Zhang X, Liang C, Wu J, Bao Q, Qin S: Genome-wide analysis of restriction-modification system in unicellular and filamentous cyanobacteria. Physiol Genomics. 2006, 24 (3): 181-190.PubMedView ArticleGoogle Scholar
- Bryant DA: The beauty in small things revealed. Proc Natl Acad Sci U S A. 2003, 100 (17): 9647-9649. 10.1073/pnas.1834558100.PubMed CentralPubMedView ArticleGoogle Scholar
- Dufresne A, Garczarek L, Partensky F: Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 2005, 6 (2): R14-10.1186/gb-2005-6-2-r14.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhao F, Qin S: Comparative molecular population genetics of phycoerythrin locus in Prochlorococcus. Genetica. 2007, 129 (3): 291-299. 10.1007/s10709-006-0010-9.PubMedView ArticleGoogle Scholar
- Galperin MY: Structural classification of bacterial response regulators: diversity of output domains and domain combinations. J Bacteriol. 2006, 188 (12): 4169-4182. 10.1128/JB.01887-05.PubMed CentralPubMedView ArticleGoogle Scholar
- Ashby MK, Houmard J: Cyanobacterial two-component proteins: structure, diversity, distribution, and evolution. Microbiol Mol Biol Rev. 2006, 70 (2): 472-509. 10.1128/MMBR.00046-05.PubMed CentralPubMedView ArticleGoogle Scholar
- Moreno-Campuzano S, Janga SC, Perez-Rueda E: Identification and analysis of DNA-binding transcription factors in Bacillus subtilis and other Firmicutes--a genomic approach. BMC Genomics. 2006, 7: 147-10.1186/1471-2164-7-147.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.