P2CS: a two-component system resource for prokaryotic signal transduction research
- Mohamed Barakat†1Email author,
- Philippe Ortet†1,
- Cécile Jourlin-Castelli2,
- Mireille Ansaldi2,
- Vincent Méjean2 and
- David E Whitworth3
© Barakat et al; licensee BioMed Central Ltd. 2009
Received: 27 January 2009
Accepted: 15 July 2009
Published: 15 July 2009
With the escalation of high throughput prokaryotic genome sequencing, there is an ever-increasing need for databases that characterise, catalogue and present data relating to particular gene sets and genomes/metagenomes. Two-component system (TCS) signal transduction pathways are the dominant mechanisms by which micro-organisms sense and respond to external as well as internal environmental changes. These systems respond to a wide range of stimuli by triggering diverse physiological adjustments, including alterations in gene expression, enzymatic reactions, or protein-protein interactions.
We present P2CS (Prokaryotic 2-Component Systems), an integrated and comprehensive database of TCS signal transduction proteins, which contains a compilation of the TCS genes within 755 completely sequenced prokaryotic genomes and 39 metagenomes. P2CS provides detailed annotation of each TCS gene including family classification, sequence features, functional domains, as well as genomic context visualization. To bypass the generic problem of gene underestimation during genome annotation, we also constituted and searched an ORFeome, which improves the recovery of TCS proteins compared to searches on the equivalent proteomes.
P2CS has been developed for computational analysis of the modular TCSs of prokaryotic genomes and metagenomes. It provides a complete overview of information on TCSs, including predicted candidate proteins and probable proteins, which need further curation/validation. The database can be browsed and queried with a user-friendly web interface at http://www.p2cs.org/.
His-Asp phosphorelays, or two-component system (TCS) signal transduction pathways, are found across all three domains of life, allowing adaptive responses to changes in environmental conditions. However, they are mainly found in bacteria where they control diverse aspects of bacterial metabolism, such as cell differentiation, morphogenesis, central metabolism, motility, biofilm formation and virulence. These systems were classically described as the association of two proteins that communicate through a His-Asp phosphorelay . A typical TCS comprises a histidine kinase (HK) sensor protein, which is capable of autophosphorylation on a conserved His residue, before transferring the phosphoryl group to a conserved Asp residue within the receiver domain (REC) of a response regulator (RR). This 2-step phosphorelay constitutes the basic scheme of TCS signalling. More complex systems utilise a 4-step phosphorelay (His1-Asp1-His2-Asp2) that is made possible by the addition of 2 intermediate phosphorylation domains: a second receiver domain homologous to that of RRs, and a phospho-His domain, called HPT (for Histidine Phosphotransfer), which is the phosphodonor for the terminal RR protein of the pathway. In 4-step phosphorelays, the phosphoacceptor domains can be found distributed across 2 to 4 individual proteins .
The large number of TCS protein sequences available demands user-friendly databases to facilitate inter-genomic and intra-genomic analyses. Currently, databases describing prokaryotic TCSs contain data from only a subset of available genomes (eg. SENTRA, Genome Atlas) [3, 4], or analyze TCSs on the basis of predicted proteins (eg. MiST) . However, valuable data are often overlooked by protein prediction tools, and, to our knowledge, no database is currently available for performing analysis of metagenomic TCSs.
We have therefore developed a novel resource, the P2CS database, which contains the TCSs of all available bacterial and archaeal genomes, and 39 microbial metagenomes. Our objective was to provide an easy to use environment for exploitation by users, with the data being completely available and consultable by all of the scientific community.
Construction and content
The identification of TCS candidates was accomplished by domain analysis of each predicted protein. The pool of domains used to search for TCS proteins was manually selected from the literature [4, 5, 8–11] and extracted from within the Pfam and SMART libraries. All the data are stored in the P2CS database, accessible via our web interface, GenoBrowser. The P2CS pipeline was developed to search the numerous combinations of TCS modules and to categorize TCS proteins into families based on similarity and/or domain architecture. Our process identifies a subset of TCSs as 'probable incomplete TCS proteins'. This is the case when a HATPase (HK ATPase) domain is identified in a protein, without the typical N-terminal His-containing phosphoacceptor site (HisKA domain). These incomplete HKs (IHKs) were then analysed for the presence of a probable site of phosphorylation (H-box). First, for all the predicted HKs belonging to each replicon, we calculated the maximal and minimal distances separating the N-terminal and the C-terminal extremities of, respectively, identified HisKA domains and HATPase domains. This interval was then used to search for putative phospho-accepting His-residues in the IHKs. Secondly, searches for the site of phosphorylation were undertaken by constructing alignments of the 10 amino acid residues surrounding the conserved histidine from all predicted HKs in each replicon, and using the alignment to create a profile. We compared each predicted HK to the profile and calculated a score, which allowed the definition of a minimal value (min score). The same process was repeated for each IHK and in the presence of a putative H-box (below the min score), an IHK was reclassified as a HK. As in other studies [3, 5, 12], proteins belonging to the GyrB, MutL or HtpG family were excluded.
Enzyme list implicated in TCS signal transduction
Present In TCS
Mitogen-activated protein kinase kinase kinase
Calcium/calmodulin-dependent protein kinase
Glutamate dehydrogenase (NAD(P)(+))
All the identified enzymes that did not match with the selected enzyme list were discarded. For instance, in Burkholderia phymatum STM815, the protein Bphy_5251 contains a REC domain and is annotated as a response regulator in GenBank. This protein seems, however, to be more probably involved in an ABC-type transport system.
Finally, the cellular localization of each TCS protein was determined by the presence or absence of transmembrane (TM) segments, using the HMMTOP predictor .
The current version of P2CS contains 53233 TCS proteins, which comprises 22376 HKs (Figure 3a), 26892 RRs (Figure 3b) and 1900 phosphotransfer proteins. The ORFeome search allows retrieval of a further 774 supplemental TCS proteins from completely sequenced genomes.
Performance test of P2CS
Manually defined TCS proteins
P2CS Predicted TCS proteins
Anaeromyxobacter dehalogenans 2CP-C
Bacillus anthracis str. Ames
Bacillus anthracis str. Sterne
Bacillus cereus ATCC 14579
Bacillus cereus ATCC 10987
Bacillus cereus E33L
Bacillus thuringiensis serovar konkukian str. 97-27
Escherichia coli str. K-12 substr. MG1655
Myxococcus xanthus DK1622
Nitrosospira multiformis ATCC 25196 chromosome 1
Pseudomonas syringae pv. syringae B728a
Pseudomonas syringae pv. tomato str. DC3000
Pseudomonas syringae pv. phaseolicola 1448A
Sorangium cellulosum So ce56
Streptomyces coelicolor A3(2)
Xanthomonas campestris pv. campestris ATCC 33913
X. campestris pv. campestris 8004
X. axonopodis pv. citri 306
X. campestris pv. vesicatoria 85-10
X. oryzae pv. oryzae KACC10331
X. oryzae pv. oryzae MAFF 311018
To facilitate the visualization of metagenomes, the different contigs of each metagenome were joined and rebuilt into an artificial and linear chromosome. The new positions of genes were then recomputed to allow exploration of genomic contexts. The transition between two different contigs is announced to avoid misinterpretation of gene organisation.
IHKs in all genomes and metagenomes were not eliminated during the screening process and can therefore be curated by users. Additionally, mis-predicted TCS proteins can be visualized as the longest possible coding sequence, so that users can also make their own start assignments.
Utility and discussion
P2CS provides an integrated environment for exploration, visualization and annotation of TCS proteins from all available bacterial genomes and metagenomes via . The P2CS homepage contains a navigation bar that allows database browsing. Among the menus, users will also find P2CS Browse, which links directly to sortable lists of analysed genomes, plasmids and metagenomes. The selection of a microbe or a microbiome displays the result of the P2CS analysis process. It shows global counts of the different categories of TCSs and detailed class counts of each category. Each class result provides a clickable link to a detailed gene list. Selecting an object from the list of identifiers, displays a detailed gene description page with an image representing the gene in its genomic context, in the appropriate frame. Blast searches can be performed with the gene using external links, against the NCBI protein database or the annotated databases Swiss-Prot/TrEMBL. To obtain detailed information on a given gene, the software provides database links to investigate structural and functional domains of the corresponding protein sequence using the Conserved Domain Search service , the Simple Modular Architecture Research Tool  and the TMHMM transmembrane topology prediction method . The presence and location of signal peptide cleavage sites in amino acid sequences can also be checked using SIG-Pred .
A second menu, P2CS Search, provides several search modes that allow users to request genes on the basis of their locus-tag, domain possession or TCS class. The search module builds search output as a tabular view that is linked to a full description and genomic context for each selected gene. The gene description page is the core exploration tool, providing several analysis options as described above. Analysis of each gene can be performed and users have the ability to display and propose the modification of any gene.
P2CS was designed to allow download of TCS data in tab-delimited format and generates a file compatible with spreadsheet programs such as Excel. For each genome and metagenome, users can also download the flat format files used for the construction of the database.
P2CS has been developed for computational analysis of the modular TCSs of prokaryotic genomes and metagenomes. It provides a complete overview of information on TCSs, including predicted candidate proteins and probable proteins, which need further curation/validation. The analysis process recovers each protein presenting N-terminal HisKA or C-terminal HATPase domains and classifies them as probable incomplete HK. The status can be changed through the manual curation process.
Users can modify annotation parameters and append comments, which are made available for consultation by other users. To ensure the integrity of the database, we propose to the interested experts to download formatted data and then after manual curation, the same downloaded files can be used as exchange format for an update of the database by the P2CS team.
One of the most important features of P2CS is the ability to search for TCSs within an ORFeome. One common problem of prokaryotic genome annotation is the accuracy of gene prediction and the loss of valuable data as a consequence of underestimation of the number of predicted genes. A blatant example is the genome of M. magneticum AMB-1 , with 23 overlooked TCS genes. A possible explanation for the high number of missing TCS genes is the GC richness of this genome (65%), which may constitute a complication in the gene prediction process.
Biologists with little or no computing background, have an increasing need for fast and intuitively usable tools, which is why P2CS has been developed as an interactive system for editing and viewing TCS information.
The current P2CS database contains information on over 53000 predicted TCS proteins. The pipeline used to predict TCSs begins with a domain annotation of all proteins from completely sequenced genomes and metagenomes, searches the numerous combinations of TCS modules and classifies TCS proteins. The P2CS database analyses TCSs in both predicted proteomes and reconstituted ORFeomes. This last process is a unique feature of our system, which allows the recovery of nearly 2% more TCS proteins.
Databases devoted to TCSs are major resources for the signal transduction research community but these currently do not include metagenomic data. P2CS is the first database of its kind that provides metagenomic TCS information, with nearly 17% of identified TCS proteins originating from metagenomes.
P2CS is an open resource for biologists and results are presented for user exploration as an interactive web interface. Currently, our database contains a high quality automatic analysis of the TCSs of all currently available genomes and metagenomes, including 8 bacterial genomes (Escherichia coli, Bacillus subtilis, Shewanella oneidensis and 5 myxobacterial strains) curated manually by our team.
Availability and requirements
P2CS is publicly available at . It runs on most web browsers, including Mozilla Firefox, Safari and Internet Explorer.
We are grateful to DSV/IBITEC-S/GIPSI team and particularly Arnaud Martel for the hosting server installation.
- Kofoid EC, Parkinson JS: Transmitter and receiver modules in bacterial signaling proteins. Proc Natl Acad Sci USA. 1988, 85 (14): 4981-4985. 10.1073/pnas.85.14.4981.PubMed CentralView ArticlePubMed
- Hoch JA: Two-component and phosphorelay signal transduction. Curr Opin Microbiol. 2000, 3 (2): 165-170. 10.1016/S1369-5274(00)00070-9.View ArticlePubMed
- D'Souza M, Glass EM, Syed MH, Zhang Y, Rodriguez A, Maltsev N, Galperin MY: Sentra: a database of signal transduction proteins for comparative genome analysis. Nucleic Acids Res. 2007, D271-273. 10.1093/nar/gkl949. 35 Database
- Kiil K, Ferchaud JB, David C, Binnewies TT, Wu H, Sicheritz-Ponten T, Willenbrock H, Ussery DW: Genome update: distribution of two-component transduction systems in 250 bacterial genomes. Microbiology. 2005, 151 (Pt 11): 3447-3452. 10.1099/mic.0.28423-0.View ArticlePubMed
- Ulrich LE, Zhulin IB: MiST: a microbial signal transduction database. Nucleic Acids Res. 2007, D386-390. 10.1093/nar/gkl932. 35 Database
- National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov/]
- Integrated Microbial Genomes with Microbiome Samples. [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi]
- Maltsev N, Marland E, Yu GX, Bhatnagar S, Lusk R: Sentra, a database of signal transduction proteins. Nucleic Acids Res. 2002, 30 (1): 349-350. 10.1093/nar/30.1.349.PubMed CentralView ArticlePubMed
- Ulrich LE, Koonin EV, Zhulin IB: One-component systems dominate signal transduction in prokaryotes. Trends Microbiol. 2005, 13 (2): 52-56. 10.1016/j.tim.2004.12.006.PubMed CentralView ArticlePubMed
- Galperin MY: Structural classification of bacterial response regulators: diversity of output domains and domain combinations. J Bacteriol. 2006, 188 (12): 4169-4182. 10.1128/JB.01887-05.PubMed CentralView ArticlePubMed
- Lavin JL, Kiil K, Resano O, Ussery DW, Oguiza JA: Comparative genomic analysis of two-component regulatory proteins in Pseudomonas syringae. BMC Genomics. 2007, 8 (1): 397-10.1186/1471-2164-8-397.PubMed CentralView ArticlePubMed
- Galperin MY, Nikolskaya AN: Identification of sensory and signal-transducing domains in two-component signaling systems. Methods Enzymol. 2007, 422: 47-74. 10.1016/S0076-6879(06)22003-2.PubMed CentralView ArticlePubMed
- Claudel-Renard C, Chevalet C, Faraut T, Kahn D: Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 2003, 31 (22): 6633-6639. 10.1093/nar/gkg847.PubMed CentralView ArticlePubMed
- Tusnady GE, Simon I: The HMMTOP transmembrane topology prediction server. Bioinformatics. 2001, 17 (9): 849-850. 10.1093/bioinformatics/17.9.849.View ArticlePubMed
- Whitworth DE, Cock PJ: Two-component systems of the myxobacteria: structure, diversity and evolutionary relationships. Microbiology. 2008, 154 (Pt 2): 360-372. 10.1099/mic.0.2007/013672-0.View ArticlePubMed
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006, D354-357. 10.1093/nar/gkj102. 34 Database
- Hutchings MI, Hoskisson PA, Chandra G, Buttner MJ: Sensing and responding to diverse extracellular signals? Analysis of the sensor kinases and response regulators of Streptomyces coelicolor A3(2). Microbiology. 2004, 150 (Pt 9): 2795-2806. 10.1099/mic.0.27181-0.View ArticlePubMed
- de Been M, Francke C, Moezelaar R, Abee T, Siezen RJ: Comparative analysis of two-component signal transduction systems of Bacillus cereus, Bacillus thuringiensis and Bacillus anthracis. Microbiology. 2006, 152 (Pt 10): 3035-3048. 10.1099/mic.0.29137-0.View ArticlePubMed
- Norton JM, Klotz MG, Stein LY, Arp DJ, Bottomley PJ, Chain PS, Hauser LJ, Land ML, Larimer FW, Shin MW, et al: Complete genome sequence of Nitrosospira multiformis, an ammonia-oxidizing bacterium from the soil environment. Appl Environ Microbiol. 2008, 74 (11): 3559-3572. 10.1128/AEM.02722-07.PubMed CentralView ArticlePubMed
- Qian W, Han Z-J, He C: Two-Component Signal Transduction Systems of Xanthomonas spp.: A Lesson from Genomics. Molecular Plant-Microbe Interactions. 2008, 21 (2): 151-161. 10.1094/MPMI-21-2-0151.View ArticlePubMed
- P2CS Database. [http://www.p2cs.org/]
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004, W327-331. 10.1093/nar/gkh454. 32 Web Server
- Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P: SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 2006, D257-260. 10.1093/nar/gkj079. 34 Database
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.View ArticlePubMed
- Signal Peptide Prediction. [http://bmbpcu36.leeds.ac.uk/prot_analysis/Signal.html]
- Matsunaga T, Okamura Y, Fukuda Y, Wahyudi AT, Murase Y, Takeyama H: Complete genome sequence of the facultative anaerobic magnetotactic bacterium Magnetospirillum sp. strain AMB-1. DNA Res. 2005, 12 (3): 157-166. 10.1093/dnares/dsi002.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.