PePPER: a webserver for prediction of prokaryote promoter elements and regulons
© de Jong et al.; licensee BioMed Central Ltd. 2012
Received: 11 July 2011
Accepted: 13 April 2012
Published: 2 July 2012
Skip to main content
© de Jong et al.; licensee BioMed Central Ltd. 2012
Received: 11 July 2011
Accepted: 13 April 2012
Published: 2 July 2012
Accurate prediction of DNA motifs that are targets of RNA polymerases, sigma factors and transcription factors (TFs) in prokaryotes is a difficult mission mainly due to as yet undiscovered features in DNA sequences or structures in promoter regions. Improved prediction and comparison algorithms are currently available for identifying transcription factor binding sites (TFBSs) and their accompanying TFs and regulon members.
We here extend the current databases of TFs, TFBSs and regulons with our knowledge on Lactococcus lactis and developed a webserver for prediction, mining and visualization of prokaryote promoter elements and regulons via a novel concept. This new approach includes an all-in-one method of data mining for TFs, TFBSs, promoters, and regulons for any bacterial genome via a user-friendly webserver. We demonstrate the power of this method by mining WalRK regulons in Lactococci and Streptococci and, vice versa, use L. lactis regulon data (CodY) to mine closely related species.
The PePPER webserver offers, besides the all-in-one analysis method, a toolbox for mining for regulons, promoters and TFBSs and accommodates a new L. lactis regulon database in addition to already existing regulon data. Identification of putative regulons and full annotation of intergenic regions in any bacterial genome on the basis of existing knowledge on a related organism can now be performed by biologists and it can be done for a wide range of regulons. On the basis of the PePPER output, biologist can design experiments to further verify the existence and extent of the proposed regulons. The PePPER webserver is freely accessible at http://pepper.molgenrug.nl.
As early as in 1960 the term operon was coined for a group of genes of which the expression is coordinated by an operator . Experimental methods like Electrophoretic Mobility Shift Assays (EMSA), Surface Plasmon Resonance (SPR), nuclease protection assays (DNAse-footprinting) and Chromatin Immuno Precipitation (ChIP) can all be used to demonstrate that an interaction exists between a transcription factor (TF) and DNA . Experimentally proven TFBSs have been described in literature and are available via publicly accessible databases such as DBTBS , RegulonDB , PRODORIC , MicrobesOnline , RegTransBase  and RegPrecise . Besides experimental proof for the existence of protein-DNA interaction, TFBS discovery algorithms have been developed to uncover conserved regions that might act as TFBSs (MEME , ARCS-Motif , GLAM2 , W-AlignACE , GIMSAN , RankMotif++ , GAME , and Tmod ). This so-called motif mining is based on a collection of genes having a certain correlation. Gene-to-gene correlations can be derived e.g., from transcriptome data or from functional relations like belonging to the same metabolic pathway or to certain COG or GO classes. Motif mining consists of a search for conserved DNA patterns in the upstream intergenic regions of the genes or the operons to which the gene(s) belong. A high probability (low p-value) that the occurrence of a certain DNA pattern is very specific for a gene set does not necessarily imply that this motif constitutes a TFBS but it is a good lead for biological functional analysis.
Genes and operons that are under control of the same TF are members of that TF’s regulon. Although methods for the prediction of regulons have been substantially improved , they are still far from perfect. Comparative genomics tools can be used to predict regulons in bacterial genomes but the procedure can lead to incorrect regulon calling. Despite this drawback, several regulon databases are available that are based on comparative genomics methods and lack experimental evidence. Probably the most extended and accurate databases of regulons are DBTBS for B. subtilis and RegulonDB for E. coli. The latest update of DBTBS brought the total number of B. subtilis TFs to 120, promoters to 1475 and regulated operons to 736, of which 463 operons have been experimentally validated . Together, RegulonDB and DBTBS are the major resources for regulon network mining dedicated to prokaryotes. PRODORIC and RegTransBase are the most extended and manually curated databases on gene regulation in prokaryotes in general . Besides regulon information they include TFBSs and bioinformatics tools for prediction, analysis and visualization of gene regulatory networks using ProdoNet  and furthermore, PRODORIC offers the tool “virtual footprint”, which can be used to mine for novel regulons. The in silico prediction of regulons is usually based on operons that share the same TFBS and the information is supplemented with the results from comparative genomics analysis of known regulons. This method is used in the recently launched webserver RegPrecise , which gives access to a database containing a collection of manually curated regulons grouped together by similar properties such as belonging to the same biological process or metabolic pathway. The database is limited to six closely related bacteria (Shewanella, Thermotogales Bacillales and Desulfovibrionales). On the other hand FITBAR  is dedicated to TFBS mining and discovery, RegAnalyst  and ProdoNet  are webservers enabling integration of data on proteomics and metabolic pathways and provide subsequent graphical representation of networks.
In this work, we designed and developed a novel tool, PePPER, to mine for regulons and TFBSs in any sequenced bacterial genome. As a showcase, we extended the existing regulon databases with a database of L. lactis regulons that is derived from literature on transcriptional regulation. The latter is accessible via the user-friendly PePPER web interface.
MolgenRegDB is an integrated in house collection of TFs, TFBSs and regulons of L. lactis and is available via the PePPER webserver (http://pepper.molgenrug.nl). In addition, TF and TFBS data were downloaded from RegulonDB (E. coli) and DBTBS (B. subtilis) and subsequently reformatted and integrated together with MolgenRegDB in the PePPER database. Data of all publically available bacterial genomes are daily updated from NCBI (http://www.ncbi.nlm.nih.gov) and available via the PePPER webserver.
Overrepresented DNA motifs are identified using MEME  and the position-specific probability matrices (PSPMs) obtained were converted to position weight matrices (PWMs) that are compatible with MOODS . BLAST 2.2  is used for protein comparisons. Glimmer3  is used for automated gene detection (open reading frame or ORF calling) and Ribosomal Binding Sites (RBSs) are detected using RBSfinder . In case of de novo ORF calling, the translation start is adapted to match the RBS prediction, otherwise the original annotation is used. TransTermHP  is implemented for the discovery of putative transcription terminators. Possible secondary RNA structures are predicted and plotted using RNAfold and RNAplot of the Vienna package . A new prokaryote promoter prediction tool was developed and is based on PWMs and Hidden Markov Models (HMMs) of −35 and −10 consensus sequences and various sigma factor binding sites. PWMs and HMMs of B. subtilis and E. coli promoters are used as reference for Gram-positive and Gram-negative bacteria, respectively. A collection of individual tools used by PePPER are accessible via the webserver.
A database of validated L. lactis TFBSs of regulons derived from literature data was made after which for each TFBS a PSPM was calculated using MEME and subsequently transposed to a MOODS compatible PWM format. To that end we used the upstream intergenic regions plus the first 20 bases of their genes as input for MEME in order to search for overrepresented DNA motifs. These motifs ranged in length from 6 to 18 bases and a database of all intergenic regions of L. lactis MG1363 was used as a background model. Subsequently, the overrepresented DNA motifs were manually compared to the literature data. Only those DNA motifs that resemble the experimentally verified TFBSs were included in the database, including the MOODS cutoff values. An overview of TFBSs of regulons, including WebLogos, is shown in Additional file 1: Table S 1; the database containing all the PSPM profiles is available via the PePPER webserver.
A powerful toolbox has been created in PePPER and is accessible via the PePPER webserver. By selecting a regulon on the basis of its TF and one or more genomes, the program will perform a Blast analysis between the proteins of the known regulon and all the proteins encoded by the genes in the selected genome(s). PePPER provides a clear overview in colors, of the predicted regulon(s), in each genome, which is based on the degree of protein homologies; detailed information is given in attached tables. More details about scoring and the color scheme are given on the PePPER webserver.
Regulators of which the regulons have been studied in Lactococcus lactis ssp. cremoris MG1363 and Lactococcus lactis spp. lactis IL1403 and their literature references. -, strain/subspecies not specified
Analysis of regulons
Comparison of the WalRK TCS of B. subtilis to the L. lactis orthologs using PePPER’s multiple genome regulon mining tool
A universal prokaryote transcription initiation DNA motif does not exist , but a common DNA pattern (the Pribnow box) 10 base pairs upstream of the transcription start site (TSS) and a conserved sequence 35 base pairs upstream of the TSS are overrepresented in promoter regions. These patterns are searched for separately, after which putative promoters are only taken into account if the spacing between their −35 and −10 motifs is 16 to 18 bases. Although many different sigma factors binding sites are known (especially from B. subtilis) these are not used in the promoter prediction routine used here; they are implemented as conserved DNA motifs in the TFBS mining tool. The resulting promoter prediction algorithm is universal for prokaryotes, but we do offer the possibility to discriminate between Gram-positive and Gram-negative bacteria to improve the accuracy of the prediction algorithm. Furthermore, “incomplete” promoters, in which only a −35 or a −10 sequence is predicted are also shown in the results.
PePPER (http://pepper.molgenrug.nl) can be accessed through a user-friendly web interface for querying and browsing. The server runs on a linux platform (Ubuntu server LTS 10.04) with an Apache webserver (version 2.2) and a MySQL server (version 5.1) and Blast 2.2. Programming was done using PHP 5.0, Perl 5.12 and BioPerl 1.8. A combination of Joomla and jQuery 1.4 was used to build the user-friendly web interface.
Each of the 154 known or predicted TFs of L. lactis subsp. cremoris MG1363  will probably regulate the transcription of one or more genes or operons. The functionality of 32 TFs of L. lactis MG1363 and L. lactis subsp. lactis IL1403 has been reported in literature, using techniques ranging from DNA microarray analysis to DNA footprinting. Although the two lactococcal subspecies are closely related, not each regulator or regulon of one is present or similar in the other. The majority of the TFs in MG1363 and IL1403 show a high degree of mutual similarity. Of the 154 TFs in L. lactis MG1363, 22 are not present in L. lactis IL1403 while 20 out of the 143 TFs identified in L. lactis IL1403 are not found in MG1363 (Tables 2 and Table 3). Analysis performed by PePPER showed that large regulons (those of CodY, CcpA, CmbR, CesSR, ArgR, and PurR) as well as some small regulons (those of RcfB, ZirR, BusR and LmrR) are well conserved in the two strains. The conservation of regulons between the closely related subspecies is illustrated by the CmbR regulon of cysteine and methionine biosynthesis, which has been studied in detail in both L. lactis IL1403  and L. lactis MG1363 . Analysis of both CmbR regulons shows that 16 out of 17 proteins in the IL1403 CmbR regulon have high similarity to MG1363 proteins (data not shown). Finally, all known TFs and TFBSs of L. lactis were collected in one database, the MolgenRegDB. This is currently the most comprehensive manually curated regulon database of L. lactis; it is available via the PePPER webserver (http://pepper.molgenrug.nl).
The B. subtilis operon walRKyycHIJK is a 6-cistron operon encoding among others the two-component system (TCS) WalRK that controls the expression of 23 genes. These genes represent the WalR regulon [61–64]. This signal transduction pathway is crucial for the regulation of cell wall metabolism and is one the few TCS known to be a virulence factor in S. pneumoniae. The presence of the WalR regulon has never been described in L. lactis. We validated PePPER by comparing its results to literature data and subsequently used it to unravel the putative WalR regulons and cognate TFBS in 4 sequenced strains of L. lactis. PePPER showed that the products of 4 of the genes of the walRKyycHIJK of B. subtilis are orthologous to kinC, llrC and vicX htrA of L. lactis MG1363 (see Table 3). Furthermore, PePPER showed that 13 out of the 23 proteins of the WalR regulon of B. subtilis show high similarity (Blast e-value <10−20) to proteins in L. lactis MG1363; they are organized in 6 operons (Table 2). Using PePPER’s multiple genome mining tool, it is clear that orthologs of the WalRK TCS and part of the WalR regulon genes of B. subtilis are present in all other fully sequenced L. lactis strains: IL1403, SK11 and KF147 (Table 3).
We used the CodY[MG1363] regulon to screen for the presence of a similar regulon in a less closely related Gram-positive bacterium, the pathogen S. pneumoniae D39. The analysis revealed that seven genes/operons (ilvD, ilvE, asd, hom-thrB, amiACDEF, SPD_1878-thrC, livJHMGF) involved in amino acid transport or biosynthesis carry a sequence closely related to CodY-TFBS[MG1363] in their upstream DNA regions.
PePPER uses a novel approach, in which all available information on prokaryotic regulons and TFBSs is used to identify regulons in any query bacterium. In addition it offers a user-friendly web interface making the data provided byPePPER easily accessible for non-bioinformaticians. PePPER offers, next to all fully sequenced bacterial genomes, the possibility to upload un-annotated data, which is then processed automatically. Furthermore, prediction of intergenic region elements such as promoters, transcription terminators, sigma factor binding sites, RBSs, as well as that of possible secondary DNA structures therein, will lead to more detailed knowledge of the DNA regions under study. By adding our knowledge on L. lactis regulons as well as DBTBS and RegulonDB regulon data to the PePPER database, we provide an extended database of bacterial regulons and TFBSs. PePPER can be used to pinpoint a wide range of putative regulons and their cognate TFBSs in any bacterial genome on the basis of existing knowledge. This regulon information can subsequently be used by biologists to help them design experiments to authenticate the proposed regulons.
We thank Tom Eckhardt and Jan Willem Veening for fruitful discussions. This project was partly supported by grants from the Top Institute Food and Nutrition, Wageningen, the Netherlands and The Netherlands Organisation for Scientific Research (NWO), the Netherlands.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.