Escherichia coli genome-wide promoter analysis: Identification of additional AtoC binding target elements

Background Studies on bacterial signal transduction systems have revealed complex networks of functional interactions, where the response regulators play a pivotal role. The AtoSC system of E. coli activates the expression of atoDAEB operon genes, and the subsequent catabolism of short-chain fatty acids, upon acetoacetate induction. Transcriptome and phenotypic analyses suggested that atoSC is also involved in several other cellular activities, although we have recently reported a palindromic repeat within the atoDAEB promoter as the single, cis-regulatory binding site of the AtoC response regulator. In this work, we used a computational approach to explore the presence of yet unidentified AtoC binding sites within other parts of the E. coli genome. Results Through the implementation of a computational de novo motif detection workflow, a set of candidate motifs was generated, representing putative AtoC binding targets within the E. coli genome. In order to assess the biological relevance of the motifs and to select for experimental validation of those sequences related robustly with distinct cellular functions, we implemented a novel approach that applies Gene Ontology Term Analysis to the motif hits and selected those that were qualified through this procedure. The computational results were validated using Chromatin Immunoprecipitation assays to assess the in vivo binding of AtoC to the predicted sites. This process verified twenty-two additional AtoC binding sites, located not only within intergenic regions, but also within gene-encoding sequences. Conclusions This study, by tracing a number of putative AtoC binding sites, has indicated an AtoC-related cross-regulatory function. This highlights the significance of computational genome-wide approaches in elucidating complex patterns of bacterial cell regulation.


Background
AtoSC is a two-component regulatory system that activates the transcription of the atoDAEB operon genes in E. coli, the products of which are involved in the catabolism of short-chain fatty acids [1][2][3]. Acetoacetate has been identified as the inducer of this system. Additionally, modulation of the poly-hydroxybutyrate (cPHB) biosynthesis in E. coli by AtoC [1], and the effects of spermidine [2] or histamine [3] on this process, suggest a more critical metabolic role for this system.
The AtoC response regulator shares significant homology and characteristic properties with the NtrC-NifA transcriptional factors that activate sigma-54 RNA polymerase [4]. The AtoC binding site was experimentally verified as an inverted 40-bp palindrome (two identical inverted 20-bp sites), located upstream of the transcription initiation site of the atoDAEB operon. These cis-elements were found to control both the promoter inducibility by acetoacetate and AtoC binding. Moreover, chromatin immunoprecipitation (ChIP) experiments confirmed in vivo acetoacetate-inducible AtoC binding [5].
The role of AtoSC, however, appears not to be limited to the regulation of the atoDAEB operon expression. Transcriptomic and phenotypic analyses of all twocomponent systems (TCSs) in E. coli implied communication among several systems [6]. In particular, deletion of atoSC genes resulted in a drastic alteration of the mRNA profiles, reduced motility, salt sensitivity and susceptibility to certain environmental agents. Despite the documented in vitro cross-regulation between the E. coli TCSs [7], the observation of such phenotypes implies AtoSC specific effects. These data suggest that other gene targets of the AtoSC TCS might exist and their identification could shed further light into the role(s) of this system.
Several approaches allow the prediction of transcription factor binding sites (TFBS). The most common of these are based on already existing, experimentally verified gene targets of transcription factors, which contain cis-regulatory elements in their promoter regions. The DNA sequences upstream of the transcription start sites (TSS) of these targets are used as input to various mathematical algorithms for the reconstruction of motif models, as for instance Gibbs Sampling [8] and overrepresentation of oligonucleotides [9]. Typically, these motifs are mathematical representations of conserved sub-sequences, emerging from a noisy background. The idea is that conserved DNA patterns would be overrepresented in a group of transcriptionally related sequences compared to a group of unrelated ones. Thus, conserved DNA patterns of co-expressed genes are more likely to be biologically relevant and hence to represent actual cisregulatory elements. While a common method to define sets of co-expressed genes is transcriptomic analysis [10], the incorporation of phylogenetic information in the form of intergenic sequences upstream of orthologous genes can further enhance the accuracy of the motif detection. This is due to observations that regions playing an important role in gene regulation are more likely to be conserved even among different species [11][12][13]. Other methodologies use pathway or gene function information to assess co-regulation [14][15][16].
The common element in the aforementioned approaches is the assumption of prior knowledge of an adequate population of TFBS or the plethora of available co-regulation data. Neither of these two approaches could be used, however, for the prediction of the AtoC binding sites, since only a single AtoC binding site has been detected and the transcriptomic data are limited. For these reasons we set out to predict and then to detect AtoC binding sites through the setup of an iterative, genome-wide, ab initio detection method for the specific transcription factor, without the requirement of a population of target promoters or co-regulation data. The aims were the in silico prediction of additional binding sites, targeted by the AtoC response regulator and the experimental validation of these sites in vivo, using the ChIP technique.

Ab initio motif detection and evaluation procedure
In order to predict putative AtoC binding sites, we implemented an ab initio motif detection procedure ( Figure 1). This procedure involved an iterative pairwise motif sampling process between the promoter of the atoDAEB operon, which was used as the initial qualified sequence, and a pool of E. coli promoters (initial promoter set). The promoter that showed the best homology to the atoDAEB promoter sequences was removed from the promoter pool and, together with the initial qualified promoter sequence, formed the qualified promoter set that was used for the next iteration of the motif-sampling process, against an input promoter set. In each iteration, the input promoter set is depleted by the selected promoter, plus those promoter sequences that scored below a predefined threshold (see Materials and Methods). The goal of this procedure was to yield motifs with maximized log-likelihood (ll) scores at the specified positions, thereby revealing conserved intergenic sites corresponding to putative AtoC binding sites. The ll-score represents an informative index of both the level of motif conservation (information content) and the number of its occurrences on a set of sequences. Promoters containing sequences, which yield motifs with high ll-scores, are pooled to form the final qualified set of promoters, which is used for the derivation of a best-scoring consensus motif.
Due to the fact that the MotifSampler program [17], which was used in this analysis, implements a heuristic algorithm, the sampling results are very sensitive to the input parameters. In particular, different motifs are obtained at the end of the procedure by slightly varying the parameters, e.g. the motif length, the motif maximizing positions or the initial promoter set (see Material and Methods). All these motifs represent different instances of sequence conservation between promoters. However, the sequence conservation among promoter regions is relatively small and this implies that other factors, e.g. the chromatin context or the formation of other transcription factor complexes, also contribute to the effective AtoC binding. Thus, among different sets of conserved sequences only a few would be biologically relevant. To cope with this issue and capture the highest possible number of biologically relevant motifs, the computational pipeline was applied many times with slight variations regarding the input parameters. Subsequently the biological relevance of the obtained motifs was assessed by implementing a Gene Ontology Term statistical analysis (see Materials and Methods).
Three executions of the procedure generated motifs that successfully passed the GOT test and were correlated with gene functions ( Table 1). The first execution Figure 1 Workflow chart of the ab initio motif detection procedure. A) The chart describes the algorithmic steps for the ab initio motif detection process. The thick arrows correspond to algorithmic flows after decision points, the dashed arrows represent initialization steps which are performed uniquely and the dotted arrows (Motif Searching part) concern independently implemented alternative data processing scenarios. The details of the procedure are described in Materials and Methods. B) Description of the workflow symbols.
was initiated by pair-wise motif sampling the -184 to -165 region of the atoDAEB promoter, strictly containing the half AtoC binding sequence, against all E. coli promoters (initial promoter set, see Materials and Methods). The second execution also used the strict positions of the AtoD binding site but with the initial promoter set narrowed to only the sigma-54 dependent promoters from the RegulonDB database [18] because the ato-DAEB promoter is itself sigma-54-dependent. The third execution was performed using another narrowed initial promoter set, consisting of the 57 promoters which are annotated in the E. coli K-12 EcoCyc database [19] as targets of response regulators of other TCSs. The three motifs derived from each execution are presented in Table 2. Besides these three successful executions, other attempts resulted to motifs that failed to yield GOT over-representation and they were hence discarded (data not shown), with the exception of motif 4 (see below). All retained motifs are provided in Additional File 1.
Genome-wide motif searching using the motifs obtained by the iterative procedure coupled to the Hypergeometric Test for Gene Ontology Term overrepresentation correlate motifs with gene functions The biological relevance of the derived motifs was established by an algorithmic procedure, where each motif is used as input for genome-wide screening (Motif Searching) and the genes found to contain matching sequences within their promoter regions are tested through the Hypergeometric Test for Gene Ontology Term overrepresentation. Thus, if overrepresentation of a particular GOT is observed among the genes to which the motif hits were attributed, there is statistical evidence that this gene set was not randomly generated. As the permutation-derived corrected p-values impose a significant strictness into the statistical procedure, and since pvalue correction discards the majority of GOT, which lack sufficient representation among the input genes, larger (more permissive) hit populations were required to ensure good statistical representation of all possible outcomes. Thus, the results of the Motif Searching algorithm significantly enriched the hit population (Table 3), since the qualified promoter sets were too small to yield GOT overrepresentation. In this way, the three aforementioned motifs were correlated to gene functions (Table 4) and they were selected for experimental validation. It is worth noting here, that not all these hits can map effective, functional binding sites under all experimental conditions; they rather represent putative targets, based on their sequence content. The statistical correlation with biological functions reveals slightly conserved sequences that are potentially inclined toward interaction with particular transcription factors. The GOT analysis, as implemented in this work, evaluates whether the information content captured by a motif implies random noise or is a biologically relevant one.

Evaluation of lower-scoring motifs and retention of Motif 4
In addition to the highest-scoring motifs, we also investigated motifs with intermediary correspondence to the AtoC-binding site. These motifs were initially discarded from the automatic procedure due to the stringent parameterization to retain only the highest-scoring motifs for the specified position. In order to overcome this limitation and test the discovery strengths of the proposed analytic The computational workflow was performed many times varying different parameters. Here are shown the three successful executions that corresponded to three different initial promoter sets.  pipeline in this work, we applied this to motifs that score lower during the Motif Sampling phase (assuming that they reveal weak matching with the atoDAEB promoter binding site) and examined them separately regarding their evaluation in the GOT analysis. In general, the hits of intermediary motifs either overlapped those of the highest-scoring motifs or did not provide any overrepresented GOT and so they were discarded (data not shown). Motif 4, however, is an exception. This intermediary motif was retained ( Table 2) because it matched an unexpectedly highly conserved repeat. This particular repeat corresponds to an uncommonly conserved intergenic site present in many intergenic regions and yielded a significant GOT overrepresentation, when submitted individually to the motif evaluation sub-procedure. The enrichment of motif 4 is particularly high within its hits (at 0.7 prior probability motif 4 comprises 346 hits on promoter regions; data provided in Additional File 2) due to the very high sequence conservation, and despite its restrictive character, which yields a lower log-likelihood score (66.3). Therefore, motif 4 was strongly correlated to the GOT Transporter Activity function (Table 4).
Motif 4 resides within a~40-bp highly conserved repeat found in many E. coli promoters Motif 4 weakly matched the palindromic AtoC binding site in the promoter of the atoDAEB operon [5] and at the same time it strongly matches an unexpectedly conserved 40-bp repeat present in a significant number of promoters (motif 4 searching results, Additional File 2). An additional motif, motif 5, was constructed in order to represent this conserved region. The sequence of motif 5 at 0.7 prior probability is shown in Figure 2A. Motif 5 was used for genome-wide screening and matched 85 intergenic sites, whereas it did not match any site within ORFs (data provided in Additional File 2). The high degree of conservation of this 40 bp repeat and its wide representation in the genome ( Figure 2B) suggests a more pivotal role in global E. coli regulatory mechanisms. The alignment of the first twenty hits shows unexpected sequence conservation, for an intergenic region ( Figure 3) where, for example, there are 6 variants of the repeat with at least 38/ 40 identity in various intergenic positions. The probability of finding a 38-bp DNA sequence repeated by chance two times is 1/0.25 38 × 4 in the E. coli genome comprising 4.6 × 10 6 nucleotides. This result is even more pronounced if we consider the absolute establishment of the repeat only within intergenic regions and not within ORFs. Interestingly, this highly conserved sequence was also detected in many intergenic regions of other Enterobacteria such as Shigella flexneri, Salmonella enterica, Klebsiella pneumonia and Citrobacter coseri (data not shown).
Application of the computational procedure on the transcription factor LexA as a demonstration of its general effectiveness In order to demonstrate that the computational procedure used here for the prediction of AtoC binding  elements can be applied to other transcription factors, we analyzed its effectiveness using LexA, a well-characterized transcription factor. LexA, which regulates the transcription of several genes involved in the cellular response (SOS response) to DNA damage or inhibition of DNA [20], has been extensively studied and the multitude of experimentally identified DNA binding sites that predicted its consensus [21] were also used as a benchmark to assess the predictive capacity of our computational framework. By initiating the computational analysis with one LexA binding sequence and querying the whole promoter set of Escherichia coli K12 (motif length 20, same parameters as for AtoC), we identified a number of putative DNA binding sites, including thirteen known LexA binding sites. Moreover, the highestscoring GOTs generated from this procedure were GO:0006974, "response to DNA damage stimulus" and GO:0009432, "SOS response", a finding that is in total agreement with the established biological function of LexA (the findings are presented in Additional File 3).

Experimental validation of the in silico predicted sequences by in vivo binding of recombinant AtoC
ChIP experiments are routinely used to define whether a DNA-binding protein recognizes a particular gene regulatory region in vivo. Regarding the system under study, the specificity of the results obtained with this method has already been demonstrated by the lack of AtoC binding to its defined site in the regulatory region of the atoDAEB operon in the E. coli strain BW28878, an atoSC deletion mutant [6]. In contrast, in the isogenic atoSC + strain BW25113 AtoC was found to bind this site in vivo and this binding was more pronounced in the presence of inducing acetoacetate [5]. Having confirmed that AtoC binds in vivo to its a priori target, ChIP experiments were used to define whether AtoC binds to any of the putative targets that emerged from the bioinformatics approach. Initially, we analyzed potential AtoC binding to four such sequences, three located within the dmsA, acr and fliA promoter regions, and one located within the fliT gene-encoding region. ChIP experiments indicated that AtoC binds to all four, as well as in the control atoDAEB promoter (data not shown). The specificity of the experimental approach used was confirmed by the lack of any visible signal, when the precipitations were performed either in the absence of AtoC-specific antibody or in the atoSCs train BW28878.
To further increase the probability that AtoC would bind in vivo to even relatively low affinity targets predicted by the bioinformatic analysis, the experiments were performed in E. coli BL21[DE3] transformed with plasmid pHis 10 -AtoC [22] and overexpressing a recombinant His-tagged form of AtoC, recognised by an anti-His probe. Initially, we analyzed potential AtoC binding to the hits of the qualified promoter sets and subsequently to additional hits of the motif derived from each execution. Ten of the qualified promoters of Table 1 were tested and found positive for in vivo AtoC binding, together with the already verified atoD: narZ [23], puuP [24], dmsA [25], crr [26], rtcB-rtcR [27], borD [28] and acrD [29]. ChIP analysis was also extended to accommodate targets derived from the Motif-Searching procedure while their cognate gene products suggested biological relevance (presented in Table 5). Sequences within six intergenic regions, i.e. metR-metE [30], trmA-btuB [31], ykgA-ykgQ [32], aegA-narQ [33,34], ymiA [35], narZ [23], and cpxR-cpxP [36], which were observed as hits by motif 1 searching (Table 2) were selected for testing based both on their scores and their putative functional relevance to the ato genes. The results of the ChIP analysis for all intergenic regions ascribed to motif 1 are included in Figure 4, which clearly demonstrates the in vivo binding of recombinant AtoC. Six additional promoters, i.e. borD [28], putA-putP [37], acrD [29], adhP [38], rhaT-sodA [39,40] and nirB [41], that are likely transcribed by sigma-54 polymerase and which matched motif 2, were similarly tested for in vivo AtoC binding. Finally, both the borD and acrD gene promoters, to which AtoC was found to bind in vivo, share patterns with motif 3. Regarding the top hits of motif 4 (Additional File 2), sseB and yjcH were also among the positive targets. This result is potentially significant since yjcH is cotranscribed with acs and actP, which encode proteins involved in acetate activation and transport, as part of the acs-yjcHG operon [42][43][44]. All predicted targets matching motifs 2, 3 and 4 that gave a positive signal in genomic DNA (Input Chromatins, see paragraph regarding specificity of AtoC binding) were also analysed by ChIP and the results are presented in Figure 5A. As becomes evident in this figure, not all targets are recognised by AtoC with the same intensity, even though the input chromatin signals share, as stated above, the same strong level of intensity. This allows a certain prioritization of the putative targets according to the strength of the AtoC binding.
In vivo binding of AtoC to gene encoding regions An interesting aspect of this bioinformatic analysis is the detection of several potential AtoC binding sites, corresponding to the pattern of motif 1, within gene-   Figure 6). Thus, it is most likely that AtoC also binds within the gene-encoding regions predicted by the bioinformatic analysis, although the functional relevance of this binding remains unclear.
Specificity of AtoC binding to its predicted targets: the effect of the acetoacetate inducer in vivo and in vitro DNA-binding competition experiments In order to assess the specificity of AtoC binding to the predicted operators, we re-examined all ChIP analyses by reducing the PCR cycles to twenty five, while ChIP signals for both acetoacetate-induced and non-induced cultures were analysed comparatively for each target.
AtoC was found to demonstrate increased affinity for most of the predicted "atypical" sites only in the case where acetoacetate, the inducer, was added to the cultures, while the abundant, over-expressed AtoC appears able to bind the atoDAEB promoter with the same affinity, regardless of the presence of acetoacetate ( Figures  4, 5 and 6). In an attempt to ensure consistency of the results through the use of extra controls, chromatin input samples for all targets groups were also tested for PCR positive signals in the absence of the His-probe antibody precipitation. DNA signals of the chromatin input for both induced and non-induced cultures, exhibited the same signal intensity ( Figure 5B), which proves that the signal intensity corresponding to each target can be solely attributed to the AtoC binding, whereas the absence of signal can be considered a sign of absence or weak binding, rather than a mere coincidence caused by primer malfunction. Moreover, low ChIP signals for the non-induced cultures provide negative controls, in the presence of both abundant AtoC and the antibody, directly relating the presence of abundant positive signals to acetoacetate induced AtoC binding of the predicted targets, thus supporting our case.
To further address whether AtoC indeed binds to the predicted targets, the aforementioned representative of each of the three groups was also tested, together with atoD, in gel retardation assays combining His 10 -AtoC with the PCR products that contain the predicted promoters or gene encoding regions. The specificity of the DNA-AtoC interactions has already been verified by competition experiments to evaluate the association of AtoC with biotinylated atoD probe in the presence of increasing concentrations of unlabeled specific (atoD) or nonspecific DNA competitors [4,5]. As illustrated in Figure 7A, electrophoretic gel retardation assays showed that all DNA targets exhibit binding to AtoC, following incubation on ice for 25 min. However, this binding appears to be less pronounced when compared with the cognate, high-affinity AtoC binding site present in the atoD promoter. The specificity of the results was further verified by also performing DNA binding reactions on ice for 30 min in the presence and absence of DNA competitors as previously described [4,5]. Under these conditions, His 10 -AtoC binding to the biotin-labelled atoD probe was abolished either by the addition of excess amounts of unlabelled cognate (atoD) competitor or by unlabelled DNA competitors containing the novel, atypical binding sites, i.e. metR-metE, acrD, narG ( Figure  7B). The degree of specificity of the AtoC binding to each of the predicted targets remains to be determined.

Discussion
The AtoSC two-component signal transduction system is involved in the expression of the atoDAEB operon encoding proteins required for the catabolism of shortchain fatty acids. Acetoacetate activates the AtoSC TCS, and subsequently, the expression of the atoDAEB operon genes. We have already shown that, upon acetoacetate activation, AtoC binds a 40-bp palindromic sequence (two identical inverted 20-bp sites) located upstream of the atoDAEB transcription start site and activates the sigma-54-dependent expression of this Figure 4 In vivo binding of recombinant AtoC to in silico predicted potential targets of motif 1 in intergenic regions. Results of a ChIP analysis by agarose gel electrophoresis (2% w/v) of PCR amplification products generated by primers corresponding to motif 1 hits and qualified promoters, included in Tables 1 and 5. All ChIP preparations were carried out following acetoacetate induction. Above each PCR product, generated by the specific designated primers, the name of each gene or promoter region comprising the motif sequence is denoted for each pair of lanes. All AtoC additional binding targets are PCR tested (25 cycles) in two ChIP preparations carried out without (-, left lanes) or with (+, right lanes) AcAc induction (AA).
operon [5]. A schematic representation of the atoSC and atoDAEB region, as a well as the positions of the verified AtoC targets including the high-affinity binding site, on the E. coli genomic map, is illustrated in Figure 8. In addition to its role as a transcriptional activator, AtoC has a second function as the posttranslational inhibitor of ornithine decarboxylase, the key-enzyme in polyamine biosynthesis [45]. These data underline the complexity of the regulatory networks that are involved in maintaining the metabolic pathways and homeostasis in bacteria.
Several lines of evidence indicate that the role of the AtoSC TCS in the regulation of E. coli gene expression might not be limited to the expression of the atoDAEB B A Figure 5 Monitoring the AcAc effect on AtoC in vivo binding to its predicted "atypical" targets. A) All predicted targets matching motifs 2, 3 and 4 (Table 5), that gave a positive signal in genomic DNA, are ChIP analysed. All AtoC additional binding targets are PCR tested (25 cycles) in two ChIP preparations carried out without (-, left lanes) or with (+, right lanes) AcAc induction (AA). Input Chromatin samples are also included representing positive genomic DNA signals for the sampled targets. B) Three selected targets representing each of the afore-mentioned groups (metR-metE, narG and acrD) were also PCR tested together with atoD (25 cycles) using as templates input chromatin controls (Input Chromatin) and immunoprecipitated preparations (ChIPs) from non-induced (-AA) or AcAc induced (+AA) cultures.  (Table 5) were ChIP analyzed for in vivo AtoC binding by electrophoresis of PCR products generated by primers specifically designed to comprise the target motif sequence. The name of each gene comprising the motif sequence is denoted above each lane. The AcAc effect on the binding to each target is also denoted, as stated in Figure 4. operon genes. Deletion of the atoSC locus results in the inability of E. coli to catabolize short-chain fatty acids, as well as in a number of other defects including downregulation of flagellar gene expression, defects in motility and chemotaxis, inability to use glucuronamide as carbon source and sensitivity both to high osmotic pressure and to aminoglycoside antibiotics [6]. These pleiotropic effects could involve the phosphorylation of noncognate response regulators by the AtoS kinase and the regulation of the expression of additional genes by AtoC, as part of a cross-talk regulatory network. This study exposed a number of promoters of related genes that may serve as putative AtoC targets. These include fliT that encodes a chaperone of the flagellar export system [46], and the neighboring fliA gene that encodes the sigma-28 polymerase, which activates transcription of both flagellar and chemotaxis genes, including tar, tsr and the flaA operon [47][48][49]. Additional potential AtoC targets include acrD, which encodes a component of the AcrAD-TolC multidrug efflux transport system and is involved in bacterial response to a variety of aminoglycosides [29], puuP encoding an importer for putrescine [24], the enzymatic product of ornithine decarboxylase and the precursor of the polyamines spermine and spermidine [19] as well as yjcH encoding a conserved inner membrane protein involved in acetate transport [44].
A defined consensus sequence for the AtoC binding site would greatly facilitate the identification of potential AtoC binding elements. However, the lack of experimentally verified AtoC binding targets other than that upstream of the atoDAEB promoter [5], necessitates the utilization of bioinformatics methodologies for the task of predicting potential AtoC binding motifs. Therefore, we implemented an ab initio computational prediction of additional targets, using half the verified palindromic sequence of the AtoC binding site [5] as the sole input. The process, executed many times with slightly varying parameters, generated different motifs, which were tested through statistical GOT enrichment analysis for their biological relevance. The qualified motifs defined a pool of putative AtoC binding sites present within either intergenic or gene-encoding regions of the E. coli genome. The biological validation of the in silico-obtained predictions was performed by ChIP assays that demonstrate AtoC binding in vivo. These experiments validated the computational approach, since AtoC was found to bind in vivo in twenty-one of the predicted target sequences. As predicted by the GOT analysis, many of the targets are transcription factors, transporters and enzymes effecting cellular redox state ( Table 5).
As no high homology is shared among the primary sequences of the confirmed AtoC binding sites, the extrapolation of a strict consensus sequence remains elusive. Previous studies have shown that global bacterial transcription factors can recognize multiple noncanonical binding sites that do not conform to a consensus site [50,51]. Our computational workflow bypassed this issue by revealing slightly conserved, yet functionally effective, AtoC binding sites that do not conform to a global consensus. Another interesting aspect of this analysis is that AtoC binds in vivo not only to intergenic sequences but also to a number of gene-encoding regions. It is not clear whether this binding has a regulatory (possibly enhancing) impact on the expression of these genes or whether it serves some other purpose, such as quenching the amounts of available AtoC within the E. coli cell milieu. AtoC, like other response regulators, could in theory affect bacterial gene expression from a distance [52], in a manner analogous to that of eukaryotic enhancer-binding transcription factors [53]. However, this is not necessarily the case, since there is at least one report of response regulator binding to gene-encoding regions without obvious effects on gene expression [54]. The presence of alternative transcriptional factor DNA targets of unknown function has been recently demonstrated [55], showing that CtrA, a bacterial response regulator, binds a second category of weak DNA targets. Questions remain regarding the conditions governing the AtoC binding to these sites in vivo, i.e. whether AtoC binds alone or in conjunction with some other factor(s). Figure 7 Verification of AtoC in vitro binding to representative "atypical" targets. A) Sample gel retardation experiments illustrating His 10 -AtoC protein binding to DNA fragments atoD, metR-metE, acrD (promoter) and narG (gene encoding) regions. As described in Materials and Methods, following electrophoresis, gels were first stained with Gel-Red dye and then with Coomassie blue, and side-by-side photographs are shown. The lines above the Gel-Red stained lanes indicate the different probes combined with the indicated His 10 -AtoC (AtoC) quantities. Arrows indicate bands that were stained with both Gel-Red and Coomassie blue, corresponding to band-shifting caused by the AtoC binding. Lines indicate free proteins, not bound to corresponding DNA particles. B) Band shift assay of His 10 -AtoC with a biotinylated fragment of the upstream region of the atoDAEB operon. EMSAs were performed as described in Materials and Methods with 10 ng of biotinylated atoD. The addition of His 10 -AtoC (0.35 μM) and certain amounts of competitive, non-biotinylated atoD, as well as metR-metE, acrD and narG fragments is indicated. 2 μg of sonicated calf thymus competitive, non-specific DNA were added in each reaction.  There is a high probability that existing public microarray data cannot reveal the full extent of the biological relevance of the AtoSC system, since those data were obtained from cells grown in very rich media. Our data clearly indicate that conditions promoting the activation of the AtoSC TCS, i.e. the presence of the acetoacetate inducer, are required to promote AtoC binding to the predicted sequences. In this sense, microarray experiments from cells exposed to AtoSC inducers, and the subsequent correlation of these data with the novel AtoC binding targets presented here, could provide significant information on the biological importance of the AtoSC TCS.

B
Furthermore, the wealth of novel AtoC-binding sites, some of which are present within gene-encoding sequences, should be further analyzed for their functional significance in gene regulation. While this may not be a trivial task, especially when taking into account the large number of targets, it should provide insight into the regulation of bacterial gene expression from a distance.

Conclusions
The scope of this work was to employ an ab initio motif detection approach, coupled with gene ontology analysis, in order to define novel binding elements for the bacterial transcription factor AtoC. This approach identified a number of sequences which, albeit their divergence, were recognized by AtoC in vivo. Overall, this study underlines the emergent complexity of a possible AtoCrelated regulatory network which may contribute to the pleiotropic effects of AtoC.

Computational Tools
All procedures were implemented on a Linux workstation using a combination of Perl scripts interacting with the specified INCLUSIVE programs for motif analysis (MotifSampler, MotifSearch, MotifRanking) [17]. All sequences, annotations and intermediary results were stored and queried in a Mysql database using the Perl-Mysql interface DBI library.

Selection of promoters
The promoters are defined here as the sequences from -250 to +30 relative to each transcription start site (TSS). The annotations of the TSS and the genomic sequence of Escherichia coli K-12 were extracted from the Refseq database [56] (refseq id:NC_000913) using Perl and Bioperl scripts. No overlap between two promoters was allowed and all intergenic sequences shorter than 500 base pairs and between two genes encoded respectively in -1 and +1 strands were entirely included and considered as promoters of both flanking genes in the GO Term analysis step.

Definition of promoter sets
The initial promoter set is defined as the promoter pool that is subjected to motif sampling. The different initial promoter sets used in the three executions of this analysis are presented in Table 1.
The promoters submitted to each reiteration of motif sampling represent the input promoter set. During the first iteration the input promoter set, which equals the initial promoter set, is subjected to motif sampling against atoDAEB promoter sequences representing the initial qualified promoter set. The number of promoters within the input promoter set decreases after each reiteration, since both a qualified promoter, as well as promoters scoring below a threshold value (see below), are eliminated from the pool. The qualified promoter set, ever increasing by one, is then used for the next iteration of the motif sampling process and the procedure advances until no more qualified promoters can be found.

Iterative procedure for de novo motif detection
The de novo motif detection was performed using a Gibbs sampling algorithm (implemented in INCLUSIVE MotifSampler program [17]) in a multi-step procedure. The first step of the procedure was a pair-wise motif sampling between the atoDAEB operon promoter and each of the other E. coli promoters of the initial promoter set (3789 in total in the case of the full set of E. coli promoters), using the following parameters: A hundred runs (detection of one motif per run permitted); motif length 20; both strands; background model of order 3 (downloaded from the INCLUSIVE [17] web site for E. coli K-12), with prior probability value 0.5. The program retained the highest scoring motif.
In each iteration: -A log likelihood (ll) score is estimated for every motif derived from the pair-wise motif sampling between each remaining promoter in the input promoter set and the qualified promoter set derived from the previous sampling iterations.
-The promoter of the input promoter set yielding a motif with the maximum ll-score within the specified position of the atoDAEB promoter is retained and added to the qualified promoter dataset. Thus, the qualified promoter set is derived through iterative agglomeration.
-The motif sampling performs independent rounds (same parameters as above) and datasets composed of n +1 sequences, where "n" the number of sequences in the dataset of the previous step (started by 2).
-When the ll score of a particular motif derived is lower than the 30th percentile of all scores, then this promoter is immediately eliminated from the input promoter set.
-Additionally when the position-specific maximum llscore of a motif within a promoter is either equal or lower than the score derived in the previous step, then this promoter is also removed from the input promoter set.
In order to reduce the number of motif sampling rounds, the elimination of promoters that are less probable to maximize the ll-scores of the detected motifs is predicated, otherwise the screening procedure becomes time-exhaustive and possibly non converging to a final qualified promoter set, with an increasing number of sequences embedded in the qualified promoter dataset.

Motif Ranking
The motifs generated by Motif Sampling were ranked and separated by Kullback-Leiber distance (INCLUSIVE MotifRanking program [17]). The threshold was set to 0.9 which means that if the Kullback-Leiber distance between two motifs is smaller than this threshold these motifs are considered as similar.

Motif Evaluation and correlation with biological function
The evaluation of the motifs was performed by coupling Motif Searching with Gene Ontology Term Analysis. The motif searching hits were attributed to genes and these genes were submitted to the Hypergeometric Test for GO Test over-representation, described below.

• Motif Searching
The position-specific probability matrices (PSPM) representing these motifs, were used for two types of motif screening (INCLUSIVE MotifScanner program [17]); one including all individually cut intergenic sequences and another including all individually cut gene-encoding sequences. For both screenings the prior probability value was set to 0.7. The motif sequences were constructed by the WebLogo program [57]. For the sequence alignment we used the Seaview program [58].
• Gene Ontology Term Analysis of the promoters containing the motifs In order to assess whether these motifs correspond to a cis-regulatory element, we performed the Hypergeometric Test for GO Test over-representation in the genes having a promoter, where a hit was observed at the Motif Searching step. For the implementation of the test, we used the GO::TermFinder Perl module [59]. The GO Term gene annotation used was the EBI Gene Ontology Annotation (GOA) [60] for E. coli K-12 (84.4% coverage). The p-value threshold of the Hypergeometric Test was set to 0.1 for the first test and to 0.05 for the second one (for corrected p-value, using 1000 simulation runs) and we considered as the total gene population the 4339 E. coli K-12 genes annotated with GO terms.

Construction of motif 5
The promoters which contained hits of motif 4 were submitted to MotifSampler with the same parameters as the initial motif samplings, except from the motif length, which was set to 40. For the visualization of the genomic positions of the hits of motif 5 we used the CGview Server [61].

Bacterial strains and growth conditions
Cells were grown to their mid-exponential phase at 37°C in 10 ml of modified M9 mineral medium supplemented with 0.1 mM CaCl 2 , 1 mM MgSO 4 , 0.4% (w/v) glucose, 1 μg of thiamine/ml and 80 μg of proline/ml [62], in the absence or presence of 10 mM acetoacetate. E. coli BL21[DE3] cells carrying pHis 10 -AtoC plasmid [22] were grown at 37°C in minimal medium containing 100 μg/ ml ampicillin. Induction of recombinant AtoC expression was achieved by addition of IPTG (0.25 mM) to the cultures when the OD 600 reached 0.25 and it was allowed to take place for 4 hours. Acetoacetate, the known inducer of the AtoSC TCS, was also added during the induction phase of the culture at a concentration of 10 mM.

ChIP assays
ChIP assays were performed using chromatin from E. coli BL21 [DE3] over-expressing His-tagged AtoC, grown in the absence or presence of acetoacetate. The assays were performed using Magna ChIP™ A Millipore kit following the manufacturer's instructions. Briefly, in vivo cross-linking of the nucleoproteins took place by the addition of formaldehyde (final concentration 1%), directly to the E. coli (10 ml cultures) grown in modified M9 mineral medium, when the OD 600 of the culture reached 0.6. The cross-linking reactions were quenched 20 min later by the addition of glycine and the cells were harvested by centrifugation and washed twice with ice-cold PBS, supplemented with a protease inhibitor cocktail. Cell lyses were performed as described in the kit manual and the cellular DNA was sheared by sonication to an average size of 500 to 1,000 bps. Cell debris was removed by centrifugation and the supernatant was used as the chromatin input for the immunoprecipitation reactions, after an initial stage of pre-incubation with protein A magnetic beads, in the absence of antibodies, to remove the nonspecific binding proteins. The ChIP reactions were then performed by adding the His-probe™ Santa Cruz rabbit polyclonal) and fresh protein A magnetic beads, to the resulting supernatant and allowing the reactions to take place overnight, with rotation at 4°C. DNA was then obtained from the immunoprecipitated protein/DNA complexes and the presence of the target promoter sequences in the chromatin immunoprecipitates, was detected by PCR amplification. Parallel mock precipitations performed in the absence of added antibody, were used as negative controls for each reaction.

Primer design and PCR reactions
The primers used to monitor the chromatin immunoprecipitations (Additional file 4) were designed, using the Oligo Explorer software, to generate a product of approximately 300 base pairs. All oligonucleotides were synthesized by VBC Genomics, Austria. The number of PCR amplification cycles varied from 25 to 35 to achieve the maximum sensitivity and specificity. The PCR products were analyzed by electrophoresis on 2% w/v agarose gels and stained using GelRed (Biotium). DNA polymerase DyNAzyme™ EXT and reaction buffers used in the PCR process were purchased from Finnzymes. MspI digest DNA ladder was purchased by New England Biolabs.

Electrophoretic Mobility Shift Assays
We used the electrophoretic mobility shift assay methods of Orchard and May [63], slightly modified, to determine the formation of protein-DNA complexes. The His 10 -AtoC protein was bound to double-stranded oligonucleotides spanning the representative target sites of the three promoters (atoD, metR-metE and acrD) and the narG gene. Respective quantities of each of the four indicated particles were combined with elevated His 10 -AtoC quantities (1, 1.25, 5 and 10 μg) in 25 μl of binding buffer, 10 mM Tris [pH 7.6], 10 mM MgCl 2 , 50 mM NaCl, 5% (v/v) glycerol, and incubated on ice for 25 min, following a 5 min pre-incubation at room temperature, before the addition of protein. Non-protein containing samples were quickly mixed with 10× Gel loading buffer (Takara) and all samples were applied to a 100 V prerun polyacrylamide gel containing 0.5× TBE buffer prepared with 6.0% (w/v) polyacrylamide (ratio of acrylamide to bisacrylamide, 29:1. Following electrophoresis (120 V, 120 min), the gels were stained first with Gel-Red dye (Biotium) and then with Coomassie blue.
For the DNA-binding experiments 10 ng (3 nM) of biotinylated atoD probe were used in binding buffer consisting of 10 mM Tris-HCl pH 7.4, 50 mM NaCl, 1 mM EDTA, 4 mM DTT and 5% (v/v) glycerol, as previously described [5]. Competitive non-biotinylated atoD was added in the same concentration (up to 80 nM each) with metR-metE, acrD and narG. Each reaction took place in a final volume of 20 μl consisted of 0.2 mg/ml BSA and 2 μg sonicated calf thymus competitive DNA. All the ingredients were added and preincubated for 5 min at room temperature, before the addition of 0.35 μM (2 μg) of the protein and transfer to ice for 30 min. At the end of binding reactions, the samples were loaded to 4% (w/v) acrylamide gel in 0.25× TBE buffer and ran at 120 V, with a 30 min prerun at 100 V. DNA was transferred from the gels to Biodyne B membranes (Pall) and detected by Phototope-Star detection kit (New England Biolabs), as previously described [4,5]. Sciences (e-LICO)" (IST-2007.4.4-231519). This work was also supported from the core funds of the Departments of Chemistry (AIG and DAK) and Pharmacy (CAP) of the Aristotle University of Thessaloniki. Materials essential for the experimental validation in this study were purchased by funds of the project 26515 "BIOPRODUCTION" of the 6th Framework Programme (FP6) of the European Union. The funding bodies had no involvement in the design of this study; in the collection, analysis, and interpretation of data; in the writing of the manuscript; or in the decision to submit the manuscript for publication. Authors' contributions EP co-designed and carried out the computational analysis for de novo motif detection of putative AtoC binding targets within the E. coli genome workflow, participated in the interpretation of the results of the computational analysis, co-drafted and co-revised the manuscript, AC codesigned the computational analysis workflow for de novo motif detection of putative AtoC binding targets within the E. coli genome, supervised the computational analysis and interpretation of the results of the computational analysis, contributed in the evaluation of the results, codrafted, co-revised and proofread the manuscript. AIG carried out the experimental assays (ChIP assays, Primer design and PCR validations, Electrophoretic Mobility Shift Assays) co-drafted and co-revised the manuscript, CAP supervised the experimental analysis and interpretation part, contributed in the evaluation of the results, co-drafted and co-revised the manuscript, FNK contributed in the supervision of the complete work, in the interpretation and evaluation of the results and revised the manuscript. DAK contributed in the supervision of the complete work, in the interpretation and functional evaluation of the results, co-drafted, co-revised and proofread the manuscript. All authors have read and approved the final manuscript.