Computational prediction of cAMP receptor protein (CRP) binding sites in cyanobacterial genomes
© Xu and Su. 2009
Received: 31 May 2008
Accepted: 15 January 2009
Published: 15 January 2009
Skip to main content
© Xu and Su. 2009
Received: 31 May 2008
Accepted: 15 January 2009
Published: 15 January 2009
Cyclic AMP receptor protein (CRP), also known as catabolite gene activator protein (CAP), is an important transcriptional regulator widely distributed in many bacteria. The biological processes under the regulation of CRP are highly diverse among different groups of bacterial species. Elucidation of CRP regulons in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. Previously, CRP has been experimentally studied in only two cyanobacterial strains: Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120; therefore, a systematic genome-scale study of the potential CRP target genes and binding sites in cyanobacterial genomes is urgently needed.
We have predicted and analyzed the CRP binding sites and regulons in 12 sequenced cyanobacterial genomes using a highly effective cis-regulatory binding site scanning algorithm. Our results show that cyanobacterial CRP binding sites are very similar to those in E. coli; however, the regulons are very different from that of E. coli. Furthermore, CRP regulons in different cyanobacterial species/ecotypes are also highly diversified, ranging from photosynthesis, carbon fixation and nitrogen assimilation, to chemotaxis and signal transduction. In addition, our prediction indicates that crp genes in modern cyanobacteria are likely inherited from a common ancestral gene in their last common ancestor, and have adapted various cellular functions in different environments, while some cyanobacteria lost their crp genes as well as CRP binding sites during the course of evolution.
The CRP regulons in cyanobacteria are highly diversified, probably as a result of divergent evolution to adapt to various ecological niches. Cyanobacterial CRPs may function as lineage-specific regulators participating in various cellular processes, and are important in some lineages. However, they are dispensable in some other lineages. The loss of CRPs in these species leads to the rapid loss of their binding sites in the genomes.
Cyclic AMP receptor protein (CRP), also known as catabolite gene activator protein (CAP), is an important transcriptional regulator widely distributed in a variety of bacterial groups [1, 2]. The biological processes under the regulation of CRP are highly diverse, including energy metabolism [3, 4], cell division and development , toxin production , competence development , quorum sensing  and cellular motility [8, 9]. CRP belongs to the CRP/FNR transcription factor (TF) superfamily , which are generally believed to function as global regulators throughout the eubacteria . Each member of the CRP/FNR superfamily contains an N-terminal effector binding domain and a C-terminal helix-turn-helix (HTH) DNA binding domain (DBD) . The TFs of this superfamily form a homodimer in vivo, and are activated by the binding of specific small effector molecules to their effector binding domains . The CRP dimer is activated by the binding of two cAMP molecules to the effector binding domain of each subunit, which causes a conformational change in the DBDs, allowing each to bind to half of a specific pseudo-palindromic DNA sequence in the promoters of the genes that are under CRP regulation . Upon the binding, CRP interacts with the C-terminal domain of the alpha subunit of the RNA polymerase, affecting the RNA polymerase binding to the promoter, and thus leads to the change of the transcription initiation rate of the target gene [14–18].
The functions of CRP as well as its target genes (the CRP regulons) have been well studied in E. coli and other heterotrophic bacteria , and it seems that all the sequenced E. coli genomes encode one copy of the crp gene. CRP in E. coli is characterized as a global regulator, which controls the expression of more than 200 transcriptional units involved in various important biological processes of this organism [20, 21]. Through decades of research, 269 CRP binding sites (RegulonDB release 5.8 ) in this species have been experimentally identified, which show a pseudo-palindromic consensus in the form of TGTGAN6TCACA. More recently, slightly different CRP binding sites with the consensus TGCGAN6TCGCA were also identified in E. coli and other γ-proteobacteria . One of the major functions of CRP in E. coli involves the transcriptional regulation of genes related to organic carbon assimilation and energy metabolism [3, 4].
As the life of E. coli and other heterotrophic organisms relies on the assimilation of organic carbon sources from the environment, it is not surprising that CRP works as an important global regulator to coordinate a variety of biological processes in these organisms. Cyanobacteria, on the other hand, are a group of autotrophic organisms capable of oxygenic photosynthesis; therefore, they do not rely on organic carbon source from the environment. Intriguingly, at least half of the sequenced cyanobacterial genomes encode at least one copy of the crp gene (see below). CRP proteins have been experimentally studied in two cyanobacterial strains, i.e. Synechocystis sp. PCC 6803 (PCC6803) [8, 9, 23–26] and Anabaena sp. PCC 7120 (PCC7120) [27, 28]. In the PCC6803 genome, the open reading frame (ORF) sll1371 encodes a homologue to the E. coli crp gene , and has been named sycrp1. It has been shown that the product of this gene, SyCRP1 forms a homodimer, which can bind cAMP with high affinity in vitro . Furthermore, in the presence of cAMP, SyCRP1 could form a complex with DNA that contains the consensus CRP binding site similar to that in E. coli (TGTGAN6TCACA) . Further studies have revealed that SyCRP1 was essential for type IV pilus biogenesis and was involved in cell motility in PCC6803 [8, 9, 25, 29, 30]. On the other hand, in the PCC7120 genome two ORFs, alr0295 and alr2325, were found to encode putative CRPs, and were named ancrpA and ancrpB, respectively . Equilibrium dialysis measurements showed that both AnCRPA and AnCRPB (the gene products of ancrpA and ancrpB respectively) could bind cAMP. Electrophoresis mobility shift assay (EMSA) further demonstrated that AnCrpA could bind to the consensus CRP binding site in E. coli . It has also been reported that both AnCrpA and AnCrpB are functional in PCC7120, the former regulates the expression of several genes involved in nitrogen fixation , and the latter controls the genes induced by nitrogen depletion . A few CRP binding sites in these two genomes have also been experimentally determined, which were found to form a palindromic motif with the consensus sequence TGTGAN6TCACA similar to that in E.coli [25, 28]. In addition, the promoter regions of most of these identified CRP-activated genes in these two cyanobacterial genomes also contain an E. coli -10 σ70-like box (TAN3T), located ~22 bp downstream the CRP binding site. These studies also suggested that CRPs in cyanobacteria might regulate a very different set of genes than those in E. coli. However, a systematic genome-scale study of the potential CRP target genes as well as CRP binding sites in cyanobacteria is hitherto lacking. In this paper, we have predicted the CRP regulons as well as CRP binding sites in 12 sequenced cyanobacterial genomes that encode at least one copy of the crp gene using a highly effective motif scanning algorithm [32, 33]. We have also investigated the degradation of the CRP binding sites in the rest of sequenced cyanobacterial genomes in which the crp genes were lost during the course of evolution.
Genome sequences, predicted ORFs and annotation files of the following 29 cyanobacterial genomes were downloaded from the NCBI website at ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria: Acaryochloris marina MBIC11017 (MBIC11017), Anabaena variabilis ATCC 29413 (ATCC29413), Anabaena sp. PCC 7120 (PCC7120), Gloeobacter violaceus PCC 7421 (PCC7421), Prochlorococcus marinus str. AS9601 (AS9601), Prochlorococcus marinus str. MIT 9211 (MIT9211), Prochlorococcus marinus str. MIT 9215 (MIT9215), Prochlorococcus marinus str. MIT 9301 (MIT9301), Prochlorococcus marinus MIT9303 (MIT9303), Prochlorococcus marinus MIT9312 (MIT9312), Prochlorococcus marinus str. MIT 9313 (MIT9313), Prochlorococcus marinus str. MIT 9515 (MIT9515), Prochlorococcus marinus str. NATL1A (NATL1A), Prochlorococcus marinus str. NATL2A (NATL2A), Prochlorococcus marinus CCMP1375 (CCMP1375), Prochlorococcus marinus MED4 (MED4), Synechococcus elongatus PCC 6301 (PCC6301), Synechococcus elongatus PCC 7942 (PCC7942), Synechococcus sp. CC9311 (CC9311), Synechococcus sp. CC9605 (CC9605), Synechococcus sp. JA-2-3B'a(2–13) (B-Prime), Synechococcus sp. JA-3-3Ab (A-Prime), Synechocystis sp. PCC 6803 (PCC6803), Synechococcus sp. CC9902 (CC9902), Synechococcus sp. RCC307 (RCC307), Synechococcus sp. WH 7803 (WH7803), Synechococcus sp. WH8102 (WH8102), Thermosynechococcus elongatus BP-1 (BP-1), and Trichodesmium erythraeum IMS101 (IMS101).
We predicted operon structures in each cyanobacterial genomes using the Operon Finder Software (OFS) developed by Westover et al . OFS predicts operons based on three pieces of information, including the intergenic distance, functional relatedness of gene annotations, and conserved gene neighborhoods. In this study, both a multi-gene operon and a singleton operon containing only one gene are referred as a transcription unit (TU).
A simple bidirectional best hit (BDBH) approach using the BLASTP program with an E-value cutoff 10-10 was used for the prediction of orthologous genes between each pair of genomes.
To construct the CRP tree, the full length amino acid sequences of cyanobacterial CRPs were identified by the criteria described above using SyCRP1 (sll1371) in PCC6803 as the query sequence. Multiple sequence alignments of the identified cyanobacterial CRP sequences and that of E. coli K12 were performed using ClustalW implemented in MEGA  with default settings. A neighbor-joining (NJ) tree with Poisson correction was constructed using the MEGA program with the E. coli CRP (GI:16131236) being the outgroup. To construct the tree in Figure S1 (see Additional file 1), we used SyCRP1 as the query sequence to search the RefSeq database using BLASTP with an E-value cutoff 10-7. If there were multiple hits from a species, the hit with the smallest E-value was identified as the CRP in that species. The resulting sequences were used to construct an un-rooted tree in the similar way as described above. To construct the species tree, the DNA sequences of the 16S rRNA genes of the sequenced cyanobacteria and that of E. coli K12 were aligned using ClustalW with manual refinement. After the indels were discarded, the final alignments contain 1311 positions. A Neighbor-Joining tree of the 16 rRNA gene sequences was constructed with the E. coli K12 sequence being the outgroup using Kimura 2-parameter model. Statistical significance at each node in the trees was evaluated using 1000 bootstrap resamplings.
Transcriptional units in PCC6803 used for the initial phylogenetic footprinting analysis.
ORF(s) within TU
Putative CRP binding site1
slr1667 slr1668 ssr2786
slr2015 slr2016 slr2017 slr2018
sll0443 sll0444 sll0445 sll0446 sll0447 sll0448 sll0449
sll1371 sll1372 sll1373
For this purpose, we pooled the entire upstream inter-TU regions of these five TUs in PCC6803, as well as those of TUs containing their orthologous genes in other cyanobacterial genomes which encode at least one crp gene. Motif finding programs including CUBIC  and BioProspector , were then applied to predict CRP binding sites in these sequences. CUBIC is a graph theoretic based algorithm that identifies highly similar k-mers in a set of pooled sequences; while BioProspector uses a Gibbs sampling strategy to find overrepresented k-mers in a set of pooled sequences. Putative CRP binding sites with high scores were manually picked up from the motif finding results, and were used to build a preliminary profile of the CRP binding sites in cyanobacterial genomes.
Since the number of binding sites used to construct the preliminary profile of CRP binding sites was relatively small (for the reason, see the Results section), in order to minimize possible bias of binding site sampling, we conducted a one-round iteration to obtain a more representative profile of the CRP binding sites. To this end, we scanned cyanobacterial genomes with the preliminary profile using the techniques described below, and picked the high scoring sites from each genome to construct the more representative profile of the CRP binding sites. We then used this final profile for genome-scale predictions of CRP binding sites in the cyanobacterial genomes.
The whole genome screening of all possible CRP binding sites were performed using an algorithm that we have developed previously . The design of this algorithm is to enhance the prediction specificity by integrating the information of co-occurrence of multiple binding sites in the upstream region of a gene and that in the upstream regions of its orthologues in related genomes. Briefly, the final profile of the CRP binding sites obtained above was first used to scan all the inter-TU regions of the cyanobacterial genomes. The best motif was returned for each inter-TU region. Then the 19 to 31 bp downstream region of each putative binding site was further scanned for an E. coli -10 σ-70 like box (TAN3T) using a corresponding profile that we have constructed previously from cyanobacterial genomes , to which a σ-factor of RNA polymerase is likely to bind to transcribe the downstream TU. Then, the upstream inter-TU regions of the orthologues of the genes in that particular TU in other cyanobacterial genomes were scanned for similar CRP binding sites and -10 like boxes. A score that combines these three pieces of information was computed to rank the putative CRP binding sites for each possible CRP-regulated TU.
where l is the length of the binding sites of M, h any substring of t with length l, h(i) the base at the i-th position of h, p(i, b) the relative frequency of base b occurring at position i in M, q(b) the background frequency of base b, and n the number of sequences used to construct M. When computing p(i, b), a pseudo count 1 is added to the frequency of each base at each position, and a is for normalization to keep I i within the range of [0,1].
For this study, we use M 1 for the CRP binding sites and M 2 for the -10 like box (TAN3T).
where d i, j, k is the Hamming distance between the sequence found by using profile M j in t and o k (g i ), l j the length of binding site motif with profile M j .
Since is the probability of type-I error for testing the null hypothesis that I u does not contain a binding site when is greater than a cutoff s, we used it to estimate the false positive rate of the prediction results. In this way, could be also considered as an empirical p-value, and a cutoff of p < 0.01 was used for the CRP binding site prediction in each genome.
When the 18 genes located in the five TUs (Table 1) in PCC6803 that are likely to be regulated by SyCRP1 in this species, are searched against the 12 cyanobacterial genomes that encode crp genes, we found a total of 30 orthologues in five genomes. These 30 orthologous genes are located in a total of 10 TUs in these five genomes, indicating that these TUs are not well conserved in these 12 genomes. By using the phylogenetic footprinting techniques (see Methods), we predicted a total of eight putative CRP binding sites from the 10 upstream inter-TU regions, suggesting that the CRP binding sites are largely shared by these orthologous genes. In order to increase the representation of the profile of the CRP binding sites and to minimize the possible bias of our original choice of the five TUs, we performed a one-round iteration of putative CRP binding site scanning using this preliminary profile constructed from these eight putative CRP binding sites (see Methods). From this preliminary whole genome scanning results, we selected a total of 112 putative CRP binding sites (Table S1 in Additional file 2) with high scores to construct the final profile of CRP binding sites. These sites display a strong pseudo-palindromic structure with consensus TGTGAN6TCACA (Figure 1c), which is similar to the canonical CRP binding sites in E. coli, suggesting that the pattern of CRP binding sites is well conserved between cyanobacterial and E. coli. This result is also in agreement with the observation that the binding sites of members of the CRP/FNR superfamily maintain a high level of conservation across difference lineages .
Summary of the genome-wide CRP binding site predictions.
No. of TU
% genes shared with E. coli K121
Score at p < 0.05
LOR at p < 0.05
No. of sites predicted at p < 0.05
Score at p < 0.01
LOR at p < 0.01
No. of sites predicted at p < 0.01
Although it has been estimated that CRP controls the expression of more than 200 TUs in E. coli , and the RegulonDB (release 5.8) contains 288 experimentally verified CRP binding sites, the number of predicted CRP binding sites at p-value < 0.01 in each cyanobacterial genomes is relatively small, ranging from 29 in BP-1 (containing 1075 putative TUs) to 249 in ATCC29413 (containing 3300 putative TUs), if one considers that the E. coli K12 genome encodes a total of 2070 putative TUs (predicted by the algorithm described in Methods), and that some of these cyanobacterial genomes contain much more TUs/genes (Table 2) than the E. coli K12 genome does. This result suggests that cyanobacterial CRPs might regulate fewer genes than the E. coli CRP does. We then ask whether the target genes of CRPs are conserved between E. coli and cyanobacteria, as well as among the 12 cyanobacterial genomes, given that the DBDs of CRPs as well as their binding sites are highly conserved.
The CRP regulons are not conserved between PCC6803, PCC7120 and E. coli K12.
E. coli K12
No. of genes
No. of TUs
No. of CRP-regulated genes
382 (p < 0.05),149 (p < 0.01)
969 (p < 0.05),442 (p < 0.01)
No. of CRP-regulated TUs
181 (p < 0.05),59 (p < 0.01)
537 (p < 0.05),249 (p < 0.01)
No. of CRP-regulated genes shared with E. coli K12
7 (p < 0.05), 2 (p < 0.01)
17 (p < 0.05),6 (p < 0.01)
Putative CRP-regulated genes involved in different biological processes.
Photosynthesis and carbon fixation
alr0523 alr0524 alr0525
all3335 all3334 all3333 all3332
alr2210 alr2211 alr2212 alr2213
alr2118 alr2119 asr2220
sll1577 sll1578 sll1579 sll1580 ssl3093
slr1838 slr1839 slr1838
slr1452 slr1453 slr1454 slr1455 slr1457 slr1453
tlr2000 tlr2001 tlr2002 tlr2003
tlr2000 tlr2001 tlr2002 tlr2003
Various numbers of genes involved in photosynthesis and carbon fixation were predicted to bear a CRP-regulated promoter in the 12 cyanobacterial genomes that encode crp genes. Specifically, a total of 14 (at p < 0.01) photosystem I and II reaction center genes were predicted to be regulated by CRP, including AM1_0526, asr0847, Ava_4451, Ava_0640, asr0847, CYA_0295, CYB_2824, PMT1665, P9303_22121, P9303_22711, Syncc9605_1640, sll0634, slr1739, and Tery_4669. Several carbon dioxide concentrating mechanism protein ccmK genes, including alr0317-0318 and slr1838-slr1839, were also predicted to bear a CRP binding site. Moreover, putative CRP-regulated promoters were found for the phycobiliprotein family light-harvesting genes, including alr0523-0525, sll1577-1580, ssl3093, slr1459 and tsr0033. In consistent with these results, it has been previously shown that cellular cAMP levels change significantly in response to environmental stimuli such as light-dark cycle . Thus cyanobacterial CRPs are likely to participate in photosynthesis pathways.
Three nitrogenase related proteins AM1_2462, Ava_4669 and alr0874 in MBIC11017, ATCC29413, and PCC7120, respectively, were predicted to be CRP-regulated. Furthermore, an operon encoding a nitrate transporter (all3332-3335) in PCC7120 was predicted to bear a CRP binding site. It has been previously reported that nitrogen starvation resulted in a 3–4-fold increase in intracellular cAMP level in Anabaena variabilis [42, 43]. Based on our prediction results, a possible scenario for a role of CRP in the signaling pathway of nitrogen assimilation could be as follows. Nitrogen starvation somehow increases the adenylyl cyclase activity, leading to an increase in the intracellular cAMP level. The activation of CRP by cAMP then lowers the transcription level of genes such as nitrate transporter (all3332-3335), while it enhances the expression of genes like alr0874 (nitrogenase reductase) and nitrogenase (AM1_2462, Ava_4669), as the cell switches to the more energy intensive nitrogen fixation of nitrogen gas. Nonetheless, an in-depth study is needed to elucidate the details of the role that CRP may play in nitrogen fixation in these cyanobacteria. Since not all cyanobacterial species are capable of nitrogen fixation, this role of CRP is unique to the cyanobacterial species capable of nitrogen fixation, such as MBIC11017, ATCC29413, and PCC7120.
A few genes coding for transporters and porins were predicted to be CRP-regulated, including several ion transporter in PCC7120 (alr2210-2213 and alr2118-2120) and PCC6803 (slr1392 and slr1950). Besides, several antiporters and ABC transporters were also predicted to be CRP-regulated in various genomes, e.g. CYA_2315, Ava_0687, sll0240, slr1452-1457, tll0559.
Dozens of genes coding for kinases and two-component signal transduction systems were predicted to be CRP-regulated, suggesting that CRP might play an important role in response to environmental changes in cyanobacteria. Interestingly, it has been reported that the CRPs in PCC6803 are involved in phototaxis as both sycrp1 and adenylyl cyclase mutants showed impaired phototaxis [9, 29]. However, the genes that are involved in signal transduction for pilus assembly and phototaxis, as listed in , were not predicted to be CRP-regulated by our algorithm, therefore, they might be regulated by CRP indirectly.
Among the top hits of our prediction results, a large portion of putative CRP-regulated genes are species or lineage specific. The functions of these genes vary from genome to genome, such as the type IV pilus synthesis in PCC6803 (slr1667-1668 and slr2015-2018) and MBIC11017 (AM1_3323-3324); various transposase in PCC7120 (all3624), BP-1 (tll2385) and IMS101 (Tery_0925); methyltransferase in MBIC11017 (AM1_5474); aldo/keto reductase in A/B-Prime (CYA_0976 and CYB_2928); nblA in BP-1 (tsr0033); TPR repeat containing protein in ATCC29413 (Ava_3483); peptidase in MIT9313 (PMT1940); nuclease in CC9311 (sync_1258), etc. However, the functions of many other species or lineage specific putative CRP-regulated genes are largely unknown, most of them are annotated as hypothetical proteins, such as slr0442, sll1268, ssr2848 and sll1924 in PCC6803, asr4669 in PCC7120, AM1_3950-3951, AM1_4103, AM1_4957 and AM1_2209-2210 in MBIC11017, CYA_0127/CYB_2776 in A/B-Prime, Tery_2530 and Tery_1044 in IMS101, Ava_3757 in ATCC29413, PMT1492, and PMT1223 in MIT9313, P9303_06191, P9303_04111 and P9303_12031 in MIT9303, sync_093, and sync_1261 in CC9311, and Syncc9605_0955 and Syncc9605_0452 in CC9605. It would be interesting to experimentally characterize the functions of these genes as well as the roles that CRP plays in their transcriptional regulation.
We have previously shown that when a genome lost a TF in the course of evolution, then it would rapidly lose its cognate biding sites in inter-TU regions [32, 33]. To extend this conclusion, we applied our CRP binding site prediction algorithm to the 17 cyanobacterial genomes that do not encode a CRP orthologue. Indeed, in four genomes, namely, CC9902, RCC307, WH8102 and WH7803 (Figures S2 in Additional file 3), the LOR function oscillates around zero as the score s increases, indicating that there is no significant difference between the signal of CRP binding sites in the inter-TU regions and that in the randomly selected coding regions (see Methods). These results strongly suggest that these four genomes are unlikely to contain functional CRP binding sites, which is in agreement with our previous observation [32, 33]. However, there is a clear CRP binding site signal in the rest of 13 genomes as indicated by their relatively high LOR when the score s is high, suggesting that there exist CRP-like binding sites in the inter-TU regions in these genomes. The reason for this unexpected observation is unknown, but one explanation would be that these sites are bound by a different regulator in these genomes. To identify possible TFs in these genomes that are likely bind these CRP-like binding sites, we analyze the distribution of the CRP/FNR superfamily in these genomes, and found that they all encode at least one member of the superfamily, which are likely to recognize these CRP-like binding sites, as it has been shown that members of the CRP/FRN superfamily recognize similar consensus sequence, and the binding specificity is achieved through competitive binding among the members.
Studies have shown that CRP in E. coli functions as an important global regulator controlling the expression of genes involved in many pathways such as the carbon and energy metabolism pathways. It was also reported that CRP regulates a variety of genes in other bacterial species. For instance, one recent study showed that CRP-like protein regulates genes involved in quorum sensing, motility and intestinal colonization in Vibrio cholerae . In this study, we suggest that CRP in cyanobacteria seems to have distinct functions. First, our results show that the members of the CRP regulons in cyanobacteria have little in common with those in E. coli (Table 3), which is consistent with the observation that genes whose expression is mostly affected by sycrp1 disruption in PCC6803 are involved in the type IV pilus synthesis [8, 25, 29]. In contrast, genes that are involved in carbon and energy metabolisms as seen in E. coli are not significantly affected by sycrp1 disruption [8, 25, 29]. Second, cyanobacterial CRPs seem to regulate distinct sets of genes specific to each lineage or strain as we can not clearly identify a particular set of genes common to most of the 12 cyanobacterial genomes that are under CRP control (Figure 3, Table S2 in Additional file 2). Third, more than half (17) of the 29 completely sequenced cyanobacterial genomes do not encode a CRP orthologue, suggesting that CRP is dispensable in these strains/species. For a closely related group of species, it is unlikely that an essential global regulator can be replaced or lost in some genomes while present in the others. Therefore, we conclude that CRP is not a global regulator with conserved functions; instead, it is likely a lineage-specific global regulator.
In fact, it has been shown that global regulators are not necessarily conserved in moderately related species. For instance, among the 7 and 6 global regulators in E. coli and B. subtilis, respectively, none of them is in common . Furthermore, it is not surprising that CRP in cyanobacteria does not function as a conserved global regulator as seen in E. coli, given that CRP is mainly involved in the transcriptional regulation of genes related to carbon metabolism in E. coli, while organic carbon assimilation is no longer a constraint for the growth of autotrophic cyanobacteria. On the other hand, since nitrogen assimilation is a constraint for cyanobacteria, the nitrogen assimilation regulator, NtcA, a member of the CRP/FNR family, which is unanimously encode in all the 29 sequenced genomes, has been characterized as one of the conserved global regulators in cyanobacteria. Thus, it seems that global regulators are often lineage-specific and that the environment plays a vital role in determining which TF functions as a global regulator, and which genes are regulated by the TF.
It has been reported that SyCRP1 regulates the cellular motility in PCC6803, as sycrp1 disruptants were devoid of mobility and showed reduced type IV pilus biogenesis . In another study, it was shown that the operons slr1667-slr1668 and slr2015-slr2018 in PCC6803, which are involved in type IV pilus biosynthesis [29, 45], were down-regulated in sycrp1 disruptants using microarray gene expression profiling . In consistent with these findings, we have predicted putative CRP binding sites for these genes. However, the slr1667-slr1668 genes were unique to PCC6803, and the orthologues of slr2015-slr2018 could only be found in MBIC11017 (Table S2 in Additional file 2) among the 29 cyanobacterial genomes. Thus, this function of CRP is likely to be restricted to these two species/strains only. In addition, based on our predictions of CRP regulons in the 12 cyanobacterial genomes, we argue that CRP might be also involved in other functions in different cyanobacterial lineage/strains, including carbon fixation, photosynthesis, nitrogen fixation, ion channels/transporters, and two-component signal transduction, etc. (Table 4, Table S3 - S14 in Additional file 2). Furthermore, as a large portion of our predicted CRP binding sites are associated with hypothetical proteins, CRP might be involved in the regulation of the other novel functions yet to be discovered. In this regard, we have provided a set of candidates for further experimental characterization. However, due to the lack of sufficient information about the functions of the CRP target genes, it is currently rather difficult to derive a general pathway model involving CRP regulons in cyanobacteria. Lastly, AnCrpA was shown to regulate the expression of genes related to nitrogen fixation in PCC7120 , including all1517 (nifB), all1439, all1432 (hesA), alr2515 (coax-II), and alr2834 (hepC). However, our algorithm failed to find high scoring CRP binding sites for all these genes, the possible reasons for this are addressed below.
Our failure to identify high scoring CRP binding sites in the upstream regions of these nitrogen fixation genes in PCC7120 is actually in agreement with the finding by Suzuki and coworkers who suggested that AnCrpA might have a different binding site pattern from the conventional pseudo-palindromic motif . However, an in vitro binding affinity test using EMSA showed that AnCrpA could bind to the conventional palindromic motif . Thus, a possible explanation of this inconsistency would be that AnCrpA in PCC7120 could form a monomer or a heterodimer in addition to a homodimer, given that two CRP homologues are encoded in this genome, and one of them does not have a DBD. Such a monomer or heterodimer might favor a non-canonical CRP binding site, while the homodimer remains its ability to bind to the conventional palindromic motif. Clearly, a more in-depth study on this topic is needed to verify this hypothesis. This explanation is in agreement with the results that our algorithm predicts many CRP binding sites for other genes in PCC7120 with statistical significance (Table S7 in Additional file 2).
Because the crp gene is widely distributed in many distantly related bacterial groups, including actinobacteria, aquificales, bacteroidetes, chlamydiae, chloroflexi, cyanobacteria, deinococci, firmicutes, fibrobacteres, planctomycetes, proteobacteria, spirochaetes, etc. (Figure S1 in Additional file 1), it was likely present in the common ancestor of the extant eubacterial lineages. Under this scenario, the crp gene in this ancestor was flexible enough to regulate different sets of genes. Alternatively, it is also possible that the crp gene originally evolved in a specific bacterial lineage and was subsequently spread to other groups via HGT. Such HGT events may benefit the recipient organism given the flexibility of CRP in regulating different biochemical activities, which are often coupled to environment stimuli leading to the generation of intracellular signaling molecule cAMP. Therefore, the crp gene in the ancestor of modern cyanobacteria was acquired by either vertical inheritance or an ancient HGT event from other lineages. Some cyanobacteria lost their crp genes since harboring the gene might not necessarily increase their fitness in their new environments. On the other hand, in other cyanobacteria, the crp genes were adapted to better meet their unique physiology and environmental requirements. In other groups of bacteria, crp evolved to regulate other lineage/species specific functions. For instance, it regulates the carbon and energy metabolisms in E. coli, cell-cell communication in Stenotrophomonas maltophilia , and quorum sensing in Vibrio cholerae , etc
Evolution of cis-regulatory binding sites is an interesting, but not well-studied problem. The 17 cyanobacterial genomes that do not encode a CRP orthologue provided us an excellent opportunity to examine the degradation of the CRP binding site in these genomes. It was expected that high scoring CRP binding sites do not present in those genomes, as previous studies have indicated that binding sites rapidly fade out when the corresponding TF was lost during the course of evolution [32, 47]. We do see such fading in the CC9902, RCC307, WH7803, and WH8102 genomes (Figure S2 in Additional file 3). These observations were in consistent with the well-accepted rule that "if no such a TF in a genome, then no corresponding binding sites in the genome". However, surprisingly, in the rest 13 genomes, high scoring CRP-like binding sites seem to appear in the inter-TU regions with a higher probability than in the randomly selected coding regions (Figure S2 in Additional file 3), suggesting that there exist sequence patterns similar to CRP binding sites in these cyanobacterial genomes. A possible explanation of this unexpected observation would be that there exist in these genome TFs that recognize binding sites similar to CRP binding sites. Indeed, at least one member of the CRP/FNR superfamily is encoded in these 13 genomes, and it has been shown that the binding sites of the members of this TF superfamily are well conserved across many species with a wide range of evolutionary distance . Thus, the CRP-like binding sites found in these genomes are likely to be recognized by these non-CRP TFs of the superfamily. The specificity of these similar binding sites is likely to be achieved through the competition among homologous TFs for the same binding site, which is governed by their thermodynamic equilibrium .
In this paper, we have predicted CRP binding sites in 12 cyanobacteria genomes that encode a CRP orthologue using a highly accurate motif scanning algorithm. Based on the analysis of these predictions as well as experimental data available to us, we conclude that 1) CRP has rather different functions in cyanobacteria than in E. coli; 2) cyanobacterial CRP also has a very diverse spectrum of functions in different lineages or species/stains, and is even dispensable in some species/strain; 3) CRPs in modern cyanobacteria are likely to be vertically inherited from their last common ancestor, and some cyanobacteria lost their crp genes during the course of evolution to adapt to their new environments; and 4) once the crp gene is lost, its binding sites degrade rapidly. Although many of our predictions still await experimental verification, we should have provided a high quality candidate set for further experimental characterization of the CRP binding sites and regulons in this important group of bacteria.
Open reading frame
bidirectional best hit
cAMP receptor protein
DNA binding domain
electrophoresis mobility shift assay
horizontal gene transfer
Acaryochloris marina MBIC11017
Anabaena variabilis ATCC 29413
Synechococcus sp. JA-3-3Ab
Synechococcus sp. JA-2-3B'a(2-13)
Gloeobacter violaceus PCC 7421
Nostoc sp. PCC 7120
Prochlorococcus marinus CCMP1375
Prochlorococcus marinus MED4
Prochlorococcus marinus AS9601
Prochlorococcus marinus MIT 9211
Prochlorococcus marinus MIT 9215
Prochlorococcus marinus MIT 9301
Prochlorococcus marinus MIT 9312
Prochlorococcus marinus MIT9303
Prochlorococcus marinus MIT9313
Prochlorococcus marinus MIT 9515
Prochlorococcus marinus NATL1A
Prochlorococcus marinus NATL2A
Synechococcus elongatus PCC 7942
Synechococcus elongatus PCC 6301
Synechococcus sp. RCC307
Synechococcus sp. WH 7803
Synechococcus sp. WH8102
Synechococcus sp. CC9605
Synechococcus sp. CC9902
Synechococcus sp. CC9311
Synechocystis sp. PCC 6803
Thermosynechococcus elongates BP-1
Trichodesmium erythraeum IMS101
This research was supported by a start-up fund from the University of North Carolina at Charlotte to Z.S. We would like to thank Drs. Anthony Fodor, Devaki Bhaya, and Jingling Huang for their critical reading of this manuscript and suggestions. We would also like to thank the two anonymous reviewers whose comments have greatly improved this paper.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.