Ancient conserved domains shared by animal soluble guanylyl cyclases and bacterial signaling proteins
© Iyer et al 2003
Received: 4 December 2002
Accepted: 3 February 2003
Published: 3 February 2003
Skip to main content
© Iyer et al 2003
Received: 4 December 2002
Accepted: 3 February 2003
Published: 3 February 2003
Soluble guanylyl cyclases (SGCs) are dimeric enzymes that transduce signals downstream of nitric oxide (NO) in animals. They sense NO by means of a heme moiety that is bound to their N-terminal extensions.
Using sequence profile searches we show that the N-terminal extensions of the SGCs contain two globular domains. The first of these, the HNOB (Heme NO Binding) domain, is a predominantly α-helical domain and binds heme via a covalent linkage to histidine. Versions lacking this conserved histidine and are likely to interact with heme non-covalently. We detected HNOB domains in several bacterial lineages, where they occur fused to methyl accepting domains of chemotaxis receptors or as standalone proteins. The standalone forms are encoded by predicted operons that also contain genes for two component signaling systems and GGDEF-type nucleotide cyclases. The second domain, the HNOB associated (HNOBA) domain occurs between the HNOB and the cyclase domains in the animal SGCs. The HNOBA domain is also detected in bacteria and is always encoded by a gene, which occurs in the neighborhood of a gene for a HNOB domain.
The HNOB domain is predicted to function as a heme-dependent sensor for gaseous ligands, and transduce diverse downstream signals, in both bacteria and animals. The HNOBA domain functionally interacts with the HNOB domain, and possibly binds a ligand, either in cooperation, or independently of the latter domain. Phyletic profiles and phylogenetic analysis suggest that the HNOB and HNOBA domains were acquired by the animal lineage via lateral transfer from a bacterial source.
Binding and recognizing diverse small molecules is a central aspect of signal transduction mechanisms in all cellular life forms. A variety of environmental small molecules, namely, nutrients, xenobiotics and first messengers are recognized by cells, with the help of specialized protein sensors on their surfaces [1, 2]. Additionally cells also use protein-bound small molecules, such as FAD, cinnamic acid, tetrapyrroles and heme as sensors of photons and the ambient redox states [3, 4]. Within cells, small molecules, such as cyclic nucleotides, are used as intracellular messengers to transduce signals arising from a variety of stimuli [1, 2]. Over the past few years a combination of protein sequence analysis and biochemical studies have revealed several unifying principles that govern the recognition of small molecules by cells [4–9]. A significant component of small molecule-protein interactions is mediated via a relatively small set of ancient conserved protein domains that bind their ligands using specialized pockets [4, 10]. Certain protein folds have given rise to several large superfamilies of ligand-binding domains. These include the PAS-like fold, which is the scaffold of the PAS [11, 12], GAF , and probably the CACHE  superfamilies, and the ACT-like fold, which is seen in ACT, ferredoxin and related ligand-binding domains [4, 15]. Identification of such conserved domains has, often, improved our understanding of the general structural and biochemical features that underlie protein-small molecule interactions.
Typically, these small molecule-binding domains (SMBDs) are combined, within the same polypeptide, to a number of conserved domains that are directly involved in signal transduction. These include enzymatic domains such as nucleotide cyclases, histidine and serine/threonine kinases, cNMP phosphodiesterases, receiver domains and several different kinds of ATPases [3, 4]. Non-catalytic domains, like DNA- or RNA-binding modules (eg. the helix-turn-helix domains), methyl-accepting domains, and FHA domains that bind phosphorylated peptides, are also often fused to SMBDs [3, 4].
The soluble guanylyl cyclases are important signaling molecules in animals that transduce signals mediated by the gaseous first messenger, nitric oxide [16–19]. In animals, NO functions as a neurotransmitter both in the central and peripheral nervous system. NO is released by a cell through the action of the nitric oxide synthase (NOS) on its substrate, the amino acid arginine . NO diffuses through the neighboring cells and elicits its functions by activating a soluble guanylyl cyclase (SGC) that synthesizes cGMP using GTP as a substrate [16, 18]. The animal SGCs are dimeric enzymes comprised of two homologous subunits, α and β. The sequence similarity between these subunits extends throughout their entire length, and they contain a long extension N-terminal to the cNMP generating catalytic domain. The N-terminal region of the β subunit forms a covalent link with heme via a histidine (H105 in human SGC1β) . Though the porphyrin is identical to that found in hemoglobin, it only binds NO and carbon monoxide, and not oxygen, even though O2 is present at much higher concentrations within the cell [22, 23]. This ligand specificity of the NO binding domain of animal SGCs is very similar to bacterial cytochrome c' that likewise binds only NO and CO . The N-terminal region of the α subunit has been shown to bind certain synthetic pyrazolopyridine ligands, which act as potents NO-independent agonists of cyclase activity [25, 26].
Using sensitive sequence profile analysis we show that the heme-binding domain of the animal SGCs defines a novel family of ligand binding domains that are also widely distributed in bacteria. The SGCs also contain a second globular domain situated in between the heme-binding and cyclase domains. We show that this domain is also present in the bacteria, and present evidence that these two domains are likely to function together both in eukaryotes and bacteria, and activate a diverse set of downstream signals.
Though both the α and β subunits of SGCs are homologous throughout their entire length, only the β subunit contains the critical heme-binding residue. As this long homologous N-terminal extension of SGC subunits does not map to any previously characterized domains, we sought to investigate its provenance through sequence analysis methods. The length and conservation pattern of the N-terminal domain suggested that the N-terminal extension of the SGCs is likely to comprise of two globular domains in the least. This conjecture was also supported by an analysis of sequence complexity of the polypeptide using the SEG program .
A PSI-BLAST search was initiated with the conserved N-terminal extension of the SGC (human SGC1β, gi: 4504215, region 1–360), using an inclusion threshold of.01, and compositional bias based statistics to eliminate false positives arising due to peculiarities of sequence composition. Both the N- and the C-terminal parts of this extension gave several distinct hits to different bacterial proteins, supporting the presence of two distinct globular domains in this extension. Based on these hits we divided the extension into N- and C-terminal parts and initiated separate PSI-BLAST searches with them. Searches with the N-terminal part of the extension gave significant hits to bacterial proteins of the length 180–195 residues within the first 3 iterations (eg. Mdge1313 from Microbulbifer degradans is detected with an expect-value (e) of 10-4 in the first iteration). This region of similarity encompasses the entire length of these bacterial proteins, and includes the heme-liganding histidine of the SGC β subunits, and is likely to define a distinct globular domain. In order to probe this domain further we isolated this region from SGC1β (1–185) and seeded a PSI-BLAST search that was run to convergence. At convergence, this search detected, in addition to the soluble guanylyl cyclases from the animals, several bacterial proteins, including certain methyl-accepting chemotaxis receptors from bacteria such as Desulfovibrio and Clostridium. Transitive searches initiated with the small bacterial proteins that entirely correspond to the N-terminal-most domain of the SGCs also recovered the same set of proteins as observed in the earlier search, with e <.01. These observations suggest that the N-terminal-most domain of the SGCs is evolutionarily mobile: it occurs in several distinct contexts with other domains in the same polypeptide, or in a stand-alone form.
PSI-BLAST searches initiated with the C-terminal part of the SGC-specific extension (the region in between the above-detected N-terminal-most domain, and the C-terminal cyclase domain; human SGC1β, region: 200–370) recovered homologous regions from all other animal soluble guanylyl cyclases, and, additionally, N-terminal regions of histidine kinases from Nostoc and Anabaena species (eg. gi: 17229771, e = 10-4 in iteration 1) and a diguanylate cyclase from Rhodobacter sphaeroides (gi: 22958462 e = 10-3 in iteration 3). The region of similarity shared by these bacterial proteins and animal SGCs encompassed more-or-less the entire middle region, which is present between the two other globular domains. Reciprocal searches with the corresponding region from the cyanobacterial histidine kinases recovered the animal SGCs, thereby confirming the evolutionary relationship between these regions. These observations suggest that the middle region of the animal SGCs defines a second evolutionarily mobile globular domain that is shared with several distinct bacterial proteins.
All versions of the HNOBA domain, except one from Rhodobacter sphaeroides, contain a conserved histidine between the core region and the long C-terminal α-helix that links it to a kinase or cyclase domain (Fig. 2). However, studies on the SGCs suggest no evidence for a second covalent heme-binding site, implying that this histidine could have some other function. Interestingly, binding studies on a synthetic pyrazolopyridine SGC agonist, BAY 41–2272 have suggested a role for the region 236–290 of the human SGC1 α-subunit in interacting with this ligand [25, 26]. Though actual cross-linking occurred with cysteine residues situated just N-terminal to the HNOBA domain, it is likely that the rest of domain is involved in contact with the ligand. This may suggest the possibility of the HNOBA domain binding some, as yet unknown, endogenous ligand. In such a scenario it could participate in allosteric regulation of the SGCs, similar to the GAF domains of the cyclic nucleotide phosphodiesterases [6, 13].
Genes, whose products closely interact with each other in macromolecular complexes or biochemical pathways, often cluster together in prokaryotic genomes. These co-functional gene clusters or operons often provide contextual information that may throw light on the functions of uncharacterized proteins [31, 32]. In order to obtain a better understanding of the HNOB domains, we analyzed the neighborhoods of the genes encoding them in bacteria. Genes encoding practically all the solo versions of the HNOB domain occur in the same predicted operon as a histidine kinase or a GGDEF domain (diguanylate cyclase) [33–35]. In some cases these operons also encode receiver domain proteins of the two component systems, PP2C phosphatases or cyclic diguanylate phosphodiesterases (Fig. 3). Interestingly, the histidine kinases from the two cyanobacteria, and the GGDEF domain protein from Rhodobacter, which co-occur with solo HNOB domain proteins contain HNOBA domains at their extreme N-terminus (Fig. 3). Thus, even in the bacteria, the HNOBA domain always appears to function in association with the HNOB domain. Given that these cyanobacterial HNOB proteins are also the closest bacterial relatives of the animal HNOB domains, it appears likely that the animal lineage acquired a related assemblage of HNOB and HNOBA domain as a single piece. Thus, the HNOB and HNOBA domains resemble functionally similar CACHE and CHASE domains, which have also been acquired by certain eukaryotic lineages via lateral transfer from bacteria [14, 36, 37]. A phylogenetic analysis of the guanylyl cyclase domains of the animal SGCs shows that their closest relatives are cyclases from various bacteria such cyanobacteria, and Leptospira (data not shown). Taken together, these observations support a potential bacterial origin for these components of the nitric oxide signaling pathway. This is also consistent with the presence of orthologs of the animal NO synthases in several bacteria [38, 39].
The co-occurrence of the HNOB and the HNOBA domains in either the same protein or proteins encoded by the same operon suggests a strong functional interaction between them. Studies on the human SGC1 suggest that the synthetic pyrazolopyridine ligand binds at site distinct from NO and heme, but requires the heme-binding site for its action [25, 26]. This could imply a synergy or cooperation between these domains in heme interation and cyclase activation. Majority of the bacterial HNOB domain proteins are predicted to covalently bind heme. Those versions that lack the heme-binding histidine might either non-covalently interact with heme, as in the case of the SGC α-subunits, or bind some other unknown ligand. They are likely to function as sensors of diffusible gaseous ligands that activate at least three distinct, downstream signaling pathways: namely phosphotransfer through the two-component systems, methyl ester-dependent chemotactic response and cyclic nucleotide signaling through the diguanylate cyclases/phosphodiesterases. However, further experimental studies would be required to determine if they function as NO or CO sensors, like the animal proteins, or as oxygen sensors.
Using sequence profile searches we identify two conserved domains in the N-terminal extensions of the soluble guanylyl cyclases from animals. One of these, the HNOB domain, contains the heme-binding site of the SGCs and defines a family of small-molecule binding domains exclusively occurring in animals and bacteria. In bacteria the HNOB domain occurs either as a standalone version or fused to the HAMP and methyl accepting domains of chemotaxis receptors. The bacterial solo versions co-occur in predicted operons encoding genes for two component systems and diguanylate cyclases or phosphodiesterases, and are likely to transmit signals to these downstream molecules. The second conserved domain of SGCs, the HNOBA domain is always associated with the HNOB domain either in the same polypeptide or in polypeptide encoded by neighboring genes in an operon. It is predicted to adopt an α + β fold that is possibly related to the PAS-like fold. The HNOB and HNOBA domains are likely to function synergistically or cooperatively. The latter may either cooperate in binding heme or might bind distinct ligands.
The phyletic pattern of these domains, and phylogenetic analysis of the HNOB domain suggest that the animal versions were probably acquired through lateral transfer from a bacterial source. Identification of the HNOB and HNOBA domains could help in further investigations of the SGCs that are critical components of the nitric oxide signaling pathway in animals. Furthermore, investigating their role in bacteria, such as Vibrio cholerae, could be important in understanding the sensory mechanisms of human pathogens.
The non-redundant (NR) database of protein sequences (National Center for Biotechnology Information, NIH, Bethesda) was searched using the BLASTP and PSI-BLAST programs . Profile searches using the PSI-BLAST program were conducted either with a single sequence or an alignment used as the query, with a profile inclusion expectation (E) value threshold of 0.01, and were iterated until convergence. Multiple alignments were constructed using the T_Coffee program , followed by manual correction based on the PSI-BLAST results. Protein secondary structure was predicted using a multiple alignment as the input for the JPRED and PHD programs [42–44]. Preliminary clustering of proteins was done using the BLASTCLUST program with empirically determined length and score threshold cut off values (For documentation see ftp://ftp.ncbi.nih.gov/blast/documents/README.bcl). Previously known, conserved domains were identified using PSI-BLAST derived profiles for them, with the RPS-BLAST program . Sequence-structure threading was performed using the 3DPSSM server http://www.sbg.bio.ic.ac.uk/~007E;3dpssm/. Gene neighborhoods were obtained by isolating all conserved genes, in the neighborhood of the gene under consideration, which showed a separation of less than 70 nucleotides between their termini. Genes fulfilling this criterion and occurring in the same direction were considered likely to form operons.
Phylogenetic analysis was performed using the neighbor joining or least square method followed by local rearrangements using the maximum likelihood algorithm to predict the most likely tree. The robustness of tree topology was assessed with 10,000 Resampling of Estimated Log Likelihoods (RELL) bootstrap replicates. The MOLPHY and Phylip software packages were used for phylogenetic analyses [46, 47].
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.