Evolution of hedgehog and hedgehog-related genes, their origin from Hog proteins in ancestral eukaryotes and discovery of a novel Hint motif
© Bürglin; licensee BioMed Central Ltd. 2008
Received: 16 March 2007
Accepted: 11 March 2008
Published: 11 March 2008
The Hedgehog (Hh) signaling pathway plays important roles in human and animal development as well as in carcinogenesis. Hh molecules have been found in both protostomes and deuterostomes, but curiously the nematode Caenorhabditis elegans lacks a bona-fide Hh. Instead a series of Hh-related proteins are found, which share the Hint/Hog domain with Hh, but have distinct N-termini.
We performed extensive genome searches of the cnidarian Nematostella vectensis and several nematodes to gain further insights into Hh evolution. We found six genes in N. vectensis with a relationship to Hh: two Hh genes, one gene with a Hh N-terminal domain fused to a Willebrand factor type A domain (VWA), and three genes containing Hint/Hog domains with distinct novel N-termini. In the nematode Brugia malayi we find the same types of hh-related genes as in C. elegans. In the more distantly related Enoplea nematodes Xiphinema and Trichinella spiralis we find a bona-fide Hh. In addition, T. spiralis also has a quahog gene like C. elegans, and there are several additional hh-related genes, some of which have secreted N-terminal domains of only 15 to 25 residues. Examination of other Hh pathway components revealed that T. spiralis - like C. elegans - lacks some of these components. Extending our search to all eukaryotes, we recovered genes containing a Hog domain similar to Hh from many different groups of protists. In addition, we identified a novel Hint gene family present in many eukaryote groups that encodes a VWA domain fused to a distinct Hint domain we call Vint. Further members of a poorly characterized Hint family were also retrieved from bacteria.
In Cnidaria and nematodes the evolution of hh genes occurred in parallel to the evolution of other genes that contain a Hog domain but have different N-termini. The fact that Hog genes comprising a secreted N-terminus and a Hog domain are also found in many protists suggests that this gene family must have arisen in very early eukaryotic evolution, and eventually gave rise to hh and hh-related genes in animals. The results indicate a hitherto unsuspected ability of Hog domain encoding genes to evolve new N-termini. In one instance in Cnidaria, the Hh N-terminal signaling domain is associated with a VWA domain and lacks a Hog domain, suggesting a modular mode of evolution also for the N-terminal domain. The Hog domain proteins, the inteins and VWA-Vint proteins represent three different families of Hint domain proteins that evolved in parallel in eukaryotes.
The Hedgehog (Hh) signaling pathway has been shown to be of fundamental importance for patterning and cell proliferation in animal development (for review see [1–4]). Mutations in this pathway cause congenital defects and several types of cancer such as basal cell carcinoma and medulloblastoma [5–8]. A key molecule of the pathway is Hh, a secreted ligand that can act as morphogen. Drosophila melanogaster has a single hedgehog (hh) gene, while mammalian genomes contain three paralogous genes, Sonic Hh (Shh), Desert Hh (Dhh), and Indian Hh (Ihh) . In zebrafish, five hh genes are present due to an extra round of genome duplication during evolution of ray-finned fish [10, 11]. The Hh protein is synthesized as a precursor composed of two domains, the N-terminal signaling domain and the C-terminal autoprocessing domain. A substantial part of the autoprocessing domain shares sequence similarity with self-splicing inteins and therefore this domain has been named Hint . C-terminal to the Hint domain is a sterol recognition region (SRR). A crucial function of the autoprocessing domain is to add a cholesterol moiety to the N-terminal signaling domain, which is required for the proper function of the N-terminal ligand [13–16]. In the nematode Caenorhabditis elegans no bona-fide hh is present, i.e. there is no gene that encodes both the N-terminal signalling domain as well as the C-terminal Hint domain. Instead ten genes encoding the C-terminal autoprocessing domain are found that, however, have N-terminal regions very distinct from Hh. Furthermore, a large number of additional genes are found that encode only these new N-terminal domains and lack the C-terminal autoprocessing domain. Overall, these genes can be grouped into four families that have been named quahog (qua), warthog (wrt), groundhog (grd) and ground-like (grl) and are collectively referred to as hh-related genes [17–19]. At present it is not clear, whether the C-terminal domains of the C. elegans Hh-related proteins can add a cholesterol moiety to the N-terminus analogous to Hh, since there are sequence differences in the SRR equivalent region. Therefore, this region of the Hh-related proteins was named ARR (adduct recognition region) ; here we refer to the combined Hint/SRR or Hint/ARR region as Hog domain for simplicity, as others have done as well .
The N-terminal domains of the C. elegans hh-related genes were not found in vertebrates and flies using blast searches, giving rise to the notion that these genes were perhaps derived from hh in early nematode evolution [17, 18]. Recently, a Hog domain containing protein, Hoglet, was discovered in the choanoflagellate Monosiga ovata, but its N-terminal region is distinct from Hh and other Hh-related proteins, instead sharing sequence similarity with cellulose-binding domains (CBD) . Choanoflagellates are unicellular protists most closely related to multicellular animals [23, 24] and therefore Hoglet might represent an ancestral precursor form of Hh. A Hh protein was also described from the cnidarian Nematostella vectensis [25, 26], indicating that Hh already existed before the rise of bilaterian animals. An EST with sequence similarity to Hh was also recovered from the sponge Oscarella carmela , indicating that the "Hedge" domain originated before the advent of Eumetazoa. In order to understand the origin and evolution of the C. elegans hh-related genes, we had already performed cursory searches of the genome of the parasitic nematode Brugia malayi and found that it also contains several hh-related genes [18, 17]. Here we performed comprehensive searches of the genomes of the cnidarian N. vectensis , the nematodes B. malayi and Trichinella spiralis as well as the NCBI protein, DNA and EST databases to find additional hh and hh-related genes that may shed light on the evolution of these genes. In these searches we found a previously described gene from the fungus Glomus mosseae that shares sequence similarity with Hh through the Hog domain , but has not been considered in recent evolutionary analyses . Furthermore, we found a number of additional genes with similarity to the Hog domain in Alveolata, moss, red algae, and other protists, indicating that the origin of the Hog domain occurred already in lower eukaryotes. As stated above, the Hog domain shares sequence similarity to self-splicing inteins, which have been found in Archaea, Bacteria, as well as fungi, algae and a few protists [30–32]. Recently, two other types of Hint related domains have been described, primarily from bacteria, that have been named bacterial intein-like proteins (BIL) type A and B [21, 33]. Several conserved sequence motifs within the Hint domain have been described for inteins that have been named motif A, B, E and F [34–37]. Our searches revealed also ORFs in Tetrahymena, fungi and several other protist branches that have similarity to the Hint domain via motifs A and B, but cannot be classified as inteins, Hog, or BIL domains.
Retrieval and analysis of sequences
hh-related genes in B. malayi and other Chromadorea
Number of hh and hh-related genes found in different species.
1 + 1?
A few ESTs were retrieved from other Chromadorea nematodes: In Meloidogyne incognita we found a gene with similarity to wrt-6 (Figure 2, 3, 4, 5, 6), and one grl gene, Msp3, which is expressed in the esophagal gland cells  (Additional file 8). In Parastrongyloides trichosuri a gene with similarity to wrt genes was found (Additional file 7).
hh and hh-related genes in Enoplea nematodes: Xiphinema sp. and Trichinella spiralis
Apart from the similarities in the regions N-terminal to the Hog domain indicated above, the remaining N-terminal sequences show no obvious similarities between each other or to any other proteins. Only the threonine-rich stretch is reminiscent of the 200 residue long threonine stretch in the N-terminal region of the choanoflagellate Hoglet protein . However, this may be a case of convergent evolution. No Wart, Ground, or Ground-like domains could be detected in the genome of T. spiralis or in EST database searches restricted to Enoplea.
Based on phylogenetic analyses of the Hog domains, XC Xhog1, Xhog2, Xhog3, and Ts Qua-1 form a clade with the Quahog proteins (Figure 5, 6, Additional files 5, 6, 7). While N-terminal sequences for XC Xhog1, 2 and 3 are lacking they could be bona-fide Quahog proteins. A second, distinct clade is formed by Ts Xhog1, Xhog2, and XC Shog1, Xhog4, Xhog5 and Thog, indicating that they are derived from a common ancestor (Figure 5, 6, Additional files 5, 6, 7). In three cases, Ts Xhog1, Xhog2 and XC Xhog5), a common upstream sequence (Enop) has been identified (Figure 10), which seems to be specific to Enoplea nematodes, suggesting that at least in the cases of XC Shog1 and Thog the N-terminal regions have diverged relatively recently.
Almost all nematode Hh-related proteins form a distinct clade, the only exception being the Hh proteins, and Ts Xhog3 and XC S2hog, which are both very divergent and do not fall into the Hh clade of genes either (Figure 5, 6, Additional files 5, 6, 7). Two features distinguish the Hog domains of the nematode Hh-related proteins from those of the Hh proteins (Figure 1, 2, 3, 4, Additional files 1, 2, 3). 1) The regions corresponding to motifs K and L have characteristic differences in their conserved residues in nematode Hh-related proteins. 2) Two conserved cysteine residues are found in the central region of the Hog domain. When these two residues are mapped onto the X-ray structure of the C-terminal autoprocessing domain of Drosophila Hh , it emerges that they lie adjacent to each other and therefore could form a disulfide bond. This feature might stabilize this type of Hog domain in an extracellular environment, and this extra stability might possibly provide some new functionality. It is however not unique to nematode Hog domains. Zebrafish ihha and ihhb and fugu dhh (fhh) also have this extra cysteine pair, which must represent convergent evolution. It is worth pointing out that Ts Hh lacks the two cysteine residues and has motifs K and L as expected from a bona-fide Hh molecule. However, the quite divergent Ts Xhog3 protein, which lacks a Hedge domain, also lacks the cysteine residues and has motifs K and L.
hh and hh-related genes in Cnidaria
Last but not least, Nv 200640 is predicted to be 3592 amino acids long and is highly unusual. It is similar to the Hh proteins through the N-terminal Hedge domain (blast expected probability: 1e-18 to Ciona Hh), but no Hog domain follows (Additional files 10, 13). The Hedge domain is encoded by two exons, and after an intron of 600 bp many additional exons continue the ORF of the JGI prediction, but nowhere in this genomic region resides a Hog domain. Analysis of the ORF using the SMART server revealed that these extra exons encode multiple motifs with significant sequence similarity to other proteins (Additional file 13). The first motif, encoded by exons 3 and 4, contains a von Willebrand factor (vWF) type A domain (VWA). For example, the VWA domain of chicken collagen, type XIV, alpha 1 (undulin) is retrieved with a blastp probability of 8e-28. After the VWA domain, 21 CA (Cadherin repeat) domains follow, they occur as repeats in extracellular regions and are thought to mediate cell-cell contact when bound to calcium. Further follow two Immunoglobulin C-2 Type domains, two EGF repeats, a transmembrane region, and finally an SH2 domain.
The phylogenetic analysis of the cnidarian Hog domains reveals that they cluster primarily with the Hh Hog domains (Figure 5, 6, Additional files 5, 6, 7), albeit mostly with insignificant bootstrap values. The Hog domain of Nv 241466 Hh has the best similarity to the Hh Hog domains, and clusters with the deuterostome Hh proteins. Nv 140260 and Nv 239508 are most similar to each other, suggesting a likely duplication event within the cnidarian lineage. Nv 120428 and Acm DY579185 may also be related to these two proteins via their Hog domain (Figure 5, 6, Additional files 5, 6, 7), but the bootstrap values are not significant. The Nv 95413 Hh protein is rather divergent, and the Hydra sequence Hm CO905822 is also very divergent and does not from a clade with any of the N. vectensis sequences. Therefore, it is not possible to determine, whether all the cnidarian Hog genes originated all from a single ancestral gene in the cnidarian lineage, or whether hh and other Hog genes were already present before the split of Cnidaria and Bilateria. The Hedge domains of the three N. vectensis ORFs are more divergent than the bilaterian Hedge domains (Figure 9, Additional file 10). The Hedge domain of Nv 241466 Hh is most similar to bilaterian Hh proteins, with a best blast probability of 2e-52 to a fish Hedge domain. Nv 95413 Hh is more divergent, with a best blast probability of 5e-36, and Nv 200640 is the most divergent Hedge domain, with a probability of 1e-18 to a Ciona Hh.
Hog genes in lower eukaryotes
In order to detect Hh sequences from lower eukaryotes, tblastn searches were performed using the organism restriction "eukaryotes NOT bilateria". This recovered a number of genomic and EST matches from lower animals, fungi, plants and protists (Figure 2, 3, 4, Additonal files 9, 14). One EST was recovered from the sponge Oscarella carmela, which was previously described . Analysis of this sequence shows that, while it does have a Hedge domain, the downstream sequence does not contain the start of a Hog domain in any frame (Additional file 10). No sequence similarity to a VWA domain is detected in that fragment either. Nevertheless, it indicates that as in the case of Nv 200640, this gene may not contain a Hog domain.
A match was detected to the gene GmGIN1 from the fungus Glomus mosseae, which belongs to the Glomeromycota, a sister group of ascomycetes and basidiomycetes and had already been described as having similarity to Hh . The Hog domain has a blast probability of 7e-18 to the best matching Hh Hog domain, which is much better than the blast probability of choanoflagellate Hoglet to the best matching Hh Hog domains (4e-10). Furthermore, good matches to motifs J and K, as well as a region with similarity to motif L. Therefore, GmGIN1 contains a bona-fide Hog domain (Figure 2, 3, 4, Additional file 14). The upstream domain of GmGIN1 shares similarity with Ras-like GTPases, e.g. the Arabidopsis protein AIG1 (avrRpt2-induced gene 1) and the animal The IAN/IMAP subfamily . However, this ORF lacks a signal peptide and may therefore not be secreted.
Overall, these results show that Hog domains occur in many different branches of the major groups of eukaryotes. However, multiple losses seem to have occurred, since in many branches we did not detect Hog domains, for example, in Arabidopsis thaliana and other higher plants, or in the currently sequenced ascomycetes and basidiomycetes, or in other sequenced organisms such as Dictyostelium.
Other genes of the Hh pathway in Enoplea and N. vectensis
Components of the Hh signaling pathway in N. vectensis and Xiphinema sp. The absence of a gene does not mean it is not present, it just may not have been sequenced yet. Numbers indicate the protein prediction in JGI (Nv) or the accession number (XC). For more information on pathway components and C. elegans genes see . Best blast scores are given for the Nv predictions in parenthesis.
2), 88278 (e-100)
yes (2 copies)
1), 84424 (0.0)
2), 208236 (e-123), 92220 (4e-84)
79512 (e-135) #
cubitus interruptus (Ci/Gli)
2), 116463 (3e-85)
In N. vectensis we found good orthologues for dispatched, dally-like, patched, smoothened, fused, sufu and Ci (Table 2). No obvious homolog was found for Ihog. In the case of Drosophila costa, good matches to its human homologs were found, and Drosophila costa is rather divergent. Recently it has been shown that the mammalian homologues of fused and costa do not play the same key role in the pathway as in flies, instead sufu plays a major role [42, 43]. Overall, it looks like most of the key players of the Hh pathway are present in N. vectensis so that it is clear that the pathway was already well established before the split of Cnidaria and Bilateria.
Genes with novel Hint-like (Vint) domains
The VWA-Hint proteins do not seem to have a signal peptide for secretion. The VWA domain is located at the N-terminus of the proteins, although in four cases a Ubox precedes the VWA domain (Figure 17, 18, 19). A region of around 300 residues separates the VWA domain from the Hint domain. This region has several small patches of conservation and one large region, that we propose to call Vwaint domain. At the C-terminus a Hint-like domain follows, which is of similar size as a Hog domain. However, the best conserved features are only motifs A and B, i.e. the N-terminal region of the Hint-like domain. One region shares a little similarity with motif F of inteins and BIL-Bs, but motifs J, K and L are lacking (Figure 17, 18, 19). The Hint-like domain is also rather different from inteins or Hog domains, the best blast matches of aTt 00471620 are to honeybee Hh with a probability 0.013. Therefore, these sequences cannot be classified as intein, Hog or Bil domains, and we refer to these genes as Vint genes. Vint genes are apparently so wide spread in eukaryotes that we have to assume that a common ancestor was present in early eukaryotes. However, Vint genes seem to be lacking in Arabidopsis, many fungi (for example, Saccharomyces cerevisiae), and in Metazoa. Multiple independent losses in different lineages seem the most likely explanation for this absence.
Our searches also revealed a group of proteins from bacteria that had a Hint-like domain at their C-terminus and shared some weak sequence similarity in their N-terminal region (Additional files 16, 17). At least some of these proteins are predicated to have signal peptides for secretion, and the upstream region has two cysteine residues conserved between all sequences. The Hint-like domains of these bacterial proteins are also quite divergent from inteins, Hog and BIL domains, and represent yet another subgroup. This subgroup has previously also been detected by Dassa and Pietrokovski . The new members we retrieved here support the notion that this is yet another new type of Hint protein.
Hh and hh-related proteins in nematodes
In the more distantly related Enoplea nematodes Xiphinema sp. and T. spiralis a strikingly different pictures emerges. In both species we find both a hh gene as well as several hh-related genes. In T. spiralis we also find a quahog gene, and – based on the phylogenetic analyses – some of the Xiphinema genes could also be quahog genes. Two T. spiralis and one Xiphinema protein share a new motif (Enop motif) upstream of the Hog domain that appears to be specific to Enoplea nematodes. However, there are also a number of instances of N-terminal sequences that are very short. Several of these proteins cluster with the "Enop" proteins in the phylogenetic analyses, suggesting that they diverged from a common ancestor. However, two proteins with short N-terminal regions (Ts Xhog3 and XC Shog2) are rather divergent and do not reliably fall within the clade of nematode-specific Hog proteins ("Nema-Hog" proteins) in phylogenetic analyses. In particular Ts Xhog3 lacks the conserved cysteine pair usually found in Nema-Hog domains, and it shares motifs K and L with Hh Hog domains, indicating it could be derived from a Hh protein. Therefore, while these genes could have diverged from hh or Nema-Hog genes, it may also be possible that the represent ancestral genes that were lost in Chromadorea. In conclusion, we think that there were probably at least three different types of Hog genes in the common ancestor of Enoplea and Chromadorea, one hh gene, one quahog gene and one gene which give rise to the wrt/grd branch in Chromadorea and the Ts Xhog1/2 branch in Enoplea. But possibly up to five Hog genes could have existed in the common ancestor. The proliferation into further distinct groups such as wrt, ground and ground-like appears to have happened later in a branch specific manner.
Many different N-termini now exist in Nema-Hog proteins. Two possible mechanisms can explain this diversity: Either acquisition of new N-terminal domains, or divergent evolution of existing N-terminal domains. A relatively good case can be made that all Wart, Ground and Ground-like domains arose from a single common ancestor based on weak sequence similarities between the motifs . This relationship is also supported by the phylogenetic analyses of the Hog domains. Therefore, multiple loss of the Hog domain must have occurred secondarily within the wrt and ground families. The presence of the rather short N-termini in Enoplea suggests that these regions have evolved and diverged through mutations, rather than by acquisition of a new domain. The threonine-rich stretch in XC Thog is very likely the result of polymerase slippage, though it is striking that this feature has evolved separately also in the choanoflagellate Hoglet protein . It is also worth mentioning that some of the Caenorhabditis N-terminal domains have repetitive regions outside of the conserved Ground and Ground-like domains, mainly proline, glycine and serine. For example, Ce grl-23 has a 176 residue long stretch upstream of the Ground-like domain containing 125 glycine residues. In conclusion, most of the observed variability in the N-terminal domains of nematode Hh-related proteins is probably the result of sequence divergence from a progenitor, rather than acquisition of new domains. Loss of N-terminal domains in the case of C. elegans Hog-1, as well as loss of Hog domains did occur however.
A surprising observation is the fact that T. spiralis has a hh gene, but apparently lacks several components of the Hh pathway, such as Smoothened. Particularly noteworthy is that the components that appear to be missing are the same as in C. elegans. This would suggest that the signaling pathway was modified by loss already before the split of Enoplea and Chromadorea, even though hh was maintained in Enoplea. While one could imagine that Hh could be maintained in an animal parasite such as T. spiralis to affect host cells, this is very unlikely in the case of the plant nematode Xiphinema. It implies that Hh has an important function even in the absence of Smoothened, and it refutes the hypothesis that the Nema-Hog genes evolved directly from hh concomitantly with the other changes in the Hh pathway.
Hog proteins in Cnidaria
In Cnidaria we also encounter a complex situation with both Hh and Hh-related proteins. Both in N. vectensis and A. millepora we find bona-fide hh genes that have a Hedge and a Hog domain. Another gene is well conserved between N. vectensis and A. millepora and has a distinct, novel secreted N-terminal domain. Two further Hh-related proteins in N. vectensis have yet other, distinct N-termini. The upstream region of the two closely related genes retrieved from Hydra do not share any similarity with those in Nematostella, indicating divergent evolution. No sequence similarity of these new N-terminal motifs has been found outside Cnidaria.
The hh-related genes from Cnidaria are however distinct from those in nematodes, since the phylogenetic analyses of the Hog domains does not show them to be closely related. Therefore, we would like to suggest that – as in the case of the nematode hh-related genes – the Cnidarian N-terminal domains have evolved from common ancestors by divergent evolution rather than by domain acquisition.
The case of Nv 200640 is perhaps a special exception. In this protein we find an N-terminal Hedge domain fused to a large extracellular protein that contains a VWA domain as well as CA and EGF repeats, but it clearly lacks a Hog domain. The VWA domain is a 200 residue long domain first identified in von Willebrand Factor [44, 45]. VWA domains are found both in extracellular and intracellular proteins, such as non-fibrillar collagens, plasma proteins such as complement factors and integrins, and they mediate adhesion via metal ion-dependent adhesion sites. Likewise, the CA repeats also mediate adhesion in a Ca2+ dependent fashion. Therefore, the Nv 200640 protein is probably involved in cell adhesion. This shows that the Hedge domain can also evolve in a modular fashion and separate from the Hog domain. The EST recovered from sponges also has a Hedge domain that lacks the immediately following Hog domain, and may perhaps represent also a protein lacking a Hog domain.
Hog proteins in lower eukaryotes
We have recovered a substantial number of Hog domain proteins from many diverse groups of eukaryotes, mostly protists, such as red algae, moss, alveolates (ciliates, dinoflagellates, apicomplexans), cryptophytes, jakobids, haptophytes, cercozoa and Glomeromycota fungi. While some of these Hog sequences are quite divergent, they are invariably most closely related to Hog domain proteins from animals, and not to inteins, such as those found in fungi, or to BIL or Vint domains. Given the widespread occurrence in many of the major groups of eukaryotes (, we must conclude that Hog proteins were present already in the earliest eukaryotes. We find diverse N-termini associated with the Hog domain that are only conserved to limited extends within groups (case in point are the various conserved N-termini in nematodes). Many of these limited conserved N-termini have conserved cysteine residues, and in cases, where one can be quite confident of the start methionine, they start with a good signal peptide for secretion. Only in the case of the fungal protein GmGIN1 and the choanoflagellate Hoglet are distinct other N-terminal domains fused to the Hog domain. Therefore, we postulate that an ancestral Ur-Hog gene existed, with a secreted N-terminal domain and an autoprocessing Hog domain, that may have added a sterol or similar moiety to its secreted N-terminus. This gene evolved in concert with eukaryote evolution and was lost in several branches. In animals, the question arises about the origin of the Hedge domain. Both in sponge and in Nematostella we find a Hedge gene that lacks a Hog domain. Perhaps such a gene merged with a Hog domain in early metazoans. However, the reverse process is also possible: the Hedge domain evolved as an N-terminal variant of a Hog protein in early metazoans, and in the two Hedge genes in sponge and Nematostella the Hog domain was lost later. Both in Cnidaria and nematodes we find both hh and hh-related genes. Did the hh-related genes evolve twice independently from a hh precursor in each lineage? This is certainly the most parsimonious hypothesis. Nonetheless, in an alternative scenario, a hh and a hh-related gene could have been present in the common ancestor of eumetazoa, and the hh-related gene would have given rise to the cnidarian and nematode hh-related genes. For this hypothesis to be true, we would have to postulate three separate losses of hh-related genes: in deuterostomes, in lophotrochozoa, and in arthropods. While this seems rather unlikely, we do observe many losses of Hog genes in various branches of eukaryotes, as well as loss of the Hog domain only in a number of nematode genes so that such a series of losses may not be totally impossible.
Novel Hint genes
Our searches revealed new genes with Hint motifs merged to VWA domains. Given that a Hedge domain was found fused to a VWA domain in Nematostella we investigated this further and recovered a novel gene family. The well-conserved gene structure consists of a VWA followed by a new domain, termed Vwaint, followed by the "Vint"-type Hint domain. Unlike the Hog proteins, these proteins are most likely not secreted and instead are processed inside the cell. The Vint genes are present in many eukaryotic groups, but must have been lost multiple times, in particular in multicellular eukaryotes. Multiple loss seems to be a common theme also in Hog proteins and especially inteins [21, 32]. Inteins may be subject to special selective pressure for loss [21, 32], and this pressure may also extend to Hog and Vint proteins. However, gene loss is not uncommon. The N. vectensis genome contains a remarkable complexity of highly conserved gene families , and several instances of later gene loss in the protostome or deuterostome lineage, for example in the homeobox gene family, have been found [47, 48], indicating gene loss later in evolution is feasible.
We find that the evolution of Hh is more complex than anticipated, and that this gene family is not simply derived from an intein in early metazoan evolution. Both in Cnidaria and nematodes parallel evolution between hh and hh-related genes occurred. Given that the nematode-specific Hog domain (Nema-Hog) with its distinct features was already present in the progenitor of two very different nematode branches it may be possible that both Hh and some other Hog domain protein was already present in protostomes before the emergence of nematodes and was lost in other lineages such as arthropods. The finding of multiple Hog domain proteins in Cnidaria raises the possibility that multiple distinct types of Hog domain proteins also existed in ancestral Eumetazoa. Snell et al. (2006) suggested that a precursor of a Hedge domain fused to a Hog domain in early Metazoan evolution. However, our discovery that an Ur-Hog gene probably existed in the progenitor of eukaryotes makes if feasible that Hh evolved from an ancestral Hog gene without domain shuffling. In eukaryotes, we now know that at least three different types of Hint domains evolved in parallel: Hog, Vint, and inteins. At present we do not know the origin of the Hog and Vint domains, but perhaps new Hint domains from bacteria, such as described here and by Dassa and Pietrokovski  will shed light on that issue in the future.
Procedures for retrieving and analyzing sequences have been detailed in Hao et al. 2006 and Mukherjee and Bürglin 2007 [38, 48]. Briefly, B. malayi sequences were searched at TIGR . Preliminary sequence data for B. malayi is deposited regularly into the GSS division of GenBank. This sequencing effort is part of the International Brugia Genome Sequencing Project and is supported by an award from the National Institute of Allergy and Infectious Diseases, National Institutes of Health. ESTs, in particular nematode ESTs, were searched at NCBI . The nematode ESTs are generated by the Washington University Parasitic Nematode EST sequencing project . Many of the protist ESTs were generated by the Protist EST program . The T. spiralis genome was searched using the GSC blast server at The Genome Sequencing Center of the Washington University School of Medicine . N. vectensis sequences were searched at Stellabase [54, 28], and at the DOE Joint Genome Institute (JGI) . Additional genome sequences such as for Naegleria gruberi, Physcomitrella patens and Monosiga brevicollis were searched at the JGI . Zebrafish sequences were retrieved from ZFIN [56, 57]. The intein database was checked at New England Biolabs InBase [30, 58]. Manual sequence corrections were performed with the help of FGENESH and FGENESH+ at Softberry  and PPCMatrix . ESTs representing the same locus were assembled using the CAP3 server at Iowa State University .
Species abbreviations. Fungi are prefixed with 'f', red algae with 'r', plants with 'p", Alveolata (ciliates, dinoflagellates, Apicomplexa) with 'a', jakobids with 'j', Cercozoa with 'c', Cryptophyta with 'cr', excavates with 'e', haptophytes with 'h', heterolobosea with 'l', and slime molds with 's'.
Acropora millepora (Cnidaria)
Anopheles gambiae (malaria mosquito)
Achaearanea tepidariorum (common house spider)
Branchiostoma floridae (Florida lancelet, Amphioxus)
Brugia malayi (nematode, Chromadorea)
Capitella sp. I ECS-2004 (polychaete)
Caenorhabditis briggsae (nematode, Chromadorea)
Caenorhabditis elegans (nematode, Chromadorea)
Caenorhabditis remanei (nematode, Chromadorea)
Drosophila melanogaster (fruitfly)
Danio rerio (zebrafish)
Gryllus bimaculatus (two-spotted cricket)
Lytechinus variegatus (green sea urchin)
Hydra magnipapillata (Cnidaria)
Monosiga brevicollis (choanoflagellate)
Meloidogyne incognita (southern root-knot nematode, Chromadorea)
Mus musculus (mouse)
Monosiga ovata (choanoflagellate)
Nematostella vectensis (Cnidaria, starlet sea anemone)
Octopus bimaculoides (mollusc)
Oscarella carmela (sponge)
Parastrongyloides trichosuri (nematode, Chromadorea)
Patella vulgata (common limpet, mollusc)
Strongylocentrotus purpuratus (sea urchin)
Takifugu rubripes (fugu)
Trichinella spiralis (nematode, Enoplea)
Xiphinema index CSEQDL01 (nematode, Enoplea)
Amphidinium carterae (dinoflagellate, Alveolata)
Alexandrium tamarense (dinoflagellate, Alveolata)
Cryptosporidium muris (Apicomplexa, Alveolata)
Cryptosporidium parvum (Apicomplexa, Alveolata)
Karenia brevis (dinoflagellate, Alveolata)
Karlodinium micrum (dinoflagellate, Alveolata)
Tetrahymena thermophila (ciliate, Alveolata)
Bigelowiella natans (Cercozoa)
Guillardia theta (Cryptophyta)
Tritrichomonas foetus (Parabasalidea, excavates)
Ajellomyces capsulatus (ascomycetes, fungus)
Chaetomium globosum (ascomycetes, fungus)
Candida tropicalis (ascomycetes, fungus)
Glomus mosseae (Glomeromycota, fungus)
Gibberella zeae (ascomycetes, fungus)
Magnaporthe grisea (ascomycetes, rice blast fungus)
Neurospora crassa (ascomycetes, fungus)
Pleurochrysis haptonemofera (haptophytes)
Jakoba libera (jakobids)
Naegleria gruberi (heterolobosea)
Arabidopsis thaliana (plants)
Oryza sativa (rice, plants)
Physcomitrella patens (moss, plants)
Selaginella lepidophylla (club moss, plants)
Selaginella moellendorffii (club moss, plants)
Chondrus crispus (carragheen, red algae)
Gracilaria changii (red algae)
Griffithsia japonica (red algae)
Porphyra haitanensis (red algae)
Porphyra yezoensis (red algae)
Physarum polycephalum (slime mold, amoebozoa)
I would like to thank Shmuel Pietrokovski for helpful discussions and for sharing information. Preliminary sequence data for B. malayi is deposited regularly into the GSS division of GenBank. The Sequencing effort is part of the International Brugia Genome Sequencing Project and is supported by an award from the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The Nematostella sequence data as well as other genome data such as Naegleria gruberi, Physcomitrella patens, and Monosiga brevicollis were produced by the US Department of Energy Joint Genome Institute . The Trichinella data were produced by the Genome Sequencing Center at Washington University School of Medicine in St. Louis and can be obtained from their Web site . This research was supported by grants from the Swedish Foundation for Strategic Research (SSF) and the Karolinska Institutet.
- Cohen MM: The hedgehog signaling network. Am J Med Genet A. 2003, 123 (1): 5-28. 10.1002/ajmg.a.20495.View ArticleGoogle Scholar
- Hooper JE, Scott MP: Communicating with Hedgehogs. Nature reviews. 2005, 6 (4): 306-317. 10.1038/nrm1622.PubMedView ArticleGoogle Scholar
- Huangfu D, Anderson KV: Signaling from Smo to Ci/Gli: conservation and divergence of Hedgehog pathways from Drosophila to vertebrates. Development (Cambridge, England). 2006, 133 (1): 3-14.View ArticleGoogle Scholar
- Wang Y, McMahon AP, Allen BL: Shifting paradigms in Hedgehog signaling. Current opinion in cell biology. 2007, 19 (2): 159-165. 10.1016/j.ceb.2007.02.005.PubMedView ArticleGoogle Scholar
- Beachy PA, Karhadkar SS, Berman DM: Tissue repair and stem cell renewal in carcinogenesis. Nature. 2004, 432 (7015): 324-331. 10.1038/nature03100.PubMedView ArticleGoogle Scholar
- Briscoe J, Thérond P: Hedgehog Signaling: From the Drosphila cuticle to anti-cancer drugs. Dev Cell. 2005, 8: 143-151. 10.1016/j.devcel.2005.01.008.PubMedView ArticleGoogle Scholar
- McMahon AP, Ingham PW, Tabin CJ: Developmental roles and clinical significance of hedgehog signaling. Current topics in developmental biology. 2003, 53: 1-114.PubMedView ArticleGoogle Scholar
- Rubin LL, de Sauvage FJ: Targeting the Hedgehog pathway in cancer. Nature reviews. 2006, 5 (12): 1026-1033. 10.1038/nrd2086.PubMedGoogle Scholar
- Zardoya R, Abouheif E, Meyer A: Evolution and orthology of hedgehog genes. Trends Genet. 1996, 12 (12): 496-497. 10.1016/S0168-9525(96)20014-9.PubMedView ArticleGoogle Scholar
- Avaron F, Hoffman L, Guay D, Akimenko MA: Characterization of two new zebrafish members of the hedgehog family: atypical expression of a zebrafish indian hedgehog gene in skeletal elements of both endochondral and dermal origins. Dev Dyn. 2006, 235 (2): 478-489. 10.1002/dvdy.20619.PubMedView ArticleGoogle Scholar
- Meyer A, Schartl M: Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Current opinion in cell biology. 1999, 11 (6): 699-704. 10.1016/S0955-0674(99)00039-3.PubMedView ArticleGoogle Scholar
- Hall TMT, Porter JA, Young KE, Koonin EV, Beachy PA, Leahy DJ: Crystal structure of a Hedgehog autoprocessing domain: Homology between Hedgehog and self-splicing proteins. Cell. 1997, 91: 85-97. 10.1016/S0092-8674(01)80011-8.PubMedView ArticleGoogle Scholar
- Porter JA, Young KE, Beachy PA: Cholesterol modification of Hedgehog signaling proteins in animal development. Science. 1996, 274: 255-259. 10.1126/science.274.5285.255.PubMedView ArticleGoogle Scholar
- Bijlsma MF, Spek CA, Peppelenbosch MP: Hedgehog: an unusual signal transducer. Bioessays. 2004, 26 (4): 387-394. 10.1002/bies.20007.PubMedView ArticleGoogle Scholar
- Gallet A, Rodriguez R, Ruel L, Therond PP: Cholesterol modification of hedgehog is required for trafficking and movement, revealing an asymmetric cellular response to hedgehog. Dev Cell. 2003, 4 (2): 191-204. 10.1016/S1534-5807(03)00031-5.PubMedView ArticleGoogle Scholar
- Gallet A, Ruel L, Staccini-Lavenant L, Therond PP: Cholesterol modification is necessary for controlled planar long-range activity of Hedgehog in Drosophila epithelia. Development (Cambridge, England). 2006, 133 (3): 407-418.View ArticleGoogle Scholar
- Aspöck G, Kagoshima H, Niklaus G, Bürglin TR: Caenorhabditis elegans has scores of hedgehog-related genes: sequence and expression analysis. Genome Res. 1999, 9 (10): 909-923. 10.1101/gr.9.10.909.PubMedView ArticleGoogle Scholar
- Bürglin TR, Kuwabara PE: Homologs of the Hh signalling network in C. elegans. WormBook. 2006, 1-14. 10.1895/wormbook.1.76.1.Google Scholar
- Hao L, Mukherjee K, Liegeois S, Baillie D, Labouesse M, Bürglin TR: The hedgehog-related gene qua-1 is required for molting in Caenorhabditis elegans. Dev Dyn. 2006, 235 (6): 1469-1481. 10.1002/dvdy.20721.PubMedView ArticleGoogle Scholar
- Beachy PA, Cooper MK, Young KE, von Kessler DP, Park WJ, Hall TMT, Leahy DJ, Porter JA: Multiple roles of cholesterol in hedgehog protein biogenesis and signaling. Cold Spring Harb Symp Quant Biol. 1997, 62: 191-204.PubMedView ArticleGoogle Scholar
- Dassa B, Pietrokovski S: Origin and evolution of inteins and other Hint domains. In Homing Endonucleases and Inteins. Edited by: Belfort M, Stoddard BL, Wood DW, Derbyshire V. , Springer; 2005.Google Scholar
- Snell EA, Brooke NM, Taylor WR, Casane D, Philippe H, Holland PW: An unusual choanoflagellate protein released by Hedgehog autocatalytic processing. Proc Biol Sci. 2006, 273 (1585): 401-407. 10.1098/rspb.2005.3263.PubMedPubMed CentralView ArticleGoogle Scholar
- Philippe H, Snell EA, Bapteste E, Lopez P, Holland PW, Casane D: Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol Biol Evol. 2004, 21 (9): 1740-1752. 10.1093/molbev/msh182.PubMedView ArticleGoogle Scholar
- James TY, Kauff F, Schoch CL, Matheny PB, Hofstetter V, Cox CJ, Celio G, Gueidan C, Fraker E, Miadlikowska J, Lumbsch HT, Rauhut A, Reeb V, Arnold AE, Amtoft A, Stajich JE, Hosaka K, Sung GH, Johnson D, O'Rourke B, Crockett M, Binder M, Curtis JM, Slot JC, Powell MJ, Taylor JW, McLaughlin DJ, Spatafora JW, Vilgalys R: Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature. 2006, 443 (7113): 818-822. 10.1038/nature05110.PubMedView ArticleGoogle Scholar
- Technau U, Rudd S, Maxwell P, Gordon PM, Saina M, Grasso LC, Hayward DC, Sensen CW, Saint R, Holstein TW, Ball EE, Miller DJ: Maintenance of ancestral complexity and non-metazoan genes in two basal cnidarians. Trends Genet. 2005, 21 (12): 633-639. 10.1016/j.tig.2005.09.007.PubMedView ArticleGoogle Scholar
- Walton KD, Croce JC, Glenn TD, Wu SY, McClay DR: Genomics and expression profiles of the Hedgehog and Notch signaling pathways in sea urchin development. Developmental biology. 2006, 300 (1): 153-164. 10.1016/j.ydbio.2006.08.064.PubMedPubMed CentralView ArticleGoogle Scholar
- Nichols SA, Dirks W, Pearse JS, King N: Early evolution of animal cell signaling and adhesion genes. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (33): 12451-12456. 10.1073/pnas.0604065103.PubMedPubMed CentralView ArticleGoogle Scholar
- Sullivan JC, Ryan JF, Watson JA, Webb J, Mullikin JC, Rokhsar D, Finnerty JR: StellaBase: the Nematostella vectensis Genomics Database. Nucleic Acids Res. 2006, 34 (Database issue): D495-9. 10.1093/nar/gkj020.PubMedPubMed CentralView ArticleGoogle Scholar
- Requena N, Mann P, Hampp R, Franken P: Early developmentally regulated genes in the arbuscular mycorrhizal fungus Glomus mosseae: identification of GmGIN1, a novel gene with homology to the C-terminus of metazoan hedgehog proteins . Plant Soil. 2002, 244: 129-139. 10.1023/A:1020249932310.View ArticleGoogle Scholar
- Perler FB: InBase: the Intein Database. Nucleic Acids Res. 2002, 30 (1): 383-384. 10.1093/nar/30.1.383.PubMedPubMed CentralView ArticleGoogle Scholar
- Poulter RT, Goodwin TJ, Butler MI: The nuclear-encoded inteins of fungi. Fungal Genet Biol. 2007, 44 (3): 153-179. 10.1016/j.fgb.2006.07.012.PubMedView ArticleGoogle Scholar
- Pietrokovski S: Intein spread and extinction in evolution. Trends Genet. 2001, 17 (8): 465-472. 10.1016/S0168-9525(01)02365-4.PubMedView ArticleGoogle Scholar
- Amitai G, Belenkiy O, Dassa B, Shainskaya A, Pietrokovski S: Distribution and function of new bacterial intein-like protein domains. Molecular microbiology. 2003, 47 (1): 61-73. 10.1046/j.1365-2958.2003.03283.x.PubMedView ArticleGoogle Scholar
- Pietrokovski S: Conserved sequence features of inteins (protein introns) and their use in identifying new inteins and related proteins. Protein Science. 1994, 3: 2340-2350.PubMedPubMed CentralView ArticleGoogle Scholar
- Dalgaard JZ, Moser MJ, Hughey R, Mian IS: Statistical modeling, phylogenetic analysis and structure prediction of a protein splicing domain common to Inteins and Hedgehog proteins. Journal of Computational Biology. 1997, 4 (2): 193-214.PubMedView ArticleGoogle Scholar
- Perler FB, Olsen GJ, Adam E: Compilation and analysis of intein sequences. Nucl Acids Res. 1997, 25 (6): 1087-1093. 10.1093/nar/25.6.1087.PubMedPubMed CentralView ArticleGoogle Scholar
- Saleh L, Perler FB: Protein splicing in cis and in trans. Chemical record. 2006, 6 (4): 183-193. 10.1002/tcr.20082.PubMedView ArticleGoogle Scholar
- Hao L, Johnsen R, Lauter G, Baillie D, Bürglin TR: Comprehensive analysis of gene expression patterns of hedgehog-related genes. BMC Genomics. 2006, 7: 280-10.1186/1471-2164-7-280.PubMedPubMed CentralView ArticleGoogle Scholar
- De Ley P: A quick tour of nematode diversity and the backbone of nematode phylogeny. WormBook. Edited by: Community TCR. 2006, WormBook, 10.1895/wormbook.1.41.1. [http://www.wormbook.org]Google Scholar
- Huang G, Gao B, Maier T, Allen R, Davis EL, Baum TJ, Hussey RS: A profile of putative parasitism genes expressed in the esophageal gland cells of the root-knot nematode Meloidogyne incognita. Mol Plant Microbe Interact. 2003, 16 (5): 376-381. 10.1094/MPMI.2003.16.5.376.PubMedView ArticleGoogle Scholar
- Genome Sequencing Center, Washington University School of Medicine. [http://genome.wustl.edu/]
- Varjosalo M, Li SP, Taipale J: Divergence of hedgehog signal transduction mechanism between Drosophila and mammals. Dev Cell. 2006, 10 (2): 177-186. 10.1016/j.devcel.2005.12.014.PubMedView ArticleGoogle Scholar
- Svard J, Heby-Henricson K, Persson-Lek M, Rozell B, Lauth M, Bergstrom A, Ericson J, Toftgard R, Teglund S: Genetic elimination of Suppressor of fused reveals an essential repressor function in the mammalian Hedgehog signaling pathway. Dev Cell. 2006, 10 (2): 187-197. 10.1016/j.devcel.2005.12.013.PubMedView ArticleGoogle Scholar
- Colombatti A, Bonaldo P, Doliana R: Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix proteins. Matrix. 1993, 13 (4): 297-306.PubMedView ArticleGoogle Scholar
- Perkins SJ, Smith KF, Williams SC, Haris PI, Chapman D, Sim RB: The secondary structure of the von Willebrand factor type A domain in factor B of human complement by Fourier transform infrared spectroscopy. Its occurrence in collagen types VI, VII, XII and XIV, the integrins and other proteins by averaged structure predictions. J Mol Biol. 1994, 238 (1): 104-119. 10.1006/jmbi.1994.1271.PubMedView ArticleGoogle Scholar
- Baldauf SL: The deep roots of eukaryotes. Science. 2003, 300 (5626): 1703-1706. 10.1126/science.1085544.PubMedView ArticleGoogle Scholar
- Ryan JF, Burton PM, Mazza ME, Kwong GK, Mullikin JC, Finnerty JR: The cnidarian-bilaterian ancestor possessed at least 56 homeoboxes. Evidence from the starlet sea anemone, Nematostella vectensis. Genome Biol. 2006, 7 (7): R64-10.1186/gb-2006-7-7-r64.PubMedPubMed CentralView ArticleGoogle Scholar
- Mukherjee K, Bürglin TR: Comprehensive Analysis of Animal TALE Homeobox Genes: New Conserved Motifs and Cases of Accelerated Evolution. Journal of molecular evolution. 2007, 65 (2): 137-153. 10.1007/s00239-006-0023-0.PubMedView ArticleGoogle Scholar
- J. Craig Venter Institute. [http://www.tigr.org]
- BLAST: Basic Local Alignment and Search Tool. [http://www.ncbi.nlm.nih.gov/blast/]
- Wylie T, Martin JC, Dante M, Mitreva MD, Clifton SW, Chinwalla A, Waterston RH, Wilson RK, McCarter JP: Nematode.net: a tool for navigating sequences from parasitic and free-living nematodes. Nucleic Acids Res. 2004, 32 (Database issue): D423-6. 10.1093/nar/gkh010.PubMedPubMed CentralView ArticleGoogle Scholar
- O'Brien EA, Koski LB, Zhang Y, Yang L, Wang E, Gray MW, Burger G, Lang BF: TBestDB: a taxonomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res. 2007, 35 (Database issue): D445-51. 10.1093/nar/gkl770.PubMedPubMed CentralView ArticleGoogle Scholar
- GSC: BLAST Server. [http://genome.wustl.edu/tools/blast/]
- StellaBase: Nematostella vectensis Database. [http://evodevo.bu.edu/stellabase/]
- DOE Joint Genome Institute. [http://www.jgi.doe.gov/]
- ZFIN: The Zebrafish Model Organism Database. [http://zfin.org/]
- Sprague J, Bayraktaroglu L, Clements D, Conlin T, Fashena D, Frazer K, Haendel M, Howe DG, Mani P, Ramachandran S, Schaper K, Segerdell E, Song P, Sprunger B, Taylor S, Van Slyke CE, Westerfield M: The Zebrafish Information Network: the zebrafish model organism database. Nucleic Acids Res. 2006, 34 (Database issue): D581-5. 10.1093/nar/gkj086.PubMedPubMed CentralView ArticleGoogle Scholar
- NEB Intein Database. [http://www.neb.com/neb/inteins.html]
- SoftBerry. [http://www.softberry.com]
- Bürglin TR: PPCMatrix: a PowerPC dotmatrix program to compare large genomic sequences against protein sequences. Bioinformatics. 1998, 14 (8): 751-752.PubMedView ArticleGoogle Scholar
- Sequence Assembly at Iowa State University. [http://deepc2.psi.iastate.edu/aat/cap/cap.html]
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl Acids Res. 1997, 25 (24): 4876-4882. 10.1093/nar/25.24.4876.PubMedPubMed CentralView ArticleGoogle Scholar
- MUSCLE. [http://phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py]
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5 (1): 113-10.1186/1471-2105-5-113.PubMedPubMed CentralView ArticleGoogle Scholar
- Galtier N, Gouy M, Gautier C: SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci. 1996, 12 (6): 543-548.PubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520.PubMedView ArticleGoogle Scholar
- SignalP 3.0 Server. [http://www.cbs.dtu.dk/services/SignalP/]
- Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004, 340 (4): 783-795. 10.1016/j.jmb.2004.05.028.PubMedView ArticleGoogle Scholar
- LogoBar - Java application for protein sequence Logos. [http://www.biosci.ki.se/groups/tbu/logobar/]
- Pérez-Bercoff , Koch J, Bürglin TR: LogoBar: bar graph visualization of protein logos with gaps. Bioinformatics. 2006, 22 (1): 112-114. 10.1093/bioinformatics/bti761.PubMedView ArticleGoogle Scholar
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004, 32 (Database issue): D142-4. 10.1093/nar/gkh088.PubMedPubMed CentralView ArticleGoogle Scholar