Prevalence of the EH1 Groucho interaction motif in the metazoan Fox family of transcriptional regulators

Background The Fox gene family comprises a large and functionally diverse group of forkhead-related transcriptional regulators, many of which are essential for metazoan embryogenesis and physiology. Defining conserved functional domains that mediate the transcriptional activity of Fox proteins will contribute to a comprehensive understanding of the biological function of Fox family genes. Results Systematic analysis of 458 protein sequences of the metazoan Fox family was performed to identify the presence of the engrailed homology-1 motif (eh1), a motif known to mediate physical interaction with transcriptional corepressors of the TLE/Groucho family. Greater than 50% of Fox proteins contain sequences with high similarity to the eh1 motif, including ten of the nineteen Fox subclasses (A, B, C, D, E, G, H, I, L, and Q) and Fox proteins of early divergent species such as marine sponge. The eh1 motif is not detected in Fox proteins of the F, J, K, M, N, O, P, R and S subclasses, or in yeast Fox proteins. The eh1-like motifs are positioned C-terminal to the winged helix DNA-binding domain in all subclasses except for FoxG proteins, which have an N-terminal motif. Two similar eh1-like motifs are found in the zebrafish FoxQ1 and in FoxG proteins of sea urchin and amphioxus. The identification of eh1-like motifs by manual sequence alignment was validated by statistical analyses of the Swiss protein database, confirming a high frequency of occurrence of eh1-like sequences in Fox family proteins. Structural predictions suggest that the majority of identified eh1-like motifs are short α-helices, and wheel modeling revealed an amphipathicity that supports this secondary structure prediction. Conclusion A search for eh1 Groucho interaction motifs in the Fox gene family has identified eh1-like sequences in greater than 50% of Fox proteins. The results predict a physical and functional interaction of TLE/Groucho corepressors with many members of the Fox family of transcriptional regulators. Given the functional importance of the eh1 motif in transcriptional regulation, our annotation of this motif in the Fox gene family will facilitate further study of the diverse transcriptional and regulatory roles of Fox family proteins.


Background
DNA-binding transcriptional regulatory proteins have a modular structure and are composed of a sequence-specific DNA-binding domain and trans-regulatory domains. Multiple studies have shown that short conserved peptide regions mediate the biological functions of trans-regulatory domains. In the case of transcriptional repressors, such short protein regions can autonomously mediate repression when fused to a heterologous DNA-binding domain [1,2]. It appears that these conserved regions form either α-helices or binding pockets to provide specific interacting surfaces for transcriptional corepressors. For instance, the Sin3 interaction motif of NRSF/REST adopts a short amphipathic α-helix that mediates specific physical interactions with the Sin3 transcriptional corepressor [3]. In the present study, we focus on identifying and analyzing the Engrailed homology region-1 (eh1) transcriptional repression motif in the Fox gene family of forkhead-related transcriptional regulators. This motif is known to mediate specific physical interactions of a number of protein families with transcriptional corepressors of the TLE/Groucho protein family [4][5][6][7].
The eh1 motif is composed of eight amino acid residues with the sequence pattern FS(I/V)XXΦΦX, with X representing any non-polar or charged residue and Φ representing branched hydrophobic residues. The eh1 motif was originally identified as a conserved N-terminal sequence shared between the Drosophila Engrailed protein and its vertebrate orthologs [6]. Functional analysis of the Engrailed protein has shown that the eh1 motif is required for active transcriptional repression in vivo, as well as for the physical interaction with Groucho corepressors [7,8]. An eh1-like motif was also identified in eight classes of the homeodomain protein superfamily (Emx, Dlx, Gsc, Hex, Msh, Six, Oct and Vnd) [5,9,10]. Further in vivo and in vitro studies have shown that the eh1like motif of Gsc, Nkx, Hex and Six is required for repression function in vivo by recruiting the TLE/Groucho corepressors [5,9,11].
Eh1-like motifs have also been found in several members of the Fox family of forkhead-related transcriptional regulators [12]. Fox proteins are essential transcriptional regulators of embryogenesis, homeostasis, metabolism, and aging in metazoan organisms [13]. The highly conserved DNA-binding domain of Fox family proteins is characterized by the formation of three α-helixes, three β-strands and two loops resembling wings [14], thus the winged helix DNA-binding domain (WHD) designation. The WHD is flanked by N-and C-terminal regions that share low similarity among the Fox protein subclasses. The initial classification of Fox proteins based on sequence-relatedness within the WHD established fifteen subclasses of the Fox gene family [15], and four additional Fox sub-classes were subsequently identified [16,17]. An updated list of Fox gene family members is available online [18].
Sequence analysis of several Fox proteins revealed that a short conserved C-terminal region of FoxA proteins (conserved region II or CII) was similar to the eh1 motif [12]. Further biochemical studies showed that FoxA2 physically interacts with TLE1, a mammalian Groucho protein, via the CII region [19]. These data suggest that the CII region not only resembles the eh1 motif in sequence, but also in the ability to directly binding Groucho/TLE corepressors. In addition, the Drosophila FoxG ortholog, Slp1, physically interacts with Groucho via an N-terminal eh1-like motif [20]. Furthermore, our recent studies in Xenopus have shown that FoxD3 can associate with the Xenopus Groucho ortholog, Grg4, via an eh1-like motif. The FoxD3 eh1 motif is essential for a functional interaction with Grg4 and for transcriptional repression in vivo [21]. These observations suggest an interaction of Groucho corepressors with multiple Fox family proteins, and prompted us to systematically examine all subclasses of the Fox gene family for the presence of eh1-like motifs. Given the functional importance of the eh1 motif in transcriptional regulation, annotation of the presence, pattern of distribution, and structural characteristics of this motif in the Fox gene family will facilitate further study of the diverse transcriptional and regulatory roles of Fox family proteins.
Here, we present a complete systematic analysis of the presence of eh1-like motifs in metazoan Fox proteins. Eh1-like motifs are identified in more than 50% of Fox proteins representing ten Fox family subclasses (A, B, C, D, G, E, H, I, L and Q) and statistical analyses of the Swiss protein database confirm a frequent occurrence of the motif in the Fox family. Secondary structure analysis of these Fox proteins predicts that the eh1-like motifs adopt a short amphipathic α-helical structure. Taken together, the results point to a functional interaction of TLE/Groucho corepressors with many members of the Fox family and identify structural features of the eh1 motifs that will facilitate further study of the physical interaction of Fox proteins with TLE/Groucho corepressors.

Identification of eh1-like motifs in ten subclasses of the Fox gene family
We performed a systematic analysis of 458 yeast and metazoan protein sequences belonging to nineteen subclasses of the Fox family of transcriptional factors for the presence of eh1-like motifs. An initial manual search was conducted for the presence of sequences composed of eight amino acids with a highly conserved hydrophobic core matching the eh1 motif pattern of FSΦXXΦΦX (X, non-polar or charged residue; Φ, branched hydrophobic residue). Conserved regions of aligned orthologous Fox protein sequences were examined for homology to the eh1 consensus sequence. Eh1-like motifs were identified in Fox protein sequences of 10 subclasses, including the A, B, C, D, E, G, H, I, L and Q, but not in Fox proteins of the F, J, K, M, N, O, P, R and S subclasses (Table 1). Fox proteins containing an eh1-like motif were found across multiple animal phyla, and included chordates, hemichordates, and a variety of invertebrates, but not yeast (Tables 2 and 3). The identified motifs exhibit high similarity to the Drosophila eh1 motif in the range of 50-87%. To summarize the results, a phylogenetic tree for the Fox gene family was constructed in which the presence of an eh1-like motif within individual Fox proteins is indicated [see Additional files 1 and 2].
To validate the results of the manual search for eh1-like motifs, we used the expectation-maximization algorithm in the MEME program [22]. We initially examined 18 FoxD3-related protein sequences, which contain a conserved and functional eh1 motif [21]. As predicted, the analysis identified eh1-like motifs (E-value of 10-75) at 18 sites corresponding to the previously described eh1 motif of FoxD3. When this approach was extended to the entire Fox family of 458 proteins, eh1-like motifs were identified at 213 sites in ten Fox subclasses (E-value of <10-16). The eh1-like motifs identified using the expectation-maximization algorithm corresponded to motifs identified in the manual sequence analysis, as well as to motifs previously identified in the Fox family [12,23].
To confirm the statistical significance of the match between identified eh1-like sequences and the eh1 consensus, a hidden Markov model (HMM) was constructed [24] for the eh1 motif of FoxD3 (eh1 FD3). This model of the eh1 motif was used to search the SWISS protein database and a summary of the results of the eh1 FD3 HHM analysis is shown in Table 4. A total of 49,363 matches with the eh1 motif were identified, and 647 matches were to proteins that are members of transcription factor families. The mean log-odds score for all transcriptional proteins was 9.07, whereas non-transcriptional proteins scored at 6.87. Among transcriptional proteins, Fox family proteins resulted in the strongest matches with the eh1 motif, with a mean log-odds scores of 14.34. The motifs were identified in 9 subclasses of the Fox protein family which included A, B, C, D, E, G, H, L and Q (the FoxI subclass is not represented in the current SWISS protein database). The search also identified a significant number of high scoring matches (mean log-odds score of 11.61) for homeodomain-containing proteins of the para-Hox cluster [25], but the score for other non-Fox, non-para-Hox transcriptional proteins was low (7.72). The results of the HMM analysis strongly supports the conclusion that eh1like motifs are present within proteins of the Fox family at high frequency when compared with most transcriptional protein families and non-transcriptional proteins.
To evaluate the statistical significance of the eh1-like motif identification results obtained by HMM, logistic regression analysis was performed. Analysis of the logodds scores for the transcriptional protein and non-tran-  The highly conserved core of the eh1-like motifs are indicated in bold. b The percent similarity between the identified Fox eh1-like motifs and the eh1 motif (FSISNILS) of the Drosophila engrailed homeodomain protein [7]. c The location of the motifs within the amino acid sequence of the individual Fox proteins.   The highly conserved core of the eh1-like motifs are indicated in bold. b The percent similarity between the identified Fox eh1-like motifs and the eh1 motif (FSISNILS) of the Drosophila engrailed homeodomain protein [7]. c The location of the motifs within the amino acid sequence of the individual Fox proteins. scriptional protein classes indicated that the association of eh1-like motifs with transcriptional proteins had high statistical significance (p < 2 × 10 -9 ). Furthermore, analysis of the log-odds scores for the Fox family transcriptional proteins and other transcriptional protein classes were analyzed, the association of higher log-odds scores with Fox proteins was found to have high statistical significance (p < 2 × 10 -9 ). The results strongly support the conclusion that eh1 motifs are present in members of the Fox family at high frequency, and suggest that the eh1 motif contributes to the transcriptional function of many Fox family proteins.
For most of the Fox proteins analyzed, a single eh1-like motif was located C-terminal to the WHD (Fox subclasses A, B, C, D, E, H, I, L and Q). Two similar eh1-like motifs are present in the zebrafish FoxQ1 protein, with both Cterminal to the WHD. Interestingly, the C. elegans FoxD and sea urchin, amphioxus and zebrafish FoxQ2 proteins contain N-terminal eh1-like motifs, whereas a C-terminal motif location is found for the other FoxD and FoxQ orthologs. All FoxG proteins contain an eh1-like motif Nterminal to the WHD, and in sea urchin and amphioxus FoxG proteins a second eh1-like motif is located C-terminal to the WHD. The vertebrate FoxG proteins contain a C-terminal sequence that appears to be a remnant of an eh1 motif that lacks the conserved phenylalanine. Eh1like motifs were identified in Fox proteins in several early divergent species. These included sponge (phylum Porifera) FoxD, hydra and sea anemone (phylum Cnidaria) FoxA, and comb jelly (phylum Ctenophora) FoxG. The presence of eh1 motifs in Fox proteins of these phyla suggests an ancient appearance of this motif in the Fox gene family and therefore, a functional interaction with Groucho-related corepressors early in the evolution of the Fox gene family.

Loss of eh1-like motifs within Fox gene subclasses
Our sequence analysis indicates incomplete distribution of the motif within certain Fox subclasses, suggesting the loss of the motif in a subset of Fox proteins. A striking example of the loss of the eh1-like motif is observed within the FoxE subclass for FoxE1 proteins. Sequence analysis of FoxE subclass proteins did not identify a recognizable eh1 motif in seven mammalian FoxE1 proteins, whereas FoxE1 proteins of fish and amphibia, and nine other FoxE proteins contained the motif. To assess the inheritance and loss of the eh1 motif during the evolution of FoxE proteins, a phylogenetic tree for the FoxE subclass and the FoxC and FoxD outgroups was constructed using a neighbor-joining method (Figure 1). The topology of the phylogenetic tree (bootstrap value 91%) indicates a close relatedness of the fish, amphibian, and mammalian FoxE1 proteins, which suggests a common ancestry. Therefore it is reasonable to infer that the ancestral FoxE1 protein contained the motif, and the loss of the eh1 motif occurred in the mammalian lineage or ancestors of the mammalian phyla in the course of evolution. All other members of the FoxE subclass, including the amphioxus and tunicate proteins, as well as mammalian FoxE3 proteins, contained the motif. This suggests that most likely an ancestral FoxE protein contained the motif before the separation and expansion of the FoxE subclass, and this idea is supported by the presence of the motif in nearly all members of the FoxC and FoxD outgroups.
It should be noted that a cnidarian FoxE-related protein lacks the eh1 motif, and this may be viewed as inconsistent with the presence of the eh1 motif in the ancestral FoxE protein. However, phylogenetic analysis indicates a distant relatedness of this cnidarian protein to the FoxE subclass, arguing for different origins. Similarly, the motif is not detected in the N. vectensis FoxD-and FoxC-related proteins, which also appear to have undergone significant sequence divergence. The motif is present in cnidarian FoxA and FoxB proteins, as well as the FoxC-and FoxDrelated (Fox1) proteins of the sponge S. domuncula [see Additional files 1 and 2], suggesting that ancestral precursors for these subclasses contained the motif, whereas the motif was likely lost in a subset of more divergent cnidarian Fox proteins.
No eh1-like motif is detected in the tunicate FoxH-like proteins, whereas nearly all vertebrate FoxH proteins con-tain the motif. The absence of the eh1 motif in the tunicate FoxH proteins suggests a divergence and loss of this motif in the hemichordate lineage. However, it is also possible that the ancestral FoxH protein did not contain an eh1 motif and that the motif was recruited in the vertebrate lineage. Interestingly, a Xenopus FoxH1 paralog, FoxH3, also lacks the eh1 motif present in other vertebrate FoxH orthologs, again suggesting a loss of the motif, perhaps due to functional specialization [see Additional files 1 and 2].

Characteristics of eh1-like motifs in Fox family proteins
For the eh1-like motifs identified, the amino acid frequency at each position of the motif was determined to better define the characteristics of the motif in invertebrate and vertebrate members of the Fox gene family (Figure 2). For this frequency analysis, each position in the motif is identified as 0 to 7 in an N-terminal to C-terminal order. Although this analysis includes Fox proteins of evolutionary distant organisms, similar residue usage is observed at most positions. Overall, the identified motifs are characterized by the predominance of hydrophobic residues. The aromatic residue, phenylalanine, is absolutely conserved (100%) at position 0 of the identified motifs in vertebrates and in nearly all invertebrates. The hydrophobic core of the motif (positions 2, 5 and 6) is characterized by the frequent presence of branched hydrophobic residues such as isoleucine, leucine, methionine, and, less frequently, valine. For both vertebrates and invertebrates, isoleucine is highly represented at position 2 (75%), and leucine and isoleucine appear at similar frequencies (40-60%) at positions 5 and 6 in both invertebrates and vertebrates. Serine is highly represented at position 1 (75%) in vertebrate Fox proteins, whereas serine (55%) and threonine (30%) predominate at this position in invertebrates. Although positions 3 and 4 are variable, there is a strong bias for negatively charged residues at position 3 and the uncharged polar residues serine and asparagine at position 4. Position 7 of the eh1-like motifs is most variable, with glycine, alanine and serine residues often present. It should be noted that within individual Fox subclasses, residue identity at each position is more highly conserved, reflecting the evolutionary relatedness of the proteins in each subclass, as well as the conservation of subclass-specific functional and structural properties of the motifs [see Additional files 2 and 3].
The conservation of multiple hydrophobic residues in the eh1 motif is favorable for the formation of α-helices, and suggests that the eh1-like motifs identified in Fox family proteins have the potential to adopt a hydrophobic α-helical structure. To predict structural characteristics of the motifs, several algorithms (DSC, PHD, MLRC) were used to calculate the propensity of secondary structure formation [26][27][28]. For several Fox proteins of each subclass, A phylogenetic tree for proteins of the FoxE subclass and the FoxC and FoxD outgroups regions containing the eh1-like motif were analyzed for predicted secondary structure. The results obtained using multiple algorithms predict a high likelihood of α-helical structure in the region of the eh1-like motif for the majority of Fox proteins examined. The highest scores for α-helical propensity were obtained for the eh1-like motifs present in FoxB, FoxE and FoxQ proteins, and α-helical structure was also predicted for FoxD, FoxA, FoxC and FoxL proteins, albeit with lower propensity scores [see Additional file 4 and data not shown].
In BLAST searches, the eh1-like motifs of several Fox proteins, including FoxB and FoxE proteins, show similarity to the hydrophobic regions of several membrane proteins, including the α-helical regions of the Chlorobium tepidum segregation and condensation protein B (CHPfCT, AAM71720), Pseudomonas aeruginosa probable transcriptional regulator Pa0477 (2ESND), and Drosophila ultraspiracle ligand-binding domain (ULBD, 1HG4F) ( Figure 3A and data not shown). A BLAST search for sequences related to the N. vectensis Fox1 eh1-like motif identified the α-helical region of Hepatitis C RNA Polymerase (1YVZA) as the only related sequence ( Figure 3B). The ability of eh1-like sequences in proteins unrelated to the Fox family to form α-helical structure supports the prediction of α-helical structure for the eh1-like motifs identified in Fox proteins.
Helical wheel analysis of the predicted α-helical regions of the eh1-like motifs revealed an amphipathicity for a majority of the identified motifs. As an example of this analysis, the helical wheel models of the eh1-like motifs of FoxB1 and FoxE4 ( Figure 3C,D) display a predicted amphipathicity of the α-helical structure. For both eh1like motifs, a hydrophobic surface is formed by Isoleucine residues at positions 2, 5 and 6 of the predicted α-helix. The eh1-like motifs of a subset of FoxB1, FoxB2, FoxH1 and FoxQ1 proteins contain an additional hydrophobic residue (Alanine or Methionine) at position 1 that extends the hydrophobic surface of the predicted α-helix ( Figure 3C and data not shown). Opposite the hydrophobic surface of the predicted α-helix is a surface consisting predominantly of hydrophilic and non-charged residues ( Figure 3C,D and data not shown). Thus, the majority of the eh1-like motifs identified in Fox proteins have a predicted amphipathic α-helical structure. The validity of the predicted eh1 structure is strongly supported by a recent crystallographic study showing that the Goosecoid eh1 motif forms a short amphipathic α-helix when bound to the WD domain of TLE1 [29].

Positional distribution of C-terminal eh1-like motifs
The eh1-like motifs identified in the Fox family were further analyzed for motif position within individual Fox proteins. Given that nearly all of the eh1-like motifs identified in the Fox family are positioned C-terminal to the WHD, we limited the analysis to C-terminal motifs. To assess the variation in motif position within the C-terminus of Fox proteins, the positional distribution of the eh1like motifs relative to the WHD was examined. A substantial variation in the relative positions of the C-terminal eh1-like motifs and the WHDs was found, with an interval ranging from 30-180 residues (Figure 4). A detailed analysis of the positional distribution of these domains in 89 Fox protein sequences revealed two groups, C-proximal and C-distal, defined by maximum interval occurrence between the two domains. For the C-proximal eh1 motifs the maximum interval occurrence is 45-60 residues with a median value of 58 residues ( Figure 4A). For the C-distal motifs the maximum interval occurrence is 100-140 residues with a median value of 120 residues ( Figure 4B).
Positional variation of the C-terminal eh1-like motifs was also examined within Fox ortholog and paralog groups for eight subclasses. This analysis was limited to chordate Fox proteins as non-chordates lack many Fox subclasses. Proteins of Fox subclasses B, E, H and Q contain C-proximal motifs, whereas C-distal motifs are present in Fox subclasses A, C, D and I. The positional distribution of the motifs in the ortholog groups is shown in Figure 5. The analysis indicates that the position of eh1-like motifs is conserved within individual Fox protein subclasses across species, but not across subclasses within individual spe- The diagrams summarize the amino acid compositions of the eh1-like motifs identified in Fox proteins Figure 2 The diagrams summarize the amino acid compositions of the eh1-like motifs identified in Fox proteins. The amino acid usage frequency of eh1-like motifs identified in invertebrate (A) and vertebrate (B) Fox proteins. The diagrams were generated with the WebLogo program [44].
cies. This conservation of motif position within each subclass is consistent with the existence of a common ancestral gene for the Fox genes comprising an individual subclass [17], but may also reflect a functional constraint that maintains the position of the eh1 motif. Exceptions to the conservation of motif position are observed for the FoxD and FoxQ subclasses, and for orthologs of FoxA3, FoxC1, and FoxH1. For the FoxD subclass, a shift of motif position towards the C-terminus is observed for chick, mouse and human proteins, when compared to amphixous, zebrafish and Xenopus ( Figure 5A). A C-terminal shift is also observed for the eh1 motifs of Xenopus, mouse and human FoxQ proteins, compared to amphioxus and zebrafish ( Figure 5B). Similarly, for FoxC1 proteins, the eh1 motif of the chick and mammalian orthologs is shifted C-terminally in comparison to the zebrafish and Xenopus orthologs. In contrast, the eh1 motif of mammalian FoxH1 proteins is shifted N-terminally, closer to the (A) Multiple sequence alignments of the α-helical region of an ultraspiracle ligand binding domain from Drosophila (ULBD), α-helix of a conserved hypothetical protein from C. tepidum (CHPfCT), and the eh1 motifs of human FoxB1, murine FoxB2 and amphioxus FoxE4 proteins, which have a high like-lihood of α-helix formation WHD, in comparison to the zebrafish and Xenopus proteins.
For each case where eh1 motif position is not conserved, the shift in motif position correlates with changes in the size of the coding region C-terminal to the WHD. For example, sequence alignment of FoxD subclass proteins reveals the presence of polyalanine, polyglycine and polyproline repeats in the mammalian proteins that are absent in FoxD proteins of lower vertebrates (data not shown).
On the other hand, mammalian FoxH1 proteins lack sequences C-terminal to the WHD that are present in the Xenopus and zebrafish orthologs (data not shown). Thus, insertion or deletion of sequences within the C-terminal domain of these mammalian Fox proteins is likely responsible for the shift of eh1 motif position.

Discussion
In this study, we have identified the presence of eh1-like Groucho interaction motifs in ten subclasses of the Fox The presence of eh1 motifs in more than 50% of Fox family proteins was in marked contrast to other protein families, including both transcriptional and non-transcriptional proteins (Table 4 and data not shown).
The prevalence of eh1-like motifs in the Fox family suggests that Groucho corepressors directly interact with many Fox proteins to mediate transcriptional repression activity or to inhibit the activation function of other regulatory domains. In a number of cases the functional importance of the identified eh1-like motifs is confirmed by the presence of the motifs within defined transcriptional repression domains and by the ability to mediate direct binding to Groucho proteins. The eh1 motifs are present in the C-terminal repression domains of mouse and chick FoxD3 [30,31], and Xenopus FoxD5 [32], as well as the C-terminal transcriptional inhibitory domain of mouse FoxC1 [33]. Furthermore, the eh1 motifs mediate a functional and direct interaction with Groucho corepressors in mouse FoxA2 [19], Drosophila FoxG/sloppypaired-1 [20], mouse FoxG1 [34], and Xenopus FoxD3 [21] and FoxH1 (SY and DSK, unpublished). These results confirm the importance of eh1 motifs in Fox family proteins, and suggest that the eh1-like motifs identified in this study may mediate a previously unappreciated interaction of Groucho corepressors with many Fox proteins.
Secondary structure analysis of the eh1-like motifs indicates that a majority of the identified motifs are highly likely to form an α-helical structure. In support of this secondary structure prediction, a number of the eh1-like motifs exhibit sequence similarity to regions of unrelated proteins with known α-helical structure. In addition, the eh1-like motifs exhibit amphipathicity, which argues in favor of α-helix formation by the motifs. Structural studies of a number of transcriptional regulators have demonstrated the importance of amphipathic α-helices in binding to transcriptional coregulators. The p53 tumor suppressor binds to the transcriptional coactivator, MDM2, via a 13 amino acid motif. Structural studies have shown that the MDM2 interaction motif of p53 forms an amphipathic α-helix that binds to MDM2 through hydrophobic interactions [35]. In addition, NRSF/REST binds to the Sin3 corepressor via several short amphipathic or hydrophobic α-helices [3]. Therefore, the predicted amphipathic α-helical structure of the eh1 motifs is likely an essential feature for direct, high-affinity binding of Fox proteins to Groucho corepressors. This conclusion is strongly corroborated by recent structural studies showing that the eh1 motif present in the human Goosecoid protein forms a short amphipathic α-helix when bound to the WD domain of the Groucho family protein TLE1 [29].
In general, these observations support the idea that diverse families of transcriptional regulators utilize distinct conserved motifs, which adopt a common amphipathic α-helical structure, as adaptors for the physical interaction with transcriptional coregulators.
Eh1-like motifs were identified in Fox proteins of the most evolutionary ancient organisms, including marine sponge (porifera), comb jelly (ctenophora) and sea anemone (cnidaria). The presence of the eh1-like motif in Fox proteins of these organisms likely reflects the presence of the eh1-Groucho interaction functional module early in evolutionary history. Eh1-like motifs are also present in other transcriptional regulators of the sponge, including the Barx/Bsh1 (AAQ24371) and a paraHox-related homeodomain protein (CAD37941). Consistent with the presence of eh1-like motifs in transcriptional regulatory proteins of early divergent species, a Groucho gene (CN626783) has been identified in the cnidarian Hydra. These data suggest an ancient origin for eh1 motif-dependent recruitment of Groucho corepressors, a protein interaction that may have been established as early as the porifera.
An intriguing question raised by these analyses is the origins of the eh1 motifs in the Fox gene family. The motifs identified in all Fox subclasses, except for the FoxG subclass, are positioned C-terminal to the WHD. The occurrence of the eh1-like motif N-terminal to the WHD in the FoxG subclass and FoxQ2 suggests that the N-terminal motif may have arisen independent of the C-terminal motif. In addition, two eh1-like motifs, positioned N-terminal and C-terminal to the WHD, were identified in the sea urchin and amphioxus FoxG1 proteins. The presence of two motifs in distinct regions of a subset of FoxG1 orthologs is consistent with independent origins for the C-terminal and N-terminal eh1 motifs. Given the small size of the eh1 motif (8 residues), it is possible that the motif arose multiple times in the Fox family. Therefore, the formation of new eh1-like motifs through the accumulation of missense mutations offers a convergent mechanism for multiple independent appearances of the motif in the Fox family. Alternatively, the Fox genes may have acquired the motif via a non-homologous recombination event that introduced a repression module containing an eh1-like motif. Such a scenario could involve the incorporation of a new exon encoding the repression module. However, since a majority of the Fox family genes lack introns, this mechanism would require intron loss subsequent to incorporation of the eh1-encoding exon.
An apparent loss of eh1 motifs was observed in a subset of FoxD, FoxE, and FoxH proteins. Our analysis indicates that the loss of the motif occurred in a subset of mammalian Fox proteins and we speculate that the motif loss provided a new functional modification for these proteins that was evolutionarily beneficial. Since the presence of an eh1 motif likely mediates a functional interaction with Groucho corepressors, the loss of the motif may represent an alteration of both transcriptional activity and regulatory function for individual Fox proteins. For example, while FoxH1 proteins can function as transcriptional activators or repressors by recruitment of Smad coactivators or Groucho corepressors [36,37] (SY and DSK, unpublished), it is predicted that FoxH3 functions exclusively as an activator in association with Smad coactivators [38]. Thus, the eh1 motif may play an important role in the evolution of the Fox gene family by providing a basis for the evolutionary modification of Fox protein function.

Conclusion
The identification of eh1-like motifs in many members of the Fox gene family provides an important insight into the potential transcriptional activity of Fox family proteins, and provides a foundation for the study of eh1 motif function in the Fox family. Biochemical and transcriptional studies will now be necessary to determine if the identified eh1-like motifs mediate a direct physical interaction with Groucho corepressors to confer transcriptional repression activity. Building on our motif analyses, ongoing functional studies should yield a more comprehensive understanding of the evolution, domain organization, and transcriptional activity of the Fox gene family.

Manual sequence analysis
The Fox gene family is subdivided into nineteen subclasses on the basis of homology within the winged helix DNA-binding domain [15], and at the time of this study the nineteen subclasses comprised 458 sequences. To identify eh1-like motifs, we used the eh1 consensus sequence F 0 S/A +1 Φ +2 X +3 X +4 Φ +5 Φ +6 X +7 (Φ, branched hydrophobic residues; X, non-polar or charged residues), which has been generated based on the published data. Yeast and metazoan Fox protein sequences present in the SWISS-PROT and NCBI databases were analyzed. To identify the presence of an eh1-like motif in protein sequences of the nineteen subclasses, we performed PSI-BLAST searches of the non-redundant databases with inclusion threshold (E-value) of 0.01 using members of each Fox subclass as a query. In parallel, the sequences of all subclasses were retrieved from the NCBI database and multiple protein alignments were constructed for each subclass using the CLUSTAL W algorithm in the software package MacVector 7.2.2. Regions that were conserved within either the N-terminal or C-terminal regions of at least two species were examined for a minimum of 50% similarity to the eh1 consensus. Taken together these searches allowed for the identification of conserved sequences matching the eh1 consensus in ten Fox subclasses.

Expectation-maximization and hidden Markov model analyses
The expectation-maximization algorithm of the MEME program (Multiple Em for Motif Elicitation, version 3.5.4) [22,39] was used to analyze 458 proteins of the Fox family for the presence of eh1-like motifs. The search parameters used were 20-30 motifs per a run and a motif size of 8-10 amino acid residues.
An eh1 motif position-specific probability matrix was generated for a set of FoxD3 protein sequences using MEME, and this matrix was used to construct a hidden Markov model for eh1-like motifs using the Meta-MEME program (Motif-based hidden Markov modeling of biological sequences, version 3.2) [24,40]. The SWISS protein database was searched with the FoxD3 eh1-like motif model using an E-value threshold of <10 4 for reported sequences.
Logistic regression analysis was performed to determine whether there was a statistically significant correlation between the results of the hidden Markov model analysis (log-odds scores) and all transcriptional proteins or Fox family proteins specifically. The dependent variable in the logistic regression analysis is the dummy variable (y), which is equal to 1 when a transcriptional protein is present and 0 otherwise. The independent variable is the score (x). The estimated logistic regression equation is: , where x is the score and is an estimate of the probability that y = 1 or that the transcription factor is present given the score.

Phylogenic analysis of Fox proteins
A phylogenic tree for the FoxE subclass was generated based on the winged-helix DNA-binding domain sequences (100 residues) for FoxC, FoxD and FoxE subclass proteins. Multiple sequence alignments were constructed using Clustal W [41] and these sequences were converted into a cladogram using MEGA 3.1 [42]. Distances were calculated with Poisson correction, and a neighbor-joining method was used to construct the tree topology with bootstrap analysis of 1000 samples.

Secondary structure analysis
For secondary structure predictions, the C-terminal or Nterminal domain of selected Fox proteins of each subclass was subjected to analysis using algorithms that predict secondary structure with accuracy in the range of 0.67-0.7. The prediction algorithm is available at the Network Protein Sequence Analysis website [43]. The source code of the combiner can be obtained on request for academic use. In addition, software written by M.L. (unpublished) was used to predict the secondary structure of Fox protein sequences. This helix prediction algorithm is based on all high-resolution structures available, with the scoring function comparing homology of the sequences to known helical structures.

Authors' contributions
SY initiated these studies and was involved in all aspects of the design, execution and interpretation of these studies, as well as the writing of the manuscript. AV participated in the motif search and statistical analyses, and contributed to the writing of the manuscript. SS and ML contributed to the secondary structure analysis and amphipathic modeling. DSK contributed to the design and interpretation of these studies, data presentation and writing of the manuscript. All authors read and approved the final manuscript.