- Research article
- Open Access
Reduction/oxidation-phosphorylation control of DNA binding in the bZIP dimerization network
BMC Genomicsvolume 7, Article number: 107 (2006)
bZIPs are transcription factors that are found throughout the eukarya from fungi to flowering plants and mammals. They contain highly conserved basic region (BR) and leucine zipper (LZ) domains and often function as environmental sensors. Specifically, bZIPs frequently have a role in mediating the response to oxidative stress, a crucial environmental signal that needs to be transduced to the gene regulatory network.
Based on sequence comparisons and experimental data on a number of important bZIP transcription factors, we predict which bZIPs are under redox control and which are regulated via protein phosphorylation. By integrating genomic, phylogenetic and functional data from the literature, we then propose a link between oxidative stress and the choice of interaction partners for the bZIP proteins.
This integration permits the bZIP dimerization network to be interpreted in functional terms, especially in the context of the role of bZIP proteins in the response to environmental stress. This analysis demonstrates the importance of abiotic factors in shaping regulatory networks.
Control of gene expression at the transcriptional level is vital and several mechanisms exist that may regulate the DNA binding of a transcription factor (TF). These include differential heterodimer formation, methylation of the DNA target site , phosphorylation in the TF DNA-binding domain (DBD) , reduction/oxidation (redox) of the DBD , the concentration of cations (particularly magnesium) in the nuclear environment . A combination of differential heterodimer formation together with the phosphorylation and the redox mechanisms may yield complex behaviours that determine the expression or inhibition of downstream targets. We are particularly interested in the complex behaviour that these 3 mechanisms create in the bZIPs, since these TFs are involved in cell proliferation and apoptosis.
bZIPs are eukaryotic TFs, found in fungi, plants and animals. They are named for the highly conserved basic region (BR) and leucine zipper (LZ) domains, found in all these proteins. Specifically, the bZIP domain is responsible for dimerization mediated by the LZ region and DNA-binding is mediated by the BR, which is N-terminal to the LZ. Two mechanisms that control DNA binding by bZIP dimers are phosphorylation and redox. This control is mediated by specific and conserved amino acids in the BR of the protein. Serine and cysteine residues mediate phosphorylation and redox, respectively, at position 19 of the BR domain (Figure 1). We designate these residues as S19 and C19, respectively. Crystallographic analysis of various bZIP proteins, like JUN-FOS, GCN4, PAP1 and C/EBPα, has shown that the amino acids at this position make contacts with the DNA , .
Phosphorylation of S19 in BATF  and C/EBP  proteins has been shown to add a negative charge to the positive BR and inhibit binding with the negatively charged DNA. Phosphorylation of only one of the two S19 residues in a heterodimer is sufficient to block DNA binding, thus having a dominant effect (Figure 2). It has also been postulated that the corresponding tyrosine residue (Y19) in ATF4 could potentially be phosphorylated .
Another mechanism of switching on or off DNA binding is the reduction/oxidation of the cysteine residue in position 19 of the BR of bZIPs (C19) , . Oxidation of C19 blocks DNA binding, and this mechanism has been shown to operate for the AP1 proteins. Several mechanisms have been proposed or predicted for the oxidation of C19, such as reversible formation of sulphenic acid, a disulphide bond , S-glutathiolation , or S-cystenyl cystenylation . Furthermore, C19 can either be protected from oxidation by the MBF1 co-activator , or it can be switched back from its oxidised to its reduced state by the ref-1 protein .
The importance of these two signalling mechanisms (redox versus phosphorylation) is stressed by the conservation of the cysteine or serine at this position. Deppmann et al.  report 55% and 35% occurrence of cysteine and serine, respectively, in position 19 in an alignment of human bZIPs. Here, we provide a more detailed phylogenetic analysis using several vertebrate and invertebrate species and highlight the level of conservation of these amino acids, strongly suggesting functional conservation. In addition to conservation at the sequence level, the same cysteine residue and the redox mechanism have been demonstrated experimentally in both human and Drosophila AP1 proteins , , . For the phosphorylation mechanism, the importance of S19 has been shown experimentally for BATF and C/EBP proteins , , . In addition, mutation of the cysteine or serine in position 19 does not affect heterodimerization properties or DNA-binding recognition , , . This mutation appears to affect the selection of the signal (phosphorylation or oxidative stress) that turns on or off the DNA-binding ability of the dimer. Therefore, the conservation of the amino acids in position 19, and the report of functional conservation across different families, strongly indicates that the same mechanism is preserved from Drosophila to humans across many divergent evolutionary lineages.
In this work, we provide a visualization of the bZIP dimerization network and show the level of conservation of C19 and S19 residues across different phylogenetic lineages. It has yet to be established experimentally, for all bZIP proteins, whether they are under redox or phosphorylation control. However, given that all bZIP DNA-binding regions around the C19 and S19 residues have fundamentally similar properties, and extrapolating from experiments on JUN, FOS, C/EBP and BATF proteins, we predict the bZIP proteins in which the redox and phosphorylation mechanisms are utilised. Based on these predictions, we propose that the control mechanism is linked to the evolutionary history of the bZIP families. Interestingly, certain dimerization types are over-represented, while others are under-represented – suggesting strong preference for particular interaction patterns. Specifically, C19 monomers tend to dimerize with other C19 monomers, probably in order to retain redox control and rarely interact with other monomers. While dimerization with other monomer types can occur, such dimers usually have a repression function, so as to avoid inappropriate gene expression.
Results and discussion
The C19 residue of the BR alignment (Figure 1) is strikingly conserved in all vertebrates and invertebrates in the ATF2, ATF3, NFE2, BACH, CREB, FOS, JUN, L-MAF, S-MAF families (for a complete alignment of the BR domains of all bZIP proteins, see Additional file 1). S19 is also strikingly conserved in the ATF6, BATF, C/EBP, C/EBP-γ, E4BP4, OASIS, OASIS-B and PAR families. Y19, which could potentially be phosphorylated, is conserved in the ATF4 family. The XBP1 family is the only family that has a serine residue in the arthropods; this has been mutated to an alanine residue in chordates, suggesting loss of phosphorylation control in the BR. Interestingly, we see the presence of serine and cysteine residues in fungal bZIPs as well as the presence of phenylalanine (F19), which is restricted to the fungal lineage (Figure 3). This phenylalanine is necessary for recognition of the yap DNA-binding motif by the Yap proteins in fungi . Nevertheless, the DNA-recognition property of this phenylalanine does not necessarily extend to the serine and cysteine residues, since they have no aromatic ring that can get stuck in the DNA. The presence of these particular serine and cysteine residues in both metazoan and fungal proteins emphasises the ancient nature of phosphorylation and redox control in bZIPs.
It appears that the highly conserved nature of these amino acids is due to functional reasons (phosphorylation, redox) and not due to size restrictions at the protein-DNA interface. While cysteine, serine and alanine are all very small amino acids, the presence of tyrosine and phenylalanine in the same position shows clearly that residue size is not a major constraint. Cysteine and serine are only one point mutation apart in the genetic code and so, if size was the only constraint, we might expect these two amino acids to occur with equal frequency at this site, even within families. Thus, the conservation of the same amino acid across all the family members, from vertebrates to invertebrates (with the exception of only the XBP1 family) strongly suggests that biochemical function is involved. Phylogenetic analysis of the BR (Figure 3 &4) indicates that the use of phosphorylation or redox control has emerged several times, as the various bZIP families emerged in early metazoan evolution. Therefore, it is impossible to determine conclusively whether phosphorylation or redox control is the ancestral state at position 19.
The bZIP network is visualised in Figure 5. The bZIP dimerization network does not have the same statistical properties as all the other protein interaction networks studied to date . Unlike them, it is not described by a power-law distribution (see figure 5), where the number of genes with k interactions should decay exponentially. We have previously described a network  of similar magnitude (the bHLH dimerization network) that followed a power-law distribution and which seems to work as a multi-switch . In the case of the bHLH network, the majority of the proteins need to heterodimerize with one of the ubiquitously expressed hub proteins (eg. E2A, ARNT) in order to form functional dimers. In contrast, in the bZIP network, a large number of proteins can form homodimers. Therefore, if the bZIP network is different, what are its purposes and functions?
Integration of protein interaction data with sequence analysis reveals the different types of dimers and how many of them are formed in humans (Table 1). DNA binding of all dimers that contain at least one S19 is probably controlled by phosphorylation. The same may also happen for dimers that contain Y19, since this tyrosine can potentially be phosphorylated . All the dimers that contain at least two C19 residues are probably controlled by the redox mechanism. Dimers that contain only one C19 should be deficient in redox control, according to .
The immediate question is whether there is a pattern to the dimerization partners that is related to the phosphorylation and redox mechanisms. To test whether the distribution of different types of dimers is significantly different from random (null hypothesis), a Chi-squared test was used (Table 2). According to the null hypothesis, the same number of proteins with the same frequency of serine and cysteine residues could dimerize randomly. Note, this statistical analysis must be treated with caution since differential gene expression and protein expression could distort the null hypothesis. By using the frequency of each amino acid in the 43 human bZIPs for which interaction data are available , we looked for all the types of bZIP dimers that could potentially form if all potential pairs were co-expressed. We were particularly interested in the redox control and therefore focused on the presence of C19-bearing dimers. We observed biases, particularly an over-representation of C-C type dimers that also resulted in an under-representation of C-X dimers (dimers that contain only one C19) and vice versa (Table 2).
It is conceivable that this under-representation of C-X dimers is an artefact and this could have arisen in two different ways. Firstly, we explored the possibility that C19-bearing proteins tend to homodimerize more than the others. Secondly, we explored the possibility that C19-bearing proteins tend to dimerize more with their closest homologues, that is proteins of the same family that also have a cysteine at position 19. In order to exclude biases created by these two options, we also performed the chi-squared test for heterodimers only, and for heterodimers that are not close paralogues (Table 2). In both cases, the over-representation of C-C and under-representation of C-X dimers is statistically significant.
In the above statistical tests, we used all the paralogues of each family. Nevertheless, most paralogues have similar dimerization patterns. If position 19 is not actually linked to the dimerization pattern, but is only responsible for family-specific DNA core-site recognition, then gene duplication (within a protein family) could have caused an artefactual connection. In order to exclude this possibility, all the paralogues of each family were collapsed into one such that the network was then composed of interacting families. We retained the structure of the network among the families, but we shuffled the amino acids of position 19 across the various families 10,000 times. This model showed clearly that, in less than 2% of the shuffled networks, did we obtain an under-representation of C-X family dimers similar to that observed in the data (11 or fewer families forming C-X dimer types) (see also Additional file 2). Furthermore, we performed the same analysis in other positions of the BR alignment (positions 9, 13, 15, 16, 23) that are also strongly conserved within each family, but we did not observe any under/over-representation of amino acid combinations at a cut-off level of 5% (see also Additional file 2), in contrast to what we observed for position 19. It should also be stressed that the experimental evidence from  shows that mutation of C19 did not affect DNA binding, DNA element recognition, or dimerization of the Zta bZIP protein.
Interestingly, it is apparent that the C-X heterodimer type is not favoured. This can be explained by the dominant nature of phosphorylation over redox control. The presence of only one S19 residue would be sufficient to place the DNA-binding properties of the dimer under phosphorylation control. The loss or decrease of the redox mechanism in the BR is known to increase the transforming activity of the JUN-FOS heterodimer , . Thus, it is presumably important to retain the redox mechanism and avoid heterodimerizing with other types (see Figures 4 &5).
What is the function of the C-type bZIPs that actually dimerize with other types (X-types), thus forming C-X type dimers and exhibiting insufficient redox control? By examining the activation/inhibition activity of the X-type partners in general, it appears that E4BP4 , p21-SNTF  and BATF  have an inhibitory effect when dimerizing with other factors. In the case of E4BP4, this is due to the active repression domain that it possesses. The cases of ATF4, C/EBP-β and C/EBP-γ are more complex because they can exhibit activating or inhibiting effects, depending on post-translational modification , alternative splicing , or the cell type in which they are expressed . Nevertheless, they do have the ability to function as inhibitors. It is reasonable to assume that, for C-type molecules, it is generally acceptable to escape from the redox control, as long as they dimerize with an inhibitor, or if the new heterodimer cannot recognise and bind to promoters of downstream targets that need to be controlled by the redox mechanism – thus preventing uncontrolled activation of downstream targets.
Dimerization is an important mechanism for generating complex behaviour with a limited number of protein "building blocks". Work on other dimerizing TF families, like the bHLH, has revealed a dimerization network with a hub-based structure  that seems to work as a multi-switch , especially in development and the cell cycle. A very different network structure was found for the bZIP proteins, despite the fact that they share a similar crystal structure with the bHLH proteins. Interestingly, there seems to be a pattern in the formation of dimers in the bZIPs (Figures 4 &5). These results indicate that environmental signals (and, particularly, oxidative stress) could have imposed some selective pressure on the dimerizing properties of these proteins. Alternatively, the dimerizing properties of each monomer could have imposed some pressure on the presence of cysteine or serine in position 19 of the BR. The redox mechanism has been implicated in the regulation of DNA binding in other TF proteins: p53, Sp1, NFI, NF-κB, PEBP2/CBF, the nuclear receptor proteins (oestrogen and glucocorticoid receptors) and the bHLH protein, USF (reviewed in ). When cells undergo oxidative stress, the cell cycle is affected and it seems that the redox control of cysteine residues in the DNA-binding domain of various TFs is a simple (but very efficient) mechanism of transducing environmental signals to the transcriptional machinery. In addition, oxidative stress has been implicated in the aetiology of several human diseases, like cancer, ischemia, atherosclerosis, neurodegenerative disorders and ageing (reviewed in ). It will be of great interest to further enhance our understanding of how this mechanism works and affects other dimerizing TF families, like the bHLH and nuclear receptors, determining whether this pattern is global or restricted to the bZIPs.
The integration of genomic, phylogenetic and functional data reveals a preference in the interaction partners of bZIP proteins that is linked to oxidative stress. Specifically, bZIP proteins whose DNA binding is controlled by redox tend to dimerize, with a frequency more than that expected by chance, with other bZIP proteins that are also controlled by redox. These results demonstrate that abiotic factors may play a major role in shaping regulatory networks. While the dimerization networks of bHLH proteins and nuclear receptors are hub-based, that of the bZIP proteins is not. Nevertheless we have demonstrated that this network is not random. It follows a logic which strongly links its structure with the network's functional role in environmental sensing.
Protein-protein interaction data for all the human bZIPs were taken from . In that study, the coiled-coils of 43 out of the 51 human bZIPs that we had identified were checked for the presence of an interaction with any of the other bZIP coiled-coils. Each protein was used both as a surface-bait and a probe. Therefore, a given heterodimerization is represented by two different symmetrical (across the diagonal) points in an interaction matrix. We considered an interaction as valid if its Z-score was greater than 2.5 in both directions, where the Z-score is a measure of the signal-to-noise ratio .
bZIP sequences were obtained by genome-wide scanning using custom-made HMMs . The training of the HMM models was based on protein sequences annotated as bZIPs in the TRANSFAC database (version 4) . Four vertebrate (Homo sapiens, Gallus gallus, Takifugu rubripes, Danio rerio), four invertebrate (Ciona intestinalis, Drosophila melanogaster, Apis melifera, Anopheles gambiae) and six fungal (Schizosaccharomyces pombe, Yarrowia lipolytica, Debaryomyces hansenii, Kluyveromyces lactis, Candida glabrata, Saccharomyces cerevisiae) genomes were scanned for bZIP sequences. In addition, cnidarian bZIP sequences were retrieved, using a keyword search, from the NCBI protein database. The inclusion of a sequence as a bZIP 'hit' required both the presence of an LZ and a typical DNA-binding region, as defined by  and the 2ZIP program . This strict criterion was imposed because LZ domains may appear by chance, due to abundance of the leucine residue and the short length of the domain .
Multiple sequence alignments were performed for family members with T-COFFEE  (Notredame et al., 2000) and among different families using CLUSTALW  (Thompson et al., 1994). The alignment was based on the BR and LZ domain. Phylogenetic analysis (neighbour-joining) of the BR was performed by the PHYLIP package , using the PROTDIST and NEIGHBOUR programs, using the JTT model of amino acid replacement. The neighbour-joining tree was visualised with TreeEdit .
Classification of bZIPs in protein families
We classified the 51 human bZIPs into 19 families, based on the neighbour-joining phylogenetic analysis of the BR-LZ domain, combined with the distribution of orthologues and the domain architecture of the whole sequence (G. D. Amoutzias, PhD Thesis, The University of Manchester, 2005). Specifically, the designations of our analysis were based on the presence of invertebrate orthologues and distinct domain architectures for each family. The new designations are: (1) the split of the OASIS family into OASIS and OASIS-B, (2) the split of CNC family into NFE2 and BACH, and (3) the split of the C/EBP family into C/EBP and C/EBP-γ.
Iguchi-Ariga SM, Schaffner W: CpG methylation of the cAMP-responsive enhancer/promoter sequence TGACGTCA abolishes specific factor binding as well as transcriptional activation. Genes Dev. 1989, 3: 612-9.
Mahoney CW, Shuman J, McKnight SL, Chen HC, Huang KP: Phosphorylation of CCAAT-enhancer binding protein by protein kinase C attenuates site-selective DNA binding. J Biol Chem. 1992, 267: 19396-403.
Abate C, Patel L, Rauscher FJ, Curran T: Redox regulation of fos and jun DNA-binding activity in vitro. Science. 1990, 249: 1157-61.
Moll JR, Acharya A, Gal J, Mir AA, Vinson C: Magnesium is required for specific DNA binding of the CREB B-ZIP domain. Nucleic Acids Res. 2002, 30: 1240-6. 10.1093/nar/30.5.1240.
Fujii Y, Shimizu T, Toda T, Yanagida M, Hakoshima T: Structural basis for the diversity of DNA recognition by bZIP transcription factors. Nat Struct Biol. 2000, 7: 889-93. 10.1038/82822.
Miller M, Shuman JD, Sebastian T, Dauter Z, Johnson PF: Structural basis for DNA recognition by the basic region leucine zipper transcription factor CCAAT/enhancer-binding protein alpha. J Biol Chem. 2003, 278: 15178-84. 10.1074/jbc.M300417200.
Deppmann CD, Thornton TM, Utama FE, Taparowsky EJ: Phosphorylation of BATF regulates DNA binding: a novel mechanism for AP-1 (activator protein-1) regulation. Biochem J. 2003, 374: 423-31. 10.1042/BJ20030455.
Xanthoudakis S, Miao G, Wang F, Pan YC, Curran T: Redox activation of Fos-Jun DNA binding activity is mediated by a DNA repair enzyme. EMBO J. 1992, 11: 3323-35.
Marshall HE, Merchant K, Stamler JS: Nitrosation and oxidation in the regulation of gene expression. FASEB J. 2000, 14: 1889-900. 10.1096/fj.00.011rev.
Klatt P, Molina EP, De Lacoba MG, Padilla CA, Martinez-Galesteo E, Barcena JA, Lamas S: Redox regulation of c-Jun DNA binding by reversible S-glutathiolation. FASEB J. 1999, 13: 1481-90.
Jindra M, Gaziova I, Uhlirova M, Okabe M, Hiromi Y, Hirose S: Coactivator MBF1 preserves the redox-dependent AP-1 activity during oxidative stress in Drosophila. EMBO J. 2004, 23: 3538-47. 10.1038/sj.emboj.7600356.
Okuno H, Akahori A, Sato H, Xanthoudakis S, Curran T, Iba H: Escape from redox regulation enhances the transforming activity of Fos. Oncogene. 1993, 8: 695-701.
Morgan IM, Havarstein LS, Wong WY, Luu P, Vogt PK: Efficient induction of fibrosarcomas by v-jun requires mutations in the DNA binding region and the transactivation domain. Oncogene. 1994, 9: 2793-7.
Trautwein C, van der Geer P, Karin M, Hunter T, Chojkier M: Protein kinase A and C site-specific phosphorylations of LAP (NF-IL6) modulate its binding affinity to DNA recognition elements. J Clin Invest. 1994, 93: 2554-61.
Fernandes L, Rodrigues-Pousada C, Struhl K: Yap, a novel family of eight bZIP proteins in Saccharomyces cerevisiae with distinct biological functions. Mol Cell Biol. 1997, 17: 6982-93.
Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272.
Amoutzias GD, Robertson DL, Oliver SG, Bornberg-Bauer E: Convergent evolution of gene networks by single-gene duplications in higher eukaryotes. EMBO Rep. 2004, 5: 274-9. 10.1038/sj.embor.7400096.
Cinquin O, Demongeot J: High-dimensional switches and the modelling of cellular differentiation. J Theor Biol. 2005, 233: 391-411. 10.1016/j.jtbi.2004.10.027.
Newman JR, Keating AE: Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science. 2003, 300: 2097-101. 10.1126/science.1084648.
Schelcher C, Valencia S, Delecluse HJ, Hicks M, Sinclair AJ: Mutation of a single amino acid residue in the basic region of the Epstein-Barr virus (EBV) lytic cycle switch protein Zta (BZLF1) prevents reactivation of EBV from latency. J Virol. 2005, 79: 13822-8. 10.1128/JVI.79.21.13822-13828.2005.
Cowell IG, Hurst HC: Transcriptional repression by the human bZIP factor E4BP4: definition of a minimal repression domain. Nucleic Acids Res. 1994, 22: 59-65.
Bower KE, Zeller RW, Wachsman W, Martinez T, McGuire KL: Correlation of transcriptional repression by p21(SNFT) with changes in DNA.NF-AT complex interactions. J Biol Chem. 2002, 277: 34967-77. 10.1074/jbc.M205048200.
Hai T, Hartman MG: The molecular biology and nomenclature of the activating transcription factor/cAMP responsive element binding family of transcription factors: activating transcription factor proteins and homeostasis. Gene. 2001, 273: 1-11. 10.1016/S0378-1119(01)00551-0.
Martinez-Jimenez CP, Gomez-Lechon MJ, Castell JV, Jover R: Transcriptional regulation of the human hepatic CYP3A4: Identification of a new distal enhancer region responsive to CCAAT/enhancer binding protein beta isoforms (LAP and LIP). Mol Pharmacol. 2005, 6: 2088-101. 10.1124/mol.104.008169.
Parkin SE, Baer M, Copeland TD, Schwartz RC, Johnson PF: Regulation of CCAAT/enhancer-binding protein (C/EBP) activator proteins by heterodimerization with C/EBPgamma (Ig/EBP). J Biol Chem. 2002, 277: 23563-72. 10.1074/jbc.M202184200.
Morel Y, Barouki R: Repression of gene expression by oxidative stress. Biochem J. 1999, 342 (Pt 3): 481-96. 10.1042/0264-6021:3420481.
Toone WM, Morgan BA, Jones N: Redox control of AP-1-like factors in yeast and beyond. Oncogene. 2001, 20: 2336-2346. 10.1038/sj.onc.1204384.
Eddy SR: Hidden Markov models. Curr Opin Struct Biol. 1996, 6: 361-365. 10.1016/S0959-440X(96)80056-X.
Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003, 31: 374-378. 10.1093/nar/gkg108.
Vinson C, Myakishev M, Acharya A, Mir AA, Moll JR, Bonovich M: Classification of human B-ZIP proteins based on dimerization properties. Mol Cell Biol. 2002, 22: 6321-35. 10.1128/MCB.22.18.6321-6335.2002.
Bornberg-Bauer E, Rivals E, Vingron M: Computational approaches to identify leucine zippers. Nucleic Acids Res. 1998, 26: 2740-2746. 10.1093/nar/26.11.2740.
Brendel V, Karlin S: Too many leucine zippers?. Nature. 1989, 341: 574-5. 10.1038/341574a0.
Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-17. 10.1006/jmbi.2000.4042.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-80.
We thank Elgar Pichler (AstraZeneca) and Marc Robinson-Rechavi (UNIL) for useful comments and discussions. GDA received a CASE studentship from the EPSRC and AstraZeneca, and was also supported by an EPSRC platform grant (GR/R80810/01) to SGO. Work on protein interactions in DLR's and SGO's groups is supported by the BBSRC and DTI/Beacon. GDA gratefully acknowledges support from Dimitris and Vasiliki Amoutzias.
GDA carried out all of the bioinformatics analyses and wrote the manuscript, together with SGO. EBB, SGO and DLR conceived of the study, participated in its design and coordination. EBB and DLR revised and refined the manuscript, which all authors read and approved before submission.