Definition of phosphomotifs
We downloaded the phosphomotifs defined in PhosphoSitePlus  and PhosphoELM [Remark 1]  on May 8, 2012. We extracted phosphomotifs validated in high- and low-throughput experiments, and they were manually combined to yield unique motifs. We also extracted known motifs described in previous studies  and added them to our phosphomotif dataset. Finally, we obtained 178 known phosphomotifs.
Conservation of motifs in species ranging from yeasts to humans
We downloaded nine genomes from KEGG  (May 8, 2012): Homo sapiens (hsa), Pan troglodytes (ptr), Mus musculus (mmu), Canis familiaris (cfa), Danio rerio (dre), Drosophila melanogaster (dme), Caenorhabditis elegans (cel), Schizosaccharomyces pombe (spo), and Saccharomyces cerevisiae (sce). The three-letter code for each genome is the species identifier defined by KEGG. Orthologous genes among these genomes were defined by KEGG OC . For each ortholog cluster, multiple sequence alignments were constructed by MAFFT , which is a freely available, rapid, and reliable tool compared with other alignment tools. MAFFT was run with the “--auto” option, which automatically selected the optimal options. We explored the sequence regions that matched exactly with the known phosphomotifs from all species. We investigated the species conservation with respect to these known and potential phosphorylated sites. In this study, potential phosphosites were defined as all serine/threonine/tyrosine residues in human proteins derived from known phosphosites stored in the databases.
We assumed that the order of evolution for these species corresponded to the general and universal tree, depending on the evolutionary distances from the viewpoint of humans [18, 19]. Therefore, the most basal organisms were the two yeasts, followed by the remaining organisms in the following order: nematodes, fly, fish, and mammals. If an amino acid residue in a protein at the same position as a known phosphorylated site in the orthologous protein in humans did not correspond to the amino acid residues S, T, and Y, we considered that the site was not conserved in the species. However, if the amino acid residue was conserved in a species that was evolutionarily distant from humans, the site was regarded as conserved, even if it was not conserved in the intermediate species. We created sequence logos of 11 residues using the WebLogo application , which included known and potential phosphorylated sites in the central positions of these conserved phosphorylation motifs. We also created sequence logos of the motif regions observed in each genome.
To compare the conservation of phosphosites, we calculated the conservation rates for the motifs in each species. The conservation rate was defined as the number of motifs conserved in a species divided by the number of motifs observed in the human genome. To confirm that the evolutionary history dramatically affected the conservation rate of sequence motifs, we extracted known motifs where the conservation rate changed >50% between two evolutionarily adjacent species. The conservation rate patterns across all the species were clustered using R (http://www.r-project.org/) based on the Euclidean distance and Ward’s method.
We calculated the conservation rates for all species from yeasts to humans for all known and potential phosphosites. To determine the difference between known and potential phosphosites, we calculated a conservation index (S), which was the sum of the difference between the conservation rate of a phosphosite in a motif (C) and the reference conservation rate (R) of the corresponding amino acid residue obtained from all human proteins. The conservation index was calculated using the following equation:
where G denotes the set of genomes used in the present study, q is each genome selected from G, and Cq and Rq are the conservation rate and the reference conservation rate in q, respectively.
We extracted GO  annotation for the proteins with known phosphorylated sites in the human genome. The annotations at the known motif level were assigned on the basis of the GO biological processes for proteins with the motif. To clarify the functions of the motif, we performed an enrichment analysis using GoMiner . Significant GO annotations were extracted with cutoffs of FDR = 0.01 and P < 0.01.
To explore the associations between proteins related to a motif, we included their interaction information. We downloaded information related to intermolecular interactions from BioGRID (2.0.58)  and STRING (v8.2) . We extracted the interactions of proteins in a known motif from this interaction information. We also compared the interaction networks generated for the motifs with randomly generated networks. We selected the same number of human proteins as those with a motif and extracted the interaction networks of these proteins. This randomization procedure was repeated 100 times. The fold change in a motif was calculated as the number of interactions in a motif divided by the average number of interactions in a randomly generated network. Network visualizations for our data were created using Cytoscape .
Zinc finger analysis
The HMMER programs with the default parameters were used to extract Pfam motifs that corresponded to zinc finger motifs from the genomes . We counted the number of zinc finger motifs in human proteins.
We counted the conservation levels of the known phosphosites included in our motifs. We also manually extracted human proteins related to the spliceosome, insulin signaling, and the cytoskeleton based on the KO definitions. We counted the conservation levels of these proteins in addition to the level of conservation in all human proteins as a control. We also extracted proteins related to complexes A, B, C, and common components of the spliceosome using the KEGG BRITE functional category and determined their conservation levels.
Proportion of proteins shared between motifs
We calculated the proportion of proteins shared between two known motifs. We extracted all human proteins with the known motif. The proportion was defined as the number of common proteins in two known motifs divided by the total number of proteins in the two motifs.
Network expansion of sigmoid-type phosphomotifs
We isolated 585 proteins that possessed phosphosites conserved from yeast (spe and sce) to humans. We defined the interaction network of these proteins as the core signaling network. In addition, we extracted the interactions with the proteins in the core signaling network from BioGRID and STRING, and obtained the additional network for 996 proteins. We extracted the interaction network for the proteins with sigmoid-type phosphomotifs (motifs 55, 56, and 58 for worm; motifs 82, 93, and 121 for fly; and motifs 46, 135, 140, 159 and 165 for fish). We also constructed random interaction networks using the same number of proteins with sigmoid-type phosphomotifs in each genome. This randomization procedure was repeated 100 times. We compared the randomization results and the real counts of proteins with sigmoid-type phosphomotifs in the core signaling network and the additional network.
cDNAs of 2×C2H2WT, 2×C2H2SN, and 2×C2H2SN were subcloned into pCXN2-mCFP and/or pCXN2-mVenus. pCXN2-mCFP and/or pCXN2-mVenus are expression vectors, which encode monomeric CFP and monomeric Venus, a YFP variant, respectively . The cDNAs of 2×C2H2WT, 2×C2H2SN, and 2×C2H2SN were synthesized by Operon Biotechnology Inc. The pCXN2 vector, which carries a neomycin resistance gene, is derived from pCAGGS.
The Cos7 cells used in this study were Cos7/E3, a subclone of Cos7 cells established by Y. Fukui. Cos7 cells were maintained in Dulbecco’s modified Eagle’s medium (Sigma, St Louis, MO, USA) supplemented with 10% fetal calf serum. In the transient expression studies, the cells were transfected using Polyfect (Qiagen). The cells were analyzed at 24 h after transfection.
Imaging of the C2H2 zinc finger motif in living cells
Live cell imaging was performed essentially as previously described. In brief, cells plated on a collagen-coated 35-mm-diameter glass base dish (Asahi Techno Glass Co., Tokyo, Japan) were transfected with C2H2 zinc finger motif expression vectors and imaged every 2 min using an Olympus IX81 inverted microscope (Olympus Optical Co., Tokyo, Japan), which was equipped with a cooled CCD camera, (CoolSNAP HQ; Roper Scientific, Trenton, NJ) and controlled by MetaMorph software (Universal Imaging, West Chester, PA). For the dual-emission ratio imaging of the m1Venus-2×C2H2 and m1CFP-2×C2H2 mutants, we used an excitation filter, i.e., 440AF21 for CFP and S492/18X for YFP, with a dichroic mirror, i.e., 86006bs, and emission filters, i.e., 480AF30 for CFP and 535AF26 for YFP (Omega Optical Inc., Brattleboro, VT). The cells were illuminated with a 75-W xenon lamp through a 12% ND filter (Olympus Optical) and visualized using a 40× oil immersion objective lens. After background subtraction, the ratio of the intensity of the nuclear region relative to the whole cell region was calculated using MetaMorph, which was used to represent the efficiency of the retention of C2H2 motifs in nuclear regions.