- Research article
- Open Access
Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling
© Karimpour-Fard et al; licensee BioMed Central Ltd. 2007
- Received: 11 May 2007
- Accepted: 29 October 2007
- Published: 29 October 2007
The use of computational methods for predicting protein interaction networks will continue to grow with the number of fully sequenced genomes available. The Co-Conservation method, also known as the Phylogenetic profiles method, is a well-established computational tool for predicting functional relationships between proteins.
Here, we examined how various aspects of this method affect the accuracy and topology of protein interaction networks. We have shown that the choice of reference genome influences the number of predictions involving proteins of previously unknown function, the accuracy of predicted interactions, and the topology of predicted interaction networks. We show that while such results are relatively insensitive to the E-value threshold used in defining homologs, predicted interactions are influenced by the similarity metric that is employed. We show that differences in predicted protein interactions are biologically meaningful, where judicious selection of reference genomes, or use of a new scoring scheme that explicitly considers reference genome relatedness, produces known protein interactions as well as predicted protein interactions involving coordinated biological processes that are not accessible using currently available databases.
These studies should prove valuable for future studies seeking to further improve phylogenetic profiling methodologies as well for efforts to efficiently employ such methods to develop new biological insights.
- Positive Predictive Value
- Reference Genome
- Protein Interaction Network
- Target Genome
- Phylogenetic Profile
Genome sequencing projects are rapidly increasing the raw data available for predicting protein function and protein interaction networks. The best established method for function prediction is based on sequence homology to proteins of known function. Unfortunately, strictly homology-based predictions are of limited use due to the large number of homologous protein families with no known function for any single member [1–3]. An alternative method for predicting protein function is the Phylogenetic profile method, also known as the Co-Conservation method, which rests on the premise that functionally related proteins are gained or lost together over the course of evolution . This method predicts functional interactions between pairs of proteins in a target organism by determining whether both proteins are consistently present or absent across a set of reference genomes. These protein-protein interactions (PPI) are distinct from physical interactions as they capture putative functional relationships. Sequence similarity is used only to identify homologs, not to infer function. Since first introduced by Pellegrini et al. , Phylogenetic profiling has been successfully applied to the prediction of protein function by several groups and demonstrated to be more powerful than sequence similarity alone at predicting protein function [5–11].
Currently several web-based databases compile predictions of protein-protein interactions (PPIs), e.g. PLEX , String  and Prolinks . These databases either use all available bacterial genomes at the time of implementation or a select subset of bacterial genomes without focusing on how the selection of the bacteria will influence the PPIs. Several groups have attempted to address this issue, including a number of methods that account for genome phylogeny when scoring profile similarities [12, 13]. Barker et al. applied maximum likelihood statistical modeling for predicting functional protein linkages based on Phylogenetic profiling . Their method detected independent instances of the correlated gain or loss of protein pairs on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny . Jothi et al. did a study using 16 different reference sets of genomes, using combinations of bacterial, archaea and eukaryotic genomes. They showed using a combination of bacterial and archaea genomes as a reference set could be enough to make accurate functional linkage predictions . Cokus et al. found phylogenetic relationships between genomes by using the first order of the genomes within profiles and then enumerating runs of consecutive matches to compute the accuracy of the probability of observing these phylogenetic relationships . Zheng et al. constructed Phylogenetic profiles based upon the presence or absence of neighboring protein pairs within a genome . They demonstrated that the inclusion of more genomes (68 vs. 30) resulted in better performance for PPI predictions, however, they did not provide a strategy for bacteria selection. Sun et al. showed that accuracy of PPI predictions can be improved by using a set of genomes which are maximally distinct from one another . We have noted the same phenomenon here, but further show that selection of groups of bacteria that are closely related either phenotypically or genotypically generates biologically relevant information that is missed when other methods for grouping bacteria are employed.
It is sensible that inclusion of genomes from organisms that exist in similar environmental niches (i.e. rhizosphere bacteria), share certain phenotypic properties (i.e. motility), or that are from the same species (i.e. different strains of E. coli) might bias protein interaction network predictions in an undesirable manner. The challenge is that the extent of such biases remains uncharacterized, and thus methods for guiding the selection of relevant reference genomes are lacking. Here, we have examined such biases and then used our studies to developed a new scoring scheme to provide guidance for the selection of reference genomes in Phylogenetic profiling efforts.
Phylogenetic profiling methods work by i) creating a Phylogenetic profile vector where Pij = 1 indicates a homolog exists between protein i in the target genome and a protein in a reference genome j, ii) calculating similarity measurements on the profile vectors for each pair of genes in the target genome, and iii) defining protein interactions in the target genome based on proteins sharing a profile similarity value greater than a threshold value. Using E. coli K12 as the target genome, we have evaluated how changing different aspects of this process, including the use of a new metric for defining similarity, affect predicted protein interaction networks.
Comparison and evaluation of protein-protein interaction
The effect of reference genome selection on interactions with proteins of unknown function
Comparison of different combinations of reference genomes and E-value thresholds
Predicted protein-protein interaction pairs using different sets of reference genomes.
No. clusters >2
No. Co-Conserved predicted pairs
No. Co-Conserved proteins
All (Inverse homology)
High GC Gram positive
Low GC Gram negative
Predicted protein-protein interaction pairs using different set of reference genomes.
No. clusters >2
No. Co-Conserved predicted pairs
No. Co-Conserved proteins
All (Inverse homology)
High GC Gram positive
Low GC Gram positive
Network topology using different reference genome
Topological analysis network measured using different sets of reference genomes.
Average clustering coefficient
No. Co-Conserved proteins
All (inverse homology)
Comparison of different scoring schemes
Selection of reference genome affects protein-protein interaction predictions
Figure 4c–d show biofilm related E. coli K12 protein interaction networks developed when either All or Motile reference genome sets were used. When All was used as the reference genome, proteins in this cluster had GGDEF (Gly-Gly-Asp-Glu-Phe) or EAL (Glu-Ala-Leu) domains, and sensor proteins for the two component regulatory system (Figure 4d). This same result was observed when the Selected or Proteobacteria reference genome sets were used. Previous studies have shown that quorum sensing and two component regulatory system are involved in biofilm formation [25, 26]. Moreover, our experiments previously have shown that many of these previously uncharacterized GGDEF containing proteins can contribute to biofilm formation . When the reference genome was changed to include genomes that shared a biofilm relevant phenotype (i.e. motility), the size of the cluster and the number of proteins within this cluster with different functional categories increased. The cluster still contained proteins with GGDEF or EAL domains but now included the sensors, amino acid biosynthesis proteins, and regulators that may contribute to the expression and regulation of overall biofilm phenotypes in E. coli. This result indicates that in at least some cases the reference selection can point out unique features of target organisms that would be missed had another reference genome been selected. Moreover, this result demonstrates that choice of reference genomes selection can also be used to identify Co-Conserved clusters of proteins that function in distinct pathways (regulators, cyclic-di-GMP metabolism, etc.) yet contribute to a common phenomenon (biofilms). This information is of substantial value to biological studies seeking to decipher complex phenotypes such as the biofilm phenotype examined here.
Comparison to alternative methods
The use of computational methods will continue to grow as more genomes are fully sequenced. Here we examined how differences in key aspects of the Phylogenetic profiles method affected predicted protein interaction networks. We specifically focused on aspects involving the selection of reference genomes and the measurement of similarity among protein Co-Conservation vector profiles.
Phylogenetic profiles method offer an alternative to strictly homology-based approaches. While homology-based methods can be effective for predicting the functions of remote homologs, these methods perform poorly as the evolutionary distance between homologous proteins increases. Even a sophisticated homology-based method fails to successfully assign functions to most of the proteins for a particular organism. Phylogenetic profiles methods on the other hand are not strictly based on homology and assign function to a protein based on the context of its interactions with other proteins within a cluster. We designed a new system that utilizes different features of this method and showed that these features affect the accuracy of predictions.
Pellegrini et al. introduced Phylogenetic profiles while using 16 fully sequenced organism . Since more genomes have become available, the choice of reference genomes to use when constructing Phylogenetic profiles has become more important. Specifically, we noted that the number of unclassified proteins varied considerably depending on the reference set of genomes. This result both verified and extended previous results based on relatedness of reference genomes. We also showed that selection of all sequenced bacteria as a reference genome set may not produce the optimal PPV since the set of fully sequenced bacteria is biased towards pathogens and laboratory species (i.e. E. coli) among others. We showed that different sets of reference genomes produce substantially different results in the terms of the accuracy of predicted interactions (i.e. PPV), and that such results were relatively independent of the choice of E-value threshold employed to define homologs. This specific result demonstrates the need for flexibility in choosing among reference genomes when initiating Phylogenetic profiling based efforts for prediction protein interaction networks. One clear challenge here is that the selection of such reference genomes is not simple. Rather, this process requires knowledge of the relevant organisms, both in terms of their taxonomy and the specific phenotypes they express. To aid in this process, we introduced a new scoring scheme (Inverse homology) that considers the homology of the target and reference genomes, and thus places some emphasis on the evolutionary relationship of the relevant genomes. We showed that depending on the similarity metric used in combination with our Inverse homology scoring scheme, either the accuracy or the topology of the predicted protein interaction network was altered.
Our final studies were directed at understanding the extent to which these issues affected overall predictions and whether or not any observed differences provide new biological insights. We examined the topology of the network with various reference genomes and noted that when more closely related bacteria were selected, the clusters became larger with a higher degree of interconnectedness when compared to clusters derived from more distantly related reference genomes (Table 1, 2, 3). In general, small and medium sized clusters tend to contained proteins of known function in contrast to large clusters that contained proteins that function in distinct but coordinate processes. As such, Phylogenetic profiling-based approaches can benefit from flexibility in selecting and weighting of reference genomes as demonstrated here. We showed that such benefits do indeed generate unique biological insights. In particular, we showed that biofilm relevant protein interaction networks contained a broader range of relevant protein functions when reference genomes were selected based on a shared essential phenotype (motility) as compared to using all available genomes. We extended this result by comparing directly the use of our Inverse homology scoring scheme to the methods used by publicly available databases (i.e. Prolinks or String). The Inverse homology approach predicted known protein interactions in two separate biological processes (flagellum, biofilms) that were not predicted by existing methods.
Overall, we have presented an evaluation of several key criteria affecting the accuracy and topology of protein interaction networks predicted by Phylogenetic profiling methods. We have shown that the choice of reference genome is of key importance and provided guidance, both in terms of different evaluations and the report of a new similarity scoring scheme, for future efforts seeking to further improve computational methods for predicting protein interactions as well as to use such methods for developing new biological understanding.
Reference Genome Selection
At the time of our implementation (June 2006), 268 complete microbial genomes were available through the National Center for Biotechnology Information (NCBI) and were downloaded from their ftp site . Phenotypic information such as motility and oxygen requirement was generated manually from available data on NCBI . Several different reference genomes were used in our system and they were 1) Proteobacteria (130 bacteria), 2) Low G+C Gram positive bacteria (75 bacteria), 3)High G+C Gram positive bacteria (22 bacteria), 4) Selecting only one strain from those fully sequenced for each organism (Selected (75 bacteria)), 5) All the fully sequenced bacteria available on NCBI (All (268 bacteria)), 6) selecting based on oxygen requirement (Aerobic (91 bacteria), 7) Anaerobic (31 bacteria), and Facultative (107)), and 8) selecting based on Motility (Motile (104), and None-Motile (82)). In our evaluation we focused on a target genome E. coli K12 because a well curated dataset of protein functions is available  and substantial experimental data exists for this bacteria.
Creating Phylogenetic profiles matrix
Proteins existing in more than 90% and less than 10% of genomes base on different set of reference genomes. (E-value 10-5)
No. proteins appearing in <10%
No. proteins appearing in > 90%
High GC Gram positive
Low GC Gram negative
Based on the assumption that highly conserved proteins (>90% of genomes evaluated) would be limited to a few functional categories and poorly conserved proteins are likely uncharacterized, we eliminated such proteins prior to measuring profile similarities (described below). To check this assumption, we characterized the discarded proteins based on COG classifications as shown in Additional file 2. The majority of proteins that appeared in more than 90% of the reference genomes were involved in translation, ribosomal structure and biogenesis, while the majority of proteins appearing in less than 10% of the reference genomes where unclassified.
Generating weighted Phylogenetic profile vectors using Inverse homology
As an alternative to binary vectors, we also developed a weighting scheme which we refer to as Inverse homology. The Inverse homology was calculated by weighting the Phylogenetic profile vector by taking into consideration the homology of the target genome versus the reference genome. Given an E-value threshold, the homology Hi, j between two genomes was calculated as the ratio of number of homologs of each reference organism j to the number of proteins in the target genome i. For each protein i the target, if there was a homolog to reference protein j (Eij <E-value threshold) then Pij = 1/(Hi, j) otherwise Pij = 0. Calculating the Pij in this way, rather than using binary values as originally thought  or normalizing the E-values , incorporates genome homology information and accounts for phylogenetic relationships between genomes and improves estimates of profile similarities.
Measuring profile similarities
Given a set of (weighted) Phylogenetic profiles, we can calculate the similarity between any pair of proteins using either Pearson correlation coefficient or Mutual Information. We describe each below.
Pearson correlation coefficient
I is the sum of PX, j overall reference genome j, J is the sum of PY, j over j. When the vectors are binary, K is the subset of genomes that contain homologs of both X and Y and N represent the total number of reference genomes. When the vectors are weighted by Inverse homology, K is the sum 1/Hi, j over the subset of genomes that contain homologs of both X and Y and N is the sum of 1/Hi, j over all j for target i.
I(X, Y) = I1(X, Y) + I2(X, Y) + I3(X, Y) + I4(X, Y).
Note: we use log2 when the vectors are binary and loge when the vectors are weighted by Inverse homology.
Generating the protein-protein interaction network
Networks were created and presented as graphs in which each protein is represented as a node and an interaction between proteins is represented by an edge. An edge exists between a pair of proteins whose Phylogenetic profiles similarity score exceed a given threshold. For separation of connected components of the network and building the clusters of proteins, breadth-first search (BFS) graph algorithms were used. The target E. coli K12 genome was analyzed, and the number of assigned pairs is shown in Table 1 and Table 2. Network graphs were visualized using Cytoscape , an open-source, platform-independent environment software. The lengths of the lines connecting proteins hold no meaning and vary to facilitate viewing of the network.
We examined whether changing the BLASTP threshold E-value would affect the accuracy and performance of the Co-Conservation method. For determination and optimization of the E-value for each organism, four E-values were applied to determine when a homologous protein was present or absent, 1 × 10-15, 1 × 10-10, 1 × 10-5, and 1 × 10-3. An E-value was considered optimal if it had the maximum number of correctly linked proteins, as ranked by the selected scoring scheme. A correct linkage was defined by two proteins sharing the same biological process. For evaluation purposes we used on eight different types of reference genomes. Therefore, 32 combinations of different reference organism sets and various E-values thresholds were formed and were evaluated using COG functional categories and EcoCyc.
Analyzing the topology of the network
The degree of a node in a graph is the number of edges connected to that node and proteins that are joined by an edge are said to be adjacent. A neighbor of a protein i is a protein adjacent to i. The clustering coefficient C indicates the degree to which k neighbors of a particular node are connected to each other. Let ki be the number of neighbors of node i and ki-1 be the number of nodes connected to neighbors of i. The clustering coefficient of node i is given as
Ci = 2 ni/ki * (ki-1)
where ni is the number of edges that exist between i, its neighbors and their neighbors . Then the average clustering coefficient was calculated by averaging C over all nodes i.
Comparison of predicted protein interaction to published data and available resources
In order to measure the performance and reliability of our method over previous methods, we compared the number of interacting proteins, the number of predicted unknown proteins and the functional similarity of proteins sharing a protein-protein interaction.
We evaluated the predicted protein-protein interaction data of E. coli K12, based on Clusters of Orthologous Groups of proteins (COG) in NCBI  e.g. for E. coli K12 ), biological pathway information in KEGG orthology (KO)  five broad functional categories (Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular Processes, and Human Diseases), TIGR  functional role category (18 different functional role categories), and protein complexes, pathway, operons, regulator (pairs of gene A and B, where the product of gene A is the a component of transcription factor that regulates gene B), and paralogous groups in EcoCyc. EcoCyc (June 2006) data were downloaded from , we extracted all information related to protein complexes, pathway, operons, regulators, and paralogous groups from EcoCyc.
To find out whether the selection of bacteria makes a difference and what the optimal way to select the bacteria is, we compared the performance of PPV using different types of reference genomes. The predicted pairs where at least one protein was unclassified were removed from analysis. In addition the interactions that involved proteins that are classified as "General function" in COG function categories were considered unclassified in COG. The functional similarity of a protein interaction dataset for COG was a true positive (TP) where the pair had the same functional category and a false positive (FP) where they belong to different functional categories. Similarly, for the EcoCyc database, proteins that appear in the same complex, pathway, operons, homologs and paralogous group are presumed to be true positives, while the other classified pairs were false positives. Sensitivity and Specificity are good predictive values but since the negative data set can not be defined, we used the Positive Predictive Value (PPV). The PPV was calculated as PPV = TP/(TP+FP). Finally, we compared our result to previous works such as String  and Prolinks  databases. Though String and Prolinks employ a variety of methods for predicting interactions, only those interactions based solely on Phylogenetic profiles were extracted. Prolinks makes available all interactions together with a confidence score for each interaction. We compared the top 2,600 interactions obtained by Inverse homology against the top 2,600 interactions from Prolinks by calculating the PPV using the EcoCyc database for E. coli K12 (Figure 5d). We also analyzed several clusters involving well known processes (i.e. flagellum, chemotaxis, and biofilm proteins), as described in the Results section in detail, against interactions from String and Prolinks.
We thank Norman Pace for excellent discussions and Sonia Leach for excellent discussions and reading the manuscript. This study was supported by NSF grant BES0228584 and NIH grant K25 AI064338 to RTG, and NIH grants R01-LM-008111 and R01-GM083649 to LH.
- Shah I, Hunter L: Predicting enzyme function from sequence: a systematic appraisal. Proc Int Conf Intell Syst Mol Biol. 1997, 5: 276-283.PubMed CentralPubMedGoogle Scholar
- Rost B: Enzyme function less conserved than anticipated. J Mol Biol. 2002, 318 (2): 595-608. 10.1016/S0022-2836(02)00016-5.PubMedView ArticleGoogle Scholar
- Fraser CM, Eisen JA, Salzberg SL: Microbial genome sequencing. Nature. 2000, 406 (6797): 799-803. 10.1038/35021244.PubMedView ArticleGoogle Scholar
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A. 1999, 96 (8): 4285-4288. 10.1073/pnas.96.8.4285.PubMed CentralPubMedView ArticleGoogle Scholar
- von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B: STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003, 31 (1): 258-261. 10.1093/nar/gkg034.PubMed CentralPubMedView ArticleGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science. 1999, 285 (5428): 751-753. 10.1126/science.285.5428.751.PubMedView ArticleGoogle Scholar
- Date SV, Marcotte EM: Protein function prediction using the Protein Link EXplorer (PLEX). Bioinformatics. 2005, 21 (10): 2558-2559. 10.1093/bioinformatics/bti313.PubMedView ArticleGoogle Scholar
- Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 2004, 5 (5): R35-10.1186/gb-2004-5-5-r35.PubMed CentralPubMedView ArticleGoogle Scholar
- Strong M, Mallick P, Pellegrini M, Thompson MJ, Eisenberg D: Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach. Genome Biol. 2003, 4 (9): R59-10.1186/gb-2003-4-9-r59.PubMed CentralPubMedView ArticleGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000, 10 (8): 1204-1210. 10.1101/gr.10.8.1204.PubMed CentralPubMedView ArticleGoogle Scholar
- Eisenberg D, Marcotte EM, Xenarios I, Yeates TO: Protein function in the post-genomic era. Nature. 2000, 405 (6788): 823-826. 10.1038/35015694.PubMedView ArticleGoogle Scholar
- Vert JP: A tree kernel to analyse phylogenetic profiles. Bioinformatics. 2002, 18 Suppl 1: S276-84.PubMedView ArticleGoogle Scholar
- Barker D, Pagel M: Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comput Biol. 2005, 1 (1): e3-10.1371/journal.pcbi.0010003.PubMed CentralPubMedView ArticleGoogle Scholar
- Jothi R, Przytycka TM, Aravind L: Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics. 2007, 8: 173-10.1186/1471-2105-8-173.PubMed CentralPubMedView ArticleGoogle Scholar
- Cokus S, Mizutani S, Pellegrini M: An improved method for identifying functionally linked proteins using phylogenetic profiles. BMC Bioinformatics. 2007, 8 Suppl 4: S7-10.1186/1471-2105-8-S4-S7.PubMedView ArticleGoogle Scholar
- Zheng Y, Roberts RJ, Kasif S: Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biol. 2002, 3 (11): RESEARCH0060-10.1186/gb-2002-3-11-research0060.PubMed CentralPubMedView ArticleGoogle Scholar
- Sun J, Xu J, Liu Z, Liu Q, Zhao A, Shi T, Li Y: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics. 2005, 21 (16): 3409-3415. 10.1093/bioinformatics/bti532.PubMedView ArticleGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A. 1999, 96 (6): 2896-2901. 10.1073/pnas.96.6.2896.PubMed CentralPubMedView ArticleGoogle Scholar
- Oliver S: Guilt-by-association goes global. Nature. 2000, 403 (6770): 601-603. 10.1038/35001165.PubMedView ArticleGoogle Scholar
- Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001, 292 (5518): 929-934. 10.1126/science.292.5518.929.PubMedView ArticleGoogle Scholar
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998, 23 (9): 324-328. 10.1016/S0968-0004(98)01274-2.PubMedView ArticleGoogle Scholar
- Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM, Pellegrini-Toole A, Bonavides C, Gama-Castro S: The EcoCyc Database. Nucleic Acids Res. 2002, 30 (1): 56-58. 10.1093/nar/30.1.56.PubMed CentralPubMedView ArticleGoogle Scholar
- Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature. 1998, 393 (6684): 440-442. 10.1038/30918.PubMedView ArticleGoogle Scholar
- Enault F, Suhre K, Abergel C, Poirot O, Claverie JM: Annotation of bacterial genomes using improved phylogenomic profiles. Bioinformatics. 2003, 19 Suppl 1: i105-7. 10.1093/bioinformatics/btg1013.PubMedView ArticleGoogle Scholar
- Li YH, Tang N, Aspiras MB, Lau PC, Lee JH, Ellen RP, Cvitkovitch DG: A quorum-sensing signaling system essential for genetic competence in Streptococcus mutans is involved in biofilm formation. J Bacteriol. 2002, 184 (10): 2699-2708. 10.1128/JB.184.10.2699-2708.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Li YH, Lau PC, Tang N, Svensater G, Ellen RP, Cvitkovitch DG: Novel two-component regulatory system involved in biofilm formation and acid resistance in Streptococcus mutans. J Bacteriol. 2002, 184 (22): 6333-6342. 10.1128/JB.184.22.6333-6342.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Lynch MD, Warnecke T, Gill RT: SCALEs: multiscale analysis of library enrichment. Nat Methods. 2006Google Scholar
- Karimpour-Fard A, Detweiler CS, Erickson KD, Hunter L, Gill RT: Cross-species cluster co-conservation: a new method for generating protein interaction networks. Genome Biol. 2007, 8 (9): R185-10.1186/gb-2007-8-9-r185.PubMed CentralPubMedView ArticleGoogle Scholar
- Saijo-Hamano Y, Uchida N, Namba K, Oosawa K: In vitro characterization of FlgB, FlgC, FlgF, FlgG, and FliE, flagellar basal body proteins of Salmonella. J Mol Biol. 2004, 339 (2): 423-435. 10.1016/j.jmb.2004.03.070.PubMedView ArticleGoogle Scholar
- Slonim N, Elemento O, Tavazoie S: Ab initio genotype-phenotype association reveals intrinsic modularity in genetic networks. Mol Syst Biol. 2006, 2: 2006 0005-10.1038/msb4100047.PubMed CentralPubMedView ArticleGoogle Scholar
- NCBI FTP site. [ftp://ftp.ncbi.nih.gov/genomes/Bacteria/]
- NCBI Genbank Protein Annotation. [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi]
- Essential Proteins. [http://tubic.tju.edu.cn/deg/]
- Wu J: Identification of functional links between genes using phylogenetic profiles. Bioinformatics. 2003, 19: 1524-1530. 10.1093/bioinformatics/btg187.PubMedView ArticleGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.PubMed CentralPubMedView ArticleGoogle Scholar
- COGs Functional annotation. [http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?fun=all]
- KEGG orthology (KO). [http://www.genome.jp/dbget-bin/get_htext?ko00001.keg]
- TIGR. [http://cmr.tigr.org/tigr-scripts/CMR/shared/RoleList.cgi]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.